In recent years, the explosion of available data and the complexity of prediction problems has increased the need for large amounts of manually labelled data, posing a challenge to the supervised machine learning process. For this reason, weak supervision using noisy or inaccurately labelled training sets proves to be an attractive alternative.
We present the broader area of weak supervision focusing on the Snorkel framework. We construct several predictive models as weak classifiers, which we then use as labelling functions for the Snorkel generative labeling model. We compare the accuracy of the final models learned with the true labels and the Snorkel probabilistic labels. We show that the final models trained with Snorkel labels have comparable or even better performance than the models trained with the true labels.
|