Distant Supervision

Created November 22, 2021 · Updated March 4, 2026

Distant supervision is a learning scheme in which a classifier is learned given a weakly labeled training set (training data is labeled automatically based on heuristics / rules).

It usually has the following steps:

It may have some labeled training data.
It has access to a pool of unlabeled data.
It has an operator that allows it to sample from this unlabeled data and label them. The operator is noisy in its labels.
The algorithm uses both the originally labeled training data and this new noisy labeled data to give the final output.

References

https://stats.stackexchange.com/questions/46685/distant-supervision-supervised-semi-supervised-or-both