Sequence labeling with multiple annotators

Authors

Filipe Rodrigues
Francisco Câmara Pereira
Bernardete Ribeiro

Abstract

The increasingly popular use of Crowdsourcing as a resource to obtain labeled data has been contributing to the wide awareness of the machine learning community to the problem of supervised learning from multiple annotators. Several approaches have been proposed to deal with this issue, but they disregard sequence labeling problems. However, these are very common, for example, among the Natural Language Processing and Bioinformatics communities. In this paper, we present a probabilistic approach for sequence labeling using Conditional Random Fields (CRF) for situations where label sequences from multiple annotators are available but there is no actual ground truth. The approach uses the Expectation-Maximization algorithm to jointly learn the CRF model parameters, the reliability of the annotators and the estimated ground truth. When it comes to performance, the proposed method (CRF-MA) significantly outperforms typical approaches such as majority voting.

Keywords

Multiple annotators, Crowdsourcing, Conditional random fields, Latent variable models, Expectation maximization

Subject

Machine learning

Related Project

Crowds - Understanding urban land use from digital footprints of crowds

Journal

Machine Learning, Springer, Hal Daume, III, October 2013

PDF File

DOI

Cited by

Year 2016 : 1 citations

C Long, G Hua, A Kapoor, A joint gaussian process model for active visual recognition with expertise estimation in crowdsourcing, International Journal of Computer Vision, 2016

Year 2015 : 1 citations

RG Brace, Physician Participation in Crowdsourcing: Effect of Intrinsic and Extrinsic Motivation, Publication/NA, 2015

Year 2014 : 2 citations

D Hovy, B Plank, A Søgaard, Experiments with crowdsourced re-annotation of a POS tagging data set., ACL (2), 2014

H Fromreide, A Søgaard, NER in Tweets Using Bagging and a Small Crowdsourced Dataset, Advances in Natural Language Processing, 2014