CAREER: Achieving Quality Crowdsourcing Across Tasks, Data Scales, and Operational Settings

CAREER: Achieving Quality Crowdsourcing Across Tasks, Data Scales, and Operational Settings

Matt Lease
Matthew Lease (PI)
Information Retrieval and Crowdsourcing Lab
University of Texas at Austin

UT Austin Press Release (May 22, 2013)

SUMMARY. While nascent crowdsourcing methods are transforming the practice of data collection in research and industry, ensuring quality of the collected data remains difficult in practice and exposes projects to significant risk. This reduces the benefits of crowdsourcing for both current adopters and a wider community of potential beneficiaries. Although diverse communities have proposed statistical algorithms for quality assurance, the splintered nature of these communities has led to relatively little comparative benchmarking and/or integration of alternative techniques. Dearth of reference implementations and shared datasets has further abated progress, as have evaluations based on tightly-coupled systems, domain specific tasks, and excess simulation. Near-exclusive focus on a single crowdsourcing platform, Amazon's Mechanical Turk (MTurk), has particularly shaped prior research and findings. To summarize: 1) the state-of-the-art for crowdsourced quality assurance remains uncertain, particularly across diverse tasks, data scales, workforces, and operational settings; 2) progress is difficult to measure; and 3) much-lauded savings of crowdsourcing often remain elusive in practice. This CAREER project will investigate, integrate, and rigorously benchmark diverse quality assurance algorithms across a range of tasks, data scales, labor sources, and operational settings. Open source reference implementations and new public test collections will facilitate reproducible findings, benchmarking, re-use, and continuing advancement.

Acknowledgement: This material is based upon work supported by the National Science Foundation under Grant No. 1253413. Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Project Duration: 3/1/13-2/28/19

Software & Data

Publications

An Thanh Nguyen, Matthew Lease, and Byron C. Wallace. Explainable Modeling of Annotations in Crowdsourcing. In Proceedings of the 24th Annual ACM Intelligent User Interfaces (IUI) conference, pages 575--579, 2019.
Tanya Goyal, Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations. In 6th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), pages 41--49, 2018. Online version here includes corrections to official version from proceedings.
An Thanh Nguyen, Aditya Kharosekar, Matthew Lease, and Byron C. Wallace. An Interpretable Joint Graphical Model for Fact-Checking from Crowds. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018.
Matthew Lease and Omar Alonso. Crowdsourcing and Human Computation, Introduction. Encyclopedia of Social Network Analysis and Mining, pages 1-12, 2017.
Akash Mankar, Riddhi J. Shah, and Matthew Lease. Design Activism for Minimum Wage Crowd Work. In 5th AAAI Conference on Human Computation and Crowdsourcing (HCOMP): Works-in-Progress Track, 2017.
An Thanh Nguyen, Junyi Jessy Li, Ani Nenkova, Byron C. Wallace, and Matthew Lease. Aggregating and Predicting Sequence Labels from Crowd Annotations. In Proceedings of the 55th annual meeting of the Association for Computational Linguistics (ACL), pages 299-309, 2017.
Brandon Dang, Miles Hutson, and Matthew Lease. MmmTurkey: A Crowdsourcing Framework for Deploying Tasks and Recording Worker Behavior on Amazon Mechanical Turk. In 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP): Works-in-Progress Track, 2016. 3 pages. arXiv:1609.00945.
An Thanh Nguyen, Matthew Halpern, Byron C. Wallace, and Matthew Lease. Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), pages 149-158, 2016.
An Thanh Nguyen, Byron C. Wallace, and Matthew Lease. A Correlated Worker Model for Grouped, Imbalanced and Multitask Data. In Proceedings of the 32nd International Conference on Uncertainty in Artificial Intelligence (UAI), 2016.
An Thanh Nguyen, Byron C. Wallace, and Matthew Lease. Combining Crowd and Expert Labels using Decision Theoretic Active Learning. In Proceedings of the 3rd AAAI Conference on Human Computation (HCOMP), pages 120-129, 2015.
Hyun Joon Jung and Matthew Lease. Modeling Temporal Crowd Work Quality with Limited Supervision. In Proceedings of the 3rd AAAI Conference on Human Computation (HCOMP), pages 83-91, 2015.
Hyun Joon Jung and Matthew Lease. Forecasting Crowd Work Quality via Multi-dimensional Features of Workers. In ICML Workshop on Crowdsourcing and Machine Learning (CrowdML), 2015.
Hyun Joon Jung and Matthew Lease. A Discriminative Approach to Predicting Assessor Accuracy. In Proceedings of the European Conference on Information Retrieval (ECIR), 2015. Received Samsung Human-Tech Paper Award: Silver Prize in Computer Science.
Hyun Joon Jung, Yubin Park, and Matthew Lease. Predicting Next Label Quality: A Time-Series Model of Crowdwork. In Proceedings of the 2nd AAAI Conference on Human Computation (HCOMP), pages 87-95, 2014.
Hyun Joon Jung. Quality Assurance in Crowdsourcing via Matrix Factorization based Task Routing. In Proceedings of World Wide Web (WWW) Ph.D. Symposium, Companion Publication, pages 3-8, 2014.
Hyun Joon Jung and Matthew Lease. Crowdsourced Task Routing via Matrix Factorization. Technical report, University of Texas at Austin, October 2013. arXiv:1310.5142.
Donna Vakharia and Matthew Lease. Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms. In Proceedings of the iConference, 2015.
Ethan Petuchowski and Matthew Lease. TurKPF: TurKontrol as a Particle Filter. Technical report, University of Texas at Austin, April 2014. arXiv:1404.5078.
Tatiana Josephy, Matt Lease, Praveen Paritosh, Markus Krause, Mihai Georgescu, Michael Tjalve, and Daniela Braga. Workshops Held at the First AAAI Conference on Human Computation and Crowdsourcing: A Report. AI Magazine, 35(2):75-78, 2014
Matthew Lease, Praveen Paritosh, and Tatiana Josephy, editors. Proceedings of the AAAI Human Computation Workshop on Crowdsourcing at Scale (CrowdScale). Palm Springs, CA, November 2013.
Aashish Sheshadri. A Collaborative Approach to IR Evaluation. Master's thesis, Department of Computer Science, University of Texas at Austin, May 2014. Co-Supervisors: Kristen Grauman and Matthew Lease.
Aashish Sheshadri and Matthew Lease. SQUARE: A Benchmark for Research on Computing Crowd Consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP), pages 156-164, 2013.
Aashish Sheshadri and Matthew Lease. SQUARE: Benchmarking Crowd Consensus at MediaEval. In Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013. CEUR Workshop (cuer-ws.org) Proceedings Vol-1043, ISSN 1613-0073.