SQUARE (Statistical QUality Assurance Robustness Evaluation) is a benchmark for comparative evaluation of consensus methods for human computation / crowdsourcing (i.e., how to generate the best possible answer for each question, given multiple judgments per question). Like any benchmark, SQUARE's goals are to assess the relative benefit of new methods, understand where further research is needed, and measure field progress over time. SQUARE includes benchmark datasets, defined tasks, evaluation metrics, and reference implementations with empirical results for several popular methods.
PAPER: Aashish Sheshadri and Matthew Lease. SQUARE: A Benchmark for Research on Computing Crowd Consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP), 2013. [ bib ]
- See also: Aashish Sheshadri and Matthew Lease. SQUARE: Benchmarking Crowd Consensus at MediaEval. In Proceedings of MediaEval: Crowdsourcing in Multimedia Task, 2013. [ bib ]
CODE: SQUARE software is released as an open-source library for which we welcome community participation and contributions. Download the code.
- October 26, 2015: Version 2.0 of SQUARE now released! (new GIT repo to avoid impacting those using Version 1.0). See 2.0 page for details of what's new!