Crowdsourcing for Information Retrieval: Principles, Methods, and Applications

Crowdsourcing has emerged in recent years as an exciting new avenue for leveraging the tremendous potential and resources of today’s digitally-connected, diverse, distributed workforce. Generally speaking, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee or contractor. Crowdsourcing platforms such as Amazon Mechanical Turk and CrowdFlower have gained particular attention as active online market places for reaching and tapping into this glut of a still largely under-utilized workforce. Crowdsourcing offers intriguing new opportunities for accomplishing different kinds of tasks or achieving broader participation than previously possible, as well as completing standard tasks more accurately in less time and at lower cost. Unlocking the potential of crowdsourcing in practice, however, requires a tri-partite understanding of principles, platforms, and best practices. This tutorial will introduce the opportunities and challenges of crowdsourcing while discussing the three issues above. This will provide attendees with a basic foundation to begin applying crowdsourcing in the context of their own particular tasks.

The tutorial is designed for those with little to intermediate familiarity with crowdsourcing who want to learn about the capabilities and limitations of crowdsourcing techniques for information retrieval (IR). Recommended background includes basic familiarity with IR, IR evaluation, and experimental design. The tutorial will provide academic and industrial participants alike with a solid introduction to:

State of the art on crowdsourcing/human computation research

  • When to use crowdsourcing for an experiment
  • How to use Mechanical Turk via the user interface and API
  • Overview of other tools
  • Apply design guidelines to maximize results
  • Crowdsourcing methods to evaluate search and blend automation with human computation


  • Introduce crowdsourcing, wisdom of crowds, and human computation
  • Survey recent “killer apps” exemplifying successful application of crowdsourcing principles
  • Provide practical “how to” guidance for effectively using Amazon’s Mechanical Turk
  • Summarize recent surveys of crowd demographics to better know who’s doing the work
  • Discuss a variety of incentive structures available to encourage quality and quantity of work
  • Emphasize the importance of design for successful execution of experiments
  • Describe use of human-centric practices with statistical methods to achieve quality assurance
  • Summarize key best practices for achieving efficient, inexpensive, and accurate work
  • Review current opportunities, untapped potential, and open challenges for IR crowdsourcing

Course Materials

Attendees will be supplied with a full set of tutorial notes and supporting bibliography.


Matthew Lease is an Assistant Professor in the School of Information at the University of Texas at Austin. He is co-organizing the 2nd SIGIR Workshop on Crowdsourcing for Information Retrieval at SIGIR this year as well as co-organizing the TREC 2011 Crowdsourcing Track. He previously was an organizer of the WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining and the 1st SIGIR Workshop on Crowdsourcing in 2010, he served on the Program Committee at CrowdConf 2010, and he is co-editing a special issue of Springer’s Information Retrieval journal on Crowdsourcing in 2011. He has published various papers in the area and taught a graduate-level course on crowdsourcing at UT Austin in Spring 2011.

Omar Alonso is a Technical Lead in the Bing team at Microsoft. He has been working on crowdsourcing for the last few years in industry and as a researcher applying this technique for a diverse set of applications. He has published a number of articles on human computation/crowdsourcing and participated in many workshops and meetups. His interest are information retrieval, temporal retrieval, human computation/crowdsourcing, evaluation, and information visualization. He holds a PhD in CS from UC Davis.