Practical Online Retrieval Evaluation

Online evaluation is amongst the few evaluation techniques available to the information retrieval community that is guaranteed to reflect how users actually respond to improvements developed by the community. Broadly speaking, online evaluation refers to any evaluation of retrieval quality conducted while observing user behavior in a natural context. However, it is rarely employed outside of large commercial search engines due primarily to a perception that it is impractical at small scales. The goal of this tutorial is to familiarize information retrieval researchers with state-of-the-art techniques in evaluating information retrieval systems based on natural user clicking behavior, as well as to show how such methods can be practically deployed. In particular, our focus will be on demonstrating how the Interleaving approach and other click based techniques contrast with traditional offline evaluation, and how these online methods can be effectively used in academic-scale research. In addition to lecture notes, we will also provide sample software and code walk-throughs to showcase the ease with which Interleaving and other click-based methods can be employed by students, academics and other researchers.

Topics include:

  • Overview of online evaluation
  • Collecting usage data: How to be their search engine (with code walk-through)
  • The Interleaving approach for click-based evaluation
  • Practical issues in deploying Interleaving experiments (with code walk-through)
  • Analyzing and interpreting Interleaving results
  • Quantitative comparison of Interleaving with other evaluation methods (both online and offline)
  • Tricky issues, extensions, and limitations
  • From evaluation to optimization: Deriving reliable training data from user feedback

Filip Radlinski is a researcher at Microsoft and a contributor to Bing, where he works on machine learning approaches to information retrieval. He completed his dissertation on learning to rank from implicit feedback at Cornell University in 2008. His recent research focuses on learning personalized rankings, measuring ambiguity in queries and user intents, and studying how to assess the quality of ranked lists of documents from the user perspective by using click information.

Yisong Yue is a postdoctoral researcher in the Machine Learning Department and the Heinz College at Carnegie Mellon University, where he works on machine learning approaches to information retrieval. He completed his dissertation on structured prediction and interactive approaches to learning to rank at Cornell University in 2010. His recent research focuses on machine learning approaches to leveraging user feedback, diversified retrieval, and interactive information retrieval.