From federated to aggregated search

Federated search refers to the brokered retrieval of content from a set of auxiliary retrieval systems instead of from a single, centralized retrieval system. Federated search tasks occur in, for example, digital libraries (where documents from several retrieval systems must be seamlessly merged) or peer-to-peer information retrieval (where documents distributed across a network of local indexes must be retrieved).

In the context of web search, aggregated search refers to the integration of non-web content (e.g. images, videos, news articles, maps, tweets) into a web search result page. This is in contrast with classic web search where users are presented with a ranked list consisting exclusively of general web documents. As in other federated search situations, the non-web content is often retrieved from auxiliary retrieval systems (e.g. image or video databases, news indexes).

Although aggregated search can be seen as an instance of federated search, several aspects make aggregated search a unique and compelling research topic. These include large sources of evidence (e.g. click logs) for deciding what non-web items to return, constrained interfaces (e.g. mobile screens), and a very heterogeneous set of available auxiliary resources (e.g. images, videos, maps, news articles). Each of these aspects introduces problems and opportunities not addressed in the federated search literature.

Aggregated search is an important future research direction for information retrieval. All major search engines now provide aggregated search results. As the number of available auxiliary resources grows, deciding how to efficiently surface content from each will become increasingly important.

The goal of this tutorial is to provide an overview of federated search and aggregated search techniques for an intermediate information retrieval researcher. At the same time, the content will be valuable for practitioners in industry. We will take the audience through state-of-the-art and the most influential work in these areas. We will also list some of the new challenges confronted in aggregated search and discuss directions for future work.

Outline:

1. Introduction
2. History and Terminology
3. Architecture
4. Resource Representation
5. Resource Selection
6. Result presentation
7. Evaluation
8. Conclusions and open problems
9. Live demo

Bio

Fernando Diaz is a research scientist at Yahoo! Labs. His primary research interest concerns formal models of information retrieval. His research experience includes distributed information retrieval approaches to web search, interactive and faceted retrieval, mining of temporal patterns from news and query logs, cross-lingual information retrieval, graph-based retrieval methods, and synthesizing information from multiple corpora. He received his PhD from the University of Massachusetts Amherst in 2008.

Mounia Lalmas joined Yahoo! Research in January 2011, as a visiting principal scientist, and works now on models and measures of user engagement. Prior to this, she was a Microsoft Research/RAEng Research Professor at the University of Glasgow. From 2002 until 2007, she co-led INEX, a large-scale project with over 80 participating organisations worldwide, responsible for defining the nature of XML retrieval, and how it should be evaluated. She also works on result presentation and evaluation for aggregated search, and technologies for bridging the digital divide. She has written a short overview on “Aggregated Search”, to appear in Advanced Topics on Information Retrieval, a book edited by Melucci and Baeza-Yates.

Milad Shokouhi is an applied researcher working for Bing at Microsoft Research Cambridge. Before joining Microsoft in 2007, he did his PhD on federated search at the Royal Melbourne Institute of Technology (RMIT) University under the supervision of Justin Zobel. His research interests are federated search, query alteration, user studies and web search evaluation. Together with Luo Si, he has written a comprehensive survey entitled “Federated Search” has appeared in Foundations and Trends in Information Retrieval.