Beyond Search: Statistical Topic Models for Text Analysis

Session Name:

Keynote speech [July 26, 2011 Tuesday 08:45-10:00]

ChengXiang Zhai

Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, IL, USA

PDF version

Abstract:

Search is generally a means to the end of finishing a task. While the current search engines are useful to users for finding relevant information, they offer little help to users for further digesting and analyzing the overwhelming found information needed for finishing a complex task. In this talk, I will discuss how statistical topic models can be used to help users analyze and digest the found relevant information and turn search results into actionable knowledge needed to complete a task. I will present several general statistical topic models for extracting and analyzing topics and their patterns in text, and show sample applications of such models in tasks such as opinion integration, comparative summarization, contextual topic trend analysis, and event impact analysis. The talk will conclude with a discussion of novel challenges raised in extending a search engine to an analysis engine that can go beyond search to provide more complete support for users to finish their tasks.

About the speaker

ChengXiang Zhai is an Associate Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he also holds a joint appointment at the Graduate School of Library and Information Science, Institute for Genomic Biology, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, and bioinformatics. He is an Associate Editor of ACM Transactions on Information Systems, and Information Processing and Management, and serves on the editorial board of Information Retrieval Journal. He is a program co-chair of ACM CIKM 2004, NAACL HLT 2007, and ACM SIGIR 2009. He is an ACM Distinguished Scientist and the recipient of an Alfred P. Sloan Research Fellowship, IBM Faculty Award, ACM SIGIR 2004 Best Paper Award, and a Presidential Early Career Award for Scientists and Engineers (PECASE).