Logistic Regression Reloaded: Online Algorithms

Logistic regression is a highly effective technique for supervised learning of classifiers.  It is widely used in information retrieval (particularly for text categorization), data mining, computational advertising, and a range of other fields.   In recent years, online algorithms have become the dominant approach to logistic regression for large scale problems.  Advantages of online algorithms include simple implementation (most can be written down on a single Powerpoint slide), anytime behavior (a classifier is available at any point during training), small memory footprint, and high effectiveness in time-constrained massive data settings.

This tutorial discusses when logistic regression is a good approach for a learning problem, how to choose between batch and online approaches, the basics of implementing online logistic regression, tips for practical applications (particularly to textual data), experimental results, available software, and a brief look at recent theoretical analyses.


Dave Lewis is a Chicago-based consulting computer scientist working in the areas of information retrieval, text mining, machine learning, e-discovery, and applied statistics.  He has published more than 75 papers and eight patents, lectures widely, and is a Fellow of the American Association for the Advancement of Science.

Dr. Lewis has applied logistic regression in a variety of consulting engagements, as well as publishing several research papers on logistic regression algorithms and applications. He is co-designer and project manager for the widely used open source C++ Bayesian logistic regression packages BBR, BMR, and BXR (http://www.bayesianregression.org), and the new Java online logistic regression package, BOXER.