When: April 19, 2017, 12:30 PM
Location: 3rd Floor Orchard View Room , Discovery Building
Contact: 608-316-4401, email@example.com
Heavy hitters via cluster-preserving clustering
In the “heavy hitters” or “frequent items” problem, one must process a stream of items and report those items that occur frequently. For example, a telecommunications company may wish to find popular destination IP addresses in a packet stream across one of their links, or a search engine may wish to report popular query words. A more general problem is when there are two streams and we must report those items whose frequencies significantly deviate between them; the former problem is a special case since we can artificially pretend the first of the two streams the empty stream. Such a problem naturally arises in trend detection and anomaly detection. Several algorithms were known in the literature solving these problems, such as the CountMin sketch, CountSketch, Hierarchical CountSketch and others. The goal in designing such algorithms is (1) to guarantee finding frequent items for as lax a definition of “frequent” as possible while still limiting output size, and while having low (2) memory consumption, (3) processing time per stream item, (4) query time to report the list of frequent times, and (5) failure probability. Previous solutions could perform well on various subsets of these metrics, but not on all 5 simultaneously.
We design a new algorithm, ExpanderSketch, which performs well on all 5. Our main innovation is a novel reduction to a new graph-clustering problem we formulate, in which finding most of every cluster guarantees finding all the frequent items. We then solve this problem by devising a novel spectral clustering algorithm potentially of independent interest, based on divide-and-conquer and local search.
Jelani Nelson is Assistant Professor of Computer Science at Harvard University. His main research interest is in algorithm design and analysis, with recent focus on streaming algorithms, dimensionality reduction, compressed sensing, and randomized linear algebra algorithms. He completed his Ph.D. in computer science at MIT in 2011, receiving the George M. Sprowls Award for best computer science doctoral dissertations at MIT. He is the recipient of an NSF CAREER Award, ONR Young Investigator Award, ONR Director of Research Early Career Award, Alfred P. Sloan Research Fellowship, and Presidential Early Career Award for Scientists and Engineers (PECASE).
SILO is a lecture series with speakers from the UW faculty, graduate students or invited researchers that discuss mathematical related topics. The seminars are organized by WID’s Optimization research group and sponsored by generous support of the Advance Technology Group of the 3M Company and the Analytics Group of Northwestern Mutual.
SILO’s purpose is to provide a forum that helps connect and recruit mathematically-minded graduate students. SILO is a lunch-and-listen format, where speakers present interesting math topics while the audience eats lunch.