When: March 25, 2015, 12:30 PM
Contact: 608-316-4401, firstname.lastname@example.org
Large-scale Problems on a Tight Budget
In this talk, I will discuss how we scale up a few classic problems, by understanding the fundamental bottlenecks arising as the problem size increases. These bottlenecks influence our access model and platform choices and inspire new algorithms and systems features. For all problems we provide some sort of guarantee that is much better than worst-case. For the first part of this talk, I will present our work on streaming, memory-limited Principal Component Analysis (PCA). Therein, we give the first finite-sample, global-convergence guarantees for an algorithm that solves PCA in the single-pass streaming setting. The algorithm is shown to exhibit optimal sample complexity, within logarithmic factors, under the spiked covariance model. It uses optimal memory, when the true components are dense. Then, motivated by storage and network capacity limitations, I will propose a method for fast PageRank approximations on graph engines. Our system uses random walks as a quantized alternative to the power iteration, eliminates the need of global jumps for PageRank, and introduces randomized synchronization. This last contribution is a novel extension to the GraphLab engine that could be useful to other workloads as well, as it reduces network usage. We achieve a very good approximation and a serious speedup (7-10x) over the vanilla system. Our theory bounds the resulting approximation error. Finally, I will discuss a new method for finding dense subgraphs in very large graphs; a significant improvement over the previous state of the art. The problem of the densest k-subgraph is NP-hard and hard to approximate. Our algorithm furnishes a k-subgraph, along with a graph-dependent guarantee that tells us that, in most real graphs we test, we achieve at least 70% of the optimal density. It scales almost linearly in the size of the graph and parallelizes well. Our MapReduce implementation works on billion-edge graphs.
SILO is a lecture series with speakers from the UW faculty, graduate students or invited researchers that discuss mathematical related topics. The seminars are organized by WID’s Optimization research group.
SILO’s purpose is to provide a forum that helps connect and recruit mathematically-minded graduate students. SILO is a lunch-and-listen format, where speakers present interesting math topics while the audience eats lunch.
Speaker: Ioannis Mitlliagkas, University of Texas–Austin