When: February 1, 2017, 12:30 PM
Location: 3rd Floor Orchard View Room , Discovery Building
Contact: 608-316-4401, firstname.lastname@example.org
Towards a better understanding of best arm identification in bounded multi-armed bandits
We present ongoing work regarding best arm identification in multi-armed bandit problems, when the reward distributions are bounded. Although this is a standard assumption in this context, state of the art methods (such as the lil-UCB algorithm) use sub-Gaussian concentration bounds for the mean rewards.
However, one can take advantage of the boundedness of the rewards to derive stronger concentration inequalities for the empirical means that involve the Kullback-Leibler divergence of certain Bernoulli distributions. Although such methods exist in the literature, they exhibit sub-optimal performance in other parameters of the problem (such as the number of arms). Our goal is to combine ideas of the lil-UCB algorithm and the KL concentration bounds to obtain methods with improved performance.
This is joint work with Robert Nowak and Kevin Jamieson.
SILO is a lecture series with speakers from the UW faculty, graduate students or invited researchers that discuss mathematical related topics. The seminars are organized by WID’s Optimization research group and sponsored by generous support of the Advance Technology Group of the 3M Company and the Analytics Group of Northwestern Mutual.
SILO’s purpose is to provide a forum that helps connect and recruit mathematically-minded graduate students. SILO is a lunch-and-listen format, where speakers present interesting math topics while the audience eats lunch.