Original paper

Self-accelerated Thompson sampling with near-optimal regret upper bound

Volume: 399, Pages: 37 - 47
Published: Jul 1, 2020
Abstract
Thompson sampling utilizes Bayesian heuristic strategy to balance the exploration-exploitation trade-off. It has been applied in a variety of practical domains and achieved great success. Despite being empirically efficient and powerful, Thompson sampling has eluded theoretical analysis. Existing analyses of Thompson sampling only provide regret upper bound of O˜(d3/2T) for linear contextual bandits, which is worse than the information-theoretic...
Paper Details
Title
Self-accelerated Thompson sampling with near-optimal regret upper bound
Published Date
Jul 1, 2020
Volume
399
Pages
37 - 47
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.