Self-accelerated Thompson sampling with near-optimal regret upper bound

Thompson sampling utilizes Bayesian heuristic strategy to balance the exploration-exploitation trade-off. It has been applied in a variety of practical domains and achieved great success. Despite being empirically efficient and powerful, Thompson sampling has eluded theoretical analysis. Existing analyses of Thompson sampling only provide regret upper bound of O˜(d3/2T) for linear contextual bandits, which is worse than the information-theoretic...

Paper Fields

Paper Details

Title

DOI

doi.org/10.1016/j.neucom.2020.01.086

Published Date

Jul 1, 2020

Journal

Neurocomputing

Volume

399

Pages

37 - 47

Citation AnalysisPro

You’ll need to upgrade your plan to Pro

Looking to understand the true influence of a researcher’s work across journals & affiliations?

Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.

More information

Notes

History