In an online decision-making system, historical data is used to determine the current decision. But at the same time, the results of online decisions are also collected and fed back into the system for future utilization. As a result, the design of efficient online decision-making algorithms should not only optimize according to past information but also aim to generate useful data. The key challenge lies in balancing between exploiting what is known to maximize the immediate outcome and investing to explore new information that may improve future performance. Thompson sampling is a systematic method that balances the exploration-exploitation tradeoff. It has shown strong empirical performance in certain domains and also achieved provable optimal performance in some decision-making problems. In this talk, I will describe how Thompson sampling is used in various applications, and I will also discuss its limitations and potential ways for improvements.
Bienvenue à tous!