Recommender systems, online marketing, medical diagnosis, network security, etc. require on-going learning and decision-making in real time. These problems often involve a very large set of complex actions that are composed of simple actions, and have multiple and possibly conflicting objectives, which make efficient online learning a very challenging task. In this talk, I will present recent pieces of work addressing these challenges.
The first is about optimal learning in a combinatorial bandit problem, where the selected complex action probabilistically triggers a set of simple actions/arms. For this problem, two important cases will be considered: (i) the case when the arm triggering probabilities are strictly positive, and (ii) the case when the arm triggering probabilities depend on the side information that is provided to the learner at the beginning of each epoch. These cases enjoy different regret bounds and have interesting real-world applications such as the online influence maximization problem. The second is about a multi-objective contextual bandit problem where one objective dominates the other objective, for which I will highlight the challenges and techniques for designing an optimal learning algorithm.
Bienvenue à tous!