Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) usually consist of two lines of treatment. We present an adaptive reinforcement learning approach to discover personalized dynamic treatment regimes from a specially designed randomized clinical trial for patients with advanced NSCLC who have not been treated previously with systemic therapy. The goal is to be able to decide for each patient which of several treatments is best for each line of therapy. In addition, we wish to determine the optimal time to initiate the second line of therapy relative to the completion of the first line. A reinforcement learning method called Q-learning is utilized in combination with a version of support vector regression which can be applied to right-censored time-to-event data. Simulation studies show that the procedure can successfully identify optimal treatment strategies for both lines of treatment, including the optimal timing, while taking into account the heterogeneity of NSCLC across patients.
Group for Research in Decision Analysis