Groupe d’études et de recherche en analyse des décisions


Semi\(^+\)-supervised learning under sample selection bias


In time-to-event data analysis, the main object of interest is the time elapsed between the occurrence of two ordered events, say \(E_1, E_2\). Sampling from the incident population, i.e., subjects who have experienced the incidence of \(E_1\) before being sampled regardless of the occurrence of \(E_2\), is the gold standard in follow-up studies. Yet often in practice, it is more feasible to sample from the prevalent population, i.e., subjects who have already experienced \(E_1\), but not \(E_2\). It is well known that the prevalent sampling design induces sample selection bias. Moreover, time-to-event data are usually subject to censoring which causes partial loss of information on a fraction of the subjects. Here, we discuss the inefficiency of the conventional learning methods due to ignoring sample selection bias and show how this problem can be avoided by properly incorporating the selection bias into the analysis. Arguments are backed by simulation studies.

, 7 pages