Groupe d’études et de recherche en analyse des décisions


Performance of n-Grams for a Question Retrieval System in the Context of Approximated Spelling

, , et

Question retrieval systems, unlike question answering systems, exploit the knowledge contained in previously answered questions to answer new ones by returning already answered question that may respond to the user's information needs. In our experiment, we work on improving a French language question retrieval addressing mostly young people's questions with approximate spelling. To assess the spelling problem, character n-gram features have been proposed in the literature as an alternative to the classical word based features. In the present study, we compare the performances of question retrieval models using character n-grams (for n = 3, 4 and 5) to ones obtained using the classical baseline word based features. Furthermore, we test the "simplified French" procedure which attempt to improve the performance of the model by simplifying the French writing. Our results show that if n-grams do not perform as well as we could expect but perform rather well in the case of 4-gram together with the simplified French procedure.

, 21 pages