GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies.
Lin L., Spreng RL., Seaton KE., Dennison SM., Dahora LC., Schuster DJ., Sawant S., Gilbert PB., Fong Y., Kisalu N., Pollard AJ., Tomaras GD., Li J.
Despite significant progress in vaccine research, the level of protection provided by vaccination can vary significantly across individuals. As a result, understanding immunologic variation across individuals in response to vaccination is important for developing next-generation efficacious vaccines. Accurate outcome prediction and identification of predictive biomarkers would represent a significant step towards this goal. Moreover, in early phase vaccine clinical trials, small datasets are prevalent, raising the need and challenge of building a robust and explainable prediction model that can reveal heterogeneity in small datasets. We propose a new model named Generative Mixture of Logistic Regression (GeM-LR), which combines characteristics of both a generative and a discriminative model. In addition, we propose a set of model selection strategies to enhance the robustness and interpretability of the model. GeM-LR extends a linear classifier to a non-linear classifier without losing interpretability and empowers the notion of predictive clustering for characterizing data heterogeneity in connection with the outcome variable. We demonstrate the strengths and utility of GeM-LR by applying it to data from several studies. GeM-LR achieves better prediction results than other popular methods while providing interpretations at different levels.