@yourwelcome

Framework for making better predictions by directly estimating variables’ predictivity

, , , and . Proceedings of the National Academy of Sciences, 113 (50): 14277--14282 (December 2016)
DOI: 10.1073/pnas.1616647113

Abstract

We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the II\textlessmml:math\textgreater\textlessmml:mi\textgreaterI\textless/mml:mi\textgreater\textless/mml:math\textgreater-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the II\textlessmml:math\textgreater\textlessmml:mi\textgreaterI\textless/mml:mi\textgreater\textless/mml:math\textgreater-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the II\textlessmml:math\textgreater\textlessmml:mi\textgreaterI\textless/mml:mi\textgreater\textless/mml:math\textgreater-score on real data to demonstrate the statistic’s predictive performance on sample data. We conjecture that using the partition retention and II\textlessmml:math\textgreater\textlessmml:mi\textgreaterI\textless/mml:mi\textgreater\textless/mml:math\textgreater-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.

Links and resources

Tags