Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
#R2
R2 is a measure on how much better (R2 >0) a model fits the data than simply using the mean of the output variable.
##Properties/Variants
This article states, that adding new input variables will always increase the R2 value. While this is true, when fitting a model and calculating R2 on all available data. However, as soon as we have a training/test split, this does not (should not) hold, i.e., consider adding a very strong outlier in the test data.
##Note on adjusted R2
"Adjusted R2 does not have the same interpretation as R2—while R2 is a measure of fit, adjusted R2 is instead a comparative measure of suitability of alternative nested sets of explanators.[citation needed] As such, care must be taken in interpreting and reporting this statistic. Adjusted R2 is particularly useful in the feature selection stage of model building."
This means, that the **adjusted R2** is basically for fitting models (not necessarily prediction) where the model is to be kept as simple as possible, i.e., it implements an Occams Razor. This is similar to the general behavior of Bayesian model comparison, where a better fit of the observations by adding additional variables is weighted against the complexity of the model (see MixedTrails).
C. Kellermann, and J. Ostermann. Procedia CIRP, 99 (14):
656-661(July 2021)14th CIRP Conference on Intelligent Computation in Manufacturing Engineering, 15-17 July 2020.
J. Voges. Fortschritt-Berichte VDI, (2022)https://doi.org/10.15488/12422 https://doi.org/10.51202/9783186878106-I https://www.vdi-nachrichten.com/shop/compression-of-dna-sequencing-data/ https://elibrary.vdi-verlag.de/10.51202/9783186878106-I/ ISBN print: 978-3-18-387810-9 ISBN online: 978-3-18-687810-6.