Article,

Improving prediction models with new markers: a comparison of updating strategies

D. Nieboer, Y. Vergouwe, D. Ankerst, M. Roobol, and E. Steyerberg.
BMC Medical Research Methodology, 16 (1): 128 (December 2016)Models predictius.
DOI: 10.1186/s12874-016-0231-2

Abstract

BACKGROUND New markers hold the promise of improving risk prediction for individual patients. We aimed to compare the performance of different strategies to extend a previously developed prediction model with a new marker. METHODS Our motivating example was the extension of a risk calculator for prostate cancer with a new marker that was available in a relatively small dataset. Performance of the strategies was also investigated in simulations. Development, marker and test sets with different sample sizes originating from the same underlying population were generated. A prediction model was fitted using logistic regression in the development set, extended using the marker set and validated in the test set. Extension strategies considered were re-estimating individual regression coefficients, updating of predictions using conditional likelihood ratios (LR) and imputation of marker values in the development set and subsequently fitting a model in the combined development and marker sets. Sample sizes considered for the development and marker set were 500 and 100, 500 and 500, and 100 and 500 patients. Discriminative ability of the extended models was quantified using the concordance statistic (c-statistic) and calibration was quantified using the calibration slope. RESULTS All strategies led to extended models with increased discrimination (c-statistic increase from 0.75 to 0.80 in test sets). Strategies estimating a large number of parameters (re-estimation of all coefficients and updating using conditional LR) led to overfitting (calibration slope below 1). Parsimonious methods, limiting the number of coefficients to be re-estimated, or applying shrinkage after model revision, limited the amount of overfitting. Combining the development and marker set using imputation of missing marker values approach led to consistently good performing models in all scenarios. Similar results were observed in the motivating example. CONCLUSION When the sample with the new marker information is small, parsimonious methods are required to prevent overfitting of a new prediction model. Combining all data with imputation of missing marker values is an attractive option, even if a relatively large marker data set is available.

BibTeX key: Nieboer2016
entry type: article
year: 2016
month: 12
journal: BMC Medical Research Methodology
number: 1
pages: 128
volume: 16
pmid: 27678479
issn: 1471-2288
DOI: 10.1186/s12874-016-0231-2
url: http://www.ncbi.nlm.nih.gov/pubmed/27678479 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5039804 http://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-016-0231-2
note: Models predictius

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 Nieboer2016 %A Nieboer, D. %A Vergouwe, Y. %A Ankerst, Danna P. %A Roobol, Monique J. %A Steyerberg, Ewout W. %D 2016 %J BMC Medical Research Methodology %K Logisticregression Modelupdating Predictionmodel Prostatecancer %N 1 %P 128 %R 10.1186/s12874-016-0231-2 %T Improving prediction models with new markers: a comparison of updating strategies %U http://www.ncbi.nlm.nih.gov/pubmed/27678479 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5039804 http://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-016-0231-2 %V 16 %X BACKGROUND New markers hold the promise of improving risk prediction for individual patients. We aimed to compare the performance of different strategies to extend a previously developed prediction model with a new marker. METHODS Our motivating example was the extension of a risk calculator for prostate cancer with a new marker that was available in a relatively small dataset. Performance of the strategies was also investigated in simulations. Development, marker and test sets with different sample sizes originating from the same underlying population were generated. A prediction model was fitted using logistic regression in the development set, extended using the marker set and validated in the test set. Extension strategies considered were re-estimating individual regression coefficients, updating of predictions using conditional likelihood ratios (LR) and imputation of marker values in the development set and subsequently fitting a model in the combined development and marker sets. Sample sizes considered for the development and marker set were 500 and 100, 500 and 500, and 100 and 500 patients. Discriminative ability of the extended models was quantified using the concordance statistic (c-statistic) and calibration was quantified using the calibration slope. RESULTS All strategies led to extended models with increased discrimination (c-statistic increase from 0.75 to 0.80 in test sets). Strategies estimating a large number of parameters (re-estimation of all coefficients and updating using conditional LR) led to overfitting (calibration slope below 1). Parsimonious methods, limiting the number of coefficients to be re-estimated, or applying shrinkage after model revision, limited the amount of overfitting. Combining the development and marker set using imputation of missing marker values approach led to consistently good performing models in all scenarios. Similar results were observed in the motivating example. CONCLUSION When the sample with the new marker information is small, parsimonious methods are required to prevent overfitting of a new prediction model. Combining all data with imputation of missing marker values is an attractive option, even if a relatively large marker data set is available.

@article{Nieboer2016, abstract = {BACKGROUND New markers hold the promise of improving risk prediction for individual patients. We aimed to compare the performance of different strategies to extend a previously developed prediction model with a new marker. METHODS Our motivating example was the extension of a risk calculator for prostate cancer with a new marker that was available in a relatively small dataset. Performance of the strategies was also investigated in simulations. Development, marker and test sets with different sample sizes originating from the same underlying population were generated. A prediction model was fitted using logistic regression in the development set, extended using the marker set and validated in the test set. Extension strategies considered were re-estimating individual regression coefficients, updating of predictions using conditional likelihood ratios (LR) and imputation of marker values in the development set and subsequently fitting a model in the combined development and marker sets. Sample sizes considered for the development and marker set were 500 and 100, 500 and 500, and 100 and 500 patients. Discriminative ability of the extended models was quantified using the concordance statistic (c-statistic) and calibration was quantified using the calibration slope. RESULTS All strategies led to extended models with increased discrimination (c-statistic increase from 0.75 to 0.80 in test sets). Strategies estimating a large number of parameters (re-estimation of all coefficients and updating using conditional LR) led to overfitting (calibration slope below 1). Parsimonious methods, limiting the number of coefficients to be re-estimated, or applying shrinkage after model revision, limited the amount of overfitting. Combining the development and marker set using imputation of missing marker values approach led to consistently good performing models in all scenarios. Similar results were observed in the motivating example. CONCLUSION When the sample with the new marker information is small, parsimonious methods are required to prevent overfitting of a new prediction model. Combining all data with imputation of missing marker values is an attractive option, even if a relatively large marker data set is available.}, added-at = {2023-02-03T11:44:35.000+0100}, author = {Nieboer, D. and Vergouwe, Y. and Ankerst, Danna P. and Roobol, Monique J. and Steyerberg, Ewout W.}, biburl = {https://www.bibsonomy.org/bibtex/2168a8bcacd091add4f6746cb86b5ad1e/jepcastel}, doi = {10.1186/s12874-016-0231-2}, interhash = {eb3f7017ca85353e5aec980745c19e8a}, intrahash = {168a8bcacd091add4f6746cb86b5ad1e}, issn = {1471-2288}, journal = {BMC Medical Research Methodology}, keywords = {Logisticregression Modelupdating Predictionmodel Prostatecancer}, month = {12}, note = {Models predictius}, number = 1, pages = 128, pmid = {27678479}, timestamp = {2023-02-03T11:44:35.000+0100}, title = {Improving prediction models with new markers: a comparison of updating strategies}, url = {http://www.ncbi.nlm.nih.gov/pubmed/27678479 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5039804 http://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-016-0231-2}, volume = 16, year = 2016 }

BibSonomy

Improving prediction models with new markers: a comparison of updating strategies

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on