Article,

Estimating the Number of Subpopulations (K) in Structured Populations

, and .
Genetics, 203 (4): 1827-1839 (August 2016)
DOI: 10.1534/genetics.115.180992

Abstract

A key quantity in the analysis of structured populations is the parameter K, which describes the number of subpopulations that make up the total population. Inference of K ideally proceeds via the model evidence, which is equivalent to the likelihood of the model. However, the evidence in favor of a particular value of K cannot usually be computed exactly, and instead programs such as Structure make use of heuristic estimators to approximate this quantity. We show-using simulated data sets small enough that the true evidence can be computed exactly-that these heuristics often fail to estimate the true evidence and that this can lead to incorrect conclusions about K Our proposed solution is to use thermodynamic integration (TI) to estimate the model evidence. After outlining the TI methodology we demonstrate the effectiveness of this approach, using a range of simulated data sets. We find that TI can be used to obtain estimates of the model evidence that are more accurate and precise than those based on heuristics. Furthermore, estimates of K based on these values are found to be more reliable than those based on a suite of model comparison statistics. Finally, we test our solution in a reanalysis of a white-footed mouse data set. The TI methodology is implemented for models both with and without admixture in the software MavericK1.0.

Tags

Users

  • @peter.ralph

Comments and Reviews