Аннотация
Large-scale genomic data offers the perspective to decipher the genetic
architecture of natural selection. To characterize natural selection, various
analytical methods for detecting candidate genomic regions have been developed.
We propose to perform genome-wide scans of natural selection using principal
component analysis. We show that the common Fst index of genetic
differentiation between populations can be viewed as a proportion of variance
explained by the principal components. Looking at the correlations between
genetic variants and each principal component provides a conceptual framework
to detect genetic variants involved in local adaptation without any prior
definition of populations. To validate the PCA-based approach, we consider the
1000 Genomes data (phase 1) after removal of recently admixed individuals
resulting in 850 individuals coming from Africa, Asia, and Europe. The number
of genetic variants is of the order of 36 millions obtained with a low-coverage
sequencing depth (3X). The correlations between genetic variation and each
principal component provide well-known targets for positive selection (EDAR,
SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN,
KCNMA, MYO5C) and non-coding RNAs. In addition to identifying genes involved in
biological adaptation, we identify two biological pathways involved in
polygenic adaptation that are related to the innate immune system (beta
defensins) and to lipid metabolism (fatty acid omega oxidation). PCA-based
statistics retrieve well-known signals of human adaptation, which is
encouraging for future whole-genome sequencing project, especially in non-model
species for which defining populations can be difficult. Genome scan based on
PCA is implemented in the open-source and freely available PCAdapt software.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)