Zusammenfassung
Due to the complexity of host�parasite relationships, discrimination
between fish populations using parasites as biological tags is
difficult. This study introduces, to our knowledge for the first time,
random forests (RF) as a new modelling technique in the application
of parasite community data as biological markers for population assignment
of fish. This novel approach is applied to a dataset
with a complex structure comprising 763 parasite infracommunities
in population samples of Atlantic cod, Gadus morhua, from the
spawning/feeding areas in five regions in the North East Atlantic
(Baltic, Celtic, Irish and North seas and Icelandic waters). The
learning
behaviour of RF is evaluated in comparison with two other algorithms
applied to class assignment problems, the linear discriminant
function analysis (LDA) and artificial neural networks (ANN). The
three algorithms are used to develop predictive models
applying three cross-validation procedures in a series of experiments
(252 models in total). The comparative approach to RF,
LDA and ANN algorithms applied to the same datasets demonstrates the
competitive potential of RF for developing predictive models
since RF exhibited better accuracy of prediction and outperformed
LDA and ANN in the assignment of fish to their regions of
sampling using parasite community data. The comparative analyses and
the validation experiment with a �blind� sample confirmed
that RF models performed more effectively with a large and diverse
training set and a large number of variables. The discrimination
results obtained for a migratory fish species with largely overlapping
parasite communities reflects the high potential of RF for developing
predictive models using data that are both complex and noisy, and
indicates that it is a promising tool for parasite tag studies.
Our results suggest that parasite community data can be used successfully
to discriminate individual cod from the five different regions
of the North East Atlantic studied using RF.
Nutzer