Zusammenfassung
Random forests are a combination of tree predictors
such that each tree depends on the values of a
random vector sampled independently and with the
same distribution for all trees in the forest.
The
generalization error for forests converges a.s. to a
limit as the number of trees in the forest becomes
large. The generalization error of a forest of tree
classifiers depends on the strength of the individual
trees in the forest and the correlation between them.
Using a random selection of features to split each
node yields error rates that compare favorably to
Adaboost (Freund and Schapire1996), but are more
robust with respect to noise. Internal estimates
monitor error, strength, and correlation and these are
used to show the response to increasing the number
of features used in the splitting. Internal estimates
are also used to measure variable importance. These
ideas are also applicable to regression.
Nutzer