Abstract
The proposal of Reshef et al. (2011) is an interesting new approach for
discovering non-linear dependencies among pairs of measurements in exploratory
data mining. However, it has a potentially serious drawback. The authors laud
the fact that MIC has no preference for some alternatives over others, but as
the authors know, there is no free lunch in Statistics: tests which strive to
have high power against all alternatives can have low power in many important
situations. To investigate this, we ran simulations to compare the power of MIC
to that of standard Pearson correlation and distance correlation (dcor). We
simulated pairs of variables with different relationships (most of which were
considered by the Reshef et. al.), but with varying levels of noise added. To
determine proper cutoffs for testing the independence hypothesis, we simulated
independent data with the appropriate marginals. As one can see from the
Figure, MIC has lower power than dcor, in every case except the somewhat
pathological high-frequency sine wave. MIC is sometimes less powerful than
Pearson correlation as well, the linear case being particularly worrisome.
Users
Please
log in to take part in the discussion (add own reviews or comments).