Abstract
Astronomical data is full of holes. While there are many reasons for this
missing data, the data can be randomly missing, caused by things like data
corruptions or unfavourable observing conditions. We test some simple data
imputation methods(Mean, Median, Minimum, Maximum and k-Nearest Neighbours
(kNN)), as well as two more complex methods (Multivariate Imputation by using
Chained Equation (MICE) and Generative Adversarial Imputation Network (GAIN))
against data where increasing amounts are randomly set to missing. We then use
the imputed datasets to estimate the redshift of the galaxies, using the kNN
and Random Forest ML techniques. We find that the MICE algorithm provides the
lowest Root Mean Square Error and consequently the lowest prediction error,
with the GAIN algorithm the next best.
Users
Please
log in to take part in the discussion (add own reviews or comments).