Article,

A Machine Learning Based Framework for Verification and Validation of Massive Scale Image Data

, , and .
IEEE Transactions on Big Data, (2017)

Abstract

Big data validation and system verification are crucial for ensuring the quality of big data applications. However, a rigorous technique for such tasks is yet to emerge. During the past decade, we have developed a big data system called CMA for investigating the classification of biological cells based on cell morphology that is captured in diffraction images. CMA includes a group of scientific software tools, machine learning algorithms, and a large scale cell image repository. We have also developed a framework for rigorous validation of the massive scale image data and verification of both the software systems and machine learning algorithms. Different machine learning algorithms integrated with image processing techniques were used to automate the selection and validation of the massive scale image data in CMA. An experiment based technique guided by a feature selection algorithm was introduced in the framework to select optimal machine learning features. An iterative metamorphic testing approach is applied for testing the scientific software. Due to the non-testable characteristic of the scientific software, a machine learning approach is introduced for developing test oracles iteratively to ensure the adequacy of the test coverage criteria. Performance of the machine learning algorithms is evaluated with the stratified N-fold cross validation and confusion matrix. We describe the design of the proposed framework with CMA as the case study. The effectiveness of the framework is demonstrated through verifying and validating the data set, software systems and algorithms in CMA.

Tags

Users

  • @vngudivada

Comments and Reviews