Zusammenfassung
We present a proof of concept that machine learning techniques can be used to
predict the properties of CNOHF energetic molecules from their molecular
structures. We focus on a small but diverse dataset consisting of 109 molecular
structures spread across ten compound classes. Up until now, candidate
molecules for energetic materials have been screened using predictions from
expensive quantum simulations and thermochemical codes. We present a
comprehensive comparison of machine learning models and several molecular
featurization methods - sum over bonds, custom descriptors, Coulomb matrices,
bag of bonds, and fingerprints. The best featurization was sum over bonds (bond
counting), and the best model was kernel ridge regression. Despite having a
small data set, we obtain acceptable errors and Pearson correlations for the
prediction of detonation pressure, detonation velocity, explosive energy, heat
of formation, density, and other properties out of sample. By including another
dataset with 309 additional molecules in our training we show how the error can
be pushed lower, although the convergence with number of molecules is slow. Our
work paves the way for future applications of machine learning in this domain,
including automated lead generation and interpreting machine learning models to
obtain novel chemical insights.
Nutzer