Abstract
Many applications of machine learning require a model to make accurate
pre-dictions on test examples that are distributionally different from training
ones, while task-specific labels are scarce during training. An effective
approach to this challenge is to pre-train a model on related tasks where data
is abundant, and then fine-tune it on a downstream task of interest. While
pre-training has been effective in many language and vision domains, it remains
an open question how to effectively use pre-training on graph datasets. In this
paper, we develop a new strategy and self-supervised methods for pre-training
Graph Neural Networks (GNNs). The key to the success of our strategy is to
pre-train an expressive GNN at the level of individual nodes as well as entire
graphs so that the GNN can learn useful local and global representations
simultaneously. We systematically study pre-training on multiple graph
classification datasets. We find that naive strategies, which pre-train GNNs at
the level of either entire graphs or individual nodes, give limited improvement
and can even lead to negative transfer on many downstream tasks. In contrast,
our strategy avoids negative transfer and improves generalization significantly
across downstream tasks, leading up to 9.4% absolute improvements in ROC-AUC
over non-pre-trained models and achieving state-of-the-art performance for
molecular property prediction and protein function prediction.
Users
Please
log in to take part in the discussion (add own reviews or comments).