FORK: First-Order Relational Knowledge Distillation for Machine Learning Interatomic Potentials

Department of Materials Science and Engineering
Seoul National University

Indicates Corresponding Author
Visualization of atom embeddings for the O* subset, comparing Teacher, n2n, and FORK models.

Visualization of Embeddings for the O* Subset.
The UMAP visualizations demonstrate that FORK (right) more accurately replicates the teacher's embedding space (left) for the O* subset, compared to a standard node-to-node (n2n) distillation (center). This highlights FORK's superior ability to capture the geometry of the learned potential energy surface from the teacher model.

Abstract

State-of-the-art equivariant Graph Neural Networks (GNNs) have significantly advanced molecular simulation by approaching quantum mechanical accuracy in predicting energies and forces. However, their substantial computational cost limits adoption in large-scale molecular dynamics simulations. Knowledge distillation (KD) offers a promising solution, but existing methods for Machine Learning Force Fields (MLFFs) often resort to simplistic atom-wise feature matching or complex second-order information distillation, overlooking fundamental first-order relational knowledge: how the teacher represents the potential energy surface (PES) through learned interatomic interactions. This paper introduces FORK, First-Order Relational Knowledge Distillation, a novel KD framework that directly distills interatomic relational knowledge by modeling each interaction as a relational vector derived from bonded atom embeddings. FORK employs contrastive learning to train students to generate relational vectors uniquely identifiable with teacher counterparts, effectively teaching the geometry of the teacher's learned PES. On the challenging OC20 benchmark, FORK enables a compact 22M-parameter student model to achieve superior energy and force prediction accuracy, significantly outperforming strong distillation baselines and demonstrating more effective transfer of physical knowledge.

Method

Relational Vectors: FORK models each interatomic interaction as a relational vector derived from bonded atom embeddings (e.g., zsrc - zdst). These vectors serve as proxies for the teacher's learned representation of potential along specific interactions.
Contrastive Objective: An InfoNCE-style loss trains the student to produce relational vectors that are discriminatively similar to the teacher's corresponding vectors, effectively teaching the student the geometry of these interactions.
Physics-Informed Distillation: Unlike conventional atom-wise feature matching, FORK directly distills the fundamental physics of interatomic potentials, focusing on how atoms interact rather than treating them as isolated entities.

Overview of FORK

Architecture of FORK.
FORK employs contrastive learning to distill interatomic relational knowledge from teacher to student models, focusing on the geometry of learned potential energy surfaces through relational vectors derived from bonded atom embeddings.

Results

FORK achieves superior performance in energy and force prediction on the challenging OC20 benchmark. The method enables a compact 22M-parameter student model to significantly outperform strong distillation baselines, including conventional node-to-node feature matching and Hessian-based distillation. FORK demonstrates more effective transfer of physical knowledge by directly modeling the geometry of interatomic interactions learned by the teacher model.

BibTeX

@inproceedings{
  lim2025fork,
  title={FORK: First-Order Relational Knowledge Distillation for Machine Learning Interatomic Potentials},
  author={Hyukjun Lim and Seokhyun Choung and Jeong Woo Han},
  booktitle={arXiv},
  year={2025},
}