FORK: First-Order Relational Knowledge Distillation for Machine Learning Interatomic Potentials

Department of Materials Science and Engineering
Seoul National University
^† Indicates Corresponding Author

Abstract

State-of-the-art equivariant Graph Neural Networks (GNNs) have significantly advanced molecular simulation by approaching quantum mechanical accuracy in predicting energies and forces. However, their substantial computational cost limits adoption in large-scale molecular dynamics simulations. Knowledge distillation (KD) offers a promising solution, but existing methods for Machine Learning Force Fields (MLFFs) often resort to simplistic atom-wise feature matching or complex second-order information distillation, overlooking fundamental first-order relational knowledge: how the teacher represents the potential energy surface (PES) through learned interatomic interactions. This paper introduces FORK, First-Order Relational Knowledge Distillation, a novel KD framework that directly distills interatomic relational knowledge by modeling each interaction as a relational vector derived from bonded atom embeddings. FORK employs contrastive learning to train students to generate relational vectors uniquely identifiable with teacher counterparts, effectively teaching the geometry of the teacher's learned PES. On the challenging OC20 benchmark, FORK enables a compact 22M-parameter student model to achieve superior energy and force prediction accuracy, significantly outperforming strong distillation baselines and demonstrating more effective transfer of physical knowledge.

Method

Relational Vectors: FORK models each interatomic interaction as a relational vector derived from bonded atom embeddings (e.g., z_src - z_dst). These vectors serve as proxies for the teacher's learned representation of potential along specific interactions.
Contrastive Objective: An InfoNCE-style loss trains the student to produce relational vectors that are discriminatively similar to the teacher's corresponding vectors, effectively teaching the student the geometry of these interactions.
Physics-Informed Distillation: Unlike conventional atom-wise feature matching, FORK directly distills the fundamental physics of interatomic potentials, focusing on how atoms interact rather than treating them as isolated entities.

Results

FORK achieves superior performance in energy and force prediction on the challenging OC20 benchmark. The method enables a compact 22M-parameter student model to significantly outperform strong distillation baselines, including conventional node-to-node feature matching and Hessian-based distillation. FORK demonstrates more effective transfer of physical knowledge by directly modeling the geometry of interatomic interactions learned by the teacher model.

@inproceedings{ lim2025fork, title={FORK: First-Order Relational Knowledge Distillation for Machine Learning Interatomic Potentials}, author={Hyukjun Lim and Seokhyun Choung and Jeong Woo Han}, booktitle={arXiv}, year={2025}, }

FORK: First-Order Relational Knowledge Distillation for Machine Learning Interatomic Potentials

Abstract

Method

Architecture of FORK.
FORK employs contrastive learning to distill interatomic relational knowledge from teacher to student models, focusing on the geometry of learned potential energy surfaces through relational vectors derived from bonded atom embeddings.

Results

Performance comparison of FORK and baselines on the **O* subset of OC20**.
FORK with n2n achieves the best energy MAE (232.0 meV) and force MAE (5.8 meV/Å), demonstrating superior knowledge transfer.

Performance comparison of FORK and baselines on the 200K subset of OC20.
FORK consistently outperforms strong baselines, validating the effectiveness of relational knowledge distillation.

Ablation Study: Instance-level vs Relational-level Contrastive Loss
The relational approach consistently outperforms instance-level contrastive loss, validating our hypothesis that distilling interactions is more effective than distilling isolated atom features.

Ablation Study: Impact of Temperature τ
The temperature parameter controls the difficulty of the contrastive task. An optimal τ=0.15 balances discrimination and stability for effective knowledge transfer.

BibTeX

FORK: First-Order Relational Knowledge Distillation for Machine Learning Interatomic Potentials

Abstract

Method

Architecture of FORK. FORK employs contrastive learning to distill interatomic relational knowledge from teacher to student models, focusing on the geometry of learned potential energy surfaces through relational vectors derived from bonded atom embeddings.

Results

Performance comparison of FORK and baselines on the O* subset of OC20. FORK with n2n achieves the best energy MAE (232.0 meV) and force MAE (5.8 meV/Å), demonstrating superior knowledge transfer.

Performance comparison of FORK and baselines on the 200K subset of OC20. FORK consistently outperforms strong baselines, validating the effectiveness of relational knowledge distillation.

Ablation Study: Instance-level vs Relational-level Contrastive Loss The relational approach consistently outperforms instance-level contrastive loss, validating our hypothesis that distilling interactions is more effective than distilling isolated atom features.

Ablation Study: Impact of Temperature τ The temperature parameter controls the difficulty of the contrastive task. An optimal τ=0.15 balances discrimination and stability for effective knowledge transfer.

BibTeX

Architecture of FORK.
FORK employs contrastive learning to distill interatomic relational knowledge from teacher to student models, focusing on the geometry of learned potential energy surfaces through relational vectors derived from bonded atom embeddings.

Performance comparison of FORK and baselines on the **O* subset of OC20**.
FORK with n2n achieves the best energy MAE (232.0 meV) and force MAE (5.8 meV/Å), demonstrating superior knowledge transfer.

Performance comparison of FORK and baselines on the 200K subset of OC20.
FORK consistently outperforms strong baselines, validating the effectiveness of relational knowledge distillation.

Ablation Study: Instance-level vs Relational-level Contrastive Loss
The relational approach consistently outperforms instance-level contrastive loss, validating our hypothesis that distilling interactions is more effective than distilling isolated atom features.

Ablation Study: Impact of Temperature τ
The temperature parameter controls the difficulty of the contrastive task. An optimal τ=0.15 balances discrimination and stability for effective knowledge transfer.