The OpenLAM Challenges

How a Global Hunt for Crystal Structures is Powering the Next Scientific Revolution

AI4Science Materials Discovery Open Source

Introduction

In the world of artificial intelligence, a quiet revolution is underway—one that is shifting focus from the words we speak to the very atoms that make up our physical world. Just as large language models like GPT have transformed how we work with text, a new class of artificial intelligence known as Large Atom Models (LAMs) is emerging to reshape our understanding of the molecular universe.

"We wanted flying cars, instead we got 140 characters (Twitter)"5 , highlighting the disparity between technological progress in the digital versus physical realms.

At the forefront of this movement is the OpenLAM Initiative, an ambitious, community-driven project that aims to "Conquer the Periodic Table" by developing open-source foundation models capable of simulating and designing materials at the atomic level.

The significance of this endeavor extends far beyond academic curiosity. The development of new materials—whether for more efficient batteries, smarter pharmaceuticals, or advanced semiconductors—has traditionally been a slow, expensive process of trial and error. OpenLAM seeks to bridge this gap by creating AI infrastructure that can dramatically accelerate scientific discovery and materials design.

Accelerated Discovery

Dramatically reducing the time needed for materials development from years to months or weeks.

Collaborative Science

Harnessing global collective intelligence through open challenges and shared datasets.

The Rise of Large Atom Models: From Language to Matter

What are Large Atom Models?

Large Atom Models (LAMs) are sophisticated AI systems designed to understand and predict the behavior of atomic systems. Just as large language models learn the patterns and relationships between words, LAMs learn the fundamental physical principles that govern how atoms interact with each other.

These models approximate the universal potential energy surface—essentially, the mathematical description of how energy is distributed and transferred between atoms in different configurations4 .

Traditional vs. LAM Approach
Traditional Methods (DFT)

Computationally intensive, taking days or weeks for complex systems

LAM Approach

Similar calculations in a fraction of the time with remarkable accuracy8

The AI4Science Movement

The development of LAMs is part of the broader AI for Science (AI4Science) movement, which applies advanced machine learning techniques to long-standing scientific challenges.

"AI4Science has enormous potential and will comprehensively transform the process from scientific research to industrial application"8 — Professor E Weinan, Peking University

This transformation is already visible across multiple domains—from AI systems that can predict weather patterns with unprecedented accuracy to models that are helping astronomers analyze cosmic data thousands of times faster than previously possible8 .

Inside the OpenLAM Initiative: An Open-Source Crusade

Origins and Vision

The OpenLAM Initiative was formally launched by the Deep Potential team in early 2024, though its roots trace back to 2022 when the team began actively pretraining LAMs2 3 . The project's ambitious slogan—"Conquer the Periodic Table!"—reflects its comprehensive scope: to create an open-source ecosystem around large atomic models that can span the entire periodic table5 .

The initiative operates on a simple but powerful premise: that open collaboration will accelerate the development of more robust and capable atomic models. By sharing curated datasets, algorithms, and relevant workflows, the project aims to democratize access to cutting-edge AI tools for scientific discovery.

OpenLAM Roadmap
2024

Universal property learning capability

2025

Universal cross-modal capability

2026

Target-oriented atomic scale universal generation and planning capability5

Community-Driven Infrastructure

OpenLAM represents more than just a research project—it's a growing ecosystem with multiple components:

Model Development

Regular updates to architectures and training strategies7

Data Curation

"LAM-ready" datasets for pretraining and evaluation5

Competitions

Benchmarking atomic modeling methods2

Education

Events for developers and users5

The Crystal Philately Competition: A Case Study in Collaborative Science

The Competition Framework

At the heart of the OpenLAM Challenges is the LAM Crystal Philately competition, an innovative approach to building a comprehensive database of crystal structures. The competition's name evokes the practice of philately (stamp collecting), but instead of stamps, participants "collect" unique atomic configurations with arbitrary chemical compositions2 3 .

The competition mechanics are elegantly designed. Participants submit proposed crystal structures, which are then validated by a LAM based on energy and force criteria. The stability of these structures is assessed using the OpenLAM convex hull—a mathematical construct that identifies the most thermodynamically stable configurations from all structures within the database2 3 .

Competition Workflow
  1. Participants submit crystal structures
  2. LAM validates structures based on energy and force criteria
  3. Stability assessed using OpenLAM convex hull
  4. Stable structures added to the database

Impressive Results

The first round of the Crystal Philately competition has yielded extraordinary results, collecting over 19.8 million valid structures, including approximately 350,000 on the OpenLAM convex hull2 3 .

Metric Initial Results Mid-2024 Update
Total Valid Structures 19.8 million2 3 13+ million7
Structures on Convex Hull ~350,0002 3 Not specified
Participant Contributions Not specified 5+ million7

19.8M+

Valid Structures Collected


350K+

On Convex Hull

By mid-2024, the competition database had grown to contain over 13 million crystal structures, with more than 5 million contributions coming directly from participants7 . All structure information in the database is open-source, accessible either through a Python API or via a dedicated application called CrystalCraft that supports multiple search functions and structure analysis7 .

The Scientist's Toolkit: Key Components of the OpenLAM Ecosystem

The OpenLAM initiative brings together a sophisticated collection of computational tools and frameworks that enable researchers to participate in this scientific frontier.

Tool/Component Function Significance
DeePMD-kit Software for performing molecular dynamics simulations Provides the foundation for training and running Deep Potential models7
DPA-2 Architecture Neural network design for large atomic models Incorporates three-body encoding information for improved accuracy7
LAMBench Benchmarking system for evaluating LAMs Enables standardized comparison of different models across domains4
OpenLAM API Programming interface for accessing structure data Allows researchers to programmatically query the competition database7
CrystalCraft App Application for visualizing and analyzing crystal structures Provides user-friendly access to the growing database of structures7

Measuring Progress: Benchmarking and Performance Gains

The LAMBench Framework

As the field of Large Atom Models has expanded, the need for standardized evaluation has become increasingly important. The LAMBench benchmarking system was developed to address this need, providing a comprehensive framework for evaluating LAMs in terms of their generalizability, adaptability, and applicability4 .

LAMBench assesses models across three critical dimensions:

  • Generalizability: How well a model performs on atomistic systems not included in its training data
  • Adaptability: A model's capacity to be fine-tuned for tasks beyond basic potential energy prediction
  • Applicability: The stability and efficiency of deploying LAMs in real-world simulations4
LAMBench Assessment Dimensions
Generalizability
Adaptability
Applicability

Tangible Performance Improvements

The OpenLAM Initiative has demonstrated steady progress in improving the accuracy and efficiency of their models. The 2024 Q3 report highlighted substantial gains in both performance and speed for the DPA-2 model7 .

Metric DPA-2-b3 (Previous) DPA-2-b4-medium (New) Improvement
Energy Weighted RMSE 18.5 meV/atom 13.1 meV/atom ~30% improvement
Force Weighted RMSE 130.8 meV/Å 113.1 meV/Å ~14% improvement
Training Speed (100 steps) 15.9 seconds 8.4 seconds ~47% faster
Inference Speed (100 runs) 6.3 seconds 3.7 seconds ~41% faster

These improvements are particularly significant because they demonstrate that the OpenLAM team is successfully navigating the trade-off between accuracy and computational efficiency—a critical challenge in the development of practical AI tools for scientific research.

Real-World Impact: From Algorithms to Applications

The ultimate test of any scientific tool lies in its ability to solve real-world problems, and here, the OpenLAM Initiative and related AI4Science approaches are already showing remarkable promise.

Accelerating Materials Discovery

A research team in China successfully trained an AI system for catalyst screening. From an initial pool of more than 14,000 potential candidates, they identified four molecular formulas that yielded highly satisfactory outcomes.

"Using conventional approaches, a seasoned analyst could perform approximately 20 experiments annually. To explore such a vast array of candidates, it would have required the efforts of 20 analysts over the span of 35 years... However, with the adoption of AI, this extensive endeavor was completed in a mere six months"8 — Liao Zengtai, Wanhua Chemical
Transforming Drug Development

In the pharmaceutical industry, companies like MindRank have leveraged AI drug discovery platforms to identify preclinical drug candidates in record time.

Their system identified a promising molecule for treating obesity and type 2 diabetes from nearly 100 candidates in just eight months, with the resulting drug MDR-001 now receiving clinical trial approvals in both China and the United States8 .

Solving Fundamental Scientific Puzzles

OpenLAM approaches are also being applied to fundamental scientific challenges, such as the growth of lithium dendrites in batteries—a phenomenon that can render lithium batteries inoperative and has remained poorly understood.

"The use of AI, particularly large language models, offers a promising avenue to achieve both precision and efficiency in this challenging domain"8 — Zhang Linfeng, AI for Science Institute

Conclusion: The Future of Atomic-Scale Science

The OpenLAM Challenges represent more than just a series of technical competitions—they embody a fundamental shift in how scientific research is conducted and who gets to participate in the process. By creating an open, collaborative ecosystem around Large Atom Models, the initiative is democratizing access to cutting-edge research tools that were previously available only to well-funded institutions.

The progress achieved through the Crystal Philately competition and related efforts demonstrates the power of this approach. With millions of validated structures added to community databases and consistent improvements in model performance, the project is building momentum toward its ambitious goal of "conquering the periodic table."

Looking Ahead

As the initiative continues to evolve, its focus on openness, standardization, and community engagement provides a compelling model for how we might approach other complex scientific challenges. In the words of the OpenLAM team's vision, the ultimate goal is to achieve "Large Atom Embodied Intelligence" for atomic-scale intelligent scientific discovery and synthetic design within 5-10 years5 .

If the current pace of progress is any indication, this vision may be closer than we think—promising a future where AI-powered discovery unlocks new materials, medicines, and technologies that today exist only in our imagination.

References