Skip to content

MasterAI-EAM/atomworld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

115 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Atom World

Testing LLMs' ability on operating 3D atomic structures.

"Forget the messy details, I just need a model that can play Lego with atoms." ⚛️🤖


Table of Contents


Installation

pip install -e .

Usage of the Bench

If you want to run the benchmark for your own model, implement your model in src/models/ and corresponding parameters in config/models.yaml. Currently, we have implemented openai_model, azure_openai_model, huggingface_model, and vllm_model.

Run the Benchmark

python ./src/run_benchmark.py -t [benchmark_type] -m [model_name] -a [action_name] -b [batch_size] -n [num_batch]

Arguments:

Argument Description
benchmark_type Benchmark to run. See Available Benchmarks.
model_name Model to test (e.g., deepseek_chat).
action_name Action to test (see Available Actions). Only for AtomWorld and PointWorld.
batch_size Number of parallel LLM calls (default: 50).
num_batch Number of batches to test (default: all data).

Available Benchmarks

  • atomworld: AtomWorld
  • pointworld: PointWorld
  • cifgen: CIFGen
  • cifrepair: CIFRepair

For the StructProp task, see below.


Available Actions

AtomWorld:

  • add_atom_action
  • change_atom_action
  • delete_around_atom_action
  • delete_below_atom_action
  • insert_between_atoms_action
  • move_around_atom_action
  • move_atom_action
  • move_selected_atoms_action
  • move_towards_atom_action
  • remove_atom_action
  • rotate_around_atom_action
  • swap_atoms_action

PointWorld:

  • move
  • move_towards
  • insert_between
  • rotate_around

StructProp Task

To get CIFs from LLM for StructProp:

python ./src/struct_prop_bench/inferring.py -m [model_name] -p [property] -b [batch_size] -n [num_batch]

Then run your own calculation pipelines. The results should be saved with the format similar to ./results/StructPropBench/dft_statistics.csv in order to use the ./src/scripts/analyze_structprop_results.py for final metrics. Or you can modify the analysis script for your own results.


Analyze the Results

In the new codes, the results are saved in ./results/[BenchmarkType]/[ModelName]/[ActionName]/[Timestamp]/. The evaluation_results.csv contains the correct results, and evaluation_wrongs.csv contains the incorrect ones. metrics.json contains the summary of the metrics.

Plotting after evaluation

You can now request an automatic max_dist histogram to be generated after a benchmark run by adding the --plot flag to run_benchmark.py. The runner supports plotting for atomworld, pointworld, and cifgen benchmarks. The plot is saved to the same results folder as evaluation_results.csv and will not open an interactive window by default.

Examples:

python .\src\run_benchmark.py -t atomworld -m deepseek_chat -a move_atom_action -b 10 -n 1 --plot
python .\src\run_benchmark.py -t cifgen -m deepseek_chat -b 10 -n 1 --plot

Construct Your Own Data with mp-api

The actions and data_generator are currently under refactoring. The current pipeline will be updated soon. If you want to construct your own data, you can follow the steps below:

  1. (Optional) Download random structures:
    python src/scripts/download_random_mp_data.py --api_key [YOUR_API_KEY] --out_path [path] --min_natoms [min_atoms] --max_natoms [max_atoms] --num_entries [total_entries]
    The input CIFs we used are available in ./src/data/input_cifs.zip.
  2. Generate data:
    python src/atom_world/data_generator.py
  3. Convert to h5:
    python src/scripts/convert_cifs_to_h5.py
  4. Put the generated [action_name].csv and [action_name].h5 files in ./src/data/. Then you can run the benchmark with your own data.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

@misc{lv2025atomworldbenchmarkevaluatingspatial,
      title={AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials}, 
      author={Taoyuze Lv and Alexander Chen and Fengyu Xie and Chu Wu and Jeffrey Meng and Dongzhan Zhou and Bram Hoex and Zhicheng Zhong and Tong Xie},
      year={2025},
      eprint={2510.04704},
      archivePrefix={arXiv},
      primaryClass={cond-mat.mtrl-sci},
      url={https://arxiv.org/abs/2510.04704}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •