Atom World

Testing LLMs' ability on operating 3D atomic structures.

"Forget the messy details, I just need a model that can play Lego with atoms." ⚛️🤖

Installation

pip install -e .

Usage of the Bench

If you want to run the benchmark for your own model, implement your model in src/models/ and corresponding parameters in config/models.yaml. Currently, we have implemented openai_model, azure_openai_model, huggingface_model, and vllm_model.

Run the Benchmark

python ./src/run_benchmark.py -t [benchmark_type] -m [model_name] -a [action_name] -b [batch_size] -n [num_batch]

Arguments:

Argument	Description
`benchmark_type`	Benchmark to run. See Available Benchmarks.
`model_name`	Model to test (e.g., `deepseek_chat`).
`action_name`	Action to test (see Available Actions). Only for AtomWorld and PointWorld.
`batch_size`	Number of parallel LLM calls (default: 50).
`num_batch`	Number of batches to test (default: all data).

Available Benchmarks

atomworld: AtomWorld
pointworld: PointWorld
cifgen: CIFGen
cifrepair: CIFRepair

For the StructProp task, see below.

Available Actions

AtomWorld:

add_atom_action
change_atom_action
delete_around_atom_action
delete_below_atom_action
insert_between_atoms_action
move_around_atom_action
move_atom_action
move_selected_atoms_action
move_towards_atom_action
remove_atom_action
rotate_around_atom_action
swap_atoms_action

PointWorld:

move
move_towards
insert_between
rotate_around

StructProp Task

To get CIFs from LLM for StructProp:

python ./src/struct_prop_bench/inferring.py -m [model_name] -p [property] -b [batch_size] -n [num_batch]

Then run your own calculation pipelines. The results should be saved with the format similar to ./results/StructPropBench/dft_statistics.csv in order to use the ./src/scripts/analyze_structprop_results.py for final metrics. Or you can modify the analysis script for your own results.

Analyze the Results

In the new codes, the results are saved in ./results/[BenchmarkType]/[ModelName]/[ActionName]/[Timestamp]/. The evaluation_results.csv contains the correct results, and evaluation_wrongs.csv contains the incorrect ones. metrics.json contains the summary of the metrics.

Plotting after evaluation

You can now request an automatic max_dist histogram to be generated after a benchmark run by adding the --plot flag to run_benchmark.py. The runner supports plotting for atomworld, pointworld, and cifgen benchmarks. The plot is saved to the same results folder as evaluation_results.csv and will not open an interactive window by default.

Examples:

python .\src\run_benchmark.py -t atomworld -m deepseek_chat -a move_atom_action -b 10 -n 1 --plot
python .\src\run_benchmark.py -t cifgen -m deepseek_chat -b 10 -n 1 --plot

Construct Your Own Data with mp-api

The actions and data_generator are currently under refactoring. The current pipeline will be updated soon. If you want to construct your own data, you can follow the steps below:

(Optional) Download random structures:

python src/scripts/download_random_mp_data.py --api_key [YOUR_API_KEY] --out_path [path] --min_natoms [min_atoms] --max_natoms [max_atoms] --num_entries [total_entries]

The input CIFs we used are available in ./src/data/input_cifs.zip.

Generate data:
```
python src/atom_world/data_generator.py
```

Convert to h5:

python src/scripts/convert_cifs_to_h5.py

Put the generated [action_name].csv and [action_name].h5 files in ./src/data/. Then you can run the benchmark with your own data.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

@misc{lv2025atomworldbenchmarkevaluatingspatial,
      title={AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials}, 
      author={Taoyuze Lv and Alexander Chen and Fengyu Xie and Chu Wu and Jeffrey Meng and Dongzhan Zhou and Bram Hoex and Zhicheng Zhong and Tong Xie},
      year={2025},
      eprint={2510.04704},
      archivePrefix={arXiv},
      primaryClass={cond-mat.mtrl-sci},
      url={https://arxiv.org/abs/2510.04704}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
docs/img		docs/img
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atom World

Table of Contents

Installation

Usage of the Bench

Run the Benchmark

Available Benchmarks

Available Actions

StructProp Task

Analyze the Results

Plotting after evaluation

Construct Your Own Data with mp-api

Contributing

License

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

MasterAI-EAM/atomworld

Folders and files

Latest commit

History

Repository files navigation

Atom World

Table of Contents

Installation

Usage of the Bench

Run the Benchmark

Available Benchmarks

Available Actions

StructProp Task

Analyze the Results

Plotting after evaluation

Construct Your Own Data with mp-api

Contributing

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages