EarthLink is the first AI "copilot" for Earth scientists that can automate the entire research process, enabling systematic, large-scale exploration across over 5 petabytes of cross-disciplinary data.
A comprehensive understanding of Earth is essential to address climate change, but the fragmentation and explosive growth of data make it impossible for scientific discovery to keep pace with planetary change. EarthLink addresses this challenge by providing an autonomous AI agent system that achieves performance comparable to junior scientists across core research tasks, including bias diagnosis and future climate projection.
- Fully Automated Research Pipeline: From hypothesis generation to data analysis and result interpretation
- Massive Data Integration: Access to 5+ petabytes of cross-disciplinary Earth science data (CMIP6, OBS, etc.)
- Novel Discovery Capability: Demonstration of AI autonomously formulating and verifying physical mechanisms
- Multi-Agent Architecture: Specialized agents for planning, recipe generation, diagnostics, and result analysis
EarthLink employs a multi-agent architecture with specialized components:
User Request β Input Guard β Plan Agent β Recipe Agent β Diagnostic Agent β Image Analysis Agent β Results
- Input Guard Agent: Validates and filters user requests for Earth science relevance
- Plan Agent: Generates comprehensive experiment plans based on user requirements
- Recipe Agent: Creates ESMValTool recipes for data processing and analysis
- Diagnostic Agent: Executes diagnostic scripts and monitors the analysis process
- Image Analysis Agent: Interprets results and generates scientific summaries
- Python 3.12.0
- Access to LLM API (e.g., OpenAI GPT-5, or compatible models)
- Access to embedding API
- Tavily API key (for web search capabilities)
- ESMValTool and ESMValCore installed
- Clone the repository
git clone https://github.com/OpenEarthLab/EarthLink.git
cd EarthLink- Install ESMValTool
conda create -n earthlink python=3.12
conda activate earthlink
# Follow ESMValTool installation guide
# https://docs.esmvaltool.org/en/latest/quickstart/installation.html
conda install esmvaltool=2.12.0- Install dependencies
pip install -r requirements.txt- Configure environment variables
cp .env.example .env
# Edit .env with your API keys and configurationsRequired environment variables:
DEFAULT_MODEL=gpt-5 # or your preferred model
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=your_openai_api_key
EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B
EMBEDDING_BASE_URL=your_provider_embedding_api_url
EMBEDDING_API_KEY=your_embedding_api_key
DATA_DIR=./data
TAVILY_API_KEY=your_tavily_api_key # https://tavily.com/- Prepare data directories and retrive db
# Create necessary data directories
mkdir -p data/CMIP6 data/OBS data/obs4MIPs
unzip retrieval_db.zip
β οΈ Important Note on Data Access:
For Full Data Access: We recommend using our online platform which provides direct access to the complete 5+ petabytes dataset without requiring local data preparation.
For Local Deployment: If you wish to run the code repository on your own infrastructure, you will need to prepare and download the required climate datasets yourself. We recommend you refer to the ESMValTool Input Data Documentation for data preparation.
To quickly test EarthLink with a pre-configured example:
python -m src.main \
--request "plot tas map in 2000-12 in E3SM-2-0" \
--output_dir ./output_example \
--run_name "example_run"This will run a simple test case that generates a temperature map. The complete example output is available in output_example/example_run/, which includes:
- Generated experiment plans
- ESMValTool recipe
- Diagnostic scripts
- Output figures and data
- Image analysis results
Run with a text request:
python -m src.main --request "Analyze the annual mean and seasonal cycle of surface air temperature using CMIP6 historical simulations and compare with ERA5 observations from 1980 to 2014."Run with a request file:
python -m src.main --request request_examples/level1-annual_mean_seasonal_cycle-surface_air_temperature.txtRun with a custom experiment plan:
python -m src.main --plan path/to/your/experiment_plan.mdResume from a previous run:
# Resume from plan stage
python -m src.main --resume_from plan --resume_dir path/to/previous/run
# Resume from recipe stage
python -m src.main --resume_from recipe --resume_dir path/to/previous/runFor optimal results, follow these guidelines when writing requests:
-
Be Specific: Provide detailed requirements including:
- Types of visualizations needed (time series, spatial maps, bar charts, etc.)
- Time period for analysis (e.g., 1980-2014)
- Metrics or statistical characteristics (climatology, trends, anomalies, etc.)
-
Specify Datasets: Explicitly list the datasets to be used. Refer to:
files/avail_CMIP6_data.xlsx- Available CMIP6 datasetsfiles/avail_obs_data.xlsx- Available observational datasetsfiles/avail_obs4mips_data.xlsx- Available obs4MIPs datasets
We provide comprehensive examples organized by complexity level:
Level 1 - Simple Statistical Analysis: Performs basic climatological tasks, including data retrieval, preprocessing, calculation of annual means, spatial distributions, and interannual variability, with visualizations supporting initial model evaluation.
- Annual mean and seasonal cycles
- Climatology and standard deviation
- Change trends
- Regional mapping
- Multi-observation comparisons
Level 2 - Mechanistic Diagnosis: Solve moderately complex climate problems, such as estimating Equilibrium Climate Sensitivity (ECS) and Transient Climate Response (TCR), by understanding the physical diagnostic framework, invoking common analyses of multiple experiment datasets and applying simple mathematical tools.
- Climate change detection
- Equilibrium climate sensitivity
- Transient climate response
- Future projections
Level 3 - Complex Scientific Reasoning: Decomposes complex climate analyses into clear, logical subtasks. Integrates advanced analytical methods (e.g., Empirical Orthogonal Function (EOF), composite analysis) with specialized knowledge to study complex phenomena such as El NiΓ±o-Southern Oscillation (ENSO) diversity, requiring rigorous methodology and extended reasoning chain.
- ENSO diversity and periodicity
- Atlantic Meridional Overturning Circulation (AMOC)
- Climate change attribution
- Heat budget analysis
Level 4 - Semi-Open Scientific Problem: Automatically selects appropriate datasets based on detailed human problem descriptions, combining physical understanding with adaptive workflows to address open-ended climate problems. Applies constraint methods (e.g., emergent constraints) to identify the constraint factor and produce constrained forecasts and preliminary decision-oriented recommendations.
- Emergent constraint applications
See request_examples/ for detailed examples.
python -m src.main [OPTIONS]
Options:
--request TEXT User request message or path to request file
--plan TEXT Path to experiment plan file (skips plan generation)
--run_name TEXT Custom name for the run
--output_dir TEXT Custom output directory
--resume_from [plan|plan_list|recipe]
Resume from specific stage
--resume_dir TEXT Directory to resume fromEach run creates an organized output directory:
output_dir/
βββ [run_name]/
βββ request.txt # Original user request
βββ experiment_plans/ # Generated experiment plans
β βββ final_plan.md
βββ recipe.yml # ESMValTool recipe
βββ diag_scripts/ # Diagnostic scripts
βββ output/ # Final organized outputs
β βββ figure/ # Generated figures
β βββ data/ # Processed data
βββ image_analysis/ # Image analysis results
βββ image_analysis_pdf/ # PDF reports
βββ run_log.log # Execution log
βββ time_cost.txt # Performance metrics
EarthLink has been successfully applied to:
- Bias Diagnosis: Identifying systematic biases in climate models
- Future Climate Projection: Generating and analyzing climate projections
- Physical Mechanism Discovery: Autonomously identifying Atlantic NiΓ±o precursors and formulating interpretable mechanisms
- Multi-Model Evaluation: Comparing performance across CMIP6 model ensembles
- Climate Change Attribution: Analyzing drivers of observed climate changes
If you find this work useful, please cite our paper:
@article{earthlink2025,
title={A self-evolving AI agent system for climate science},
author={Guo, Zijie and Wang, Jiong and Ling, Fenghua and Wei, Wangxu and Yue, Xiaoyu and Jiang, Zhe and Xu, Wanghan and Luo, Jing-Jia and Cheng, Lijing and Ham, Yoo-Geun and others},
journal={arXiv preprint arXiv:2507.17311},
year={2025}
}For questions, suggestions, or collaborations:
- Emails: guozijie@pjlab.org.cn, lingfenghua@pjlab.org.cn
- Project Page: http://www.openearthlab.com/EarthLink
- Website: https://earthlink.intern-ai.org.cn/
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
You are free to:
- Share β copy and redistribute the material in any medium or format
- Adapt β remix, transform, and build upon the material
Under the following terms:
- Attribution β You must give appropriate credit, provide a link to the license, and indicate if changes were made
- NonCommercial β You may not use the material for commercial purposes
- ShareAlike β If you remix, transform, or build upon the material, you must distribute your contributions under the same license
For more details, visit CC BY-NC-SA 4.0.
EarthLink builds upon the excellent work of:
- ESMValTool: Earth System Model Evaluation Tool
- CMIP6: Coupled Model Intercomparison Project Phase 6
- Other Earth Science Datasets: Including various observational datasets and reanalysis products
We thank all contributors and the Earth science community for their support.
Built with β€οΈ for the Earth Science Community
Β© 2026 OpenEarthLab. All rights reserved.
