Skip to content

OpenEarthLab/EarthLink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EarthLink: A Self-Evolving AI Agent System for Climate Science

Paper Project Page Website License


πŸ“– Overview

EarthLink is the first AI "copilot" for Earth scientists that can automate the entire research process, enabling systematic, large-scale exploration across over 5 petabytes of cross-disciplinary data.

A comprehensive understanding of Earth is essential to address climate change, but the fragmentation and explosive growth of data make it impossible for scientific discovery to keep pace with planetary change. EarthLink addresses this challenge by providing an autonomous AI agent system that achieves performance comparable to junior scientists across core research tasks, including bias diagnosis and future climate projection.

🌟 Key Highlights

  • Fully Automated Research Pipeline: From hypothesis generation to data analysis and result interpretation
  • Massive Data Integration: Access to 5+ petabytes of cross-disciplinary Earth science data (CMIP6, OBS, etc.)
  • Novel Discovery Capability: Demonstration of AI autonomously formulating and verifying physical mechanisms
  • Multi-Agent Architecture: Specialized agents for planning, recipe generation, diagnostics, and result analysis

πŸ—οΈ System Architecture

EarthLink employs a multi-agent architecture with specialized components:

User Request β†’ Input Guard β†’ Plan Agent β†’ Recipe Agent β†’ Diagnostic Agent β†’ Image Analysis Agent β†’ Results
  1. Input Guard Agent: Validates and filters user requests for Earth science relevance
  2. Plan Agent: Generates comprehensive experiment plans based on user requirements
  3. Recipe Agent: Creates ESMValTool recipes for data processing and analysis
  4. Diagnostic Agent: Executes diagnostic scripts and monitors the analysis process
  5. Image Analysis Agent: Interprets results and generates scientific summaries

πŸš€ Quick Start

Prerequisites

  • Python 3.12.0
  • Access to LLM API (e.g., OpenAI GPT-5, or compatible models)
  • Access to embedding API
  • Tavily API key (for web search capabilities)
  • ESMValTool and ESMValCore installed

Installation

  1. Clone the repository
git clone https://github.com/OpenEarthLab/EarthLink.git
cd EarthLink
  1. Install ESMValTool
conda create -n earthlink python=3.12
conda activate earthlink
# Follow ESMValTool installation guide
# https://docs.esmvaltool.org/en/latest/quickstart/installation.html
conda install esmvaltool=2.12.0
  1. Install dependencies
pip install -r requirements.txt
  1. Configure environment variables
cp .env.example .env
# Edit .env with your API keys and configurations

Required environment variables:

DEFAULT_MODEL=gpt-5  # or your preferred model
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=your_openai_api_key
EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B
EMBEDDING_BASE_URL=your_provider_embedding_api_url
EMBEDDING_API_KEY=your_embedding_api_key
DATA_DIR=./data
TAVILY_API_KEY=your_tavily_api_key # https://tavily.com/
  1. Prepare data directories and retrive db
# Create necessary data directories
mkdir -p data/CMIP6 data/OBS data/obs4MIPs

unzip retrieval_db.zip

⚠️ Important Note on Data Access:

  • For Full Data Access: We recommend using our online platform which provides direct access to the complete 5+ petabytes dataset without requiring local data preparation.

  • For Local Deployment: If you wish to run the code repository on your own infrastructure, you will need to prepare and download the required climate datasets yourself. We recommend you refer to the ESMValTool Input Data Documentation for data preparation.

Quick Test

To quickly test EarthLink with a pre-configured example:

python -m src.main \
    --request "plot tas map in 2000-12 in E3SM-2-0" \
    --output_dir ./output_example \
    --run_name "example_run"

This will run a simple test case that generates a temperature map. The complete example output is available in output_example/example_run/, which includes:

  • Generated experiment plans
  • ESMValTool recipe
  • Diagnostic scripts
  • Output figures and data
  • Image analysis results

Basic Usage

Run with a text request:

python -m src.main --request "Analyze the annual mean and seasonal cycle of surface air temperature using CMIP6 historical simulations and compare with ERA5 observations from 1980 to 2014."

Run with a request file:

python -m src.main --request request_examples/level1-annual_mean_seasonal_cycle-surface_air_temperature.txt

Run with a custom experiment plan:

python -m src.main --plan path/to/your/experiment_plan.md

Resume from a previous run:

# Resume from plan stage
python -m src.main --resume_from plan --resume_dir path/to/previous/run

# Resume from recipe stage
python -m src.main --resume_from recipe --resume_dir path/to/previous/run

πŸ“š Documentation

User Manuals

Guidelines for Writing Requests

For optimal results, follow these guidelines when writing requests:

  1. Be Specific: Provide detailed requirements including:

    • Types of visualizations needed (time series, spatial maps, bar charts, etc.)
    • Time period for analysis (e.g., 1980-2014)
    • Metrics or statistical characteristics (climatology, trends, anomalies, etc.)
  2. Specify Datasets: Explicitly list the datasets to be used. Refer to:

Request Examples

We provide comprehensive examples organized by complexity level:

Level 1 - Simple Statistical Analysis: Performs basic climatological tasks, including data retrieval, preprocessing, calculation of annual means, spatial distributions, and interannual variability, with visualizations supporting initial model evaluation.

  • Annual mean and seasonal cycles
  • Climatology and standard deviation
  • Change trends
  • Regional mapping
  • Multi-observation comparisons

Level 2 - Mechanistic Diagnosis: Solve moderately complex climate problems, such as estimating Equilibrium Climate Sensitivity (ECS) and Transient Climate Response (TCR), by understanding the physical diagnostic framework, invoking common analyses of multiple experiment datasets and applying simple mathematical tools.

  • Climate change detection
  • Equilibrium climate sensitivity
  • Transient climate response
  • Future projections

Level 3 - Complex Scientific Reasoning: Decomposes complex climate analyses into clear, logical subtasks. Integrates advanced analytical methods (e.g., Empirical Orthogonal Function (EOF), composite analysis) with specialized knowledge to study complex phenomena such as El NiΓ±o-Southern Oscillation (ENSO) diversity, requiring rigorous methodology and extended reasoning chain.

  • ENSO diversity and periodicity
  • Atlantic Meridional Overturning Circulation (AMOC)
  • Climate change attribution
  • Heat budget analysis

Level 4 - Semi-Open Scientific Problem: Automatically selects appropriate datasets based on detailed human problem descriptions, combining physical understanding with adaptive workflows to address open-ended climate problems. Applies constraint methods (e.g., emergent constraints) to identify the constraint factor and produce constrained forecasts and preliminary decision-oriented recommendations.

  • Emergent constraint applications

See request_examples/ for detailed examples.


πŸ”§ Advanced Usage

Command-Line Options

python -m src.main [OPTIONS]

Options:
  --request TEXT              User request message or path to request file
  --plan TEXT                 Path to experiment plan file (skips plan generation)
  --run_name TEXT            Custom name for the run
  --output_dir TEXT          Custom output directory
  --resume_from [plan|plan_list|recipe]
                             Resume from specific stage
  --resume_dir TEXT          Directory to resume from

Output Structure

Each run creates an organized output directory:

output_dir/
└── [run_name]/
    β”œβ”€β”€ request.txt                    # Original user request
    β”œβ”€β”€ experiment_plans/              # Generated experiment plans
    β”‚   └── final_plan.md
    β”œβ”€β”€ recipe.yml                     # ESMValTool recipe
    β”œβ”€β”€ diag_scripts/                  # Diagnostic scripts
    β”œβ”€β”€ output/                        # Final organized outputs
    β”‚   β”œβ”€β”€ figure/                   # Generated figures
    β”‚   └── data/                      # Processed data
    β”œβ”€β”€ image_analysis/                # Image analysis results
    β”œβ”€β”€ image_analysis_pdf/            # PDF reports
    β”œβ”€β”€ run_log.log                    # Execution log
    └── time_cost.txt                  # Performance metrics

πŸ”¬ Research Applications

EarthLink has been successfully applied to:

  • Bias Diagnosis: Identifying systematic biases in climate models
  • Future Climate Projection: Generating and analyzing climate projections
  • Physical Mechanism Discovery: Autonomously identifying Atlantic NiΓ±o precursors and formulating interpretable mechanisms
  • Multi-Model Evaluation: Comparing performance across CMIP6 model ensembles
  • Climate Change Attribution: Analyzing drivers of observed climate changes

πŸ“ Citation

If you find this work useful, please cite our paper:

@article{earthlink2025,
  title={A self-evolving AI agent system for climate science},
  author={Guo, Zijie and Wang, Jiong and Ling, Fenghua and Wei, Wangxu and Yue, Xiaoyu and Jiang, Zhe and Xu, Wanghan and Luo, Jing-Jia and Cheng, Lijing and Ham, Yoo-Geun and others},
  journal={arXiv preprint arXiv:2507.17311},
  year={2025}
}

πŸ“§ Contact

For questions, suggestions, or collaborations:


πŸ“„ License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

You are free to:

  • Share β€” copy and redistribute the material in any medium or format
  • Adapt β€” remix, transform, and build upon the material

Under the following terms:

  • Attribution β€” You must give appropriate credit, provide a link to the license, and indicate if changes were made
  • NonCommercial β€” You may not use the material for commercial purposes
  • ShareAlike β€” If you remix, transform, or build upon the material, you must distribute your contributions under the same license

For more details, visit CC BY-NC-SA 4.0.


πŸ™ Acknowledgments

EarthLink builds upon the excellent work of:

  • ESMValTool: Earth System Model Evaluation Tool
  • CMIP6: Coupled Model Intercomparison Project Phase 6
  • Other Earth Science Datasets: Including various observational datasets and reanalysis products

We thank all contributors and the Earth science community for their support.


πŸ”— Related Resources


Built with ❀️ for the Earth Science Community

Β© 2026 OpenEarthLab. All rights reserved.

About

EarthLink: A Self-Evolving AI Agent System for Climate Science

Resources

Stars

Watchers

Forks

Packages

No packages published