Skip to content

Add CMIP7 database support#503

Merged
lewisjared merged 9 commits intomainfrom
cmip7-database-v2
Feb 5, 2026
Merged

Add CMIP7 database support#503
lewisjared merged 9 commits intomainfrom
cmip7-database-v2

Conversation

@lewisjared
Copy link
Contributor

@lewisjared lewisjared commented Feb 3, 2026

Description

Add database schema and adapter for CMIP7 datasets based on CMIP7 Global Attributes v1.0 specification (DOI: 10.5281/zenodo.17250297).

Changes

  • CMIP7Dataset model (models/dataset.py): New SQLAlchemy model with core DRS attributes (activity_id, institution_id, source_id, experiment_id, variant_label, variable_id, grid_label, frequency, region, branding_suffix, version), additional mandatory attributes (mip_era, realm, nominal_resolution), parent info fields, and variable metadata (standard_name, long_name, units)
  • tracking_id column: Added to DatasetFile table for CMIP7 file-level identifiers (handle-based)
  • CMIP7DatasetAdapter (datasets/cmip7.py): New adapter implementing find_local_datasets() with instance_id construction following CMIP7 DRS format
  • Database migration: Creates cmip7_dataset table with indexes on source_id, experiment_id, and instance_id
  • Factory update (datasets/__init__.py): Registers CMIP7DatasetAdapter in get_dataset_adapter()

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

Add database schema and adapter for CMIP7 datasets based on CMIP7
Global Attributes v1.0 specification (DOI: 10.5281/zenodo.17250297).

Changes:
- Add CMIP7Dataset model with core DRS attributes, parent info, and
  variable metadata
- Add tracking_id column to DatasetFile for CMIP7 file identifiers
- Create CMIP7DatasetAdapter with find_local_datasets() and instance_id
  construction following CMIP7 DRS format
- Add database migration for cmip7_dataset table with indexes
- Add unit tests for adapter and model
@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 89.87342% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ages/climate-ref/src/climate_ref/datasets/cmip7.py 80.00% 6 Missing and 7 partials ⚠️
...ages/climate-ref/src/climate_ref/datasets/utils.py 84.21% 2 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
...s/climate-ref/src/climate_ref/datasets/__init__.py 96.87% <100.00%> (+0.15%) ⬆️
...ages/climate-ref/src/climate_ref/datasets/cmip6.py 94.73% <100.00%> (+3.07%) ⬆️
...s/climate-ref/src/climate_ref/datasets/obs4mips.py 83.33% <100.00%> (ø)
...ages/climate-ref/src/climate_ref/models/dataset.py 100.00% <100.00%> (ø)
packages/climate-ref/src/climate_ref/solver.py 97.12% <100.00%> (-0.05%) ⬇️
...ages/climate-ref/src/climate_ref/datasets/utils.py 88.88% <84.21%> (-11.12%) ⬇️
...ages/climate-ref/src/climate_ref/datasets/cmip7.py 80.00% <80.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive support for CMIP7 datasets to the climate-ref system based on the CMIP7 Global Attributes v1.0 specification. The PR introduces a new database schema, adapter implementation, and complete test coverage for handling CMIP7 climate model data.

Changes:

  • Added CMIP7Dataset model with core DRS attributes (activity_id, institution_id, source_id, experiment_id, variant_label, variable_id, grid_label, frequency, region, branding_suffix, version), mandatory attributes (mip_era, realm, nominal_resolution), parent information fields, and variable metadata
  • Introduced tracking_id column to DatasetFile table for CMIP7 file-level handle-based identifiers
  • Implemented CMIP7DatasetAdapter with file parsing, instance_id construction following CMIP7 DRS format, and comprehensive metadata handling
  • Created database migration to establish cmip7_dataset table with appropriate indexes and foreign key constraints
  • Registered CMIP7DatasetAdapter in the factory method for dataset type routing

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/climate-ref/src/climate_ref/models/dataset.py Adds CMIP7Dataset model class and tracking_id field to DatasetFile for CMIP7 file identifiers
packages/climate-ref/src/climate_ref/datasets/cmip7.py Implements CMIP7DatasetAdapter with parsing, instance_id construction, and metadata handling functions
packages/climate-ref/src/climate_ref/datasets/init.py Registers CMIP7DatasetAdapter in get_dataset_adapter factory method
packages/climate-ref/src/climate_ref/migrations/versions/2026-02-02T1645_c47703d514ba_add_cmip7_tables.py Database migration creating cmip7_dataset table and adding tracking_id column
packages/climate-ref/tests/unit/datasets/test_cmip7.py Comprehensive test suite covering adapter initialization, metadata structure, parsing, instance_id construction, and database operations
changelog/503.feature.md Documents the new CMIP7 dataset support feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add CMIP7DatasetAdapter to ExecutionSolver.build_from_db
- Add CMIP7 to parametrized test_get_dataset_adapter test
- Extract parse_datetime and clean_branch_time to shared utils module
- Update docstring for clean_branch_time explaining EC-Earth3 suffixes
Add two missing CMIP7 spec attributes:
- license_id (mandatory): creative commons license identifier
- external_variables (conditionally required): cell measure variable names

Updates the DB model, file parser, dataset adapter, and includes
an Alembic migration. Also improves the DRS comment to clarify
the omitted leading drs_specs/mip_era fixed values.
@lewisjared lewisjared merged commit 27bac07 into main Feb 5, 2026
15 of 16 checks passed
@lewisjared lewisjared deleted the cmip7-database-v2 branch February 5, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant