Skip to content

Conversation

@bencap
Copy link
Collaborator

@bencap bencap commented Nov 24, 2025

This pull request implements a major migration of functional range data in MaveDB, moving from a legacy JSONB-based representation to a new normalized relational schema with explicit tables for ACMG classifications, functional classifications, and their associations to variants. The migration includes new Alembic migrations, a data migration script, and supporting code changes for model and enum usage.

The changes also include two related supporting changes: an improvement to the way we handle Pydantic forward references that rebuilds models dynamically upon the import of any Pydantic class and a model loading utility that simplifies route definitions when both JSON and Form data must be supported by routes with optional file uploads.

Key changes:

Database Schema Migration:

  • Introduced new tables: acmg_classifications, score_calibration_functional_classifications, and an association table for linking functional classifications to variants. The old functional_ranges JSONB column is renamed and then dropped after migration. [1]], [2]])
  • Renamed columns and added new fields to support the new schema, including a class_ column and renaming classification to functional_classification in the relevant table. ([alembic/versions/0520dfa9f2db_rename_functional_ranges_to_functional_.pyR1-R45])

Data Migration Script:

  • Added alembic/manual_migrations/migrate_jsonb_ranges_to_table_rows.py, a comprehensive script to migrate existing JSONB functional ranges to the new tables, including logic to create ACMG classification records, functional classification rows, and variant associations. The script also supports verification and rollback of the migration. ([alembic/manual_migrations/migrate_jsonb_ranges_to_table_rows.pyR1-R374])

Model and Enum Refactoring:

  • Refactored ACMG-related enums (ACMGCriterion, StrengthOfEvidenceProvided) out of src/mavedb/lib/acmg.py to their own modules, and updated imports to use the new locations. ([src/mavedb/lib/acmg.pyL1-R8])

Pydantic Model Circularity:

  • Ensured that model forward references are resolved upfront on module import by importing model_rebuild in src/mavedb/__init__.py. ([src/mavedb/init.pyR12-R14])

Flexible Loader for multipart/form-data

  • Added a library module flexible_module_loader.py with a generic dependency generator create_flexible_model_loader and convenience method json_or_form_loader. These dependency generators can be used to dynamically create parsers for routes that require support for JSON data and optional file uploads. ([src/mavedb/lib/flexible_calibration_loader.py.pyR185])

@bencap bencap linked an issue Nov 24, 2025 that may be closed by this pull request
@bencap bencap requested a review from sallybg November 24, 2025 20:28
…cular dependencies

Implements a centralized model rebuilding strategy for Pydantic model rebuilding. Instead of maintaing these model rebuilds in each file, we now can import circular dependencies in an `if TYPE_CHECKING:` block. The model rebuild module will then automatically handle model rebuilds, walking the view_model module and dynamically rebuilding our models based on their sub-classes. This should substantially increase ease of maintainability when adding dependent Pydantic models.
- Added new SQLAlchemy model `ScoreCalibrationFunctionalClassification` to represent functional classifications associated with score calibrations.
- Established relationships between `ScoreCalibration` and `ScoreCalibrationFunctionalClassification`.
- Created an association table for many-to-many relationships between functional classifications and variants.
- Updated view models to accommodate new functional classification structures, including validation for inclusive bounds.
- Enhanced tests to cover new functionality, including creation and validation of functional classifications.
- Refactored existing code to ensure compatibility with new models and relationships.
- Add a property `class_` to score calibration functional classifications. One of `range` or `class_` must be defined
- Add validation logic to class based score ranges
- Refactor lib code to support both range types
- Refactor tests to support both range types

TODO: Support for creating variant associations in class based score ranges.
- Added router functionality for validation and standardization of class based calibration files.
- Added lib functionality for creation/modification of class based calibrations.
- Invoked lib functionality from routers to allow client creation/modification of class based calibrations.
- Introduced a new CSV file `calibration_classes.csv` containing variant URNs and their corresponding class names.
- Implemented tests for creating and updating score calibrations using class-based classifications.
- Enhanced existing test suite with parameterized tests to validate score calibration creation and modification.
- Ensured that the response includes correct functional classifications and variant counts.
…_pro

- Allow class-based calibration to be defined via hgvs strings
- Introduced new test CSV files for calibration classes based on HGVS nucleotide, HGVS protein, and URN.
- Enhanced test coverage for score calibration creation and updating, including scenarios for decoding errors and validation errors.
- Refactored tests to utilize parameterization for different calibration class files.
- Added validation checks for index column selection in calibration dataframes.
- Improved error messages for missing or invalid calibration classes.
@bencap bencap force-pushed the feature/bencap/538/categorical-calibrations branch from d524416 to 4962f4f Compare December 18, 2025 20:43
@jstone-dev
Copy link
Collaborator

I ran into two unexpected (I think) validation errors.

For functional classes with evidence strength, I'm seeing this:
evidence-strength-validation

When that error isn't present, I run into this:
functional-range-validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calibrations without score ranges

3 participants