Skip to content

Privacy-safe data governance and analytics framework with PII detection, policy enforcement, audit logging, and verifiable data sharing

License

Notifications You must be signed in to change notification settings

staticpayload/covenant.data

Repository files navigation

COVENANT.DATA

PyPI Version Python Versions License: MIT CI codecov

Privacy-safe data governance and analytics framework for real teams.

Documentation | Changelog | Security Policy | Contributing

What It Is

COVENANT.DATA is a Python framework that helps teams handle sensitive data with fewer leaks and fewer mistakes. It provides tools for:

  • Data contracts - Define what data you have, how sensitive it is, and how it may be used
  • PII detection - Automatically scan for personally identifiable information
  • Policy enforcement - Control who can access data and for what purposes
  • Redaction - Safely export data with sensitive fields removed
  • Privacy budgets - Track differential privacy spending
  • Audit logging - Tamper-evident logs of all sensitive operations
  • Lineage tracking - Full history of data transformations
  • Verifiable bundles - Share data with proof of what was removed

Core Guarantees

  1. No silent data loss - All operations are logged and verifiable
  2. No silent data leakage - Access requires explicit policy approval
  3. Auditability beats convenience - Every decision can be reviewed
  4. Security beats features - Safe defaults over powerful defaults
  5. Reproducibility beats speed - Deterministic outputs from given inputs

What COVENANT.DATA Does NOT Guarantee

  1. Legal compliance by itself - You must still review your specific requirements
  2. That user data is truthful - Garbage in, garbage out
  3. That PII detection is perfect without human review - Use detection as a tool, not a replacement for judgment
  4. That differential privacy settings are appropriate for every case - Privacy parameters require expertise

Quick Start

Installation

pip install covenant-data

Initialize a Project

covenant init my-project
cd my-project

Create a Data Contract

from covenant.schema.contract import Contract, ContractVersion, FieldTag
from covenant.schema import SensitivityLevel, PIICategory

contract = Contract(
    id="users-dataset",
    name="User Records",
    version=ContractVersion(major=1, minor=0, patch=0),
    schema={
        "fields": {
            "id": {"type": "string"},
            "name": {"type": "string"},
            "email": {"type": "string"},
            "age": {"type": "integer"},
        }
    },
    field_tags={
        "id": FieldTag(sensitivity=SensitivityLevel.PUBLIC),
        "name": FieldTag(
            sensitivity=SensitivityLevel.CONFIDENTIAL,
            pii_category=PIICategory.NAME,
            is_pii=True,
        ),
        "email": FieldTag(
            sensitivity=SensitivityLevel.RESTRICTED,
            pii_category=PIICategory.EMAIL,
            is_pii=True,
        ),
        "age": FieldTag(sensitivity=SensitivityLevel.INTERNAL),
    },
    allowed_purposes=["academic_research", "testing"],
    retention={},
    export_rules={},
)

print(f"Contract hash: {contract.hash()}")

Scan for PII

from covenant.pii.detector import PIIDetector
from covenant.pii.ruleset import default_ruleset

data = [
    {"name": "Alice Smith", "email": "alice@example.com"},
    {"name": "Bob Jones", "email": "bob@example.com"},
]

detector = PIIDetector(ruleset=default_ruleset())
result = detector.scan_dataset("users", data)

print(f"Found {len(result.detections)} PII occurrences")
print(f"PII fields: {result.pii_fields()}")

Create a Redaction Plan

from covenant.redact.plan import create_redaction_plan, full_redaction_action

plan = create_redaction_plan(
    dataset_id="users",
    contract_id=contract.id,
    actions=[
        full_redaction_action("email", reason="PII"),
        full_redaction_action("name", reason="PII"),
    ],
)

print(f"Plan hash: {plan.hash()}")

Apply Redaction

from covenant.redact.engine import apply_redaction_plan

redacted, result = apply_redaction_plan(plan, data)

print(f"Original: {data}")
print(f"Redacted: {redacted}")
print(f"Hash change: {result.original_hash[:16]}... -> {result.redacted_hash[:16]}...")

Evaluate Access Policies

from covenant.policy.engine import PolicyEngine, PolicyRequest
from covenant.core.context import Actor, Purpose

request = PolicyRequest(
    actor=Actor(identity="researcher1", roles=["researcher"]),
    purpose=Purpose.ACADEMIC_RESEARCH,
    resource_id="users",
    contract_id=contract.id,
    requested_fields=["id", "age"],  # No PII
)

engine = PolicyEngine()
decision = engine.evaluate(request)

print(f"Decision: {decision.decision}")
print(f"Reason: {decision.reason}")

Create a Bundle

from covenant.bundles.bundle import make_bundle
from covenant.bundles.format import write_bundle

bundle = make_bundle(
    contract=contract,
    audit_log=audit_log,
    lineage=lineage_graph,
    artifacts=[
        ("users.csv", csv_data.encode(), "text/csv"),
        ("redaction_report.json", report.encode(), "application/json"),
    ],
    created_by="data-team",
)

write_bundle(bundle, "users-export.bundle")

CLI Commands

# Initialize a project
covenant init

# Create a contract
covenant contract create --name "User Data" -o contract.json

# Validate a contract
covenant contract validate contract.json

# Scan for PII
covenant scan pii data.csv --output pii-report.json

# Create a redaction plan
covenant redact plan contract.json --fields email,name -o plan.json

# Apply redaction
covenant redact apply plan.json data.csv -o redacted.csv

# Check a policy
covenant policy check policy.json --actor user1 --purpose research

# Show audit log
covenant audit show audit.json

# Create a bundle
covenant bundle make --contract contract.json --data data.csv -o export.bundle

# Verify a bundle
covenant bundle verify export.bundle

# Verify storage
covenant verify store ./storage

Architecture

src/covenant/
├── core/       # Identity, hashing, canonical encoding
├── schema/     # Data contracts, validation, migration
├── pii/        # PII detection, rulesets
├── redact/     # Redaction plans, engine, reports
├── policy/     # Policy language, evaluation, proofs
├── audit/      # Hash-chained audit log
├── lineage/    # Data lineage graph, replay
├── privacy/    # Budget ledger, DP releases
├── bundles/    # Portable, verifiable bundles
├── storage/    # Content-addressed storage, verify/repair
├── server/     # Local web UI
├── cli/        # Command-line interface
└── viz/        # Report generation

Documentation

Examples

See the examples/ directory for complete examples:

  • ngo_case_intake/ - NGO case management with PII handling
  • school_research_release/ - Academic research data export
  • clinic_like_data_demo/ - Healthcare data with redaction
  • redaction_review_workflow/ - Review and approve redactions
  • policy_denial_and_explain/ - Understanding policy decisions
  • privacy_budget_release_report/ - DP releases with budget tracking
  • bundle_share_and_replay/ - Verifiable data sharing

Stability and Versioning

  • Version: 0.1.0 (Alpha)
  • APIs may change before 1.0.0
  • File formats are versioned with migration support
  • All changes are documented in CHANGELOG.md

Security Notes

  1. Default deny - Access is denied unless explicitly allowed
  2. Audit everything - All sensitive operations are logged
  3. Encrypt at rest - Storage supports encryption (optional)
  4. Verify imports - Bundles verify signatures on load
  5. No secrets in logs - Audit logs never contain raw sensitive data

Reporting Issues

Security issues should be reported privately at security@covenant.data

Bugs and feature requests: https://github.com/covenant-data/covenant.data/issues

License

MIT License - see LICENSE for details.

Contributing

See CONTRIBUTING.md