A comprehensive machine learning provenance tracking system with blockchain integration for immutable, tamper-evident audit trails.
- π Multi-Blockchain Support: IPFS, Ethereum, Bitcoin
- π Merkle Tree Integration: Cryptographic verification of ML pipeline
- π Immutable Provenance: Tamper-evident audit trails
- β‘ Auto Mode: Fully automated blockchain integration
- π οΈ Developer Friendly: Easy setup and configuration
- π Privacy-Preserving: Differential privacy support
- π Comprehensive Tracking: Data, model, and training provenance
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ML Training Pipeline β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ β
β β Data β β Model β β Training β β
β β Provenance β β Provenance β β Provenance β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ProvenanceTracker β
β βββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββ β
β β Merkle Tree β β Blockchain β β Provenance β β
β β Generation β β Integration β β Data β β
β βββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BlockchainManager β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ β
β β Ethereum β β Bitcoin β β IPFS β β
β β Interface β β Interface β β Interface β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
git clone <repository-url>
cd mnist_provenance
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt# Start local Geth node for Ethereum development
bash scripts/setup_local_geth.sh
# Or use IPFS only (default)
# No additional setup requiredpython3 scripts/demo_blockchain_provenance.pypython3 src/ml_provenance/training/train.py- Developer Guide - Comprehensive guide for developers
- Blockchain Integration - Detailed blockchain documentation
- Architecture - System architecture overview
- Provenance Tracking - Provenance tracking concepts
- Model Training - Training and safety features
{
"blockchain": {
"enabled": true,
"networks": ["ipfs", "ethereum"],
"ipfs": {
"enabled": true,
"url": "http://localhost:5001"
},
"ethereum": {
"enabled": true,
"rpc_url": "http://127.0.0.1:8545",
"private_key": null
}
}
}-
Never store private keys in configuration files
-
Use environment variables for sensitive data:
# Set environment variables export ETH_PRIVATE_KEY=your_ethereum_private_key_here export BTC_PRIVATE_KEY=your_bitcoin_private_key_here # Or use a .env file (copy from env.example) cp env.example .env # Edit .env with your actual private keys
-
Add .env to .gitignore to prevent accidental commits
-
Use testnet keys for development
-
Rotate keys regularly in production
config = {
"epochs": 5,
"batch_size": 64,
"learning_rate": 0.001,
"hash_algorithm": "blake3",
"blockchain": {
"networks": ["ipfs", "ethereum"],
"ipfs": {"url": "http://localhost:5001"},
"ethereum": {
"rpc_url": "http://127.0.0.1:8545",
"private_key": None # Will use ETH_PRIVATE_KEY environment variable
}
}
}from ml_provenance.provenance.tracker import ProvenanceTracker
import json
# Load configuration
with open('configs/blockchain_config.json', 'r') as f:
config = json.load(f)
# Initialize tracker with blockchain support
provenance_tracker = ProvenanceTracker(config=config)
# Track data and model
provenance_tracker.track_data(train_data, test_data)
provenance_tracker.track_model(model)
# Store pre-training hash on blockchain
before_transactions = provenance_tracker.store_merkle_on_blockchain_before_training(training_config)
# ... training process ...
# Store post-training hash on blockchain
after_transactions = provenance_tracker.store_merkle_on_blockchain_after_training(training_results)
# Verify blockchain provenance
verification_results = provenance_tracker.verify_blockchain_provenance()# Get blockchain status
status = provenance_tracker.get_blockchain_status()
print(f"Blockchain enabled: {status['blockchain_enabled']}")
# Verify provenance chain
verification = provenance_tracker.verify_blockchain_provenance()
if verification['chain_integrity']:
print("β
Provenance chain integrity verified!")- Advantages: Decentralized storage, no fees, high availability
- Setup:
brew install ipfs && ipfs daemon - Use Case: Development and testing
- Advantages: Smart contracts, immutable blockchain, programmable verification
- Setup:
brew install ethereum && bash scripts/setup_local_geth.sh - Use Case: Production environments
- Advantages: Maximum security, global consensus, long-term stability
- Setup:
brew install bitcoin && bitcoind - Use Case: High-security requirements
- Python 3.8+
- Git
- Homebrew (for macOS)
# Clone repository
git clone <repository-url>
cd mnist_provenance
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install development dependencies
pip install pytest black flake8
# Run tests
python -m pytest tests/# Unit tests
python -m pytest tests/
# Integration tests
python scripts/demo_blockchain_provenance.py
# Full training test
python src/ml_provenance/training/train.py- Import Errors:
pip install gitpython web3 requests ipfshttpclient - Geth Connection: Check if Geth is running with
lsof -i :8545 - IPFS Connection: Start IPFS daemon with
ipfs daemon - Private Key Issues: Extract from Geth dev node keystore
import logging
logging.basicConfig(level=logging.DEBUG)
# Or in configuration
config["blockchain"]["debug"] = TrueThe system generates several output files:
blockchain_report.json- Complete blockchain verification reportprovenance_report.json- Standard provenance report with blockchain infomerkle_tree_*.json- Merkle tree structure filesgeth_dev.log- Geth development node logs
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8
- Use type hints
- Add docstrings
- Write unit tests
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section
- Review the API documentation
- Check existing issues on GitHub
- Create a new issue with detailed information
- PyTorch for the deep learning framework
- Web3.py for Ethereum integration
- IPFS for decentralized storage
- Opacus for differential privacy
Made with β€οΈ for secure and verifiable machine learning