Financial Data Pipeline Project

A comprehensive Python project for fetching, cleaning, and normalizing financial time-series data including OHLCV (Open, High, Low, Close, Volume) data and economic indicators.

Overview

This project provides a modular pipeline for collecting financial data from multiple sources, standardizing it to consistent formats, and applying reversible normalization techniques. It's designed for machine learning applications requiring clean, normalized time-series data.

Key Features

Multi-Source Data Fetching: Collect OHLCV data from Kraken (crypto) and Yahoo Finance (stocks/indices)
Economic Indicators: Fetch macroeconomic data from FRED (Federal Reserve) and BEA (Bureau of Economic Analysis)
Data Cleaning: Standardize time indices, handle missing data, and ensure data quality
Reversible Normalization: Implement RevIN (Reversible Instance Normalization) for ML preprocessing
Modular Architecture: Clean separation of concerns with dedicated modules for each data type

Project Structure

├── data/
│   ├── clean_indicators/     # Cleaned economic indicators (CSV)
│   ├── clean_ohlcv/          # Cleaned OHLCV data (CSV)
│   ├── normalized/           # Normalized data (CSV)
│   └── raw/                  # Raw downloaded data (CSV)
├── scripts/                  # Executable scripts for data operations
├── src/                     # Source code modules
│   ├── indicators/          # Economic indicators fetching & cleaning
│   ├── ohlcv/              # OHLCV data fetching (Kraken, yfinance)
│   ├── normalize/           # RevIN normalization transforms
│   ├── symbols/             # Symbol lists for different sources
│   └── models/              # (Reserved for ML models)
├── tests/                   # Unit tests
├── _BU/                     # Backup folders with timestamps
└── _Notes_MD/               # Documentation

Data Sources

OHLCV Data

Kraken: Cryptocurrency pairs (e.g., XBTUSD, ETHUSD)
Yahoo Finance: Stocks, ETFs, indices (e.g., AAPL, ^GSPC)

Economic Indicators

FRED (Federal Reserve): Unemployment, CPI, GDP growth, Treasury spreads, etc.
BEA (Bureau of Economic Analysis): GDP growth, national accounts data

Dependencies

pandas >= 2.0.0
yfinance >= 0.2.0
fredapi >= 0.5.0
beaapi >= 0.1.0
python-dotenv >= 1.0.0
requests >= 2.28.0
pytest >= 7.0.0

Setup

Clone the repository
Install dependencies: pip install -r requirements.txt
Set up API keys in .env file:
- FRED_API_KEY: Get from https://fred.stlouisfed.org/docs/api/api_key.html
- BEA_API_KEY: Get from https://apps.bea.gov/API/signup/

Usage

Data Collection Scripts

get_rnd_data.py: Download random samples from all sources
get_clean_rnd_indicators.py: Fetch and clean economic indicators
normalize_clean_ohlcv.py: Apply normalization to OHLCV data

Core Modules

Indicators

from src.indicators import IndicatorFetcher
fetcher = IndicatorFetcher()
data = fetcher.get("UNRATE", "5y")  # Unemployment rate, 5 years

OHLCV

from src.ohlcv import KrakenOHLCV, YFinanceOHLCV
kraken = KrakenOHLCV()
data = kraken.get("XBTUSD", "2y")  # Bitcoin USD, 2 years

yfinance = YFinanceOHLCV()
data = yfinance.get("AAPL", "3y")  # Apple stock, 3 years

Normalization

from src.normalize import RevinTransform
transformer = RevinTransform(num_features=4)  # OHLC features
normalized_df = transformer.fit_transform(df)

Testing

Run tests with: pytest

Test coverage includes:

Data fetching functionality
Cleaning and standardization
Normalization transforms
API integrations

Data Format Standards

Time Index: Daily frequency, timezone-naive DatetimeIndex
Missing Data: Forward/backward filled where appropriate
File Naming: {symbol}_{date}.csv format
Normalization: RevIN applied per feature, with mean/stdev preserved for reversal

Contributing

Follow the modular structure
Add tests for new functionality
Update documentation in _Notes_MD/
Ensure data quality standards are maintained

License

[Add license information here]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
_Notes_MD		_Notes_MD
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.MD		README.MD
debug_tests.py		debug_tests.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Data Pipeline Project

Overview

Key Features

Project Structure

Data Sources

OHLCV Data

Economic Indicators

Dependencies

Setup

Usage

Data Collection Scripts

Core Modules

Indicators

OHLCV

Normalization

Testing

Data Format Standards

Contributing

License

About

Uh oh!

Releases

Packages

Languages

hypercoreiai/PRJ_master

Folders and files

Latest commit

History

Repository files navigation

Financial Data Pipeline Project

Overview

Key Features

Project Structure

Data Sources

OHLCV Data

Economic Indicators

Dependencies

Setup

Usage

Data Collection Scripts

Core Modules

Indicators

OHLCV

Normalization

Testing

Data Format Standards

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages