Financial Transaction Fraud Detection System

Machine Learning Model for Detecting Fraudulent Transactions (6.3 Million Records)

Overview

This project builds a logistic learning–based fraud detection system using a Kaggle dataset containing 6.3 million financial transactions, each labeled as fraudulent or legitimate.
The primary objective is to develop a high-precision fraud detection model that minimizes false alarms while still capturing a meaningful portion of real fraud cases.

Because fraud investigation teams have limited time and resources, reducing false positives is critical. This project focuses on precision-first optimization through hyperparameter tuning and threshold engineering.

Dataset Description

Source:
Kaggle – Fraud Detection Dataset

Dataset Size:

6.3 million transaction records
Highly imbalanced (fraud ≈ 0.13%)

Features include:

step — time step (simulation hour)
amount — transaction amount
oldbalanceOrg / newbalanceOrig
oldbalanceDest / newbalanceDest
nameOrig / nameDest — anonymized IDs
isFraud — real fraud indicator
isFlaggedFraud — rule-based system flag

The dataset simulates general financial transaction fraud, such as money transfers and cash-out fraud.

Project Goals

Handle extreme class imbalance in large-scale data
Train and evaluate multiple ML models
Improve precision while keeping recall usable
Reduce false positives using threshold optimization (0.999)
Analyze fraud behavior across millions of transactions
Produce a model that is practically deployable in real financial systems

Final Results (After Hyperparameter + Threshold Tuning)

High-Precision Settings

Optimal Threshold: 0.999
Accuracy: 0.9989
Precision: 0.5654
Recall: 0.5347
F1-Score: 0.5496

Before vs After Comparison

Metric	Before Tuning	After Tuning
Precision	0.0226	0.5654
Recall	0.9324	0.5347
F1-Score	Very Low	0.5496

Precision Improvement: +2401.8%

Interpretation of Results

Precision jumped from 2.26% → 56.54%, meaning the model now produces far fewer false fraud alerts.
Recall decreased intentionally (93% → 53%) to make the system conservative and highly reliable.
The threshold of 0.999 flags only the most confident fraud cases.
The tuned model is now practically usable in real-world environments where false positives are costly.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
fraud_detection.ipynb		fraud_detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Financial Transaction Fraud Detection System

Machine Learning Model for Detecting Fraudulent Transactions (6.3 Million Records)

Overview

Dataset Description

Project Goals

Final Results (After Hyperparameter + Threshold Tuning)

High-Precision Settings

Before vs After Comparison

Interpretation of Results

About

Uh oh!

Languages

Vineetnaik1611/Financial-Fraud-Classification-Model

Folders and files

Latest commit

History

Repository files navigation

Financial Transaction Fraud Detection System

Machine Learning Model for Detecting Fraudulent Transactions (6.3 Million Records)

Overview

Dataset Description

Project Goals

Final Results (After Hyperparameter + Threshold Tuning)

High-Precision Settings

Before vs After Comparison

Interpretation of Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages