This sophisticated invoice processing system, initially developed as a prototype for Brim’s Agentic AI Engineer technical challenge, leverages LangChain’s multi-agent workflow to automate extraction, validation, and purchase order (PO) matching. Designed to reduce manual processing time by over 75%, it ensures high accuracy through intelligent error handling and human-in-the-loop review processes. A standout feature is the implementation of Retrieval-Augmented Classification (RAC)—an adaptation of RAG—using FAISS with data/raw/test_samples/ (5 faulty PDFs) to minimize human intervention by classifying and resolving common errors autonomously.
The project evolved in phases:
-
Prototype (Streamlit Version): A lightweight, Streamlit-based solution for small-scale local enterprises, relying on local JSON storage (structured_invoices.json) for quick deployment and testing.
-
Next.js Version: A robust iteration with a modern Next.js frontend, enhancing the UI with real-time WebSocket updates and maintaining JSON storage for simplicity.
-
Scalable Version (feature/database-integration Branch): The current, production-ready state, integrating SQLite (invoices.db) for efficient metadata management and AWS S3 for scalable PDF storage. While PostgreSQL was considered for larger-scale needs (e.g., 5,000+ invoices/month), SQLite was chosen as sufficient for the target volume of 5,000 invoices/month.
This staged approach—starting small, iterating to a functional Next.js system, and scaling with cloud and database technologies—demonstrates a practical path from prototype to enterprise-ready solution.
| Variant | Purpose | Key Features |
|---|---|---|
| Streamlit | Prototyping | Simple UI, Python-based |
| Next.js | Production | WebSockets, Modern UI |
| AWS S3 + SQLite | Scalability | S3 storage, SQLite metadata |
- Processes PDFs from data/raw/invoices/ (35 invoices), stored in AWS S3
- Multi-agent system for extraction, validation, and PO matching
- RAG-based error handling with FAISS, using data/raw/test_samples/ (5 faulty PDFs)
- Asynchronous processing with SQLite-backed metadata
- Next.js dashboard with real-time WebSocket updates
- Interactive invoice review with S3-hosted PDF previews
- Comprehensive metrics visualization
- FastAPI backend with WebSocket support
- SQLite database (invoices.db) for structured data
- AWS S3 for PDF storage
- Scalable design supports thousands of invoices with minimal latency
- Fully containerized deployment
- Overview
- Key Features
- Development Journey
- Migration Challenges & Solutions
- Architecture
- Setup Guide
- CI/CD & Docker Hub
- License
- Contributing
- Set up FastAPI backend and Next.js frontend
- Built core extraction and validation logic with Pydantic models
- Integrated FAISS-based RAG and OpenAI’s gpt-4o-mini for error handling
- Added PO matching with fuzzy logic and enhanced the frontend UI
- Fixed WebSocket connectivity, file uploads, and PDF viewing issues
- Stabilized backend and frontend compatibility (Node.js 20)
- Migrated from JSON to SQLite with migrate_json_to_db.py
- Integrated AWS S3 for PDF storage, optimizing WebSocket stability
- Refined documentation and made demo video
| Challenge | Solution |
|---|---|
| S3 Upload Errors | Removed 'ACL': 'public-read'; configured bucket policies for public access |
| WebSocket Instability | Added ConnectionManager with heartbeat checks and reconnection logic |
| Database Migration | Created migrate_json_to_db.py to prevent duplicates and ensure smooth transition |
| Anomalies Page Glitch | Updated anomalies.tsx to enforce numeric page values |
brim_invoice_nextjs/
├── Backend/Dockerfile
├── main.py
├── docker-compose.yml
├── README.md
├── requirements.txt
├── .gitignore
├── invoices.db
├── agents/
│ ├── base_agent.py
│ ├── extractor_agent.py
│ ├── fallback_agent.py
│ ├── human_review_agent.py
│ ├── matching_agent.py
│ ├── validator_agent.py
├── api/
│ ├── app.py
│ ├── review_api.py
├── config/
│ ├── logging_config.py
│ ├── monitoring.py
│ ├── settings.py
├── data/
│ ├── processed/
│ │ └── anomalies.json
│ ├── raw/
│ │ ├── invoices/
│ │ ├── test_invoice.txt
│ │ └── vendor_data.csv
│ ├── temp/
│ └── test_samples/
├── data_processing/
│ ├── anomaly_detection.py
│ ├── confidence_scoring.py
│ ├── document_parser.py
│ ├── ocr_helper.py
│ ├── po_matcher.py
│ ├── rag_helper.py
├── frontend-nextjs/
│ ├── Dockerfile
│ ├── next.config.ts
│ ├── package.json
│ ├── tailwind.config.ts
│ ├── tsconfig.json
│ ├── lib/
│ │ └── api.ts
│ └── src/
│ ├── pages/
│ │ ├── _app.tsx
│ │ ├── anomalies.tsx
│ │ ├── index.tsx
│ │ ├── invoices.tsx
│ │ ├── metrics.tsx
│ │ ├── review.tsx
│ │ └── upload.tsx
│ ├── components/
│ │ └── Layout.tsx
│ └── styles/
│ └── globals.css
├── models/
│ ├── invoice.py
│ ├── validation_schema.py
└── workflows/
├── orchestrator.py
+-------------------+
| Next.js UI |
| - React, Next.js |
| - Tailwind CSS |
+-------------------+
↓
+---------+---------+
| FastAPI Backend |
| - WebSocket |
+---------+---------+
↓
+---------+---------+
|Multi-Agent Workflow|
| - Extraction |
| - Validation |
| - PO Matching |
| - Human Review |
| - Fallback (FAISS)|
+---------+---------+
↓
+---------+---------++---------+-------+
| SQLite (invoices.db) | AWS S3 (PDFs) |
+-------------------+-----------------+
flowchart TD
A[Next.js Frontend<br>Port: 3000] -->|PDF Upload & UI Events| B[FastAPI Backend<br>Port: 8000]
B -->|WebSocket Updates| A
B -->|Delegate Tasks| C[Multi-Agent Workflow]
C -->|Store Metadata| D[SQLite Database<br>invoices.db]
C -->|Store PDFs| E[AWS S3<br>PDF Storage]
subgraph Pages
A --> A1[Upload]
A --> A2[Invoices]
A --> A3[Review]
A --> A4[Metrics]
A --> A5[Anomalies]
end
subgraph Agents
C --> C1[Extraction<br>gpt-4o-mini]
C --> C2[Validation<br>Pydantic]
C --> C3[PO Matching<br>Fuzzy]
C --> C4[Human Review<br>Confidence < 0.9]
C --> C5[Fallback<br>FAISS RAG]
end
- Docker & Docker Compose
- AWS account with S3 access
- OpenAI API key
Note: The setup instructions assume a Unix-like environment (e.g., Linux, macOS). For Windows, use WSL or adjust commands accordingly.
- Clone the Repository:
git clone -b feature/database-integration https://github.com/YanCotta/brim_invoice_nextjs.git brim_invoice_nextjs_feature
cd brim_invoice_nextjs_feature
git branch # make sure you're in the right branch- Set Up Environment Variables:
Create a
.envfile with your credentials:
OPENAI_API_KEY=your_key
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
BUCKET_NAME=your_bucket_name- Run the Application:
docker compose up -dFor the feature/database-integration branch, you can use pre-built Docker images from Docker Hub:
# Pull images from Docker Hub
docker pull yancotta/brim_invoice_nextjs_backend:feature-database-integration
docker pull yancotta/brim_invoice_nextjs_frontend:feature-database-integrationBackend Image:
- Name:
yancotta/brim_invoice_nextjs_backend:feature-database-integration - Description: Brim's Tech Test - Next.js Backend with SQLite and AWS S3 integration
- GitHub: https://github.com/YanCotta/brim_invoice_nextjs
- Size: 1.01 GB
- OS/Arch: linux/amd64
- Digest: bc0bfcdf4d1a
Frontend Image:
- Name:
yancotta/brim_invoice_nextjs_frontend:feature-database-integration - Description: Brim's Tech Test - Next.js Frontend with enhanced database features
- GitHub: https://github.com/YanCotta/brim_invoice_nextjs
- Size: 299.7 MB
- OS/Arch: linux/amd64
- Verify Installation:
- Visit http://localhost:3000 to confirm the frontend is running
- Access API docs at http://localhost:8000/docs
- Configure AWS S3:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::your-bucket-name/*"
}
]
}- Disable "Block all public access" in bucket settings
- Migration (Optional): Only needed when upgrading from JSON-based versions:
python scripts/migrate_json_to_db.py --json-path data/processed/invoices.json
sqlite3 invoices.db "SELECT COUNT(*) FROM invoices"- Docker Configuration:
Save as
docker-compose.yml:
version: '3.8'
services:
backend:
image: yancotta/brim_invoice_nextjs_backend:feature-database-integration
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- BUCKET_NAME=${BUCKET_NAME}
volumes:
- ./invoices.db:/app/invoices.db
- ./data:/app/data
frontend:
image: yancotta/brim_invoice_nextjs_frontend:feature-database-integration
ports:
- "3000:3000"
depends_on:
- backendThe feature/database-integration branch uses GitHub Actions for CI/CD, building and pushing Docker images on every push:
-
Backend: yancotta/brim_invoice_nextjs_backend
- Tag: feature-database-integration
- Size: 1.01 GB
- Updated: February 26, 2025
-
Frontend: yancotta/brim_invoice_nextjs_frontend
- Tag: feature-database-integration
- Size: 299.7 MB
- Updated: February 26, 2025
Quick Start: Pull images directly from Docker Hub and run
docker compose up -dto skip building locally.The current deployment uses Docker Compose to orchestrate FastAPI backend (with SQLite) and Next.js frontend (with AWS S3). While robust for prototyping, integrating Kubernetes (K8s) would enable enterprise-grade scalability for handling 5,000+ invoices monthly.
Kubernetes would deploy the system as a cluster with:
- Backend Pods: Multiple replicas of
yancotta/brim_invoice_nextjs_backend:feature-database-integration - Frontend Pods: Instances of
yancotta/brim_invoice_nextjs_frontend:feature-database-integration - Storage: SQLite via PersistentVolumeClaims (PVCs) and AWS S3
- Services: Internal ClusterIP and external LoadBalancer/Ingress
- Local: Minikube (
minikube start) - Cloud: AWS EKS (
eksctl create cluster)
- Create Deployments for backend/frontend
- Define Services for access
- Configure Secrets for API keys
- Setup PVC for database
Example manifest:
apiVersion: apps/v1 kind: Deployment metadata: name: backend spec: replicas: 2 selector: matchLabels: app: backend template: metadata: labels: app: backend spec: containers: - name: backend image: yancotta/brim_invoice_nextjs_backend:feature-database-integration ports: - containerPort: 8000
kubectl apply -f k8s/ minikube service frontend-service --url
kubectl autoscale deployment backend --min=2 --max=6 --cpu-percent=80
- Integrate Prometheus/Grafana
- Scalability: Auto-scaling pods
- Resilience: Self-healing system
- Portability: Multi-environment deployment
- Modularity: Compatible with existing architecture
Time constraints within the 10-day challenge prioritized delivering a functional Docker Compose system. The modular design ensures minimal refactoring for future Kubernetes adoption.
5-day implementation plan:
- Cluster setup
- Manifest development
- Testing
- Deployment validation
- Monitoring integration
This enhancement maintains compatibility with existing Docker Hub images while preparing for production scale.
Built with ❤️ using LangChain, OpenAI, SQLite, AWS S3, and more for Brim's Technical Challenge
This project is licensed under the MIT License.