Performance-focused engineer building production AI without breaking budgets
70% cost reduction • 200ms latency wins • Real-world impact
$ whoami
tarang@localhost:~$ cat /dev/brain
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ENGINEER_TYPE → Full-Stack + AI/ML Specialist
CURRENT_MISSION → RAG pipelines that ship to production
OBSESSION → "200ms latency" > "feels faster"
TECH_PHILOSOPHY → Boring tech that works > Shiny tech for imaginary scale
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
$ git log --oneline --recent-wins
✓ 70% cost saved (Gemini 2.0 Flash vs GPT-4)
✓ 60% API quota cut (PostgreSQL caching strategy)
✓ 200ms latency win (WebSocket over HTTP polling)
✓ <100ms retrieval (FAISS local vs cloud databases)
$ ls -la skills/
Languages: Python, TypeScript, JavaScript, SQL
Backend: Node.js, Express, Flask, FastAPI
Frontend: React, Tailwind CSS, Chrome APIs
AI/ML: Gemini, LangChain, FAISS, RAG Pipelines
Data: PostgreSQL, Vector DBs, BYTEA encryption
Architecture: Microservices, REST APIs, WebSockets, OAuth2
$ echo $MANTRA
"Measure twice, code once. Ship fast, optimize smart."|
Python Flask • FastAPI • AI/ML |
TypeScript Node.js • Express • Type Safety |
React Tailwind • Modern UIs |
PostgreSQL BYTEA • Vector DBs |
Production applications with real users and measurable impact
🤖 Web QA Chatbot - Chrome Extension • RAG Pipeline • 70% Cost Saved
🎯 The Problem: Users spend hours reading docs. AI chat costs are insane with GPT-4.
💡 The Solution: Turn any webpage into a chatbot using RAG + Gemini 2.0 Flash
Cost Optimization: 70% savings (Gemini vs GPT-4)
Performance: <100ms retrieval (FAISS local storage)
Context Strategy: 1000-token chunks, 190-token overlap
Architecture: RAG + Vector DB + No cloud dependency⚡ Why It Matters: No cloud DB = no latency tax = 100% uptime
📧 MailWise - AI Email Manager • 60% API Quota Saved
🎯 The Problem: Gmail API quotas kill productivity tools. 400+ wasted requests/day/user.
💡 The Solution: PostgreSQL caching + Webhooks + AI classification
API Efficiency: 60% quota reduction via smart caching
Architecture: 13 REST endpoints + Gmail API integration
Real-time Updates: Webhooks (eliminated 60s polling)
Security: OAuth2 compliant authentication flow⚡ Why It Matters: Staying under quota = app stays alive
🧠 Mental Health AI Platform - HIPAA Compliant • Crisis Detection • 200ms Faster
🎯 The Problem: Mental health data is sensitive. Latency kills conversational UX.
💡 The Solution: Microservices + BYTEA encryption + WebSocket + Crisis scoring
Performance: 200ms latency reduction (WebSocket magic)
Compliance: HIPAA-ready (BYTEA encryption at rest)
Crisis Detection: 0-10 scoring with auto-escalation
Architecture: Microservices (TypeScript + Python)
Security: JWT rotation (15min expiry prevents theft)⚡ Why It Matters: Lives > uptime. Real-time matters in mental health.
Real-time collaboration tools • Multi-agent RAG systems • Zero-knowledge architectures • Data visualization platforms
Building in public, learning in private, shipping in stealth
"Measure twice, code once. Ship fast, optimize smart. Sleep well at night."
const varshney = {
philosophy: "Real metrics > Vague goals",
tradeoffs: {
"70% cost cut": "Gemini 2.0 Flash > GPT-4 (same quality, way cheaper)",
"100% uptime": "FAISS local > Cloud DBs (no external dependencies)",
"200ms faster": "WebSocket > HTTP polling (real-time wins)",
"60% API saved": "PostgreSQL cache > 400 requests/day waste"
},
principles: [
"📊 Quantify everything: '200ms' > 'feels faster'",
"🛡️ Design for failure: rate limits, retries, circuit breakers",
"⚡ Ship fast, then optimize the 20% that matters",
"🔧 Boring tech > Shiny tech for problems you don't have"
],
stack: {
languages: ["Python", "TypeScript", "JavaScript", "SQL"],
backend: ["Node.js", "Express", "Flask", "FastAPI"],
frontend: ["React", "Tailwind CSS"],
aiml: ["Gemini", "LangChain", "FAISS", "RAG"],
data: ["PostgreSQL", "Vector DBs", "BYTEA encryption"]
},
mission: "Making AI accessible without selling kidneys for API credits",
realTalk: "I choose solutions that ship, not solutions that impress VCs"
};| 🎯 Principle | 💡 Example | 📊 Result |
|---|---|---|
| Quantify Trade-offs | Gemini vs GPT-4 benchmark | 70% cost cut, <2% quality drop |
| Fail Gracefully | JWT 15min expiry + rotation | Theft window minimized, UX intact |
| Optimize Impact | Cache email metadata in PG | 60% API quota saved |
| Choose Boring | PostgreSQL > MongoDB | Schema = predictability = sleep |
AI/ML Engineering • System Design • Cost Optimization • RAG Pipelines • Performance Tuning • Startup Ideas
