An API-first service that enriches text into segmented content with per-segment images and audio narration.
This project uses multiple Gemini 3 models across the pipeline:
| Use case | Model | Role |
|---|---|---|
| Text segmentation | gemini-3.0-flash (primary), gemini-2.5-flash-lite (fallback) |
Splits content into logical segments with titles and bounds |
| Image generation | gemini-3-pro-image-preview |
Native image output via ResponseModality: ["IMAGE"] |
| Narration scripts | gemini-3-pro-preview |
Style-adapted narration for TTS (educational / financial / fictional) |
| Text-to-speech | gemini-2.5-pro-preview-tts |
Audio output with configurable voice (e.g. Zephyr, Puck, Aoede) |
| Multi-modal input | Gemini Pro vision | Extract/summarize text from uploaded images and PDFs |
The Jobs Processor (worker) calls these models to segment input, generate per-segment narration, produce TTS audio, create image prompts, and generate images. See doc/GEMINI_INTEGRATION.md for details.
See doc/architecture.md for detailed system design, doc/requirements.md for functional requirements, doc/setup-and-development.md for setup, and doc/webhooks.md for webhook delivery.
API reference: openapi.yaml (OpenAPI 3.0)
- Docker & Docker Compose
- Go 1.24+ (for local development without Docker)
-
Copy environment file:
cp env.example .env
-
Add your Gemini API key to
.env:GEMINI_API_KEY=your-actual-api-key
-
Start all services:
docker-compose up -d
-
Run migrations:
docker-compose exec api ./stories-api migrate -
Access services:
- API: http://localhost:8080
- MinIO Console: http://localhost:9001 (minioadmin/minioadmin)
- Redpanda Console: http://localhost:19644
docker-compose exec postgres psql -U stories -d stories -c "
INSERT INTO users (id, email) VALUES (gen_random_uuid(), 'test@example.com') RETURNING id;
INSERT INTO api_keys (id, user_id, key_hash, status, quota_period, quota_chars)
VALUES (gen_random_uuid(), '<user_id_from_above>', crypt('test-key-123', gen_salt('bf')), 'active', 'monthly', 100000);
"curl -X POST http://localhost:8080/v1/jobs \
-H "Authorization: Bearer test-key-123" \
-H "Content-Type: application/json" \
-d '{
"text": "The solar system consists of the Sun and everything that orbits it...",
"type": "educational",
"segments_count": 3,
"audio_type": "free_speech",
"webhook": {
"url": "https://your-webhook-endpoint.com/callback"
}
}'stories/
├── cmd/
│ ├── api/ # API server main
│ ├── worker/ # Worker service main
│ └── dispatcher/ # Webhook dispatcher main
├── internal/
│ ├── auth/ # Authentication & API key validation
│ ├── quota/ # Quota management
│ ├── jobs/ # Job management
│ ├── segments/ # Segment processing
│ ├── assets/ # Asset storage & retrieval
│ ├── kafka/ # Kafka producer/consumer
│ ├── storage/ # S3 storage interface
│ ├── llm/ # LLM client (Gemini)
│ └── markup/ # Output markup generation
├── migrations/ # Database migrations
├── compose.yaml # Docker Compose for local dev
├── Dockerfile # Multi-stage Docker build
└── README.md
-
Install dependencies:
go mod download
-
Setup local Postgres, Kafka, and MinIO (update .env accordingly)
-
Run migrations:
go run ./cmd/api migrate
-
Start services:
# Terminal 1: API go run ./cmd/api # Terminal 2: Worker go run ./cmd/worker # Terminal 3: Dispatcher go run ./cmd/dispatcher
go test ./...# Build all binaries
make build
# Build specific binary
go build -o bin/api ./cmd/api
go build -o bin/worker ./cmd/worker
go build -o bin/dispatcher ./cmd/dispatcherDeploy the full stack (API, worker, dispatcher, PostgreSQL, Kafka, MinIO) with Docker Compose:
-
Copy the environment file and set required variables:
cp env.example .env # Edit .env: set DATABASE_URL, GEMINI_API_KEY, and any other production values -
Build and start all services:
docker-compose up -d --build
-
Run database migrations:
docker-compose exec api ./stories-api migrate -
The API is available on port 8080. Use the same Create API Key and Test the API steps as in Quick Start, adjusting the host if not localhost.
To stop:
docker-compose downFull specification: openapi.yaml (OpenAPI 3.0). Use it with Swagger UI, Redoc, or any OpenAPI tool.
Create a new enrichment job.
Request:
{
"text": "string (required, max 50k chars)",
"type": "educational|financial|fictional (required)",
"segments_count": "integer (required, 1-20)",
"audio_type": "free_speech|podcast (required)",
"webhook": {
"url": "string (optional)",
"secret": "string (optional)"
}
}Response (202 Accepted):
{
"job_id": "uuid",
"status": "queued",
"created_at": "timestamp"
}Get job status and results.
List user's jobs (with pagination).
Get asset metadata and download URL.
Proprietary - Gemini 3 Hackathon Project
