An automated, AI-powered scribe to generate narrative recaps of your TTRPG sessions.
Scribble is a tool to help provide AI-powered recaps of TTRPG sessions held over Discord. It is designed to run as a continuous service in a Docker container, watching for new audio recordings and processing them automatically.
It supports multiple LLM providers (Google (Gemini), OpenAI (ChatGPT)*, Anthropic (Claude), and Ollama) and includes a web dashboard for file management, system statistics, and session monitoring.
* While I can verify it works with Gemini, Anthropic, and Ollama, there are no free API for OpenAI, so that endpoints is untested for now. Big thanks to @SnoFox for help testing the Anthropic API endpoint.
The workflow is designed to be as automated as possible after the initial setup:
- Record Audio: You use Craig to record your session on Discord.
- Upload Audio: After the session, you download the multi-track FLAC
.zipfile from Craig and upload it via the Scribble Web UI (or drop it into theSessionsfolder). - Transcribe: Scribble detects the new file, unzips it, and uses whisperx to perform a time-accurate, speaker-separated transcription of each player's audio track.
- Summarize: The individual transcripts are merged into a single, time-sorted master transcript. This transcript, along with a custom prompt, is sent to your configured LLM Provider.
- Deliver: The AI's narrative recap is received, formatted, and posted to a Discord channel via a webhook.
- Track: Detailed metrics (tokens used, estimated cost, API latency) are logged to a database and visualized on the Statistics page.
The process is currently CPU-only, as that is what I am hardware limited to working with/testing. Contributions to add GPU support are welcome!
While Scribble could be installed on bare metal, the easiest way to use it is with the provided Docker image.
Before you begin, you will need:
- Craig: Invite the Craig bot to your Discord server.
- LLM AI Provider: An API Key for Google Gemini, OpenAI, or Anthropic. Alternatively, a URL for a local Ollama instance.
- Discord Webhook URL: For a Forum-style text channel in Discord, where you want recaps to be posted. (Server Settings -> Integrations -> Webhooks -> New Webhook).
- Create a directory for your project on your host machine.
mkdir scribble-server cd scribble-server - Inside that directory, create a
docker-compose.ymlfile (see example below). - Create the
appdirectory that you referenced in thevolumessection:mkdir app
- Start the container:
docker compose up -d
- First-Run Setup:
- The container will automatically create
app/Sessionsandapp/sample_prompt.txt. - Edit
app/sample_prompt.txtto define the instructions for the AI. - When you are satisfied, rename it to
prompt.txt.
- The container will automatically create
- Usage:
- Access the Web UI at
http://your-server-ip:12345. - Log in using the password you set in
WEB_PASSWORD. - Upload your Craig
.zipfile via the Upload page. - Monitor progress on the Status page.
- Access the Web UI at
services:
scribble:
image: goosews/scribble:latest
container_name: scribble
restart: unless-stopped
ports:
- "12345:12345" # Flask Web UI
environment:
# --- Required ---
# Choose one provider: google, openai, anthropic, ollama
LLM_PROVIDER: "google"
LLM_API_KEY: "YOUR_API_KEY_HERE"
LLM_MODEL: "gemini-2.5-flash"
DISCORD_WEBHOOK: "YOUR_DISCORD_WEBHOOK_URL_HERE"
# --- Optional: Web UI Security ---
WEB_PASSWORD: "change_me"
WEB_COOKIE_KEY: "random_secret_string" # For session security
# --- Optional: Cost Tracking (Per Million Tokens) ---
TOKEN_COST_INPUT: "0.075" # Example cost per 1M input tokens
TOKEN_COST_OUTPUT: "0.30" # Example cost per 1M output tokens
# --- Optional: Whisper Performance Tuning ---
OUTPUT_VERBOSITY: "3"
WHISPER_MODEL: "large-v3"
WHISPER_THREADS: "24"
WHISPER_BATCH_SIZE: "24"
RESPAWN_TIME: "3600" # Check for new files every hour
volumes:
- ./app:/app
| Variable | Required | Default | Description |
|---|---|---|---|
DISCORD_WEBHOOK |
Yes | (not set) |
The URL for the Discord webhook. |
PUID / PGID |
No | 0 |
User/Group ID for file permissions. |
TZ |
No | Etc/UTC |
Local timezone. |
RESPAWN_TIME |
No | 3600 |
Wait time (seconds) between processing cycles. |
OUTPUT_VERBOSITY |
No | 3 |
1: Errors, 2: Warnings, 3: Info, 4: Verbose. |
KEEP_AUDIO |
No | true |
Set to false to delete FLAC files after processing. |
SAVE_DB_SPACE |
No | true |
Prevents data blobs and thought tokens from being stored in the API calls DB, significantly reducing DB size. Set to false to keep this data. |
CUSTOM_SCRIPT |
No | (not set) |
Names a custom script that can be run after the recap is posted to Discord. Useful for if you want to do anything else with the recap file, such as post it on a DocMost wiki, or something else. Script must be placed in the directory you are mounting as /app. The variable value should be the base file name (including extension) of the script you want to run (e.g. CUSTOM_SCRIPT: "docmost.sh"). The path to the recap file will be passed as positional parameter #1. |
| Variable | Required | Default | Description |
|---|---|---|---|
LLM_PROVIDER |
Yes | (not set) |
google, openai, anthropic, or ollama. |
LLM_API_KEY |
Yes | (not set) |
API Key (Required for cloud providers). |
LLM_MODEL |
Yes | (not set) |
Model name (e.g., gpt-4o, claude-3-5-sonnet, gemini-2.5-pro). |
OLLAMA_URL |
If Ollama | (not set) |
Full URL to Ollama instance (e.g., http://192.168.1.50:11434). |
TOKEN_COST_INPUT |
No | 0 |
Cost in USD per 1 Million input tokens (for stats). |
TOKEN_COST_OUTPUT |
No | 0 |
Cost in USD per 1 Million output tokens (for stats). |
| Variable | Required | Default | Description |
|---|---|---|---|
WEB_PASSWORD |
No | (random) | Password for the web interface. |
WEB_COOKIE_KEY |
No | (random) | Secret key for Flask sessions. |
| Variable | Default | Description |
|---|---|---|
WHISPER_MODEL |
large-v3 |
The whisper model to use. More on available models here. |
WHISPER_THREADS |
(all) | Number of CPU threads for whisperx. |
WHISPER_BATCH_SIZE |
8 |
Parallel processing batch size. |
WHISPER_BEAM_SIZE |
5 |
Beam search size (1-5). |
WHISPER_VAD_METHOD |
pyannote |
silero is faster, pyannote is more accurate. |
WHISPER_COMPUTE_TYPE |
int8 |
Quantization type (int8 recommended for CPU). |
-
Multi-Provider Support: Switch easily between Gemini, OpenAI, Claude, or local Ollama models.
-
Web Dashboard:
-
Upload: Drag-and-drop or URL upload for session zips.
-
Status: View progress bars, read logs, and download transcripts.
-
Statistics: Visualize token usage, costs, and API latency over time.
-
Prompt Editor: Edit the system prompt directly from the browser.
-
Session Management:
-
Retry specific steps (Re-Transcribe, Re-Build Transcript, Re-Generate Recap) with a single click.
-
Automatic file cleanup (optional).
Processing is CPU-intensive. On a dual Intel E5-2670 system (24 threads), a 2.5-hour session with 6 speakers takes approximately 7 hours to fully transcribe using large-v3.
- Move from state based processing to action based processing
- Add GPU support for WhisperX
The original code for this project is licensed under the MIT License.
This project relies on whisperx, which is distributed under the BSD-2-Clause License.
