-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Implement convenient logging helpers that make it easy to log LLM calls to Braintrust with the correct metadata structure. Focus on capturing all useful information (tokens, latency, model info) in a format that works well with Braintrust's UI.
Context
When logging LLM interactions to Braintrust, the value is in capturing rich metadata:
- Model and provider information
- Token usage (input/output)
- Latency metrics
- Input/output in OpenAI message format (enables "Try prompt" button in UI)
- Finish reasons, temperature, and other parameters
Since LLM calls typically take 500ms-5s+, a synchronous logging call adds negligible overhead. This simplifies the implementation - no need for async batching or OpenTelemetry complexity.
Proposed Solution
Wrapper-Style Logging
Wrap LLM calls to automatically capture timing and structure:
{:ok, response} = Braintrust.Log.with_llm_span(project_id, %{
model: "gpt-4",
provider: :openai,
input: [%{role: "user", content: "Hello"}]
}, fn ->
OpenAI.chat_completion(messages)
end)
# Automatically logs: input, output, latency_ms, and extracts usage if presentDirect Logging
For cases where the call is already made:
Braintrust.Log.llm_call(project_id, %{
input: messages,
output: response.content,
model: "gpt-4",
provider: :openai,
usage: %{input_tokens: 50, output_tokens: 100},
latency_ms: 1200,
finish_reason: "stop",
temperature: 0.7
})Metadata Captured
| Field | Type | Description |
|---|---|---|
input |
map | Messages in OpenAI format (enables "Try prompt" button) |
output |
any | Model response |
model |
string | Model name (gpt-4, claude-3-opus, etc.) |
provider |
atom/string | Provider (openai, anthropic, etc.) |
usage.input_tokens |
integer | Input/prompt token count |
usage.output_tokens |
integer | Output/completion token count |
latency_ms |
integer | Request duration in milliseconds |
finish_reason |
string | Why generation stopped (stop, length, tool_calls) |
temperature |
float | Temperature parameter used |
max_tokens |
integer | Max tokens parameter used |
error |
string | Error message if call failed |
Acceptance Criteria
-
Braintrust.Log.with_llm_span/3wrapper that captures timing and logs automatically -
Braintrust.Log.llm_call/2for direct logging with structured metadata - Automatic extraction of usage data from common response formats
- Input formatted as OpenAI message list for UI compatibility
- All metadata fields stored appropriately (scores vs metrics vs metadata)
- Documentation with usage examples for OpenAI, Anthropic patterns
- Tests covering the logging helpers
Technical Notes
- No async batching needed: LLM latency (500ms-5s) dominates; sync logging adds negligible overhead
- OpenAI message format: Structure input as
%{messages: [...]}for Braintrust UI "Try prompt" support - Metrics vs Metadata: Token counts go in
metrics(summed during aggregation), model/provider go inmetadata - Scores: Reserved for 0-1 normalized quality scores, not raw metrics
Future Considerations
OpenTelemetry integration could be added later as an optional feature for users with existing OTEL infrastructure. This would provide helpers for GenAI semantic conventions (gen_ai.* attributes) but let users own their exporter configuration.