Skip to content

Implement LLM logging helpers with GenAI metadata #24

@johnnyt

Description

@johnnyt

Summary

Implement convenient logging helpers that make it easy to log LLM calls to Braintrust with the correct metadata structure. Focus on capturing all useful information (tokens, latency, model info) in a format that works well with Braintrust's UI.

Context

When logging LLM interactions to Braintrust, the value is in capturing rich metadata:

  • Model and provider information
  • Token usage (input/output)
  • Latency metrics
  • Input/output in OpenAI message format (enables "Try prompt" button in UI)
  • Finish reasons, temperature, and other parameters

Since LLM calls typically take 500ms-5s+, a synchronous logging call adds negligible overhead. This simplifies the implementation - no need for async batching or OpenTelemetry complexity.

Proposed Solution

Wrapper-Style Logging

Wrap LLM calls to automatically capture timing and structure:

{:ok, response} = Braintrust.Log.with_llm_span(project_id, %{
  model: "gpt-4",
  provider: :openai,
  input: [%{role: "user", content: "Hello"}]
}, fn ->
  OpenAI.chat_completion(messages)
end)
# Automatically logs: input, output, latency_ms, and extracts usage if present

Direct Logging

For cases where the call is already made:

Braintrust.Log.llm_call(project_id, %{
  input: messages,
  output: response.content,
  model: "gpt-4",
  provider: :openai,
  usage: %{input_tokens: 50, output_tokens: 100},
  latency_ms: 1200,
  finish_reason: "stop",
  temperature: 0.7
})

Metadata Captured

Field Type Description
input map Messages in OpenAI format (enables "Try prompt" button)
output any Model response
model string Model name (gpt-4, claude-3-opus, etc.)
provider atom/string Provider (openai, anthropic, etc.)
usage.input_tokens integer Input/prompt token count
usage.output_tokens integer Output/completion token count
latency_ms integer Request duration in milliseconds
finish_reason string Why generation stopped (stop, length, tool_calls)
temperature float Temperature parameter used
max_tokens integer Max tokens parameter used
error string Error message if call failed

Acceptance Criteria

  • Braintrust.Log.with_llm_span/3 wrapper that captures timing and logs automatically
  • Braintrust.Log.llm_call/2 for direct logging with structured metadata
  • Automatic extraction of usage data from common response formats
  • Input formatted as OpenAI message list for UI compatibility
  • All metadata fields stored appropriately (scores vs metrics vs metadata)
  • Documentation with usage examples for OpenAI, Anthropic patterns
  • Tests covering the logging helpers

Technical Notes

  • No async batching needed: LLM latency (500ms-5s) dominates; sync logging adds negligible overhead
  • OpenAI message format: Structure input as %{messages: [...]} for Braintrust UI "Try prompt" support
  • Metrics vs Metadata: Token counts go in metrics (summed during aggregation), model/provider go in metadata
  • Scores: Reserved for 0-1 normalized quality scores, not raw metrics

Future Considerations

OpenTelemetry integration could be added later as an optional feature for users with existing OTEL infrastructure. This would provide helpers for GenAI semantic conventions (gen_ai.* attributes) but let users own their exporter configuration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew functionalityresourcesAPI resource modules (Project, Experiment, Dataset, etc.)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions