Implement LLM logging helpers with GenAI metadata

## Summary

Implement convenient logging helpers that make it easy to log LLM calls to Braintrust with the correct metadata structure. Focus on capturing all useful information (tokens, latency, model info) in a format that works well with Braintrust's UI.

## Context

When logging LLM interactions to Braintrust, the value is in capturing rich metadata:
- Model and provider information
- Token usage (input/output)
- Latency metrics
- Input/output in OpenAI message format (enables "Try prompt" button in UI)
- Finish reasons, temperature, and other parameters

Since LLM calls typically take 500ms-5s+, a synchronous logging call adds negligible overhead. This simplifies the implementation - no need for async batching or OpenTelemetry complexity.

## Proposed Solution

### Wrapper-Style Logging

Wrap LLM calls to automatically capture timing and structure:

```elixir
{:ok, response} = Braintrust.Log.with_llm_span(project_id, %{
  model: "gpt-4",
  provider: :openai,
  input: [%{role: "user", content: "Hello"}]
}, fn ->
  OpenAI.chat_completion(messages)
end)
# Automatically logs: input, output, latency_ms, and extracts usage if present
```

### Direct Logging

For cases where the call is already made:

```elixir
Braintrust.Log.llm_call(project_id, %{
  input: messages,
  output: response.content,
  model: "gpt-4",
  provider: :openai,
  usage: %{input_tokens: 50, output_tokens: 100},
  latency_ms: 1200,
  finish_reason: "stop",
  temperature: 0.7
})
```

### Metadata Captured

| Field | Type | Description |
|-------|------|-------------|
| `input` | map | Messages in OpenAI format (enables "Try prompt" button) |
| `output` | any | Model response |
| `model` | string | Model name (gpt-4, claude-3-opus, etc.) |
| `provider` | atom/string | Provider (openai, anthropic, etc.) |
| `usage.input_tokens` | integer | Input/prompt token count |
| `usage.output_tokens` | integer | Output/completion token count |
| `latency_ms` | integer | Request duration in milliseconds |
| `finish_reason` | string | Why generation stopped (stop, length, tool_calls) |
| `temperature` | float | Temperature parameter used |
| `max_tokens` | integer | Max tokens parameter used |
| `error` | string | Error message if call failed |

## Acceptance Criteria

- [ ] `Braintrust.Log.with_llm_span/3` wrapper that captures timing and logs automatically
- [ ] `Braintrust.Log.llm_call/2` for direct logging with structured metadata
- [ ] Automatic extraction of usage data from common response formats
- [ ] Input formatted as OpenAI message list for UI compatibility
- [ ] All metadata fields stored appropriately (scores vs metrics vs metadata)
- [ ] Documentation with usage examples for OpenAI, Anthropic patterns
- [ ] Tests covering the logging helpers

## Technical Notes

- **No async batching needed**: LLM latency (500ms-5s) dominates; sync logging adds negligible overhead
- **OpenAI message format**: Structure input as `%{messages: [...]}` for Braintrust UI "Try prompt" support
- **Metrics vs Metadata**: Token counts go in `metrics` (summed during aggregation), model/provider go in `metadata`
- **Scores**: Reserved for 0-1 normalized quality scores, not raw metrics

## Future Considerations

OpenTelemetry integration could be added later as an optional feature for users with existing OTEL infrastructure. This would provide helpers for GenAI semantic conventions (`gen_ai.*` attributes) but let users own their exporter configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement LLM logging helpers with GenAI metadata #24

Summary

Context

Proposed Solution

Wrapper-Style Logging

Direct Logging

Metadata Captured

Acceptance Criteria

Technical Notes

Future Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Type	Description
`input`	map	Messages in OpenAI format (enables "Try prompt" button)
`output`	any	Model response
`model`	string	Model name (gpt-4, claude-3-opus, etc.)
`provider`	atom/string	Provider (openai, anthropic, etc.)
`usage.input_tokens`	integer	Input/prompt token count
`usage.output_tokens`	integer	Output/completion token count
`latency_ms`	integer	Request duration in milliseconds
`finish_reason`	string	Why generation stopped (stop, length, tool_calls)
`temperature`	float	Temperature parameter used
`max_tokens`	integer	Max tokens parameter used
`error`	string	Error message if call failed

Implement LLM logging helpers with GenAI metadata #24

Description

Summary

Context

Proposed Solution

Wrapper-Style Logging

Direct Logging

Metadata Captured

Acceptance Criteria

Technical Notes

Future Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions