Skip to content

Conversation

@stuartc
Copy link
Member

@stuartc stuartc commented Jan 23, 2026

Description

This PR adds Prometheus telemetry metrics to track sandbox project usage. The new SandboxPromExPlugin module emits counters for:

sandbox_merged_event_count
description: "Count of sandbox projects merged into targets."

sandbox_deleted_event_count
description: "Count of sandbox projects manually deleted."

workflow_saved_event_count
tags: [:is_sandbox],
  description: "Count of workflow saves, tagged by project type."

provisioner_import_event_count
  tags: [:is_sandbox],
  description: "Count of provisioner imports, tagged by project type."

Closes #4101

Validation steps

  1. Run mix test - all tests pass including new telemetry tests
  2. Check Prometheus metrics endpoint includes new sandbox counters
  3. Create/merge/delete a sandbox and verify counters increment

Additional notes for the reviewer

  1. There are no seed events for these metrics based on the understanding that the very first event will be dropped but only once in the lifespan of the Prometheus series (when using GMP). This should be the lessor of two evils, the other situation being seeding and having an extra count every time the server starts.
  2. Made ObanManagerTest synchronous to fix a flaky test unrelated to this feature
  3. Removed unused socket variable from two workflow channel tests

AI Usage

  • I have used Claude Code

Pre-submission checklist

  • I have performed an AI review of my code
  • I have implemented and tested all related authorization policies
  • I have updated the changelog
  • I have ticked a box in "AI usage" in this PR

Add PromEx plugin to track sandbox-related metrics exposed to Prometheus:
- lightning.sandbox.created.count - sandbox provisioned
- lightning.sandbox.merged.count - sandbox merged into target
- lightning.sandbox.deleted.count - sandbox manually deleted
- lightning.workflow.saved.count (is_sandbox tag) - workflow saves by project type
- lightning.provisioner.import.count (is_sandbox tag) - imports by project type
@github-project-automation github-project-automation bot moved this to New Issues in v2 Jan 23, 2026
@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.30%. Comparing base (b5d6c75) to head (9aa970b).

Files with missing lines Patch % Lines
lib/lightning/projects/sandboxes.ex 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4342      +/-   ##
==========================================
+ Coverage   89.21%   89.30%   +0.09%     
==========================================
  Files         425      426       +1     
  Lines       20011    20024      +13     
==========================================
+ Hits        17852    17882      +30     
+ Misses       2159     2142      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stuartc stuartc requested a review from rorymckinley January 23, 2026 10:24
@stuartc stuartc self-assigned this Jan 23, 2026
Copy link
Collaborator

@rorymckinley rorymckinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartc Nice! Always excited to see metrics being added.

I disagree with the assertion that not seeding the event is the lesser of the two evils, as I think the value add is very low compared to seeding - but I have already presented my best argument in this regard, so not going to rehash it further.

Other than that, some small niggles about a couple of metrics that feel like they should belong somewhere else - but not serious enough to block merging if you disagree.

|> get_assoc(:workflows)
|> Enum.each(&Workflows.publish_kafka_trigger_events/1)

Lightning.Projects.SandboxPromExPlugin.fire_provisioner_import_event(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartc It feels a bit weird for the SandboxPromExPlugin to be responsible for capturing provisioner import events that aren't related to sandboxes? It may be better to do one of the following:

  • Only fire the provisioner_import_event if the calling code knows that the updated_project is a sandbox
  • Have a ProvisionerPromExPlugin that fires an import event and differentiates between sandbox and non-sandbox.

Lightning.Repo.get(Lightning.Projects.Project, workflow.project_id)
|> Lightning.Projects.Project.sandbox?()

Lightning.Projects.SandboxPromExPlugin.fire_workflow_saved_event(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartc same spidey-sense tingle as for fire_provisioner_import_event.

}
end

test "delete_sandbox does not emit telemetry event on unauthorized" do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartc I am guessing there is no robust way to trigger a deletion failure, to ensure that does not emit an event?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

Add telemetry for Sandbox usage

3 participants