Skip to content

Conversation

@mostronator
Copy link
Collaborator

@mostronator mostronator commented Feb 11, 2026

Summary

Prevents duplicate dev fee payments caused by race conditions between timeouts and in-flight Lightning payments.

Closes #568

Problem

When the 50-second timeout expired during a dev fee payment, the code unconditionally reset dev_fee_paid = false and cleared the payment hash. But a timeout ≠ failure — the payment could still be in-flight or may have already succeeded on the LN network. This race condition leads to double payments.

Solution

On timeout, query the LN node for the actual payment status via TrackPaymentV2 before deciding:

LN Status Action
Succeeded Mark as paid in DB, do NOT reset
InFlight Skip reset, leave state intact
Failed Safe to reset and retry next cycle
Unknown Skip reset (err on side of caution)

Design principle: When in doubt, don't retry

A missed dev fee can be recovered manually; a duplicate payment is money lost.

Changes

src/lightning/mod.rs

  • New: LndConnector::check_payment_status() — queries LND via TrackPaymentV2 for current payment state

src/scheduler.rs

  • New: DevFeePaymentState enum (Succeeded, InFlight, Failed, Unknown)
  • New: check_dev_fee_payment_status() helper — decodes hash, queries LND (10s timeout), updates DB on success
  • Modified: Timeout handler now checks LN status instead of blindly resetting

docs/DEV_FEE_TIMEOUT_SAFETY.md

  • Full documentation of the problem, solution, and design decisions

Verification

  • cargo fmt — clean
  • cargo clippy --all-targets --all-features — clean
  • cargo test — 112 passed, 0 failed

Summary by CodeRabbit

  • New Features

    • Two-step developer-fee payment flow: invoices are resolved before sending, enabling safer recovery after crashes.
  • Bug Fixes

    • Timeouts now trigger a Lightning status check: succeeded = recorded (no retry), in-flight = left alone, failed = reset/retry, unknown = skip reset — reducing duplicate charges.
  • Documentation

    • Added DEV_FEE_TIMEOUT_SAFETY.md describing the new timeout-safety workflow and recovery behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Warning

Rate limit exceeded

@grunch has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 11 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds a two-phase dev-fee flow (resolve + send), a new LND payment status query, and integrates status-driven timeout handling in the scheduler to avoid unconditional resets that can cause duplicate Lightning payments.

Changes

Cohort / File(s) Summary
Documentation
docs/DEV_FEE_TIMEOUT_SAFETY.md
New design doc describing the two-phase dev-fee flow and status-driven timeout handling to prevent duplicate Lightning payments.
Lightning connector
src/lightning/mod.rs
Adds pub async fn check_payment_status(&mut self, payment_hash: &[u8]) -> Result<PaymentStatus, MostroError> that queries TrackPaymentV2 and maps stream status to a PaymentStatus.
Scheduler
src/scheduler.rs
Adds DevFeePaymentState enum and check_dev_fee_payment_status(); augments job_process_dev_fee_payment() to resolve invoices before send, store real payment hashes, and use LN status checks on timeouts to decide: mark paid, keep in-flight, reset on failure, or skip on unknown. Includes stale real-hash cleanup and logging.
App release / payment split
src/app/release.rs
Splits dev-fee payment into resolve_dev_fee_invoice(order) -> (payment_request, payment_hash) and send_dev_fee_payment(order, payment_request) so invoices are resolved/decoded before sending.
Database query
src/db.rs
Removes filter that excluded rows with NULL or PENDING-% dev_fee_payment_hash so unpaid-dev-fees query returns orders with pending/legacy markers for recovery handling.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant LndConnector
    participant LNNode as "LN Node"
    participant Database

    Scheduler->>Database: find_unpaid_dev_fees()
    Scheduler->>Scheduler: resolve_dev_fee_invoice(order) (20s)
    Scheduler->>Database: store dev_fee_payment_hash (real-hash)
    Scheduler->>LndConnector: send_dev_fee_payment(payment_request)
    LndConnector->>LNNode: SendPayment / TrackPaymentV2
    LNNode-->>LndConnector: Payment update (succeeded/in-flight/failed)
    LndConnector-->>Scheduler: payment result / hash

    alt Timeout during send or unresolved state
        Scheduler->>LndConnector: check_payment_status(payment_hash) (10s)
        LndConnector->>LNNode: TrackPaymentV2 query
        LNNode-->>LndConnector: status {Succeeded, InFlight, Failed, Unknown}
        LndConnector-->>Scheduler: PaymentStatus
        alt Succeeded
            Scheduler->>Database: mark order.dev_fee_paid = true
        else InFlight
            Scheduler->>Scheduler: leave pending (no reset)
        else Failed
            Scheduler->>Database: clear dev_fee_paid & dev_fee_payment_hash (reset)
        else Unknown
            Scheduler->>Scheduler: skip reset (leave for later)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • Catrya

Poem

🐰 I sniffed the hash beneath moonlight bright,
I asked the node if payment took flight,
No blind reset, no double-pay fray,
I kept the ledger tidy, hopped away,
My whiskers twitch — the fees behaved tonight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: checking LN payment status before resetting on dev fee timeout, which is the core fix for preventing duplicate payments.
Linked Issues check ✅ Passed The PR fully implements the solution specified in issue #568: detects LN payment status before mutating state on timeout, handles Succeeded/InFlight/Failed states correctly, and includes comprehensive documentation.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the dev fee timeout safety fix: Lightning status checking, scheduler logic, two-phase invoice resolution, and supporting documentation with no unrelated modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/dev-fee-timeout-payment-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/scheduler.rs`:
- Around line 840-850: The guard that treats any order.dev_fee_payment_hash
beginning with "PENDING-" as untrackable makes the LN status check unreachable
in the timeout path; instead, update the logic in the scheduler where
DevFeePaymentState is determined so that if order.dev_fee_payment_hash starts
with "PENDING-" you attempt to resolve the real payment identifier (for example
by checking the associated outgoing payment record/ID created when the invoice
was sent or a separate field like dev_fee_payment_id) and only return
DevFeePaymentState::Unknown if no resolvable payment identifier exists; keep the
existing branches that map actual payment hashes to Succeeded/InFlight/Failed
once you have a resolvable ID, and ensure you do not short-circuit on the
"PENDING-" prefix alone.
- Around line 742-806: The PENDING-{id} placeholder allows a stale cleanup path
to reset orders without ever checking LN status; extract and persist the real
payment hash before sending so status checks can query LND: modify
send_dev_fee_payment to derive the invoice/payment hash (like send_payment uses
invoice.signable_hash() or payment_hash logic in lightning/mod.rs) and set
order.dev_fee_payment_hash to that real hash instead of "PENDING-{id}" prior to
calling the sender; ensure check_dev_fee_payment_status uses that stored hash
and update the stale-PENDING cleanup path to call check_dev_fee_payment_status
when encountering a PENDING marker or, better, to only treat entries as stale if
the real hash is absent after X time—this prevents blind resets and duplicate
payments.
🧹 Nitpick comments (1)
src/scheduler.rs (1)

865-875: New LND connection created on each status check call.

LndConnector::new() opens a fresh gRPC connection (TLS handshake, auth). Since this only runs on the exceptional timeout path, the overhead is acceptable for now. If the status check is later called more frequently, consider accepting an &mut LndConnector parameter instead.

When a dev fee payment times out, query the LN node for the actual
payment status via TrackPaymentV2 before deciding to reset:

- Succeeded: mark as paid, do NOT reset (prevents duplicate payment)
- InFlight: skip reset, payment may still complete
- Failed: safe to reset and retry next cycle
- Unknown: skip reset to err on the side of caution

Previously, a timeout would unconditionally reset dev_fee_paid=false
and clear the payment hash, creating a race condition where in-flight
payments could result in double payments.

- Add LndConnector::check_payment_status() for querying payment state
- Add DevFeePaymentState enum and check_dev_fee_payment_status() helper
- Documentation in docs/DEV_FEE_TIMEOUT_SAFETY.md

Closes #568
@mostronator mostronator force-pushed the fix/dev-fee-timeout-payment-check branch from af5da45 to efedc88 Compare February 12, 2026 12:55
Move DevFeePaymentState enum and check_dev_fee_payment_status function
above the #[cfg(test)] mod tests block to fix the
clippy::items_after_test_module error.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/scheduler.rs (1)

570-641: ⚠️ Potential issue | 🟠 Major

Stale-PENDING cleanup resets orders without querying LN status, re-enabling the duplicate payment race.

After 5 minutes, stale PENDING markers are unconditionally cleared (Lines 616–617) without calling check_dev_fee_payment_status. This means a payment that is still in-flight (or even succeeded) on the LN network will be reset to unpaid, and the next scheduler cycle will initiate a duplicate payment — the exact scenario this PR aims to prevent.

Consider calling check_dev_fee_payment_status (or a similar LN query) here before resetting, consistent with the timeout handler's approach.

🛡️ Sketch of the fix
                     if !is_stale {
                         continue;
                     }

+                    // Before resetting, check if the payment actually succeeded on LN
+                    let ln_state = check_dev_fee_payment_status(&pending_order, pool).await;
+                    match ln_state {
+                        DevFeePaymentState::Succeeded => {
+                            info!("Stale PENDING order {} actually succeeded on LN, skipping reset", order_id);
+                            continue;
+                        }
+                        DevFeePaymentState::InFlight => {
+                            warn!("Stale PENDING order {} still in-flight, skipping reset", order_id);
+                            continue;
+                        }
+                        DevFeePaymentState::Failed | DevFeePaymentState::Unknown => {
+                            // Safe to reset
+                        }
+                    }
+
                     stale_count += 1;

Note: this fix alone won't help until the PENDING marker is replaced with a real payment hash (as noted above), since check_dev_fee_payment_status returns Unknown for PENDING markers. Both issues need to be addressed together.

🤖 Fix all issues with AI agents
In `@docs/DEV_FEE_TIMEOUT_SAFETY.md`:
- Around line 36-42: Update the documentation for the helper that extracts and
checks the payment hash to explicitly state that during the primary timeout path
the payment-hash field will contain a PENDING marker, which causes the LN query
to be bypassed and the helper to return the DevFeePaymentState.Unknown fallback
(i.e., no reset). Reference the "Extracts the payment hash from the order (skips
PENDING markers)" behavior and clarify how it interacts with the 10-second LND
query, the DevFeePaymentState enum, and the primary timeout path so readers
understand the current effective behavior versus the intended design.
🧹 Nitpick comments (2)
src/lightning/mod.rs (1)

265-300: gRPC stream error is silently discarded, producing a misleading error message.

At Line 287, if stream.message().await returns Err(e), the if let Ok(…) falls through to the else branch which reports "No payment status received". The actual gRPC error (e.g., NOT_FOUND for an unknown hash) is lost, making debugging harder. Consider matching both branches:

♻️ Proposed fix
-        // Get the first (current) status update
-        if let Ok(Some(payment)) = stream.message().await {
-            fedimint_tonic_lnd::lnrpc::payment::PaymentStatus::try_from(payment.status).map_err(
-                |_| {
-                    MostroInternalErr(ServiceError::LnPaymentError(
-                        "Unknown payment status".to_string(),
-                    ))
-                },
-            )
-        } else {
-            Err(MostroInternalErr(ServiceError::LnPaymentError(
-                "No payment status received".to_string(),
-            )))
-        }
+        // Get the first (current) status update
+        match stream.message().await {
+            Ok(Some(payment)) => {
+                fedimint_tonic_lnd::lnrpc::payment::PaymentStatus::try_from(payment.status)
+                    .map_err(|_| {
+                        MostroInternalErr(ServiceError::LnPaymentError(
+                            "Unknown payment status".to_string(),
+                        ))
+                    })
+            }
+            Ok(None) => Err(MostroInternalErr(ServiceError::LnPaymentError(
+                "No payment status received (stream ended)".to_string(),
+            ))),
+            Err(e) => Err(MostroInternalErr(ServiceError::LnPaymentError(
+                format!("Failed to get payment status: {}", e),
+            ))),
+        }
src/scheduler.rs (1)

922-932: New LND connection created on every status check.

check_dev_fee_payment_status creates a fresh LndConnector (and thus a new gRPC connection) on every call. Other scheduler jobs (e.g., job_cancel_orders at Line 327) create the client once and reuse it across iterations.

Consider accepting an &mut LndConnector parameter or creating the client once in the calling scope to avoid repeated connection overhead.

Split dev fee payment into resolve + send phases so the real LN payment
hash is stored in the database before the payment is dispatched. This
replaces the PENDING-{uuid}-{ts} placeholder marker approach.

With a real hash stored upfront:
- The timeout path can now query LND for actual payment status instead
  of returning Unknown for PENDING markers
- Crash recovery works via a new stale real-hash cleanup that checks
  LND status and resets failed payments for retry
- The find_unpaid_dev_fees query no longer needs the PENDING filter

The existing PENDING cleanup is kept for backward compatibility with
any legacy markers still in the database.
Reflect that the real payment hash is now stored before sending,
making the LN status check reachable on timeout. Document the
two-phase resolve/send flow, the stale real-hash cleanup for crash
recovery, and clarify that the PENDING- guard in
check_dev_fee_payment_status is now only for legacy backward
compatibility.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/scheduler.rs (1)

688-724: ⚠️ Potential issue | 🟠 Major

Recovery guard blindly assumes "hash exists = payment succeeded" — should verify via LND.

The comment on line 700 says "payment already succeeded" but the hash is stored before payment dispatch (Step 2, line 759). If this guard ever fires (e.g., data inconsistency, manual DB intervention), it would mark the order as paid without verifying the actual LN payment status.

In the current flow this path appears unreachable (Step 2 atomically sets dev_fee_paid=true with the hash), but as a defensive measure the logic should be correct:

Proposed: check LND before assuming success
                     warn!(
                         "Order {} has payment hash '{}' but dev_fee_paid=false. Recovering from failed DB update.",
                         order_id,
                         payment_hash
                     );

-                    // Recovery: Mark as paid since hash exists (payment already succeeded)
-                    order.dev_fee_paid = true;
-                    match order.update(&pool).await {
-                        Ok(_) => {
-                            info!(
-                                "✅ Recovered order {} - marked as paid with existing hash",
-                                order_id
-                            );
+                    // Recovery: Check LN status before marking as paid
+                    match check_dev_fee_payment_status(&order, &pool).await {
+                        DevFeePaymentState::Succeeded => {
+                            info!("✅ Recovered order {} - confirmed paid via LN", order_id);
+                        }
+                        DevFeePaymentState::Failed => {
+                            info!("Order {} hash exists but payment failed, clearing for retry", order_id);
+                            order.dev_fee_paid = false;
+                            order.dev_fee_payment_hash = None;
+                            let _ = order.update(&pool).await;
+                        }
+                        _ => {
+                            warn!("Order {} has hash but status uncertain, skipping", order_id);
                         }
-                        ...
                     }
                     continue;
🧹 Nitpick comments (3)
src/app/release.rs (1)

570-611: resolve_dev_fee_invoice — clean two-phase separation.

The logic is sound and the doc comments are thorough. One subtle concern: resolv_ln_address (see src/lnurl.rs lines 71–123) returns Ok("".to_string()) for several failure modes (tag mismatch, amount out of range, callback failure) rather than an Err. If that happens, decode_invoice on line 602 will fail with InvoiceInvalidError, which is misleading — the root cause is LNURL resolution, not invoice decoding.

Consider adding an empty-string guard before decoding:

Proposed guard
+    if payment_request.is_empty() {
+        error!(
+            "Dev fee LNURL resolution returned empty invoice for order {} ({} sats)",
+            order.id, order.dev_fee
+        );
+        return Err(MostroInternalErr(ServiceError::LnAddressParseError));
+    }
+
     // Decode invoice and extract payment hash
     let invoice = decode_invoice(&payment_request)?;
src/scheduler.rs (2)

643-682: Stale real-hash cleanup logic is correct for crash recovery.

This correctly handles the scenario where the process crashes between storing the hash (step 2) and completing the payment. The LN status check prevents both false positives (resetting a successful payment) and false negatives (leaving a failed payment as "paid").

One efficiency concern: check_dev_fee_payment_status creates a new LndConnector per invocation (line 1000). If many orders need checking (stale real-hash loop + unpaid orders + timeout path), this could open many gRPC connections per scheduler cycle.

Consider passing a shared LndConnector
-async fn check_dev_fee_payment_status(
-    order: &Order,
-    pool: &sqlx::Pool<sqlx::Sqlite>,
-) -> DevFeePaymentState {
-    use crate::lightning::LndConnector;
+async fn check_dev_fee_payment_status(
+    order: &Order,
+    pool: &sqlx::Pool<sqlx::Sqlite>,
+    ln_client: &mut LndConnector,
+) -> DevFeePaymentState {
     use fedimint_tonic_lnd::lnrpc::payment::PaymentStatus;
     // ...
-    let mut ln_client = match LndConnector::new().await {
-        Ok(client) => client,
-        Err(e) => { /* ... */ }
-    };

967-1055: check_dev_fee_payment_status — correct status-driven logic with good fallbacks.

The function correctly maps LN payment states and conservatively returns Unknown for any error/timeout, aligning with the "don't retry" principle.

Minor note: the use imports on lines 971–972 and 987 are inside the function body. While Rust allows this, they'd be more discoverable at file level with the other imports.

check_dev_fee_payment_status was creating a new gRPC connection on every
invocation. Create the connector once in job_process_dev_fee_payment and
pass it through, matching the pattern already used by job_cancel_orders.
@grunch grunch merged commit 13aa8ce into main Feb 12, 2026
6 checks passed
@grunch grunch deleted the fix/dev-fee-timeout-payment-check branch February 12, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate Payment Risk in Dev Fee Timeout Handling

2 participants