fix: check LN payment status before resetting on dev fee timeout #582

mostronator · 2026-02-11T22:27:42Z

Summary

Prevents duplicate dev fee payments caused by race conditions between timeouts and in-flight Lightning payments.

Closes #568

Problem

When the 50-second timeout expired during a dev fee payment, the code unconditionally reset dev_fee_paid = false and cleared the payment hash. But a timeout ≠ failure — the payment could still be in-flight or may have already succeeded on the LN network. This race condition leads to double payments.

Solution

On timeout, query the LN node for the actual payment status via TrackPaymentV2 before deciding:

LN Status	Action
Succeeded	Mark as paid in DB, do NOT reset
InFlight	Skip reset, leave state intact
Failed	Safe to reset and retry next cycle
Unknown	Skip reset (err on side of caution)

Design principle: When in doubt, don't retry

A missed dev fee can be recovered manually; a duplicate payment is money lost.

Changes

`src/lightning/mod.rs`

New: LndConnector::check_payment_status() — queries LND via TrackPaymentV2 for current payment state

`src/scheduler.rs`

New: DevFeePaymentState enum (Succeeded, InFlight, Failed, Unknown)
New: check_dev_fee_payment_status() helper — decodes hash, queries LND (10s timeout), updates DB on success
Modified: Timeout handler now checks LN status instead of blindly resetting

`docs/DEV_FEE_TIMEOUT_SAFETY.md`

Full documentation of the problem, solution, and design decisions

Verification

✅ cargo fmt — clean
✅ cargo clippy --all-targets --all-features — clean
✅ cargo test — 112 passed, 0 failed

Summary by CodeRabbit

New Features
- Two-step developer-fee payment flow: invoices are resolved before sending, enabling safer recovery after crashes.
Bug Fixes
- Timeouts now trigger a Lightning status check: succeeded = recorded (no retry), in-flight = left alone, failed = reset/retry, unknown = skip reset — reducing duplicate charges.
Documentation
- Added DEV_FEE_TIMEOUT_SAFETY.md describing the new timeout-safety workflow and recovery behavior.

coderabbitai · 2026-02-11T22:28:32Z

Warning

Rate limit exceeded

@grunch has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 11 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds a two-phase dev-fee flow (resolve + send), a new LND payment status query, and integrates status-driven timeout handling in the scheduler to avoid unconditional resets that can cause duplicate Lightning payments.

Changes

Cohort / File(s)	Summary
Documentation `docs/DEV_FEE_TIMEOUT_SAFETY.md`	New design doc describing the two-phase dev-fee flow and status-driven timeout handling to prevent duplicate Lightning payments.
Lightning connector `src/lightning/mod.rs`	Adds `pub async fn check_payment_status(&mut self, payment_hash: &[u8]) -> Result<PaymentStatus, MostroError>` that queries TrackPaymentV2 and maps stream status to a `PaymentStatus`.
Scheduler `src/scheduler.rs`	Adds `DevFeePaymentState` enum and `check_dev_fee_payment_status()`; augments job_process_dev_fee_payment() to resolve invoices before send, store real payment hashes, and use LN status checks on timeouts to decide: mark paid, keep in-flight, reset on failure, or skip on unknown. Includes stale real-hash cleanup and logging.
App release / payment split `src/app/release.rs`	Splits dev-fee payment into `resolve_dev_fee_invoice(order) -> (payment_request, payment_hash)` and `send_dev_fee_payment(order, payment_request)` so invoices are resolved/decoded before sending.
Database query `src/db.rs`	Removes filter that excluded rows with NULL or `PENDING-%` dev_fee_payment_hash so unpaid-dev-fees query returns orders with pending/legacy markers for recovery handling.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant LndConnector
    participant LNNode as "LN Node"
    participant Database

    Scheduler->>Database: find_unpaid_dev_fees()
    Scheduler->>Scheduler: resolve_dev_fee_invoice(order) (20s)
    Scheduler->>Database: store dev_fee_payment_hash (real-hash)
    Scheduler->>LndConnector: send_dev_fee_payment(payment_request)
    LndConnector->>LNNode: SendPayment / TrackPaymentV2
    LNNode-->>LndConnector: Payment update (succeeded/in-flight/failed)
    LndConnector-->>Scheduler: payment result / hash

    alt Timeout during send or unresolved state
        Scheduler->>LndConnector: check_payment_status(payment_hash) (10s)
        LndConnector->>LNNode: TrackPaymentV2 query
        LNNode-->>LndConnector: status {Succeeded, InFlight, Failed, Unknown}
        LndConnector-->>Scheduler: PaymentStatus
        alt Succeeded
            Scheduler->>Database: mark order.dev_fee_paid = true
        else InFlight
            Scheduler->>Scheduler: leave pending (no reset)
        else Failed
            Scheduler->>Database: clear dev_fee_paid & dev_fee_payment_hash (reset)
        else Unknown
            Scheduler->>Scheduler: skip reset (leave for later)
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Change development fund to be paid by mostrod #561: Builds on the same dev-fee scheduler and payment/hash DB flow; closely related to the two-phase resolve/send changes.
fix: prevent duplicate dev fee payment on timeout #584: Also modifies dev-fee timeout/stale-cleanup logic to avoid unconditional resets; overlaps on timeout handling.
Development fund implementation third phase #558: Introduced earlier dev-fee payment sending and scheduler hooks that this change refactors and extends.

Suggested reviewers

Catrya

Poem

🐰 I sniffed the hash beneath moonlight bright,
I asked the node if payment took flight,
No blind reset, no double-pay fray,
I kept the ledger tidy, hopped away,
My whiskers twitch — the fees behaved tonight.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: checking LN payment status before resetting on dev fee timeout, which is the core fix for preventing duplicate payments.
Linked Issues check	✅ Passed	The PR fully implements the solution specified in issue `#568`: detects LN payment status before mutating state on timeout, handles Succeeded/InFlight/Failed states correctly, and includes comprehensive documentation.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the dev fee timeout safety fix: Lightning status checking, scheduler logic, two-phase invoice resolution, and supporting documentation with no unrelated modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/dev-fee-timeout-payment-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/scheduler.rs`:
- Around line 840-850: The guard that treats any order.dev_fee_payment_hash
beginning with "PENDING-" as untrackable makes the LN status check unreachable
in the timeout path; instead, update the logic in the scheduler where
DevFeePaymentState is determined so that if order.dev_fee_payment_hash starts
with "PENDING-" you attempt to resolve the real payment identifier (for example
by checking the associated outgoing payment record/ID created when the invoice
was sent or a separate field like dev_fee_payment_id) and only return
DevFeePaymentState::Unknown if no resolvable payment identifier exists; keep the
existing branches that map actual payment hashes to Succeeded/InFlight/Failed
once you have a resolvable ID, and ensure you do not short-circuit on the
"PENDING-" prefix alone.
- Around line 742-806: The PENDING-{id} placeholder allows a stale cleanup path
to reset orders without ever checking LN status; extract and persist the real
payment hash before sending so status checks can query LND: modify
send_dev_fee_payment to derive the invoice/payment hash (like send_payment uses
invoice.signable_hash() or payment_hash logic in lightning/mod.rs) and set
order.dev_fee_payment_hash to that real hash instead of "PENDING-{id}" prior to
calling the sender; ensure check_dev_fee_payment_status uses that stored hash
and update the stale-PENDING cleanup path to call check_dev_fee_payment_status
when encountering a PENDING marker or, better, to only treat entries as stale if
the real hash is absent after X time—this prevents blind resets and duplicate
payments.

🧹 Nitpick comments (1)

src/scheduler.rs (1)

865-875: New LND connection created on each status check call.

LndConnector::new() opens a fresh gRPC connection (TLS handshake, auth). Since this only runs on the exceptional timeout path, the overhead is acceptable for now. If the status check is later called more frequently, consider accepting an &mut LndConnector parameter instead.

src/scheduler.rs

When a dev fee payment times out, query the LN node for the actual payment status via TrackPaymentV2 before deciding to reset: - Succeeded: mark as paid, do NOT reset (prevents duplicate payment) - InFlight: skip reset, payment may still complete - Failed: safe to reset and retry next cycle - Unknown: skip reset to err on the side of caution Previously, a timeout would unconditionally reset dev_fee_paid=false and clear the payment hash, creating a race condition where in-flight payments could result in double payments. - Add LndConnector::check_payment_status() for querying payment state - Add DevFeePaymentState enum and check_dev_fee_payment_status() helper - Documentation in docs/DEV_FEE_TIMEOUT_SAFETY.md Closes #568

Move DevFeePaymentState enum and check_dev_fee_payment_status function above the #[cfg(test)] mod tests block to fix the clippy::items_after_test_module error.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/scheduler.rs (1)
570-641: ⚠️ Potential issue | 🟠 Major

Stale-PENDING cleanup resets orders without querying LN status, re-enabling the duplicate payment race.

After 5 minutes, stale PENDING markers are unconditionally cleared (Lines 616–617) without calling check_dev_fee_payment_status. This means a payment that is still in-flight (or even succeeded) on the LN network will be reset to unpaid, and the next scheduler cycle will initiate a duplicate payment — the exact scenario this PR aims to prevent.

Consider calling check_dev_fee_payment_status (or a similar LN query) here before resetting, consistent with the timeout handler's approach.
🛡️ Sketch of the fix
                     if !is_stale {
                         continue;
                     }

+                    // Before resetting, check if the payment actually succeeded on LN
+                    let ln_state = check_dev_fee_payment_status(&pending_order, pool).await;
+                    match ln_state {
+                        DevFeePaymentState::Succeeded => {
+                            info!("Stale PENDING order {} actually succeeded on LN, skipping reset", order_id);
+                            continue;
+                        }
+                        DevFeePaymentState::InFlight => {
+                            warn!("Stale PENDING order {} still in-flight, skipping reset", order_id);
+                            continue;
+                        }
+                        DevFeePaymentState::Failed | DevFeePaymentState::Unknown => {
+                            // Safe to reset
+                        }
+                    }
+
                     stale_count += 1;
Note: this fix alone won't help until the PENDING marker is replaced with a real payment hash (as noted above), since check_dev_fee_payment_status returns Unknown for PENDING markers. Both issues need to be addressed together.

🤖 Fix all issues with AI agents

In `@docs/DEV_FEE_TIMEOUT_SAFETY.md`:
- Around line 36-42: Update the documentation for the helper that extracts and
checks the payment hash to explicitly state that during the primary timeout path
the payment-hash field will contain a PENDING marker, which causes the LN query
to be bypassed and the helper to return the DevFeePaymentState.Unknown fallback
(i.e., no reset). Reference the "Extracts the payment hash from the order (skips
PENDING markers)" behavior and clarify how it interacts with the 10-second LND
query, the DevFeePaymentState enum, and the primary timeout path so readers
understand the current effective behavior versus the intended design.

🧹 Nitpick comments (2)

src/lightning/mod.rs (1)

265-300: gRPC stream error is silently discarded, producing a misleading error message.

At Line 287, if stream.message().await returns Err(e), the if let Ok(…) falls through to the else branch which reports "No payment status received". The actual gRPC error (e.g., NOT_FOUND for an unknown hash) is lost, making debugging harder. Consider matching both branches:

♻️ Proposed fix

-        // Get the first (current) status update
-        if let Ok(Some(payment)) = stream.message().await {
-            fedimint_tonic_lnd::lnrpc::payment::PaymentStatus::try_from(payment.status).map_err(
-                |_| {
-                    MostroInternalErr(ServiceError::LnPaymentError(
-                        "Unknown payment status".to_string(),
-                    ))
-                },
-            )
-        } else {
-            Err(MostroInternalErr(ServiceError::LnPaymentError(
-                "No payment status received".to_string(),
-            )))
-        }
+        // Get the first (current) status update
+        match stream.message().await {
+            Ok(Some(payment)) => {
+                fedimint_tonic_lnd::lnrpc::payment::PaymentStatus::try_from(payment.status)
+                    .map_err(|_| {
+                        MostroInternalErr(ServiceError::LnPaymentError(
+                            "Unknown payment status".to_string(),
+                        ))
+                    })
+            }
+            Ok(None) => Err(MostroInternalErr(ServiceError::LnPaymentError(
+                "No payment status received (stream ended)".to_string(),
+            ))),
+            Err(e) => Err(MostroInternalErr(ServiceError::LnPaymentError(
+                format!("Failed to get payment status: {}", e),
+            ))),
+        }

src/scheduler.rs (1)

922-932: New LND connection created on every status check.

check_dev_fee_payment_status creates a fresh LndConnector (and thus a new gRPC connection) on every call. Other scheduler jobs (e.g., job_cancel_orders at Line 327) create the client once and reuse it across iterations.

Consider accepting an &mut LndConnector parameter or creating the client once in the calling scope to avoid repeated connection overhead.

docs/DEV_FEE_TIMEOUT_SAFETY.md

Split dev fee payment into resolve + send phases so the real LN payment hash is stored in the database before the payment is dispatched. This replaces the PENDING-{uuid}-{ts} placeholder marker approach. With a real hash stored upfront: - The timeout path can now query LND for actual payment status instead of returning Unknown for PENDING markers - Crash recovery works via a new stale real-hash cleanup that checks LND status and resets failed payments for retry - The find_unpaid_dev_fees query no longer needs the PENDING filter The existing PENDING cleanup is kept for backward compatibility with any legacy markers still in the database.

Reflect that the real payment hash is now stored before sending, making the LN status check reachable on timeout. Document the two-phase resolve/send flow, the stale real-hash cleanup for crash recovery, and clarify that the PENDING- guard in check_dev_fee_payment_status is now only for legacy backward compatibility.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/scheduler.rs (1)

688-724: ⚠️ Potential issue | 🟠 Major

Recovery guard blindly assumes "hash exists = payment succeeded" — should verify via LND.

The comment on line 700 says "payment already succeeded" but the hash is stored before payment dispatch (Step 2, line 759). If this guard ever fires (e.g., data inconsistency, manual DB intervention), it would mark the order as paid without verifying the actual LN payment status.

In the current flow this path appears unreachable (Step 2 atomically sets dev_fee_paid=true with the hash), but as a defensive measure the logic should be correct:

Proposed: check LND before assuming success

                     warn!(
                         "Order {} has payment hash '{}' but dev_fee_paid=false. Recovering from failed DB update.",
                         order_id,
                         payment_hash
                     );

-                    // Recovery: Mark as paid since hash exists (payment already succeeded)
-                    order.dev_fee_paid = true;
-                    match order.update(&pool).await {
-                        Ok(_) => {
-                            info!(
-                                "✅ Recovered order {} - marked as paid with existing hash",
-                                order_id
-                            );
+                    // Recovery: Check LN status before marking as paid
+                    match check_dev_fee_payment_status(&order, &pool).await {
+                        DevFeePaymentState::Succeeded => {
+                            info!("✅ Recovered order {} - confirmed paid via LN", order_id);
+                        }
+                        DevFeePaymentState::Failed => {
+                            info!("Order {} hash exists but payment failed, clearing for retry", order_id);
+                            order.dev_fee_paid = false;
+                            order.dev_fee_payment_hash = None;
+                            let _ = order.update(&pool).await;
+                        }
+                        _ => {
+                            warn!("Order {} has hash but status uncertain, skipping", order_id);
                         }
-                        ...
                     }
                     continue;

🧹 Nitpick comments (3)

src/app/release.rs (1)
570-611: resolve_dev_fee_invoice — clean two-phase separation.

The logic is sound and the doc comments are thorough. One subtle concern: resolv_ln_address (see src/lnurl.rs lines 71–123) returns Ok("".to_string()) for several failure modes (tag mismatch, amount out of range, callback failure) rather than an Err. If that happens, decode_invoice on line 602 will fail with InvoiceInvalidError, which is misleading — the root cause is LNURL resolution, not invoice decoding.

Consider adding an empty-string guard before decoding:
Proposed guard
+    if payment_request.is_empty() {
+        error!(
+            "Dev fee LNURL resolution returned empty invoice for order {} ({} sats)",
+            order.id, order.dev_fee
+        );
+        return Err(MostroInternalErr(ServiceError::LnAddressParseError));
+    }
+
     // Decode invoice and extract payment hash
     let invoice = decode_invoice(&payment_request)?;
src/scheduler.rs (2)
643-682: Stale real-hash cleanup logic is correct for crash recovery.

This correctly handles the scenario where the process crashes between storing the hash (step 2) and completing the payment. The LN status check prevents both false positives (resetting a successful payment) and false negatives (leaving a failed payment as "paid").

One efficiency concern: check_dev_fee_payment_status creates a new LndConnector per invocation (line 1000). If many orders need checking (stale real-hash loop + unpaid orders + timeout path), this could open many gRPC connections per scheduler cycle.
Consider passing a shared LndConnector
-async fn check_dev_fee_payment_status(
-    order: &Order,
-    pool: &sqlx::Pool<sqlx::Sqlite>,
-) -> DevFeePaymentState {
-    use crate::lightning::LndConnector;
+async fn check_dev_fee_payment_status(
+    order: &Order,
+    pool: &sqlx::Pool<sqlx::Sqlite>,
+    ln_client: &mut LndConnector,
+) -> DevFeePaymentState {
     use fedimint_tonic_lnd::lnrpc::payment::PaymentStatus;
     // ...
-    let mut ln_client = match LndConnector::new().await {
-        Ok(client) => client,
-        Err(e) => { /* ... */ }
-    };
967-1055: check_dev_fee_payment_status — correct status-driven logic with good fallbacks.

The function correctly maps LN payment states and conservatively returns Unknown for any error/timeout, aligning with the "don't retry" principle.

Minor note: the use imports on lines 971–972 and 987 are inside the function body. While Rust allows this, they'd be more discoverable at file level with the other imports.

… message

check_dev_fee_payment_status was creating a new gRPC connection on every invocation. Create the connector once in job_process_dev_fee_payment and pass it through, matching the pattern already used by job_cancel_orders.

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

src/scheduler.rs Show resolved Hide resolved

src/scheduler.rs Show resolved Hide resolved

mostronator force-pushed the fix/dev-fee-timeout-payment-check branch from af5da45 to efedc88 Compare February 12, 2026 12:55

fix: move items before test module to satisfy clippy lint

1ae150a

Move DevFeePaymentState enum and check_dev_fee_payment_status function above the #[cfg(test)] mod tests block to fix the clippy::items_after_test_module error.

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

docs/DEV_FEE_TIMEOUT_SAFETY.md Show resolved Hide resolved

grunch added 2 commits February 12, 2026 10:34

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

grunch added 4 commits February 12, 2026 10:46

gRPC stream error is silently discarded, producing a misleading error…

1597d0a

… message

Add an empty-string guard before decoding

da94768

fix: reuse single LndConnector in dev fee scheduler job

5be5f59

check_dev_fee_payment_status was creating a new gRPC connection on every invocation. Create the connector once in job_process_dev_fee_payment and pass it through, matching the pattern already used by job_cancel_orders.

fix format

7ed9c36

grunch merged commit 13aa8ce into main Feb 12, 2026
6 checks passed

grunch deleted the fix/dev-fee-timeout-payment-check branch February 12, 2026 14:02

coderabbitai bot mentioned this pull request Feb 12, 2026

fix: prevent redundant LND queries for confirmed dev fee payments #586

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: check LN payment status before resetting on dev fee timeout #582

fix: check LN payment status before resetting on dev fee timeout #582

mostronator commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: check LN payment status before resetting on dev fee timeout #582

fix: check LN payment status before resetting on dev fee timeout #582

Conversation

mostronator commented Feb 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Design principle: When in doubt, don't retry

Changes

src/lightning/mod.rs

src/scheduler.rs

docs/DEV_FEE_TIMEOUT_SAFETY.md

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mostronator commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

`src/lightning/mod.rs`

`src/scheduler.rs`

`docs/DEV_FEE_TIMEOUT_SAFETY.md`

coderabbitai bot commented Feb 11, 2026 •

edited

Loading