-
Notifications
You must be signed in to change notification settings - Fork 43
fix: check LN payment status before resetting on dev fee timeout #582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughAdds a two-phase dev-fee flow (resolve + send), a new LND payment status query, and integrates status-driven timeout handling in the scheduler to avoid unconditional resets that can cause duplicate Lightning payments. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler
participant LndConnector
participant LNNode as "LN Node"
participant Database
Scheduler->>Database: find_unpaid_dev_fees()
Scheduler->>Scheduler: resolve_dev_fee_invoice(order) (20s)
Scheduler->>Database: store dev_fee_payment_hash (real-hash)
Scheduler->>LndConnector: send_dev_fee_payment(payment_request)
LndConnector->>LNNode: SendPayment / TrackPaymentV2
LNNode-->>LndConnector: Payment update (succeeded/in-flight/failed)
LndConnector-->>Scheduler: payment result / hash
alt Timeout during send or unresolved state
Scheduler->>LndConnector: check_payment_status(payment_hash) (10s)
LndConnector->>LNNode: TrackPaymentV2 query
LNNode-->>LndConnector: status {Succeeded, InFlight, Failed, Unknown}
LndConnector-->>Scheduler: PaymentStatus
alt Succeeded
Scheduler->>Database: mark order.dev_fee_paid = true
else InFlight
Scheduler->>Scheduler: leave pending (no reset)
else Failed
Scheduler->>Database: clear dev_fee_paid & dev_fee_payment_hash (reset)
else Unknown
Scheduler->>Scheduler: skip reset (leave for later)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/scheduler.rs`:
- Around line 840-850: The guard that treats any order.dev_fee_payment_hash
beginning with "PENDING-" as untrackable makes the LN status check unreachable
in the timeout path; instead, update the logic in the scheduler where
DevFeePaymentState is determined so that if order.dev_fee_payment_hash starts
with "PENDING-" you attempt to resolve the real payment identifier (for example
by checking the associated outgoing payment record/ID created when the invoice
was sent or a separate field like dev_fee_payment_id) and only return
DevFeePaymentState::Unknown if no resolvable payment identifier exists; keep the
existing branches that map actual payment hashes to Succeeded/InFlight/Failed
once you have a resolvable ID, and ensure you do not short-circuit on the
"PENDING-" prefix alone.
- Around line 742-806: The PENDING-{id} placeholder allows a stale cleanup path
to reset orders without ever checking LN status; extract and persist the real
payment hash before sending so status checks can query LND: modify
send_dev_fee_payment to derive the invoice/payment hash (like send_payment uses
invoice.signable_hash() or payment_hash logic in lightning/mod.rs) and set
order.dev_fee_payment_hash to that real hash instead of "PENDING-{id}" prior to
calling the sender; ensure check_dev_fee_payment_status uses that stored hash
and update the stale-PENDING cleanup path to call check_dev_fee_payment_status
when encountering a PENDING marker or, better, to only treat entries as stale if
the real hash is absent after X time—this prevents blind resets and duplicate
payments.
🧹 Nitpick comments (1)
src/scheduler.rs (1)
865-875: New LND connection created on each status check call.
LndConnector::new()opens a fresh gRPC connection (TLS handshake, auth). Since this only runs on the exceptional timeout path, the overhead is acceptable for now. If the status check is later called more frequently, consider accepting an&mut LndConnectorparameter instead.
When a dev fee payment times out, query the LN node for the actual payment status via TrackPaymentV2 before deciding to reset: - Succeeded: mark as paid, do NOT reset (prevents duplicate payment) - InFlight: skip reset, payment may still complete - Failed: safe to reset and retry next cycle - Unknown: skip reset to err on the side of caution Previously, a timeout would unconditionally reset dev_fee_paid=false and clear the payment hash, creating a race condition where in-flight payments could result in double payments. - Add LndConnector::check_payment_status() for querying payment state - Add DevFeePaymentState enum and check_dev_fee_payment_status() helper - Documentation in docs/DEV_FEE_TIMEOUT_SAFETY.md Closes #568
af5da45 to
efedc88
Compare
Move DevFeePaymentState enum and check_dev_fee_payment_status function above the #[cfg(test)] mod tests block to fix the clippy::items_after_test_module error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/scheduler.rs (1)
570-641:⚠️ Potential issue | 🟠 MajorStale-PENDING cleanup resets orders without querying LN status, re-enabling the duplicate payment race.
After 5 minutes, stale PENDING markers are unconditionally cleared (Lines 616–617) without calling
check_dev_fee_payment_status. This means a payment that is still in-flight (or even succeeded) on the LN network will be reset to unpaid, and the next scheduler cycle will initiate a duplicate payment — the exact scenario this PR aims to prevent.Consider calling
check_dev_fee_payment_status(or a similar LN query) here before resetting, consistent with the timeout handler's approach.🛡️ Sketch of the fix
if !is_stale { continue; } + // Before resetting, check if the payment actually succeeded on LN + let ln_state = check_dev_fee_payment_status(&pending_order, pool).await; + match ln_state { + DevFeePaymentState::Succeeded => { + info!("Stale PENDING order {} actually succeeded on LN, skipping reset", order_id); + continue; + } + DevFeePaymentState::InFlight => { + warn!("Stale PENDING order {} still in-flight, skipping reset", order_id); + continue; + } + DevFeePaymentState::Failed | DevFeePaymentState::Unknown => { + // Safe to reset + } + } + stale_count += 1;Note: this fix alone won't help until the PENDING marker is replaced with a real payment hash (as noted above), since
check_dev_fee_payment_statusreturnsUnknownfor PENDING markers. Both issues need to be addressed together.
🤖 Fix all issues with AI agents
In `@docs/DEV_FEE_TIMEOUT_SAFETY.md`:
- Around line 36-42: Update the documentation for the helper that extracts and
checks the payment hash to explicitly state that during the primary timeout path
the payment-hash field will contain a PENDING marker, which causes the LN query
to be bypassed and the helper to return the DevFeePaymentState.Unknown fallback
(i.e., no reset). Reference the "Extracts the payment hash from the order (skips
PENDING markers)" behavior and clarify how it interacts with the 10-second LND
query, the DevFeePaymentState enum, and the primary timeout path so readers
understand the current effective behavior versus the intended design.
🧹 Nitpick comments (2)
src/lightning/mod.rs (1)
265-300: gRPC stream error is silently discarded, producing a misleading error message.At Line 287, if
stream.message().awaitreturnsErr(e), theif let Ok(…)falls through to theelsebranch which reports"No payment status received". The actual gRPC error (e.g.,NOT_FOUNDfor an unknown hash) is lost, making debugging harder. Consider matching both branches:♻️ Proposed fix
- // Get the first (current) status update - if let Ok(Some(payment)) = stream.message().await { - fedimint_tonic_lnd::lnrpc::payment::PaymentStatus::try_from(payment.status).map_err( - |_| { - MostroInternalErr(ServiceError::LnPaymentError( - "Unknown payment status".to_string(), - )) - }, - ) - } else { - Err(MostroInternalErr(ServiceError::LnPaymentError( - "No payment status received".to_string(), - ))) - } + // Get the first (current) status update + match stream.message().await { + Ok(Some(payment)) => { + fedimint_tonic_lnd::lnrpc::payment::PaymentStatus::try_from(payment.status) + .map_err(|_| { + MostroInternalErr(ServiceError::LnPaymentError( + "Unknown payment status".to_string(), + )) + }) + } + Ok(None) => Err(MostroInternalErr(ServiceError::LnPaymentError( + "No payment status received (stream ended)".to_string(), + ))), + Err(e) => Err(MostroInternalErr(ServiceError::LnPaymentError( + format!("Failed to get payment status: {}", e), + ))), + }src/scheduler.rs (1)
922-932: New LND connection created on every status check.
check_dev_fee_payment_statuscreates a freshLndConnector(and thus a new gRPC connection) on every call. Other scheduler jobs (e.g.,job_cancel_ordersat Line 327) create the client once and reuse it across iterations.Consider accepting an
&mut LndConnectorparameter or creating the client once in the calling scope to avoid repeated connection overhead.
Split dev fee payment into resolve + send phases so the real LN payment
hash is stored in the database before the payment is dispatched. This
replaces the PENDING-{uuid}-{ts} placeholder marker approach.
With a real hash stored upfront:
- The timeout path can now query LND for actual payment status instead
of returning Unknown for PENDING markers
- Crash recovery works via a new stale real-hash cleanup that checks
LND status and resets failed payments for retry
- The find_unpaid_dev_fees query no longer needs the PENDING filter
The existing PENDING cleanup is kept for backward compatibility with
any legacy markers still in the database.
Reflect that the real payment hash is now stored before sending, making the LN status check reachable on timeout. Document the two-phase resolve/send flow, the stale real-hash cleanup for crash recovery, and clarify that the PENDING- guard in check_dev_fee_payment_status is now only for legacy backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/scheduler.rs (1)
688-724:⚠️ Potential issue | 🟠 MajorRecovery guard blindly assumes "hash exists = payment succeeded" — should verify via LND.
The comment on line 700 says "payment already succeeded" but the hash is stored before payment dispatch (Step 2, line 759). If this guard ever fires (e.g., data inconsistency, manual DB intervention), it would mark the order as paid without verifying the actual LN payment status.
In the current flow this path appears unreachable (Step 2 atomically sets
dev_fee_paid=truewith the hash), but as a defensive measure the logic should be correct:Proposed: check LND before assuming success
warn!( "Order {} has payment hash '{}' but dev_fee_paid=false. Recovering from failed DB update.", order_id, payment_hash ); - // Recovery: Mark as paid since hash exists (payment already succeeded) - order.dev_fee_paid = true; - match order.update(&pool).await { - Ok(_) => { - info!( - "✅ Recovered order {} - marked as paid with existing hash", - order_id - ); + // Recovery: Check LN status before marking as paid + match check_dev_fee_payment_status(&order, &pool).await { + DevFeePaymentState::Succeeded => { + info!("✅ Recovered order {} - confirmed paid via LN", order_id); + } + DevFeePaymentState::Failed => { + info!("Order {} hash exists but payment failed, clearing for retry", order_id); + order.dev_fee_paid = false; + order.dev_fee_payment_hash = None; + let _ = order.update(&pool).await; + } + _ => { + warn!("Order {} has hash but status uncertain, skipping", order_id); } - ... } continue;
🧹 Nitpick comments (3)
src/app/release.rs (1)
570-611:resolve_dev_fee_invoice— clean two-phase separation.The logic is sound and the doc comments are thorough. One subtle concern:
resolv_ln_address(seesrc/lnurl.rslines 71–123) returnsOk("".to_string())for several failure modes (tag mismatch, amount out of range, callback failure) rather than anErr. If that happens,decode_invoiceon line 602 will fail withInvoiceInvalidError, which is misleading — the root cause is LNURL resolution, not invoice decoding.Consider adding an empty-string guard before decoding:
Proposed guard
+ if payment_request.is_empty() { + error!( + "Dev fee LNURL resolution returned empty invoice for order {} ({} sats)", + order.id, order.dev_fee + ); + return Err(MostroInternalErr(ServiceError::LnAddressParseError)); + } + // Decode invoice and extract payment hash let invoice = decode_invoice(&payment_request)?;src/scheduler.rs (2)
643-682: Stale real-hash cleanup logic is correct for crash recovery.This correctly handles the scenario where the process crashes between storing the hash (step 2) and completing the payment. The LN status check prevents both false positives (resetting a successful payment) and false negatives (leaving a failed payment as "paid").
One efficiency concern:
check_dev_fee_payment_statuscreates a newLndConnectorper invocation (line 1000). If many orders need checking (stale real-hash loop + unpaid orders + timeout path), this could open many gRPC connections per scheduler cycle.Consider passing a shared LndConnector
-async fn check_dev_fee_payment_status( - order: &Order, - pool: &sqlx::Pool<sqlx::Sqlite>, -) -> DevFeePaymentState { - use crate::lightning::LndConnector; +async fn check_dev_fee_payment_status( + order: &Order, + pool: &sqlx::Pool<sqlx::Sqlite>, + ln_client: &mut LndConnector, +) -> DevFeePaymentState { use fedimint_tonic_lnd::lnrpc::payment::PaymentStatus; // ... - let mut ln_client = match LndConnector::new().await { - Ok(client) => client, - Err(e) => { /* ... */ } - };
967-1055:check_dev_fee_payment_status— correct status-driven logic with good fallbacks.The function correctly maps LN payment states and conservatively returns
Unknownfor any error/timeout, aligning with the "don't retry" principle.Minor note: the
useimports on lines 971–972 and 987 are inside the function body. While Rust allows this, they'd be more discoverable at file level with the other imports.
check_dev_fee_payment_status was creating a new gRPC connection on every invocation. Create the connector once in job_process_dev_fee_payment and pass it through, matching the pattern already used by job_cancel_orders.
Summary
Prevents duplicate dev fee payments caused by race conditions between timeouts and in-flight Lightning payments.
Closes #568
Problem
When the 50-second timeout expired during a dev fee payment, the code unconditionally reset
dev_fee_paid = falseand cleared the payment hash. But a timeout ≠ failure — the payment could still be in-flight or may have already succeeded on the LN network. This race condition leads to double payments.Solution
On timeout, query the LN node for the actual payment status via
TrackPaymentV2before deciding:Design principle: When in doubt, don't retry
A missed dev fee can be recovered manually; a duplicate payment is money lost.
Changes
src/lightning/mod.rsLndConnector::check_payment_status()— queries LND viaTrackPaymentV2for current payment statesrc/scheduler.rsDevFeePaymentStateenum (Succeeded, InFlight, Failed, Unknown)check_dev_fee_payment_status()helper — decodes hash, queries LND (10s timeout), updates DB on successdocs/DEV_FEE_TIMEOUT_SAFETY.mdVerification
cargo fmt— cleancargo clippy --all-targets --all-features— cleancargo test— 112 passed, 0 failedSummary by CodeRabbit
New Features
Bug Fixes
Documentation