Skip to content

Conversation

@valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Jan 6, 2026

Closes #4280, partially addresses #4286

Based on #4289

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jan 6, 2026

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 89.14141% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.02%. Comparing base (d555994) to head (94815bd).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 89.29% 19 Missing and 10 partials ⚠️
lightning/src/ln/channel.rs 90.09% 9 Missing and 1 partial ⚠️
lightning/src/util/ser.rs 40.00% 0 Missing and 3 partials ⚠️
lightning/src/ln/functional_test_utils.rs 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4303      +/-   ##
==========================================
+ Coverage   86.01%   86.02%   +0.01%     
==========================================
  Files         156      156              
  Lines      102766   103006     +240     
  Branches   102766   103006     +240     
==========================================
+ Hits        88396    88616     +220     
- Misses      11864    11881      +17     
- Partials     2506     2509       +3     
Flag Coverage Δ
tests 86.02% <89.14%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 01aa4c0 to 4012864 Compare January 7, 2026 21:50
@valentinewallace valentinewallace self-assigned this Jan 8, 2026
@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label Jan 8, 2026
@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 4012864 to e2ea1ed Compare January 8, 2026 21:16
@valentinewallace valentinewallace marked this pull request as ready for review January 8, 2026 21:16
@valentinewallace valentinewallace removed the request for review from wpaulino January 8, 2026 21:16
@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from e2ea1ed to c40741b Compare January 14, 2026 17:19
@valentinewallace
Copy link
Contributor Author

valentinewallace commented Jan 14, 2026

Pushed a rebase on main due to #4289 landing

@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from c40741b to 3f0f811 Compare January 29, 2026 16:09
@valentinewallace valentinewallace changed the title Follow-ups to #4227 (Part 2) Prevent HTLC double-forwards, prune forwarded onions Jan 29, 2026
@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 3f0f811 to fce5071 Compare January 29, 2026 16:27
@joostjager
Copy link
Contributor

Haven't reviewed yet, but the following question came up: in the previous pr and in this pr, double-forward bugs are fixed. Couldn't the fuzzer detect this?

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this PR is quite complex, with multiple interacting state machines (inbound HTLC state, outbound HTLC state, holding cells, monitor updates) that need to stay consistent across restarts. The fact that multiple double-forward bugs were discovered during development suggests the state space is large enough that targeted unit tests may not provide sufficient coverage.

I keep wondering if the existing fuzzer can exercise these restart scenarios with in-flight HTLCs at various stages. The current strategy of adding a specific regression test might be too reactive?

WithOnion { update_add_htlc: msgs::UpdateAddHTLC },
/// This inbound HTLC is a forward that was irrevocably committed to the outbound edge, allowing
/// its onion to be pruned and no longer persisted.
Forwarded {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this commit could've been split in two. First introducing the enum with WithOnion and Legacy, and then adding Forwarded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would introduce a lot of intermediary changes in the channel.rs tests, where InboundHTLCStates are created. I'd like to use ::Forwarded for them by the end of the PR since the ::Legacy variant is removed in the 0.5 PR. Do you still want this?

// We don't want to return an HTLC as needing processing if it already has a resolution that's
// pending in the holding cell.
let htlc_resolution_in_holding_cell = |id: u64| -> bool {
self.context.holding_cell_htlc_updates.iter().any(|holding_cell_htlc| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The htlc_resolution_in_holding_cell closure iterates the holding cell for each HTLC. Is it worth putting it in a set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth an allocation, given the per-channel htlc limits, but no preference here.

Forwarded {
/// Useful if we need to fail or claim this HTLC backwards after restart, if it's missing in the
/// outbound edge.
hop_data: HTLCPreviousHopData,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some data in here that is redundant with InboundHTLCOutput. Not sure if we care

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either. @TheBlueMatt any opinion? This is the simplest way so I might save the dedup for followup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yea, would still be quite nice to reduce size here but its alright to do it in a followup (just has to happen in 0.3!)

mem::take(&mut already_forwarded_htlcs).drain(..)
{
if hash != payment_hash {
already_forwarded_htlcs.push((hash, hop_data, outbound_amt_msat));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a different way to do this other than drain and push back again? Maybe filter or retain or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried retain at first, but with filter/retain we can't take-by-value and have to clone the hop_data 🤔

Forwarded {
/// Useful if we need to fail or claim this HTLC backwards after restart, if it's missing in the
/// outbound edge.
hop_data: HTLCPreviousHopData,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yea, would still be quite nice to reduce size here but its alright to do it in a followup (just has to happen in 0.3!)

@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from fce5071 to 84093ca Compare February 2, 2026 19:58
@valentinewallace
Copy link
Contributor Author

Rebased since #4332 landed, haven't pushed updates addressing feedback yet

valentinewallace and others added 6 commits February 2, 2026 16:15
We recently added support for reconstructing
ChannelManager::decode_update_add_htlcs on startup, using data present in the
Channels. However, we failed to prune HTLCs from this rebuilt map if a given
inbound HTLC was already forwarded to the outbound edge and in the outbound
holding cell (this bug could've caused us to double-forward HTLCs, fortunately
it never shipped).

As part of fixing this bug, we clean up the overall pruning approach by:
1. If the Channel is open, then it is the source of truth for what HTLCs are
outbound+pending (including pending in the holding cell)
2. If the Channel is closed, then the corresponding ChannelMonitor is the
source of truth for what HTLCs are outbound+pending

Previously, we would only consider the monitor's pending HTLCs, which ignored
holding cell HTLCs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This cleanup falls out of the changes made in the previous commit. Separated
out here for reviewability.
We recently added support for reconstructing
ChannelManager::decode_update_add_htlcs on startup, using data present in the
Channels. However, we failed to prune HTLCs from this rebuilt map if a given
HTLC was already forwarded+removed from the outbound edge and resolved in the
inbound edge's holding cell.

Here we fix this bug that would have caused us to
double-forward inbound HTLC forwards, which fortunately was not shipped.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
In 0.3+, we are taking steps to remove the requirement of regularly persisting
the ChannelManager and instead rebuild the set of HTLC forwards (and the
manager generally) from Channel{Monitor} data.

We previously merged support for reconstructing the
ChannelManager::decode_update_add_htlcs map from channel data, using a new
HTLC onion field that will be present for inbound HTLCs received on 0.3+ only.

However, we now want to add support for pruning this field once it's no longer
needed so it doesn't get persisted every time the manager gets persisted. At
the same time, in a future LDK version we need to detect whether the field was
ever present to begin with to prevent upgrading with legacy HTLCs present.

We accomplish both by converting the plain update_add option that was
previously serialized to an enum that indicates whether the HTLC is from 0.2-
versus 0.3+-with-onion-pruned.

Actual pruning of the new update_add field is added in the next commit.
We store inbound committed HTLCs' onions in Channels for use in reconstructing
the pending HTLC set on ChannelManager read. If an HTLC has been forwarded to
the outbound edge, we no longer need to persist the inbound edge's onion and
can prune it here.
We recently merged (test-only, for now) support for the ChannelManager
reconstructing its set of pending HTLCs from Channel{Monitor} data, rather than
using its own persisted maps. But because we want test coverage of both the new
reconstruction codepaths as well as the old persisted map codepaths,
in tests we would decide between those two sets of codepaths randomly.

We now want to add some tests that require using the new codepaths, so here we
add an option to explicitly set whether to reconstruct or not rather than
choosing randomly.
@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 84093ca to 94815bd Compare February 2, 2026 23:21
@valentinewallace
Copy link
Contributor Author

Addressed feedback: diff

@valentinewallace
Copy link
Contributor Author

The logic in this PR is quite complex, with multiple interacting state machines (inbound HTLC state, outbound HTLC state, holding cells, monitor updates) that need to stay consistent across restarts. The fact that multiple double-forward bugs were discovered during development suggests the state space is large enough that targeted unit tests may not provide sufficient coverage.

I keep wondering if the existing fuzzer can exercise these restart scenarios with in-flight HTLCs at various stages. The current strategy of adding a specific regression test might be too reactive?

I agree on the fuzzing, it's a little tricky to add so I'm still working on it and will probably save it for follow-up.

@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 94815bd to 89600ea Compare February 3, 2026 17:41
@valentinewallace
Copy link
Contributor Author

Pushed a fix for CI, also split out another commit: diff

Cleans it up a bit in preparation for adding a new variant in the next commit.
In the last commit, we added support for pruning an inbound HTLC's persisted
onion once the HTLC has been irrevocably forwarded to the outbound edge.

Here, we add a check on startup that those inbound HTLCs were actually handled.
Specifically, we check that the inbound HTLC is either (a) currently present in
the outbound edge or (b) was removed via claim. If neither of those are true,
we infer that the HTLC was removed from the outbound edge via fail and fail the
inbound HTLC backwards.
In 0.3+, we are taking steps to remove the requirement of regularly persisting
the ChannelManager and instead rebuild the set of HTLC forwards (and the
manager generally) from Channel{Monitor} data.

We previously merged support for reconstructing the
ChannelManager::decode_update_add_htlcs map from channel data, using a new
HTLC onion field that will be present for inbound HTLCs received on 0.3+ only.
The plan is that in upcoming LDK versions, the manager will reconstruct this
map and the other forward/claimable/pending HTLC maps will automatically
repopulate themselves on the next call to process_pending_htlc_forwards.

As such, once we're in a future version that reconstructs the pending HTLC set,
we can stop persisting the legacy ChannelManager maps such as forward_htlcs,
pending_intercepted_htlcs since they will never be used.

For 0.3 to be compatible with this future version, in this commit we detect
that the manager was last written on a version of LDK that doesn't persist the
legacy maps. In that case, we don't try to read the old forwards map and run
the new reconstruction logic only.
@valentinewallace valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 89600ea to f0f269c Compare February 3, 2026 19:08
@valentinewallace
Copy link
Contributor Author

CI fix part 2: diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

weekly goal Someone wants to land this this week

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

#4227 Followups

4 participants