Skip to content

Conversation

@joostjager
Copy link
Contributor

@joostjager joostjager commented Feb 3, 2026

PoC branch to evaluate integrating DeferredChainMonitor (lightningdevkit/rust-lightning#4345) into ldk-node.

For local storage backends (SQLite, filesystem), this uses DeferredChainMonitor which defers monitor operations until explicitly flushed.

VSS continues to use the regular ChainMonitor. While this isn't safe against force-closes, it avoids introducing potentially high-latency channel manager writes into the critical path. Currently this provides no practical benefit since the background processor loop isn't sufficiently parallelized. Payment latency wouldn't actually increase if we'd also use deferred writing for VSS. This is primarily a forward-looking optimization for when that parallelization is addressed.

This change uses LDK's DeferredChainMonitor for local storage backends
(SQLite, filesystem) instead of the regular ChainMonitor. The deferred
variant queues watch_channel and update_channel operations for later
flushing, enabling safe persistence ordering where the ChannelManager
is persisted before the channel monitors. This ensures crash safety.

VSS storage backends continue to use the regular ChainMonitor since
VSS handles its own persistence ordering.

The implementation:
- Adds ChainMonitor enum that wraps both Regular and Deferred variants
- Implements all required traits (Watch, Listen, Confirm, AChainMonitor,
  BaseMessageHandler, SendOnlyMessageHandler, EventsProvider) for the enum
- Adds use_deferred_chain_monitor parameter to build_with_store_internal
- Updates VSS build methods to use regular ChainMonitor (false)
- Updates non-VSS build methods to use DeferredChainMonitor (true)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ldk-reviews-bot
Copy link

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

@tnull
Copy link
Collaborator

tnull commented Feb 3, 2026

VSS continues to use the regular ChainMonitor. While this isn't safe against force-closes, it avoids introducing potentially high-latency channel manager writes into the critical path.

Hmm, that's unfortunate. I imagine especially mobile and VSS-driven nodes would benefit the most from any change improving on the CM/CM inconsistency situation?

@TheBlueMatt
Copy link
Contributor

Yea, I think its an open question what we should do - on the one hand nodes with remote persistence are going to be the most impacted by the increase in sending latency (which is probably something where we're currently in an unacceptably-bad state, given how single-threaded some of LDK's logic is around the BP!). OTOH, they are also somewhat more likely to hit the FC-due-to-out-of-sync issues because they have high latency persistence.

I've mentioned to Joost but another option we have is to do the chanman and monitor writes at the same time but spawn them in-order, which will at least give us likely protection. We should maybe discuss live which option we want to go with.

In any case, since this is now using the async pipeline for monitor persistence anyway, we should probably switch to actual async persistence for monitors at the same time.

@joostjager
Copy link
Contributor Author

Parallel writes started in order still doesn't fully close the gap though. We'd remain in "mostly works" territory where the race window is smaller but not eliminated.

As discussed offline, for high-latency backends, an option to avoid unnecessary round trips is batched writes. Doesn't need to be atomic (which would require all KV stores to support transactions), just ordered: write chanman first, then monitors, but send them together. This would fix the FC problem without being unnecessarily slow for remote storage.

The downside is extending the KVStore interface with a batch write method, but we could provide a blanket implementation for existing KV stores that just iterates through the writes sequentially. For VSS specifically we'd implement actual batch sending to get the latency benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants