Add API to load pre-built index in wrapped_async#749
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a synchronous API in wrapped_async for loading a pre-built in-memory index from files by constructing a DataProvider via the storage loading traits, then wrapping it in the sync DiskANNIndex wrapper.
Changes:
- Introduces
SyncLoadWith+create_data_provider()indiskann-providers/src/index/wrapped_async.rsto load an index from a path/config. - Adds
futures-executordependency to synchronouslyblock_onthe asyncLoadWithfuture. - Updates
Cargo.lockfor the new dependency.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| diskann-providers/src/index/wrapped_async.rs | Adds synchronous load API and data-provider construction for pre-built index loading. |
| diskann-providers/Cargo.toml | Adds futures-executor dependency. |
| Cargo.lock | Records dependency resolution changes for futures-executor. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let data_provider = DP::load_with(provider, &pq_context); | ||
|
|
||
| block_on(data_provider) | ||
| } |
There was a problem hiding this comment.
This introduces futures_executor::block_on + a new futures-executor dependency just to synchronously wait for DP::load_with. Since this module already depends on Tokio and constructs runtimes/handles, prefer using a Tokio runtime/handle to block_on the load (or otherwise avoid adding an extra executor dependency).
| prost = "0.14.1" | ||
| futures-util.workspace = true | ||
| futures-executor = "0.3" |
There was a problem hiding this comment.
New dependency is added with an inline version (futures-executor = "0.3") instead of using [workspace.dependencies] like most shared third-party crates in this repo. Consider adding it to the workspace dependency list and referencing it via workspace = true to keep versions consistent across crates.
| pub trait SyncLoadWith: Sized { | ||
|
|
||
| fn load_with(path: &str, index_config: IndexConfiguration) -> Self; | ||
| } | ||
|
|
||
| // Load static memory index from pre-built files synchronously | ||
| impl<DP> SyncLoadWith for DiskANNIndex<DP> | ||
| where | ||
| DP: DataProvider<InternalId = u32> + LoadWith<AsyncQuantLoadContext, Error = ANNError>, | ||
| { | ||
| fn load_with(path: &str, index_config: IndexConfiguration) -> Self { | ||
| let storage = FileStorageProvider; | ||
| let data_provider = create_data_provider(&storage, path, &index_config); | ||
|
|
||
| match data_provider { | ||
| Ok(dp) => DiskANNIndex::<DP>::new_with_current_thread_runtime(index_config.config, dp), | ||
| Err(e) => panic!("Failed to create data provider: {}", e), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| pub fn create_data_provider<'a, P, DP>( | ||
| provider: &P, | ||
| path: &'a str, | ||
| index_config: &'a IndexConfiguration, | ||
| ) -> ANNResult<DP> | ||
| where | ||
| P: StorageReadProvider, | ||
| DP: DataProvider + LoadWith<AsyncQuantLoadContext, Error = ANNError>, | ||
| { | ||
| let pq_context = AsyncQuantLoadContext { | ||
| metadata: AsyncIndexMetadata::new(path), | ||
| num_frozen_points: index_config.num_frozen_pts, | ||
| metric: index_config.dist_metric, | ||
| prefetch_lookahead: index_config.prefetch_lookahead.map(|x| x.get()), | ||
| is_disk_index: false, // only support in-memory index loading here | ||
| prefetch_cache_line_level: index_config.prefetch_cache_line_level, | ||
| }; | ||
|
|
||
| let data_provider = DP::load_with(provider, &pq_context); | ||
|
|
||
| block_on(data_provider) | ||
| } |
There was a problem hiding this comment.
The new synchronous load path (SyncLoadWith::load_with / create_data_provider) isn’t covered by tests. Given the number of moving parts (context creation, file layout, error propagation), please add a unit/integration test that round-trips an index to storage and loads it through this wrapper API (ideally using VirtualStorageProvider to avoid filesystem side effects).
|
|
||
| use crate::{model::IndexConfiguration, | ||
| storage::{StorageReadProvider, LoadWith, AsyncQuantLoadContext,AsyncIndexMetadata, file_storage_provider::FileStorageProvider}}; | ||
|
|
There was a problem hiding this comment.
The new use crate::{... storage::{...}} import block is not formatted consistently (extra blank line, missing spaces after commas, long single line). Please run rustfmt / reformat this use to match the rest of the crate’s import style for readability and to avoid CI format/lint churn.
| fn load_with(path: &str, index_config: IndexConfiguration) -> Self { | ||
| let storage = FileStorageProvider; | ||
| let data_provider = create_data_provider(&storage, path, &index_config); | ||
|
|
||
| match data_provider { | ||
| Ok(dp) => DiskANNIndex::<DP>::new_with_current_thread_runtime(index_config.config, dp), | ||
| Err(e) => panic!("Failed to create data provider: {}", e), | ||
| } |
There was a problem hiding this comment.
SyncLoadWith::load_with panics on data-provider load failure. Since this is a public API for loading an index, it should return a Result (e.g., ANNResult<Self>) and propagate the underlying error instead of crashing the process.
| let pq_context = AsyncQuantLoadContext { | ||
| metadata: AsyncIndexMetadata::new(path), | ||
| num_frozen_points: index_config.num_frozen_pts, | ||
| metric: index_config.dist_metric, | ||
| prefetch_lookahead: index_config.prefetch_lookahead.map(|x| x.get()), | ||
| is_disk_index: false, // only support in-memory index loading here | ||
| prefetch_cache_line_level: index_config.prefetch_cache_line_level, | ||
| }; |
There was a problem hiding this comment.
create_data_provider duplicates the existing storage::create_load_context() helper (diskann-providers/src/storage/index_storage.rs) that already builds an AsyncQuantLoadContext from IndexConfiguration. Reusing the shared helper would reduce drift and keep load semantics consistent across the crate.
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #749 +/- ##
==========================================
- Coverage 89.01% 88.97% -0.04%
==========================================
Files 428 428
Lines 78294 78321 +27
==========================================
- Hits 69691 69687 -4
- Misses 8603 8634 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| Err(e) => Err(e), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
I feel like this would be an endless game of whack-a-mole, trying to introduce sync interfaces for all the async interfaces.
What if we did this instead:
Somewhere high in the stack of diskann-providers, we can make a function like block_on that can run any future. Then for these situations where we have an async function/trait that doesn't really need a multi-threaded runtime and is not performance sensitive, block_on can be used and we don't have to duplicate our interfaces.
Reference Issues/PRs
What does this implement/fix? Briefly explain your changes.
Any other comments?