Skip to content

Conversation

@gopalrs
Copy link
Contributor

@gopalrs gopalrs commented Feb 10, 2026

Integrating filter search with diskann-benchmark.

- Refactored recall utilities in diskann-benchmark
- Updated tokio utilities
- Added attribute and format parser improvements in label-filter
- Updated ground_truth utilities in diskann-tools
@gopalrs gopalrs requested review from a team and Copilot February 10, 2026 04:44
@gopalrs gopalrs changed the title Sync changes from CDB_DiskANN repo [Draft, Not for review] Sync changes from CDB_DiskANN repo Feb 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Syncs changes aimed at integrating label-filtered search into the benchmarking/tooling flow, primarily by enhancing ground-truth label evaluation and aligning benchmark recall logic with diskann-benchmark-core.

Changes:

  • Extend diskann-tools ground-truth label filtering to handle array-valued label structures and add diagnostics/logging.
  • Refactor diskann-benchmark recall utilities to delegate to diskann-benchmark-core.
  • Adjust diskann-label-filter JSON flattening and minor formatting/import changes; update diskann-tools dependencies.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
diskann-tools/src/utils/ground_truth.rs Adds array-expansion support for label evaluation and extra tracing diagnostics during bitmap + ground-truth generation.
diskann-tools/Cargo.toml Updates dependencies (adds serde_json, removes some unused deps) and adjusts dev-deps/features.
diskann-label-filter/src/utils/flatten_utils.rs Modifies how arrays are flattened into pointer paths.
diskann-label-filter/src/parser/format.rs Minor formatting-only change.
diskann-label-filter/src/attribute.rs Adds an import (currently unused).
diskann-benchmark/src/utils/tokio.rs Tweaks doc comment and removes unit tests.
diskann-benchmark/src/utils/recall.rs Replaces local recall/AP implementations with diskann-benchmark-core wrappers/types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1 to 8
[package]
name = "diskann-tools"
edition = "2021"
version.workspace = true
authors.workspace = true
description.workspace = true
documentation.workspace = true
license.workspace = true

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate no longer declares license.workspace = true, while the rest of the workspace crates do. This makes the crate metadata inconsistent and can break workspace-wide tooling (e.g., publish checks) that expects license metadata to be inherited from the workspace. Re-add license.workspace = true under [package] for consistency.

Copilot uses AI. Check for mistakes.
results: &dyn benchmark_core::recall::Rows<T>,
groundtruth: &dyn benchmark_core::recall::Rows<T>,
recall_k: &[usize],
recall_n: &[usize],
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_multiple_recalls changed from taking StridedView + an enhanced_metrics flag to taking &dyn benchmark_core::recall::Rows<T> and no longer accepts enhanced_metrics. There are existing call sites in diskann-benchmark/src/backend/exhaustive/* that still call compute_multiple_recalls(..., false) and pass StridedView conversions, which will not compile. Either update those call sites in this PR, or provide a backwards-compatible wrapper/overload that preserves the old signature and adapts into benchmark_core::recall::Rows.

Suggested change
recall_n: &[usize],
recall_n: &[usize],
_enhanced_metrics: bool,

Copilot uses AI. Check for mistakes.
average_precision: m.average_precision,
}
}
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module removed APMetrics and compute_average_precision, but there are existing call sites (e.g. diskann-benchmark/src/backend/index/range_search.rs and .../result.rs) that still reference utils::recall::compute_average_precision and utils::recall::APMetrics. As-is, the benchmark crate will not compile. Either restore these items as thin wrappers/type aliases around benchmark_core::recall equivalents, or update the call sites in this PR to use the new AveragePrecisionMetrics API.

Suggested change
}
}
/// Backwards-compatible alias for the core average precision metrics.
/// This keeps existing `utils::recall::APMetrics` call sites compiling.
pub(crate) type APMetrics = benchmark_core::recall::AveragePrecisionMetrics;
/// Re-export the core `compute_average_precision` function so existing
/// `utils::recall::compute_average_precision` call sites continue to work.
pub(crate) use benchmark_core::recall::compute_average_precision;

Copilot uses AI. Check for mistakes.
Comment on lines +153 to +162
// Print per-query match counts
for (i, bitmap) in query_bitmaps.iter().enumerate() {
if i < 10 || bitmap.is_empty() {
tracing::debug!(
"Query {}: {} base vectors matched the filter",
i,
bitmap.len()
);
}
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-query debug logging loop will emit a debug! line for every query with an empty bitmap (if i < 10 || bitmap.is_empty()), which can explode log volume when filters are selective or misconfigured (especially if debug logging is enabled). Consider limiting this to the first N queries only, or logging aggregate statistics plus a small sample of empty/non-empty queries.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +78
/// Expands a JSON object with array-valued fields into multiple objects with scalar values.
/// For example: {"country": ["AU", "NZ"], "year": 2007}
/// becomes: [{"country": "AU", "year": 2007}, {"country": "NZ", "year": 2007}]
///
/// If multiple fields have arrays, all combinations are generated.
fn expand_array_fields(value: &Value) -> Vec<Value> {
match value {
Value::Object(map) => {
// Start with a single empty object
let mut results: Vec<Map<String, Value>> = vec![Map::new()];

for (key, val) in map.iter() {
if let Value::Array(arr) = val {
// Expand: for each existing result, create copies for each array element
let mut new_results: Vec<Map<String, Value>> = Vec::new();
for existing in results.iter() {
for item in arr.iter() {
let mut new_map: Map<String, Value> = existing.clone();
new_map.insert(key.clone(), item.clone());
new_results.push(new_map);
}
}
// If array is empty, keep existing results without this key
if !arr.is_empty() {
results = new_results;
}
} else {
// Non-array field: add to all existing results
for existing in results.iter_mut() {
existing.insert(key.clone(), val.clone());
}
}
}

results.into_iter().map(Value::Object).collect()
}
// If not an object, return as-is
_ => vec![value.clone()],
}
}

/// Evaluates a query expression against a label, expanding array fields first.
/// Returns true if any expanded variant matches the query.
fn eval_query_with_array_expansion(query_expr: &ASTExpr, label: &Value) -> bool {
let expanded = expand_array_fields(label);
expanded.iter().any(|item| eval_query_expr(query_expr, item))
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand_array_fields clones the full JSON object and generates a Cartesian product of array-valued fields, and it's invoked inside the inner loop over base_labels for every query. This can become extremely expensive (exponential in the number/size of array fields, multiplied by num_queries * num_base_labels). Consider avoiding full expansion/allocation by evaluating the predicate directly against arrays (e.g., treat equality as "any element matches" for array-valued fields), or precomputing any needed expansions once per base label rather than per query evaluation.

Copilot uses AI. Check for mistakes.

use std::fmt::Display;
use std::hash::{Hash, Hasher};
use std::io::Write;
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::io::Write; is unused in this module (no Write methods are called; the write_u8/write_i64/etc calls are on std::hash::Hasher). This will trigger unused_imports warnings under typical lint settings; remove the import.

Suggested change
use std::io::Write;

Copilot uses AI. Check for mistakes.
Comment on lines 6 to 20
@@ -18,21 +18,3 @@ pub(crate) fn block_on<F: std::future::Future>(future: F) -> F::Output {
.expect("current thread runtime initialization failed")
.block_on(future)
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime helper previously had a unit test validating runtime(num_threads) creates the expected number of workers, but the entire test module was removed. If the old test was removed due to tokio runtime metrics being unstable/unsupported in this build, consider replacing it with a stable assertion (or moving coverage to an integration test) so runtime()/block_on() behavior remains exercised by CI.

Copilot uses AI. Check for mistakes.
Comment on lines +323 to +331
for (query_idx, npq) in ground_truth.iter().enumerate() {
let neighbors: Vec<_> = npq.iter().collect();
let neighbor_count = neighbors.len();

if query_idx < 10 {
// Print top K IDs and distances for first 10 queries
let top_ids: Vec<u32> = neighbors.iter().take(10).map(|n| n.id).collect();
let top_dists: Vec<f32> = neighbors.iter().take(10).map(|n| n.distance).collect();
tracing::debug!(
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added ground-truth debug section allocates and clones neighbor data for every query (let neighbors: Vec<_> = npq.iter().collect();) even though detailed logging is only done for the first 10 queries. For large benchmarks this adds avoidable overhead. Prefer using npq.iter().count()/npq.len() for counts and only collecting the top-N neighbors for the small set of queries you actually log.

Copilot uses AI. Check for mistakes.
Value::Array(arr) => {
for (i, item) in arr.iter().enumerate() {
flatten_recursive(item, stack.push(&i, separator), out, separator);
flatten_recursive(item, stack.push(&String::from(""), separator), out, separator);
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array flattening is now pushing an empty string segment for every array element, which drops the index from the generated path (e.g., "/tags/0" becomes something like "/tags/") and can also create duplicate keys. This breaks the JSON Pointer semantics documented in this file and will fail the existing tests that assert "/tags/0" and "/tags/1". Use the numeric index (i) when building JSON Pointer paths, or implement the existing include_array_indices config flag (currently unused) to conditionally include the index.

Suggested change
flatten_recursive(item, stack.push(&String::from(""), separator), out, separator);
let index = i.to_string();
flatten_recursive(item, stack.push(&index, separator), out, separator);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants