Skip to content

feat: add return_fields parameter to search methods (#955)#1097

Open
lailoo wants to merge 1 commit intoMemTensor:mainfrom
lailoo:feat/search-return-fields-955
Open

feat: add return_fields parameter to search methods (#955)#1097
lailoo wants to merge 1 commit intoMemTensor:mainfrom
lailoo:feat/search-return-fields-955

Conversation

@lailoo
Copy link

@lailoo lailoo commented Feb 13, 2026

Summary

Add optional return_fields parameter to search_by_embedding, search_by_keywords_like, search_by_keywords_tfidf, and search_by_fulltext methods across all graph DB backends (neo4j, neo4j_community, polardb).

Closes #955

Problem

Search methods only return {"id": ..., "score": ...}. Callers who need additional fields like memory, status, tags must make a separate get_node() call per result, causing N+1 query overhead.

Before (N+1 pattern):

results = graph_db.search_by_embedding(vector=query_vec, top_k=10)
# results = [{"id": "...", "score": 0.95}, ...]
# Need N extra queries:
mem_data = [graph_db.get_node(item["id"]) for item in results]  # 10 extra DB calls!

Solution

After (single query):

results = graph_db.search_by_embedding(
    vector=query_vec, top_k=10,
    return_fields=["memory", "status", "tags"]
)
# results = [{"id": "...", "score": 0.95, "memory": "...", "status": "activated", "tags": [...]}, ...]
# No extra get_node() calls needed!
  • Default return_fields=None preserves full backward compatibility
  • For Neo4j: modifies Cypher RETURN clause to include requested node.<field>
  • For PolarDB: adds properties column to SQL SELECT and extracts fields from JSON
  • Field name validation via _validate_return_fields() to prevent query injection

Changes

File Change
base.py Updated docstring + added _validate_return_fields() helper
neo4j.py Added return_fields to search_by_embedding
neo4j_community.py Added return_fields to search_by_embedding + _fetch_return_fields helper
polardb.py Added return_fields to all 4 search methods + _extract_fields_from_properties helper
test_search_return_fields.py 19 regression tests (Neo4j query/results, PolarDB helper, field validation, community DB)

Real Environment Verification

Tested against a live Neo4j 5 instance (Docker) with real vector search:

On main branch (before fix) — ❌ FAIL:

FAILED: search_by_embedding does NOT have 'return_fields' parameter
FAILED: Results only contain {'score', 'id'}. Need N+1 get_node() calls.

Cypher: RETURN node.id AS id, score
Result: {"id": "test-955-...", "score": 0.999}

On feat/search-return-fields-955 branch (after fix) — ✅ PASS:

✅ PASS: search_by_embedding has 'return_fields' parameter
✅ PASS: Without return_fields, results correctly contain only {'id', 'score'}
✅ PASS: return_fields works! Got memory='I love Python programming', status='activated', tags=['python', 'coding']
✅ PASS: N+1 problem eliminated

Cypher: RETURN node.id AS id, score, node.memory AS memory, node.status AS status, node.tags AS tags
Result: {"id": "test-955-...", "score": 0.999, "memory": "I love Python programming", "status": "activated", "tags": ["python", "coding"]}

Test Results

tests/graph_dbs/test_search_return_fields.py  19 passed

@lailoo lailoo force-pushed the feat/search-return-fields-955 branch 3 times, most recently from e370e03 to b7d1d84 Compare February 13, 2026 18:46
Add optional return_fields parameter to search_by_embedding,
search_by_keywords_like, search_by_keywords_tfidf, and search_by_fulltext
methods across all graph DB backends (neo4j, neo4j_community, polardb).

When return_fields is specified (e.g., ['memory', 'status', 'tags']),
the requested fields are included in each result dict alongside 'id'
and 'score', eliminating the need for N+1 get_node() calls.

Default is None, preserving full backward compatibility.

Changes:
- base.py: Updated docstring for search_by_embedding
- neo4j.py: Added return_fields to search_by_embedding, modified
  Cypher RETURN clause and record construction
- neo4j_community.py: Added return_fields to search_by_embedding,
  added _fetch_return_fields helper for direct vec_db path
- polardb.py: Added return_fields to all 4 search methods, added
  _extract_fields_from_properties helper for JSON property extraction

Closes MemTensor#955

fix: add field name validation to prevent query injection in return_fields

- Add _validate_return_fields() to BaseGraphDB base class with regex validation
- Apply validation in neo4j.py, neo4j_community.py, polardb.py before field name concatenation
- Add return_fields parameter to base class abstract method signature
- Revert unrelated .get(node_id) change back to .get(node_id, None)
- Add TestFieldNameValidation and TestNeo4jCommunitySearchReturnFields test classes (7 new tests)

fix: resolve ruff lint and format issues for CI compliance
@lailoo lailoo force-pushed the feat/search-return-fields-955 branch from e34122f to baadff4 Compare February 13, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: The database search methods should support specifying fields to be returned.

1 participant