LCORE-86: Prioritize BYOK content over built-in content by are-ces · Pull Request #1208 · lightspeed-core/lightspeed-stack

are-ces · 2026-02-24T12:41:30Z

TL;DR

Dual RAG Strategies: Added configurable "Always RAG" (pre-query context injection) and "Tool RAG" (""Agentic"" RAG). Both can be used independently or together.
Multi-Source Support: Query BYOK and Solr OKP simultaneously and merge results with BYOK prioritized first.
Chunk Prioritization: Implemented score multipliers per vector store for Always RAG BYOK to weight results from different sources.
Configuration Updates: Tool RAG defaults to enabled=True for backward compatibility. Both strategies independently configurable in lightspeed-stack.yaml.
Enrichment Script Improvements: Updated to automatically configure Solr in llama-stack and fix bugs in vector store configuration building.
Max chunk Knobs: added constants to define max_chunks for Tool RAG, BYOK RAG (Always Rag), Solr RAG (Always RAG)

Motivation

To prioritize BYOK content, a mechanism to tune chunk scoring per vector store is necessary. The alternative was to create a client-side tool to change the behavior of the RAG tool, but that is not optimal for two reasons:

RAG as a tool alone is not the best strategy- it relies on the LLM deciding when to call RAG, but in practice the majority of use cases require RAG on every query (e.g. in OLS).
In Llama Stack 0.2.x the Agent could use a client-side function directly as a tool. In 0.3.x+ the Responses API returns a function tool call that must be intercepted by LCORE to then perform RAG, which increases complexity significantly.

Always RAG with LCORE-side chunk prioritization avoids both issues.

Description

Adds configurable RAG strategies and chunk prioritization. Control which documentation sources to search, how to search them, and which sources are more important.

How It Works

Always RAG retrieves chunks from configured sources (BYOK and/or Solr) and injects them into the query before sending to the LLM. The LLM always has documentation context without calling tools.

Tool RAG lets the AI call the file_search tool during generation. This is the original behavior, enabled by default for backward compatibility.

Note: Both can be enabled simultaneously, but Tool RAG is non-deterministic - the LLM decides when to invoke file_search, so behavior varies between requests and has not been tested properly in conjunction with Always RAG.

Chunk Prioritization applies a score_multiplier per BYOK source. All sources are queried in parallel, chunk scores are multiplied by their source's weight, then merged and sorted. Top N chunks are selected across all sources. Solr chunks are appended after BYOK chunks without cross-source ranking (TBD, needs discussion / spike).

Configuration

In lightspeed-stack.yaml:

rag:
  always:
    solr:
      enabled: true     # Enable/disable Solr pre-query search
      offline: true     # true = Mimir URLs, false = public URLs
    byok:
      enabled: true     # Enable/disable BYOK pre-query search
  tool:
    byok: # for clarity, RAG as a tool is a llama-stack tool and does not support Solr
      enabled: true     # Enable/disable tool-based RAG (default: true)

byok_rag:
  - rag_id: source-a
    score_multiplier: 1.0    # Baseline
  - rag_id: source-b
    score_multiplier: 1.5    # 50% boost
    # ... other config (embedding_model, vector_db_id, db_path)

Chunk limits in src/constants.py:

BYOK_RAG_MAX_CHUNKS = 10    # Total chunks from all BYOK sources
SOLR_RAG_MAX_CHUNKS = 5    # Total chunks from Solr
TOOL_RAG_MAX_CHUNKS = 10  # Total chunks from RAG as a tool

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude

Related Tickets & Documents

Related Issue # LCORE-86
Closes # LCORE-86

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Prerequisites: BYOK vector stores (FAISS) created with rag-content tool, OKP Solr instance, Llama Stack 0.4.3+, Lightspeed Stack Providers installed, OpenAI API key.

Configure chunk limits in src/constants.py:

BYOK_RAG_MAX_CHUNKS = 5    # Total chunks from all BYOK sources
SOLR_RAG_MAX_CHUNKS = 5    # Total chunks from Solr

Configure Lightspeed Stack - add BYOK RAG sources and RAG strategy to lightspeed-stack.yaml:

# BYOK (Bring Your Own Knowledge) RAG configuration
byok_rag:
  - rag_id: openshift-docs-part1
    rag_type: inline::faiss
    embedding_model: sentence-transformers/<path-to-embeddings-model>
    embedding_dimension: 768
    vector_db_id: vs_e4270ee9-0834-422b-83f3-3dca19e2454e
    db_path: <path-to-faiss-store>/os-start-ac/faiss_store.db
    score_multiplier: 1.0
  - rag_id: openshift-docs-part2
    rag_type: inline::faiss
    embedding_model: sentence-transformers/<path-to-embeddings-model>
    embedding_dimension: 768
    vector_db_id: vs_61f66acb-c014-4ffa-b4fb-228b49219ca9
    db_path: <path-to-faiss-store>/os-end-mc/faiss_store.db
    score_multiplier: 1.2
  - rag_id: graham-docs
    rag_type: inline::faiss
    embedding_model: sentence-transformers/all-mpnet-base-v2
    embedding_dimension: 768
    vector_db_id: vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2
    db_path: <path-to-faiss-store>/kv_store.db
    score_multiplier: 1.5

# RAG configuration
rag:
  always:
    solr:
      enabled: true
      offline: false
    byok:
      enabled: true
  tool:
    byok:
      enabled: false

Run enrichment script - reads lightspeed-stack config and generates an enriched llama-stack run.yaml with BYOK vector stores and Solr provider registered:

uv run python src/llama_stack_configuration.py \
  --input <path-to-llama-stack>/run_rag_minimal.yaml \
  --output <path-to-llama-stack>/run_enriched.yaml \
  --config lightspeed-stack.yaml

Install Lightspeed Stack Providers:

uv add git+https://github.com/lightspeed-core/lightspeed-providers.git

Start Llama Stack:

export EXTERNAL_PROVIDERS_DIR=<path-to-lightspeed-stack>/providers.d
uv run llama stack run <path-to-llama-stack>/run_enriched.yaml

Start Lightspeed Stack:

uv run src/lightspeed_stack.py -c lightspeed-stack.yaml

Test query — send to /v1/query or streaming endpoint:

curl -X POST http://localhost:8080/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "provider": "openai",
    "query": "What is the Admission plugin?",
    "system_prompt": "You are a helpful assistant"
  }'

Response:

{
  "conversation_id": "...",
  "response": "Admission plugins in Red Hat OpenShift Container Platform are [...]",
  "rag_chunks": [
    {
      "content": "...",
      "source": "openshift-docs-part1",
      "score": 1.0038,
      "attributes": {
        "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt",
        "title": "Admission plugins",
        "document_id": "file-7a70ef22c4a646f2a6f657c66961ba2c"
      }
    },
    {
      "content": "...",
      "source": "openshift-docs-part2",
      "score": 0.926,
      "attributes": {
        "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt",
        "title": "Overview of dynamic plugins",
        "document_id": "file-b266f575a95a4da19d7ba058fd980f00"
      }
    },
    {
      "content": "...",
      "source": "OKP Solr",
      "score": 63.996,
      "attributes": {
        "document_id": "/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index"
      }
    }
  ],
  "referenced_documents": [
    {
      "doc_title": "Admission plugins",
      "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt",
      "source": "openshift-docs-part1"
    },
    {
      "doc_title": "Overview of dynamic plugins",
      "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt",
      "source": "openshift-docs-part2"
    },
    {
      "doc_title": null,
      "doc_url": "https://mimir.corp.redhat.com/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index",
      "source": "OKP Solr"
    }
  ],
  "truncated": false,
  "input_tokens": 3736,
  "output_tokens": 448
}

- Add configurable RAG strategies: always RAG which is performed at each query (OKP Solr + BYOK) and tool RAG can be used independently or together - Add chunk prioritization with score multipliers per vector store for always RAG - Added knobs in config to select the RAG strategy - Tool RAG defaults to enabled=True for backward compatibility - Update lightspeed stack configuration enrichment script to build the solr section in llama stack and fix bugs in building the vector stores

coderabbitai · 2026-02-24T12:41:40Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

are-ces · 2026-02-24T13:01:26Z

Chunks from Solr did not include title metadata, thus I get null values for Solr chunks titles. @Anxhela21 you might know why, I might have missed something in the Solr configuration..

Printout of the Solr chunk structure returned by client.vector_io.query:

Chunk(
    chunk_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
    content  = '# Chapter 9. Admission plugins\n\n\n\n\nAdmission plugins are used to help regulate how [..]',
    chunk_metadata = ChunkChunkMetadata(
        chunk_embedding_dimension = None,
        chunk_embedding_model     = None,
        chunk_id                  = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
        chunk_tokenizer           = None,
        chunk_window              = None,
        content_token_count       = None,
        created_timestamp         = None,
        document_id               = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index',
        metadata_token_count      = None,
        source                    = None,
        updated_timestamp         = None,
    ),
    embedding           = [],
    metadata            = {},
    embedding_model     = 'sentence-transformers/ibm-granite/granite-embedding-30m-english',
    embedding_dimension = 384,
)

are-ces marked this pull request as draft February 24, 2026 12:41

are-ces requested review from Anxhela21, asimurka and tisnik and removed request for Anxhela21 February 24, 2026 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

LCORE-86: Prioritize BYOK content over built-in content#1208

LCORE-86: Prioritize BYOK content over built-in content#1208
are-ces wants to merge 1 commit intolightspeed-core:mainfrom
are-ces:chunk-prio

are-ces commented Feb 24, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 24, 2026

Review skipped

Uh oh!

are-ces commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

are-ces commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Motivation

Description

How It Works

Configuration

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Uh oh!

coderabbitai bot commented Feb 24, 2026

Review skipped

Uh oh!

are-ces commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

are-ces commented Feb 24, 2026 •

edited

Loading

are-ces commented Feb 24, 2026 •

edited

Loading