Skip to content

Comments

LCORE-86: Prioritize BYOK content over built-in content#1208

Draft
are-ces wants to merge 1 commit intolightspeed-core:mainfrom
are-ces:chunk-prio
Draft

LCORE-86: Prioritize BYOK content over built-in content#1208
are-ces wants to merge 1 commit intolightspeed-core:mainfrom
are-ces:chunk-prio

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Feb 24, 2026

TL;DR

  • Dual RAG Strategies: Added configurable "Always RAG" (pre-query context injection) and "Tool RAG" (""Agentic"" RAG). Both can be used independently or together.
  • Multi-Source Support: Query BYOK and Solr OKP simultaneously and merge results with BYOK prioritized first.
  • Chunk Prioritization: Implemented score multipliers per vector store for Always RAG BYOK to weight results from different sources.
  • Configuration Updates: Tool RAG defaults to enabled=True for backward compatibility. Both strategies independently configurable in lightspeed-stack.yaml.
  • Enrichment Script Improvements: Updated to automatically configure Solr in llama-stack and fix bugs in vector store configuration building.
  • Max chunk Knobs: added constants to define max_chunks for Tool RAG, BYOK RAG (Always Rag), Solr RAG (Always RAG)

Motivation

To prioritize BYOK content, a mechanism to tune chunk scoring per vector store is necessary. The alternative was to create a client-side tool to change the behavior of the RAG tool, but that is not optimal for two reasons:

  1. RAG as a tool alone is not the best strategy- it relies on the LLM deciding when to call RAG, but in practice the majority of use cases require RAG on every query (e.g. in OLS).
  2. In Llama Stack 0.2.x the Agent could use a client-side function directly as a tool. In 0.3.x+ the Responses API returns a function tool call that must be intercepted by LCORE to then perform RAG, which increases complexity significantly.

Always RAG with LCORE-side chunk prioritization avoids both issues.

Description

Adds configurable RAG strategies and chunk prioritization. Control which documentation sources to search, how to search them, and which sources are more important.

How It Works

Always RAG retrieves chunks from configured sources (BYOK and/or Solr) and injects them into the query before sending to the LLM. The LLM always has documentation context without calling tools.

Tool RAG lets the AI call the file_search tool during generation. This is the original behavior, enabled by default for backward compatibility.

Note: Both can be enabled simultaneously, but Tool RAG is non-deterministic - the LLM decides when to invoke file_search, so behavior varies between requests and has not been tested properly in conjunction with Always RAG.

Chunk Prioritization applies a score_multiplier per BYOK source. All sources are queried in parallel, chunk scores are multiplied by their source's weight, then merged and sorted. Top N chunks are selected across all sources. Solr chunks are appended after BYOK chunks without cross-source ranking (TBD, needs discussion / spike).

Configuration

In lightspeed-stack.yaml:

rag:
  always:
    solr:
      enabled: true     # Enable/disable Solr pre-query search
      offline: true     # true = Mimir URLs, false = public URLs
    byok:
      enabled: true     # Enable/disable BYOK pre-query search
  tool:
    byok: # for clarity, RAG as a tool is a llama-stack tool and does not support Solr
      enabled: true     # Enable/disable tool-based RAG (default: true)

byok_rag:
  - rag_id: source-a
    score_multiplier: 1.0    # Baseline
  - rag_id: source-b
    score_multiplier: 1.5    # 50% boost
    # ... other config (embedding_model, vector_db_id, db_path)

Chunk limits in src/constants.py:

BYOK_RAG_MAX_CHUNKS = 10    # Total chunks from all BYOK sources
SOLR_RAG_MAX_CHUNKS = 5    # Total chunks from Solr
TOOL_RAG_MAX_CHUNKS = 10  # Total chunks from RAG as a tool

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue # LCORE-86
  • Closes # LCORE-86

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

Prerequisites: BYOK vector stores (FAISS) created with rag-content tool, OKP Solr instance, Llama Stack 0.4.3+, Lightspeed Stack Providers installed, OpenAI API key.

  1. Configure chunk limits in src/constants.py:

    BYOK_RAG_MAX_CHUNKS = 5    # Total chunks from all BYOK sources
    SOLR_RAG_MAX_CHUNKS = 5    # Total chunks from Solr
  2. Configure Lightspeed Stack - add BYOK RAG sources and RAG strategy to lightspeed-stack.yaml:

    # BYOK (Bring Your Own Knowledge) RAG configuration
    byok_rag:
      - rag_id: openshift-docs-part1
        rag_type: inline::faiss
        embedding_model: sentence-transformers/<path-to-embeddings-model>
        embedding_dimension: 768
        vector_db_id: vs_e4270ee9-0834-422b-83f3-3dca19e2454e
        db_path: <path-to-faiss-store>/os-start-ac/faiss_store.db
        score_multiplier: 1.0
      - rag_id: openshift-docs-part2
        rag_type: inline::faiss
        embedding_model: sentence-transformers/<path-to-embeddings-model>
        embedding_dimension: 768
        vector_db_id: vs_61f66acb-c014-4ffa-b4fb-228b49219ca9
        db_path: <path-to-faiss-store>/os-end-mc/faiss_store.db
        score_multiplier: 1.2
      - rag_id: graham-docs
        rag_type: inline::faiss
        embedding_model: sentence-transformers/all-mpnet-base-v2
        embedding_dimension: 768
        vector_db_id: vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2
        db_path: <path-to-faiss-store>/kv_store.db
        score_multiplier: 1.5
    
    # RAG configuration
    rag:
      always:
        solr:
          enabled: true
          offline: false
        byok:
          enabled: true
      tool:
        byok:
          enabled: false
  3. Run enrichment script - reads lightspeed-stack config and generates an enriched llama-stack run.yaml with BYOK vector stores and Solr provider registered:

    uv run python src/llama_stack_configuration.py \
      --input <path-to-llama-stack>/run_rag_minimal.yaml \
      --output <path-to-llama-stack>/run_enriched.yaml \
      --config lightspeed-stack.yaml
  4. Install Lightspeed Stack Providers:

    uv add git+https://github.com/lightspeed-core/lightspeed-providers.git
  5. Start Llama Stack:

    export EXTERNAL_PROVIDERS_DIR=<path-to-lightspeed-stack>/providers.d
    uv run llama stack run <path-to-llama-stack>/run_enriched.yaml
  6. Start Lightspeed Stack:

    uv run src/lightspeed_stack.py -c lightspeed-stack.yaml
  7. Test query — send to /v1/query or streaming endpoint:

    curl -X POST http://localhost:8080/v1/query \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-4o-mini",
        "provider": "openai",
        "query": "What is the Admission plugin?",
        "system_prompt": "You are a helpful assistant"
      }'

Response:

{
  "conversation_id": "...",
  "response": "Admission plugins in Red Hat OpenShift Container Platform are [...]",
  "rag_chunks": [
    {
      "content": "...",
      "source": "openshift-docs-part1",
      "score": 1.0038,
      "attributes": {
        "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt",
        "title": "Admission plugins",
        "document_id": "file-7a70ef22c4a646f2a6f657c66961ba2c"
      }
    },
    {
      "content": "...",
      "source": "openshift-docs-part2",
      "score": 0.926,
      "attributes": {
        "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt",
        "title": "Overview of dynamic plugins",
        "document_id": "file-b266f575a95a4da19d7ba058fd980f00"
      }
    },
    {
      "content": "...",
      "source": "OKP Solr",
      "score": 63.996,
      "attributes": {
        "document_id": "/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index"
      }
    }
  ],
  "referenced_documents": [
    {
      "doc_title": "Admission plugins",
      "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt",
      "source": "openshift-docs-part1"
    },
    {
      "doc_title": "Overview of dynamic plugins",
      "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt",
      "source": "openshift-docs-part2"
    },
    {
      "doc_title": null,
      "doc_url": "https://mimir.corp.redhat.com/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index",
      "source": "OKP Solr"
    }
  ],
  "truncated": false,
  "input_tokens": 3736,
  "output_tokens": 448
}

- Add configurable RAG strategies: always RAG which is performed at each query (OKP Solr + BYOK) and tool RAG can be used independently or together
- Add chunk prioritization with score multipliers per vector store for always RAG
- Added knobs in config to select the RAG strategy
- Tool RAG defaults to enabled=True for backward compatibility
- Update lightspeed stack configuration enrichment script to build the solr section in llama stack and fix bugs in building the vector stores
@are-ces are-ces marked this pull request as draft February 24, 2026 12:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@are-ces
Copy link
Contributor Author

are-ces commented Feb 24, 2026

Chunks from Solr did not include title metadata, thus I get null values for Solr chunks titles. @Anxhela21 you might know why, I might have missed something in the Solr configuration..

Printout of the Solr chunk structure returned by client.vector_io.query:

Chunk(
    chunk_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
    content  = '# Chapter 9. Admission plugins\n\n\n\n\nAdmission plugins are used to help regulate how [..]',
    chunk_metadata = ChunkChunkMetadata(
        chunk_embedding_dimension = None,
        chunk_embedding_model     = None,
        chunk_id                  = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
        chunk_tokenizer           = None,
        chunk_window              = None,
        content_token_count       = None,
        created_timestamp         = None,
        document_id               = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index',
        metadata_token_count      = None,
        source                    = None,
        updated_timestamp         = None,
    ),
    embedding           = [],
    metadata            = {},
    embedding_model     = 'sentence-transformers/ibm-granite/granite-embedding-30m-english',
    embedding_dimension = 384,
)

@are-ces are-ces requested review from Anxhela21, asimurka and tisnik and removed request for Anxhela21 February 24, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant