Description
When configuring Ollama via Litellm, the Litellm docs recommend using:
ollama_chat for better responses
This uses Ollama’s /api/chat endpoint.
However, in ART’s RULER integration, the judge model needs to:
- Produce JSON-structured outputs (e.g. match a Pydantic schema / tool schema) so RULER can parse correctness, reasoning, etc.
The problem:
Because of that, I’m forced to fall back to the older /api/generate endpoint (ollama), even though:
- Litellm explicitly recommends
ollama_chat over ollama.
/api/chat is the more modern endpoint.
What I expect
-
Either:
- Official support / documentation for using Ollama’s
/api/chat (ollama_chat) as a RULER judge with JSON-schema / tool-call style responses; or
- Clear guidance that for RULER’s JSON-schema needs, we must currently use
/api/generate and ollama.
What actually happens
-
Using ollama_chat:
- The
/api/chat endpoint does not produce the JSON / tool-call schema RULER expects.
- Judge calls fail schema validation.
-
Using ollama:
- RULER works better, but we lose out on the newer
/api/chat behavior Litellm recommends.
Request
-
Please:
-
Document the supported Ollama configuration for RULER (which engine, which endpoint, any special settings).
-
If possible, add direct support for ollama_chat + /api/chat that:
- Ensures JSON-mode / tool-call style output works with RULER’s structured response expectations.
-
Or provide example configs + templates that make Ollama’s /api/chat usable with RULER.