Bug: Multi-byte UTF-8 characters corrupted in streaming responses causing LLM regeneration

## Description

When streaming responses from LLMs contain multi-byte UTF-8 characters (Turkish, emoji, CJK), they can be corrupted if the HTTP stream splits at byte boundaries. This causes the LLM to detect malformed JSON in structured output and restart generation, resulting in double-generated content within a single response.

## Symptoms

1. **Malformed JSON in agent responses**: Response contains nested/duplicated JSON structures
2. **LLM double-generation**: The LLM starts generating, gets cut off mid-character, then restarts and generates again
3. **Affects specific languages**: Turkish characters (ü, ö, ş, ç, ğ), emoji (🚀, 🎯), CJK (中, 日)
4. **Memory block errors**: Downstream blocks fail with "field doesn't exist" because the response wasn't properly formatted

Example of corrupted response:
```json
{
  "model": "google/gemini-3-flash-preview",
  "content": "{\"response\": \"First part of response... Ö{\"response\": \"Complete response...\"}",
  "toolCalls": { "list": [] }
}
```

The first "response" field is cut off at a multi-byte character boundary, then the JSON restarts.

## Root Cause

When HTTP streaming splits at byte boundaries within multi-byte UTF-8 sequences:
- Example: Turkish "Ö" is 2 bytes in UTF-8 (0xC3 0x96)
- If split between chunks: Chunk 1 gets 0xC3, Chunk 2 gets 0x96
- `TextDecoder.decode(chunk)` without `{ stream: true }` doesn't maintain state
- Each chunk decoded separately outputs a replacement character (U+FFFD)
- Result: JSON becomes malformed → LLM detects error → restarts generation

## Impact

This affects:
- All workflows using agents with structured output (responseFormat)
- Any language with multi-byte UTF-8 characters
- Streaming responses from any LLM provider (OpenAI, Anthropic, Google, OpenRouter, etc.)

## Example Scenario

A support bot configured with structured output (strict JSON schema) receives a query in a language with multi-byte characters. The agent's response gets split during streaming:

1. First chunk arrives with incomplete byte sequence of a multi-byte character
2. TextDecoder without `{ stream: true }` outputs replacement character
3. JSON becomes invalid
4. LLM detects malformed JSON (syntax error)
5. LLM regenerates response within same call
6. Both partial and regenerated responses concatenated = corrupt output
7. Downstream memory/processing blocks fail with "field doesn't exist" errors

## Workaround

Use `TextDecoder.decode(value, { stream: true })` where TextDecoder is created once outside the streaming loop. This allows the decoder to maintain internal state and buffer incomplete multi-byte sequences until the next chunk arrives.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Multi-byte UTF-8 characters corrupted in streaming responses causing LLM regeneration #3068

Description

Symptoms

Root Cause

Impact

Example Scenario

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Multi-byte UTF-8 characters corrupted in streaming responses causing LLM regeneration #3068

Description

Description

Symptoms

Root Cause

Impact

Example Scenario

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions