-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Summary
- Context:
OpenAiRequestFactoryis responsible for building request payloads and managing prompt truncation for OpenAI-compatible providers. - Bug: The
truncatePromptForCompletionmethod calculates the token limit based on the most restrictive limit of both configured models (openaiModelandgithubModelsChatModel), regardless of which one will actually be used for the request. - Actual vs. expected: The prompt is truncated to 7,000 tokens if either model is from the GPT-5 family, even if the primary provider supports 100,000 tokens. Expected behavior is to truncate based on the specific model used for the request.
- Impact: Users of high-context models (like
gpt-4o) will experience severe and unnecessary context truncation (losing up to 93,000 tokens of context) because the default configuration for the unused provider includes a GPT-5 model.
Code with bug
public String truncatePromptForCompletion(String prompt) {
if (prompt == null || prompt.isEmpty()) {
return prompt;
}
String openaiModelId = normalizedModelId(false);
String githubModelId = normalizedModelId(true);
// BUG 🔴 Aggregates family check across BOTH configured models
boolean gpt5Family = isGpt5Family(openaiModelId) || isGpt5Family(githubModelId);
boolean reasoningModel = gpt5Family
|| canonicalModelName(openaiModelId).startsWith("o")
|| canonicalModelName(githubModelId).startsWith("o");
// BUG 🔴 Uses the most restrictive limit if ANY reasoning/gpt5 model is configured
int tokenLimit = reasoningModel ? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT;
String truncatedPrompt = chunker.keepLastTokens(prompt, tokenLimit);
if (truncatedPrompt.length() < prompt.length()) {
// BUG 🔴 Might show GPT-5 notice even when using a non-GPT-5 model
String truncationNotice = gpt5Family ? TRUNCATION_NOTICE_GPT5 : TRUNCATION_NOTICE_GENERIC;
return truncationNotice + truncatedPrompt;
}
return prompt;
}Evidence
- Reproduction Test: A test case was created where
OPENAI_MODELwas set togpt-4o(high context) andGITHUB_MODELS_CHAT_MODELwas left as default (openai/gpt-5, low context). Despite usinggpt-4ofor the completion, the prompt was truncated to 7,000 tokens and prepended with a GPT-5 truncation notice. - Default Values:
DEFAULT_GITHUB_MODELS_MODELisopenai/gpt-5. This means that by default,gpt5Familywill always betrueintruncatePromptForCompletion, forcing a 7,000 token limit on all completion requests across the entire application unless both providers are explicitly reconfigured. - Contrast with Streaming: The
prepareStreamingRequestmethod correctly identifies thetokenLimitbased on the providedApiProvider, ensuring that truncation is correctly scoped to the model actually being used.
Why has this bug gone undetected?
- Subtle Truncation: Truncation often goes unnoticed in chat applications unless the input is exceptionally long or the user is specifically monitoring token usage or context retention.
- Confusing Notice: The truncation notice
[Context truncated due to GPT-5 8K input limit]might be dismissed by users as a general limitation of the "system" or they might assume the backend is using a different model than they expected. - Mock Environment: The use of hypothetical "gpt-5" models as defaults suggests this part of the codebase might be tested primarily against these models, where the bug's effect (restricting to 7k) matches the model's actual limit.
Recommended fix
Refactor truncatePromptForCompletion to accept the ApiProvider as a parameter, or move the truncation logic into buildCompletionRequest where the provider is already known.
// Recommended fix in buildCompletionRequest 🟢
public ResponseCreateParams buildCompletionRequest(
String prompt, double temperature, RateLimitService.ApiProvider provider) {
boolean useGitHubModels = provider == RateLimitService.ApiProvider.GITHUB_MODELS;
String modelId = normalizedModelId(useGitHubModels);
// Truncate based on the resolved modelId here 🟢
int tokenLimit = (isGpt5Family(modelId) || canonicalModelName(modelId).startsWith("o"))
? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT;
String truncatedPrompt = chunker.keepLastTokens(prompt, tokenLimit);
return buildResponseParams(truncatedPrompt, temperature, modelId);
}Related bugs
- In
buildResponseParams, reasoning effort andmaxOutputTokensare only applied ifgpt5Familyis true. However, "o" models (likeo1-preview) are also identified asreasoningModelbut do not get these settings applied, even though they support them.
Reactions are currently unavailable