5.3 KiB
Length and Context Window Auto-Continue Feature
Overview
This feature automatically continues the LLM loop in the agent when the previous stop reason was due to length or context window problems. This ensures that agents can complete their tasks even when responses are truncated due to token limits or context window constraints.
Problem Statement
When LLMs reach their token limits or context window boundaries, they may return finish reasons like:
"length"
- Response reached the maximum token limit"context_length_exceeded"
- Context window was exceeded"max_tokens"
- Maximum tokens were reached
Previously, the agent would stop execution when encountering these finish reasons, potentially leaving tasks incomplete.
Solution
The agent now automatically continues the LLM loop when it encounters these specific finish reasons, similar to how it already handles "tool_calls"
finish reasons.
Implementation Details
Modified Files
backend/agentpress/thread_manager.py
- Updated
auto_continue_wrapper()
function to detect length/context finish reasons - Added logic to continue the loop for
"length"
,"context_length_exceeded"
, and"max_tokens"
finish reasons - Updated documentation and logging messages
- Updated
Auto-Continue Logic
The auto-continue logic in ThreadManager.auto_continue_wrapper()
now handles:
if chunk.get('type') == 'finish':
finish_reason = chunk.get('finish_reason')
if finish_reason == 'tool_calls':
# Existing behavior - continue for tool calls
auto_continue = True
elif finish_reason == 'xml_tool_limit_reached':
# Don't continue for XML tool limits
auto_continue = False
elif finish_reason in ['length', 'context_length_exceeded', 'max_tokens']:
# NEW: Continue for length/context issues
auto_continue = True
Finish Reasons Handled
Auto-Continue Enabled For:
"tool_calls"
- Existing behavior for function/tool calls"length"
- Response truncated due to token limit"context_length_exceeded"
- Context window exceeded"max_tokens"
- Maximum tokens reached
Auto-Continue Disabled For:
"stop"
- Normal completion"xml_tool_limit_reached"
- XML tool limit reached"agent_terminated"
- Agent terminated- Any other finish reason
Configuration
The feature is controlled by the native_max_auto_continues
parameter:
response = await thread_manager.run_thread(
# ... other parameters ...
native_max_auto_continues=25, # Enable auto-continue (0 = disabled)
# ... other parameters ...
)
native_max_auto_continues > 0
: Enables auto-continue for both tool calls and length/context issuesnative_max_auto_continues = 0
: Disables all auto-continue functionality
Usage Examples
Basic Usage
# Enable auto-continue for up to 25 iterations
response = await thread_manager.run_thread(
thread_id=thread_id,
system_prompt=system_message,
stream=stream,
llm_model=model_name,
native_max_auto_continues=25, # Enable the feature
# ... other parameters ...
)
Disable Auto-Continue
# Disable auto-continue completely
response = await thread_manager.run_thread(
thread_id=thread_id,
system_prompt=system_message,
stream=stream,
llm_model=model_name,
native_max_auto_continues=0, # Disable the feature
# ... other parameters ...
)
Benefits
- Improved Task Completion: Agents can now complete tasks even when responses are truncated
- Better User Experience: Users don't need to manually continue conversations when limits are hit
- Automatic Recovery: The agent automatically recovers from length/context limitations
- Backward Compatibility: Existing behavior for tool calls is preserved
Limitations
- Maximum Iterations: Still limited by
native_max_auto_continues
to prevent infinite loops - Context Compression: May trigger context compression if the conversation becomes very long
- Performance: Multiple iterations may increase response time and token usage
Monitoring and Logging
The feature includes comprehensive logging:
INFO: Detected finish_reason='length' (length/context limit), auto-continuing (1/25)
INFO: Detected finish_reason='context_length_exceeded' (length/context limit), auto-continuing (2/25)
INFO: Detected finish_reason='max_tokens' (length/context limit), auto-continuing (3/25)
Testing
The feature can be tested by:
- Manual Testing: Trigger responses that hit token limits
- Unit Testing: Mock different finish reasons and verify behavior
- Integration Testing: Test with real LLM APIs that return length/context finish reasons
Future Enhancements
Potential improvements could include:
- Smart Context Management: Automatically compress context when approaching limits
- Adaptive Limits: Adjust max tokens based on conversation length
- User Feedback: Notify users when auto-continue is triggered
- Metrics: Track auto-continue usage and success rates
Related Components
- ContextManager: Handles context compression and token management
- ResponseProcessor: Processes LLM responses and extracts finish reasons
- ThreadManager: Orchestrates the auto-continue logic
- Agent Run Loop: Main agent execution loop that uses this functionality