suna/backend/docs/LENGTH_CONTEXT_AUTO_CONTINU...

5.3 KiB

Length and Context Window Auto-Continue Feature

Overview

This feature automatically continues the LLM loop in the agent when the previous stop reason was due to length or context window problems. This ensures that agents can complete their tasks even when responses are truncated due to token limits or context window constraints.

Problem Statement

When LLMs reach their token limits or context window boundaries, they may return finish reasons like:

  • "length" - Response reached the maximum token limit
  • "context_length_exceeded" - Context window was exceeded
  • "max_tokens" - Maximum tokens were reached

Previously, the agent would stop execution when encountering these finish reasons, potentially leaving tasks incomplete.

Solution

The agent now automatically continues the LLM loop when it encounters these specific finish reasons, similar to how it already handles "tool_calls" finish reasons.

Implementation Details

Modified Files

  1. backend/agentpress/thread_manager.py
    • Updated auto_continue_wrapper() function to detect length/context finish reasons
    • Added logic to continue the loop for "length", "context_length_exceeded", and "max_tokens" finish reasons
    • Updated documentation and logging messages

Auto-Continue Logic

The auto-continue logic in ThreadManager.auto_continue_wrapper() now handles:

if chunk.get('type') == 'finish':
    finish_reason = chunk.get('finish_reason')
    
    if finish_reason == 'tool_calls':
        # Existing behavior - continue for tool calls
        auto_continue = True
    elif finish_reason == 'xml_tool_limit_reached':
        # Don't continue for XML tool limits
        auto_continue = False
    elif finish_reason in ['length', 'context_length_exceeded', 'max_tokens']:
        # NEW: Continue for length/context issues
        auto_continue = True

Finish Reasons Handled

Auto-Continue Enabled For:

  • "tool_calls" - Existing behavior for function/tool calls
  • "length" - Response truncated due to token limit
  • "context_length_exceeded" - Context window exceeded
  • "max_tokens" - Maximum tokens reached

Auto-Continue Disabled For:

  • "stop" - Normal completion
  • "xml_tool_limit_reached" - XML tool limit reached
  • "agent_terminated" - Agent terminated
  • Any other finish reason

Configuration

The feature is controlled by the native_max_auto_continues parameter:

response = await thread_manager.run_thread(
    # ... other parameters ...
    native_max_auto_continues=25,  # Enable auto-continue (0 = disabled)
    # ... other parameters ...
)
  • native_max_auto_continues > 0: Enables auto-continue for both tool calls and length/context issues
  • native_max_auto_continues = 0: Disables all auto-continue functionality

Usage Examples

Basic Usage

# Enable auto-continue for up to 25 iterations
response = await thread_manager.run_thread(
    thread_id=thread_id,
    system_prompt=system_message,
    stream=stream,
    llm_model=model_name,
    native_max_auto_continues=25,  # Enable the feature
    # ... other parameters ...
)

Disable Auto-Continue

# Disable auto-continue completely
response = await thread_manager.run_thread(
    thread_id=thread_id,
    system_prompt=system_message,
    stream=stream,
    llm_model=model_name,
    native_max_auto_continues=0,  # Disable the feature
    # ... other parameters ...
)

Benefits

  1. Improved Task Completion: Agents can now complete tasks even when responses are truncated
  2. Better User Experience: Users don't need to manually continue conversations when limits are hit
  3. Automatic Recovery: The agent automatically recovers from length/context limitations
  4. Backward Compatibility: Existing behavior for tool calls is preserved

Limitations

  1. Maximum Iterations: Still limited by native_max_auto_continues to prevent infinite loops
  2. Context Compression: May trigger context compression if the conversation becomes very long
  3. Performance: Multiple iterations may increase response time and token usage

Monitoring and Logging

The feature includes comprehensive logging:

INFO: Detected finish_reason='length' (length/context limit), auto-continuing (1/25)
INFO: Detected finish_reason='context_length_exceeded' (length/context limit), auto-continuing (2/25)
INFO: Detected finish_reason='max_tokens' (length/context limit), auto-continuing (3/25)

Testing

The feature can be tested by:

  1. Manual Testing: Trigger responses that hit token limits
  2. Unit Testing: Mock different finish reasons and verify behavior
  3. Integration Testing: Test with real LLM APIs that return length/context finish reasons

Future Enhancements

Potential improvements could include:

  1. Smart Context Management: Automatically compress context when approaching limits
  2. Adaptive Limits: Adjust max tokens based on conversation length
  3. User Feedback: Notify users when auto-continue is triggered
  4. Metrics: Track auto-continue usage and success rates
  • ContextManager: Handles context compression and token management
  • ResponseProcessor: Processes LLM responses and extracts finish reasons
  • ThreadManager: Orchestrates the auto-continue logic
  • Agent Run Loop: Main agent execution loop that uses this functionality