5.5 KiB

Raw Permalink Blame History

Error Handling Enhancement Changelog

Overview

Extended the existing AnthropicException - Overloaded error handling to support comprehensive error detection and fallback strategies for multiple LLM providers.

Changes Made

1. Enhanced `services/llm.py`

Added:

detect_error_and_suggest_fallback() function (lines 102-175)
- Detects specific error types from different LLM providers
- Suggests appropriate fallback models based on current model and error type
- Returns tuple: (should_fallback, fallback_model, error_type)

Modified:

make_llm_api_call() function (lines 320-340)
- Enhanced retry logic to use new error detection function
- Better handling of fallback-eligible errors on final retry attempt

2. Updated `agentpress/thread_manager.py`

Modified:

Auto-continue wrapper exception handling (lines 479-495)
- Replaced hardcoded AnthropicException - Overloaded check
- Integrated detect_error_and_suggest_fallback() function
- Enhanced logging with specific error types
- Dynamic fallback model selection

Before:

if ("AnthropicException - Overloaded" in str(e)):
    logger.error(f"AnthropicException - Overloaded detected - Falling back to OpenRouter: {str(e)}", exc_info=True)
    llm_model = f"openrouter/{llm_model}"

After:

should_fallback, fallback_model, error_type = detect_error_and_suggest_fallback(e, llm_model)
if should_fallback:
    logger.error(f"{error_type} detected - Falling back to {fallback_model}: {str(e)}", exc_info=True)
    llm_model = fallback_model

3. Updated `agentpress/response_processor.py`

Modified:

Streaming response processing exception handling (lines 802-820)
- Replaced hardcoded AnthropicException - Overloaded check
- Integrated detect_error_and_suggest_fallback() function
- Enhanced error logging with specific error types
- Improved trace event naming

Before:

if (not "AnthropicException - Overloaded" in str(e)):
    # Handle non-Anthropic errors
else:
    logger.error(f"AnthropicException - Overloaded detected - Falling back to OpenRouter: {str(e)}", exc_info=True)

After:

should_fallback, fallback_model, error_type = detect_error_and_suggest_fallback(e, llm_model)
if not should_fallback:
    # Handle non-fallback errors
else:
    logger.error(f"{error_type} detected - Falling back to {fallback_model}: {str(e)}", exc_info=True)

4. Added Comprehensive Testing

Created:

tests/test_error_handling.py - Comprehensive test suite covering:
- All supported error types (15 test cases)
- Case insensitivity testing
- Model-specific fallback strategies
- Edge cases and error conditions

Test Coverage:

Anthropic-specific errors (overloaded)
OpenRouter-specific errors (connection, rate limit)
OpenAI-specific errors (rate limit, connection, service unavailable)
xAI-specific errors (rate limit, connection)
Generic errors (connection, rate limit, service unavailable)
Unknown error handling
Case insensitivity validation

5. Documentation

Created:

docs/ERROR_HANDLING.md - Comprehensive documentation covering:
- System overview and architecture
- Supported error types and fallback strategies
- Implementation details and usage examples
- Testing procedures and benefits

Supported Error Types

Provider-Specific Errors

Anthropic: AnthropicException - Overloaded
OpenRouter: Connection/timeout, rate limit errors
OpenAI: Rate limit, connection, service unavailable errors
xAI: Rate limit, connection errors

Generic Error Patterns

Connection/Timeout: "connection", "timeout"
Rate Limiting: "rate limit", "quota"
Service Issues: "service unavailable", "internal server error", "bad gateway"

Fallback Strategies

Hierarchical Approach

Provider-Specific: Use provider-specific fallback models
OpenRouter Migration: Switch to OpenRouter versions if not already using them
Model Family: Within OpenRouter, try different models of the same family
No Fallback: Return False if no appropriate fallback is found

Model Mapping Examples

anthropic/claude-3-sonnet → openrouter/anthropic/claude-sonnet-4
gpt-4o → openrouter/openai/gpt-4o
xai/grok-4 → openrouter/x-ai/grok-4
openrouter/anthropic/claude-3-sonnet → openrouter/anthropic/claude-sonnet-4 (for connection issues)

Benefits

Improved Reliability: Automatic fallback to alternative models
Better User Experience: Reduced downtime due to provider issues
Comprehensive Coverage: Handles multiple error types from different providers
Intelligent Fallbacks: Context-aware fallback suggestions
Enhanced Logging: Specific error types for better monitoring
Backward Compatibility: Maintains existing functionality while extending capabilities

Testing Results

All 15 test cases pass successfully, covering:

✅ Anthropic overloaded errors
✅ OpenRouter connection and rate limit errors
✅ OpenAI rate limit, connection, and service errors
✅ xAI rate limit and connection errors
✅ Generic error patterns
✅ Case insensitivity
✅ Unknown error handling

Future Considerations

Configurable Fallbacks: Allow user configuration of preferred fallback models
Fallback Chains: Support multiple sequential fallback attempts
Performance Tracking: Monitor fallback success rates and response times
Health Monitoring: Proactive provider health assessment
Cost Optimization: Consider pricing when suggesting fallbacks

5.5 KiB Raw Permalink Blame History

Error Handling Enhancement Changelog

Overview

Changes Made

1. Enhanced services/llm.py

2. Updated agentpress/thread_manager.py

3. Updated agentpress/response_processor.py

4. Added Comprehensive Testing

5. Documentation

Supported Error Types

Provider-Specific Errors

Generic Error Patterns

Fallback Strategies

Hierarchical Approach

Model Mapping Examples

Benefits

Testing Results

Future Considerations

5.5 KiB

Raw Permalink Blame History

1. Enhanced `services/llm.py`

2. Updated `agentpress/thread_manager.py`

3. Updated `agentpress/response_processor.py`