mirror of https://github.com/kortix-ai/suna.git
5.5 KiB
5.5 KiB
Error Handling Enhancement Changelog
Overview
Extended the existing AnthropicException - Overloaded
error handling to support comprehensive error detection and fallback strategies for multiple LLM providers.
Changes Made
1. Enhanced services/llm.py
Added:
detect_error_and_suggest_fallback()
function (lines 102-175)- Detects specific error types from different LLM providers
- Suggests appropriate fallback models based on current model and error type
- Returns tuple: (should_fallback, fallback_model, error_type)
Modified:
make_llm_api_call()
function (lines 320-340)- Enhanced retry logic to use new error detection function
- Better handling of fallback-eligible errors on final retry attempt
2. Updated agentpress/thread_manager.py
Modified:
- Auto-continue wrapper exception handling (lines 479-495)
- Replaced hardcoded
AnthropicException - Overloaded
check - Integrated
detect_error_and_suggest_fallback()
function - Enhanced logging with specific error types
- Dynamic fallback model selection
- Replaced hardcoded
Before:
if ("AnthropicException - Overloaded" in str(e)):
logger.error(f"AnthropicException - Overloaded detected - Falling back to OpenRouter: {str(e)}", exc_info=True)
llm_model = f"openrouter/{llm_model}"
After:
should_fallback, fallback_model, error_type = detect_error_and_suggest_fallback(e, llm_model)
if should_fallback:
logger.error(f"{error_type} detected - Falling back to {fallback_model}: {str(e)}", exc_info=True)
llm_model = fallback_model
3. Updated agentpress/response_processor.py
Modified:
- Streaming response processing exception handling (lines 802-820)
- Replaced hardcoded
AnthropicException - Overloaded
check - Integrated
detect_error_and_suggest_fallback()
function - Enhanced error logging with specific error types
- Improved trace event naming
- Replaced hardcoded
Before:
if (not "AnthropicException - Overloaded" in str(e)):
# Handle non-Anthropic errors
else:
logger.error(f"AnthropicException - Overloaded detected - Falling back to OpenRouter: {str(e)}", exc_info=True)
After:
should_fallback, fallback_model, error_type = detect_error_and_suggest_fallback(e, llm_model)
if not should_fallback:
# Handle non-fallback errors
else:
logger.error(f"{error_type} detected - Falling back to {fallback_model}: {str(e)}", exc_info=True)
4. Added Comprehensive Testing
Created:
tests/test_error_handling.py
- Comprehensive test suite covering:- All supported error types (15 test cases)
- Case insensitivity testing
- Model-specific fallback strategies
- Edge cases and error conditions
Test Coverage:
- Anthropic-specific errors (overloaded)
- OpenRouter-specific errors (connection, rate limit)
- OpenAI-specific errors (rate limit, connection, service unavailable)
- xAI-specific errors (rate limit, connection)
- Generic errors (connection, rate limit, service unavailable)
- Unknown error handling
- Case insensitivity validation
5. Documentation
Created:
docs/ERROR_HANDLING.md
- Comprehensive documentation covering:- System overview and architecture
- Supported error types and fallback strategies
- Implementation details and usage examples
- Testing procedures and benefits
Supported Error Types
Provider-Specific Errors
- Anthropic:
AnthropicException - Overloaded
- OpenRouter: Connection/timeout, rate limit errors
- OpenAI: Rate limit, connection, service unavailable errors
- xAI: Rate limit, connection errors
Generic Error Patterns
- Connection/Timeout:
"connection"
,"timeout"
- Rate Limiting:
"rate limit"
,"quota"
- Service Issues:
"service unavailable"
,"internal server error"
,"bad gateway"
Fallback Strategies
Hierarchical Approach
- Provider-Specific: Use provider-specific fallback models
- OpenRouter Migration: Switch to OpenRouter versions if not already using them
- Model Family: Within OpenRouter, try different models of the same family
- No Fallback: Return
False
if no appropriate fallback is found
Model Mapping Examples
anthropic/claude-3-sonnet
→openrouter/anthropic/claude-sonnet-4
gpt-4o
→openrouter/openai/gpt-4o
xai/grok-4
→openrouter/x-ai/grok-4
openrouter/anthropic/claude-3-sonnet
→openrouter/anthropic/claude-sonnet-4
(for connection issues)
Benefits
- Improved Reliability: Automatic fallback to alternative models
- Better User Experience: Reduced downtime due to provider issues
- Comprehensive Coverage: Handles multiple error types from different providers
- Intelligent Fallbacks: Context-aware fallback suggestions
- Enhanced Logging: Specific error types for better monitoring
- Backward Compatibility: Maintains existing functionality while extending capabilities
Testing Results
All 15 test cases pass successfully, covering:
- ✅ Anthropic overloaded errors
- ✅ OpenRouter connection and rate limit errors
- ✅ OpenAI rate limit, connection, and service errors
- ✅ xAI rate limit and connection errors
- ✅ Generic error patterns
- ✅ Case insensitivity
- ✅ Unknown error handling
Future Considerations
- Configurable Fallbacks: Allow user configuration of preferred fallback models
- Fallback Chains: Support multiple sequential fallback attempts
- Performance Tracking: Monitor fallback success rates and response times
- Health Monitoring: Proactive provider health assessment
- Cost Optimization: Consider pricing when suggesting fallbacks