mirror of https://github.com/kortix-ai/suna.git
8.0 KiB
8.0 KiB
Backend Architecture for Agent Workflow System
Overview
This document describes the scalable, modular, and extensible backend architecture for the agent workflow system, designed to work like Zapier but optimized for AI agent workflows.
Architecture Principles
- Scalability: Horizontal scaling through microservices and queue-based processing
- Modularity: Clear separation of concerns with pluggable components
- Extensibility: Easy to add new triggers, nodes, and integrations
- Reliability: Fault tolerance, retries, and graceful degradation
- Performance: Async processing, caching, and efficient resource usage
System Components
1. API Gateway Layer
- Load Balancer: Distributes traffic across API instances
- Authentication: JWT-based auth with role-based access control
- Rate Limiting: Per-user and per-IP rate limits
- Request Routing: Routes to appropriate services
2. Trigger System
Supported Trigger Types:
-
Webhook Triggers
- Unique URLs per workflow
- HMAC signature validation
- Custom header validation
- Request/response transformation
-
Schedule Triggers
- Cron-based scheduling
- Timezone support
- Execution windows
- Missed execution handling
-
Event Triggers
- Real-time event bus (Redis Pub/Sub)
- Event filtering and routing
- Event replay capability
-
Polling Triggers
- Configurable intervals
- Change detection
- Rate limiting
-
Manual Triggers
- UI-based execution
- API-based execution
- Bulk execution support
3. Workflow Engine
Core Components:
-
Workflow Orchestrator
- Manages workflow lifecycle
- Handles execution flow
- Manages dependencies
- Error handling and retries
-
Workflow Executor
- Executes individual nodes
- Manages parallel execution
- Resource allocation
- Performance monitoring
-
State Manager
- Distributed state management (Redis)
- Execution context persistence
- Checkpoint and recovery
- Real-time status updates
4. Node Types
- Agent Nodes: AI-powered processing with multiple models
- Tool Nodes: Integration with external services
- Transform Nodes: Data manipulation and formatting
- Condition Nodes: If/else and switch logic
- Loop Nodes: For/while iterations
- Parallel Nodes: Concurrent execution branches
- Webhook Nodes: HTTP requests to external services
- Delay Nodes: Time-based delays
5. Data Flow
Trigger → Queue → Orchestrator → Executor → Node → Output
↓ ↓ ↓
State Manager Tool Service Results
6. Storage Architecture
- PostgreSQL: Workflow definitions, configurations, audit logs
- Redis: Execution state, queues, caching, pub/sub
- S3/Blob Storage: Large files, logs, execution artifacts
- TimescaleDB: Time-series data, metrics, analytics
7. Queue System
- RabbitMQ: Task queuing, priority queues, dead letter queues
- Kafka: Event streaming, audit trail, real-time analytics
Execution Flow
1. Trigger Phase
1. Trigger fires (webhook/schedule/event/etc)
2. Validate trigger configuration
3. Create ExecutionContext
4. Queue workflow for execution
2. Orchestration Phase
1. Load workflow definition
2. Build execution graph
3. Determine execution order
4. Initialize state management
3. Execution Phase
1. Execute nodes in topological order
2. Handle parallel branches
3. Manage data flow between nodes
4. Update execution state
4. Completion Phase
1. Aggregate results
2. Execute post-processing
3. Trigger downstream workflows
4. Clean up resources
Scalability Features
Horizontal Scaling
- Stateless API servers
- Distributed queue workers
- Shared state via Redis
- Database read replicas
Performance Optimization
- Connection pooling
- Result caching
- Batch processing
- Async I/O throughout
Resource Management
- Worker pool management
- Memory limits per execution
- CPU throttling
- Concurrent execution limits
Security
Authentication & Authorization
- JWT tokens with refresh
- API key authentication
- OAuth2 integration
- Role-based permissions
Data Security
- Encryption at rest
- TLS for all communications
- Secret management (Vault)
- Audit logging
Webhook Security
- HMAC signature validation
- IP whitelisting
- Rate limiting
- Request size limits
Monitoring & Observability
Metrics
- Prometheus metrics
- Custom business metrics
- Performance tracking
- Resource utilization
Logging
- Structured logging
- Centralized log aggregation
- Log levels and filtering
- Correlation IDs
Tracing
- Distributed tracing (OpenTelemetry)
- LLM monitoring (Langfuse)
- Execution visualization
- Performance profiling
Alerting
- Error rate monitoring
- SLA tracking
- Resource alerts
- Custom alerts
Error Handling
Retry Strategies
- Exponential backoff
- Circuit breakers
- Dead letter queues
- Manual retry options
Failure Modes
- Node-level failures
- Workflow-level failures
- System-level failures
- Graceful degradation
API Endpoints
Workflow Management
POST /api/workflows # Create workflow
GET /api/workflows/:id # Get workflow
PUT /api/workflows/:id # Update workflow
DELETE /api/workflows/:id # Delete workflow
POST /api/workflows/:id/activate # Activate workflow
POST /api/workflows/:id/pause # Pause workflow
Execution Management
POST /api/workflows/:id/execute # Manual execution
GET /api/executions/:id # Get execution status
POST /api/executions/:id/cancel # Cancel execution
GET /api/executions/:id/logs # Get execution logs
Trigger Management
GET /api/workflows/:id/triggers # List triggers
POST /api/workflows/:id/triggers # Add trigger
PUT /api/triggers/:id # Update trigger
DELETE /api/triggers/:id # Remove trigger
Webhook Endpoints
POST /webhooks/:path # Webhook receiver
GET /api/webhooks # List webhooks
Database Schema
Core Tables
-- Workflows table
CREATE TABLE workflows (
id UUID PRIMARY KEY,
name VARCHAR(255),
description TEXT,
project_id UUID,
status VARCHAR(50),
definition JSONB,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
-- Workflow executions
CREATE TABLE workflow_executions (
id UUID PRIMARY KEY,
workflow_id UUID,
status VARCHAR(50),
started_at TIMESTAMP,
completed_at TIMESTAMP,
context JSONB,
result JSONB,
error TEXT
);
-- Triggers
CREATE TABLE triggers (
id UUID PRIMARY KEY,
workflow_id UUID,
type VARCHAR(50),
config JSONB,
is_active BOOLEAN
);
-- Webhook registrations
CREATE TABLE webhook_registrations (
id UUID PRIMARY KEY,
workflow_id UUID,
path VARCHAR(255) UNIQUE,
secret VARCHAR(255),
config JSONB
);
Deployment
Docker Compose (Development)
services:
api:
build: .
ports:
- "8000:8000"
depends_on:
- postgres
- redis
- rabbitmq
worker:
build: .
command: python -m workflow_engine.worker
depends_on:
- postgres
- redis
- rabbitmq
scheduler:
build: .
command: python -m workflow_engine.scheduler
depends_on:
- postgres
- redis
Kubernetes (Production)
- Deployment manifests for each service
- Horizontal Pod Autoscaling
- Service mesh (Istio)
- Persistent volume claims
Future Enhancements
- Workflow Versioning: Track and manage workflow versions
- A/B Testing: Test different workflow variations
- Workflow Templates: Pre-built workflow templates
- Advanced Analytics: Detailed execution analytics
- Multi-tenancy: Full isolation between projects
- Workflow Marketplace: Share and monetize workflows
- Visual Debugging: Step-through debugging
- Performance Optimization: ML-based optimization