mirror of https://github.com/kortix-ai/suna.git
commit
2ef82ea13e
|
@ -1,349 +0,0 @@
|
||||||
# Backend Architecture for Agent Workflow System
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
This document describes the scalable, modular, and extensible backend architecture for the agent workflow system, designed to work like Zapier but optimized for AI agent workflows.
|
|
||||||
|
|
||||||
## Architecture Principles
|
|
||||||
|
|
||||||
1. **Scalability**: Horizontal scaling through microservices and queue-based processing
|
|
||||||
2. **Modularity**: Clear separation of concerns with pluggable components
|
|
||||||
3. **Extensibility**: Easy to add new triggers, nodes, and integrations
|
|
||||||
4. **Reliability**: Fault tolerance, retries, and graceful degradation
|
|
||||||
5. **Performance**: Async processing, caching, and efficient resource usage
|
|
||||||
|
|
||||||
## System Components
|
|
||||||
|
|
||||||
### 1. API Gateway Layer
|
|
||||||
|
|
||||||
- **Load Balancer**: Distributes traffic across API instances
|
|
||||||
- **Authentication**: JWT-based auth with role-based access control
|
|
||||||
- **Rate Limiting**: Per-user and per-IP rate limits
|
|
||||||
- **Request Routing**: Routes to appropriate services
|
|
||||||
|
|
||||||
### 2. Trigger System
|
|
||||||
|
|
||||||
#### Supported Trigger Types:
|
|
||||||
|
|
||||||
- **Webhook Triggers**
|
|
||||||
- Unique URLs per workflow
|
|
||||||
- HMAC signature validation
|
|
||||||
- Custom header validation
|
|
||||||
- Request/response transformation
|
|
||||||
|
|
||||||
- **Schedule Triggers**
|
|
||||||
- Cron-based scheduling
|
|
||||||
- Timezone support
|
|
||||||
- Execution windows
|
|
||||||
- Missed execution handling
|
|
||||||
|
|
||||||
- **Event Triggers**
|
|
||||||
- Real-time event bus (Redis Pub/Sub)
|
|
||||||
- Event filtering and routing
|
|
||||||
- Event replay capability
|
|
||||||
|
|
||||||
- **Polling Triggers**
|
|
||||||
- Configurable intervals
|
|
||||||
- Change detection
|
|
||||||
- Rate limiting
|
|
||||||
|
|
||||||
- **Manual Triggers**
|
|
||||||
- UI-based execution
|
|
||||||
- API-based execution
|
|
||||||
- Bulk execution support
|
|
||||||
|
|
||||||
### 3. Workflow Engine
|
|
||||||
|
|
||||||
#### Core Components:
|
|
||||||
|
|
||||||
- **Workflow Orchestrator**
|
|
||||||
- Manages workflow lifecycle
|
|
||||||
- Handles execution flow
|
|
||||||
- Manages dependencies
|
|
||||||
- Error handling and retries
|
|
||||||
|
|
||||||
- **Workflow Executor**
|
|
||||||
- Executes individual nodes
|
|
||||||
- Manages parallel execution
|
|
||||||
- Resource allocation
|
|
||||||
- Performance monitoring
|
|
||||||
|
|
||||||
- **State Manager**
|
|
||||||
- Distributed state management (Redis)
|
|
||||||
- Execution context persistence
|
|
||||||
- Checkpoint and recovery
|
|
||||||
- Real-time status updates
|
|
||||||
|
|
||||||
### 4. Node Types
|
|
||||||
|
|
||||||
- **Agent Nodes**: AI-powered processing with multiple models
|
|
||||||
- **Tool Nodes**: Integration with external services
|
|
||||||
- **Transform Nodes**: Data manipulation and formatting
|
|
||||||
- **Condition Nodes**: If/else and switch logic
|
|
||||||
- **Loop Nodes**: For/while iterations
|
|
||||||
- **Parallel Nodes**: Concurrent execution branches
|
|
||||||
- **Webhook Nodes**: HTTP requests to external services
|
|
||||||
- **Delay Nodes**: Time-based delays
|
|
||||||
|
|
||||||
### 5. Data Flow
|
|
||||||
|
|
||||||
```
|
|
||||||
Trigger → Queue → Orchestrator → Executor → Node → Output
|
|
||||||
↓ ↓ ↓
|
|
||||||
State Manager Tool Service Results
|
|
||||||
```
|
|
||||||
|
|
||||||
### 6. Storage Architecture
|
|
||||||
|
|
||||||
- **PostgreSQL**: Workflow definitions, configurations, audit logs
|
|
||||||
- **Redis**: Execution state, queues, caching, pub/sub
|
|
||||||
- **S3/Blob Storage**: Large files, logs, execution artifacts
|
|
||||||
- **TimescaleDB**: Time-series data, metrics, analytics
|
|
||||||
|
|
||||||
### 7. Queue System
|
|
||||||
|
|
||||||
- **RabbitMQ**: Task queuing, priority queues, dead letter queues
|
|
||||||
- **Kafka**: Event streaming, audit trail, real-time analytics
|
|
||||||
|
|
||||||
## Execution Flow
|
|
||||||
|
|
||||||
### 1. Trigger Phase
|
|
||||||
```python
|
|
||||||
1. Trigger fires (webhook/schedule/event/etc)
|
|
||||||
2. Validate trigger configuration
|
|
||||||
3. Create ExecutionContext
|
|
||||||
4. Queue workflow for execution
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Orchestration Phase
|
|
||||||
```python
|
|
||||||
1. Load workflow definition
|
|
||||||
2. Build execution graph
|
|
||||||
3. Determine execution order
|
|
||||||
4. Initialize state management
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Execution Phase
|
|
||||||
```python
|
|
||||||
1. Execute nodes in topological order
|
|
||||||
2. Handle parallel branches
|
|
||||||
3. Manage data flow between nodes
|
|
||||||
4. Update execution state
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Completion Phase
|
|
||||||
```python
|
|
||||||
1. Aggregate results
|
|
||||||
2. Execute post-processing
|
|
||||||
3. Trigger downstream workflows
|
|
||||||
4. Clean up resources
|
|
||||||
```
|
|
||||||
|
|
||||||
## Scalability Features
|
|
||||||
|
|
||||||
### Horizontal Scaling
|
|
||||||
- Stateless API servers
|
|
||||||
- Distributed queue workers
|
|
||||||
- Shared state via Redis
|
|
||||||
- Database read replicas
|
|
||||||
|
|
||||||
### Performance Optimization
|
|
||||||
- Connection pooling
|
|
||||||
- Result caching
|
|
||||||
- Batch processing
|
|
||||||
- Async I/O throughout
|
|
||||||
|
|
||||||
### Resource Management
|
|
||||||
- Worker pool management
|
|
||||||
- Memory limits per execution
|
|
||||||
- CPU throttling
|
|
||||||
- Concurrent execution limits
|
|
||||||
|
|
||||||
## Security
|
|
||||||
|
|
||||||
### Authentication & Authorization
|
|
||||||
- JWT tokens with refresh
|
|
||||||
- API key authentication
|
|
||||||
- OAuth2 integration
|
|
||||||
- Role-based permissions
|
|
||||||
|
|
||||||
### Data Security
|
|
||||||
- Encryption at rest
|
|
||||||
- TLS for all communications
|
|
||||||
- Secret management (Vault)
|
|
||||||
- Audit logging
|
|
||||||
|
|
||||||
### Webhook Security
|
|
||||||
- HMAC signature validation
|
|
||||||
- IP whitelisting
|
|
||||||
- Rate limiting
|
|
||||||
- Request size limits
|
|
||||||
|
|
||||||
## Monitoring & Observability
|
|
||||||
|
|
||||||
### Metrics
|
|
||||||
- Prometheus metrics
|
|
||||||
- Custom business metrics
|
|
||||||
- Performance tracking
|
|
||||||
- Resource utilization
|
|
||||||
|
|
||||||
### Logging
|
|
||||||
- Structured logging
|
|
||||||
- Centralized log aggregation
|
|
||||||
- Log levels and filtering
|
|
||||||
- Correlation IDs
|
|
||||||
|
|
||||||
### Tracing
|
|
||||||
- Distributed tracing (OpenTelemetry)
|
|
||||||
- LLM monitoring (Langfuse)
|
|
||||||
- Execution visualization
|
|
||||||
- Performance profiling
|
|
||||||
|
|
||||||
### Alerting
|
|
||||||
- Error rate monitoring
|
|
||||||
- SLA tracking
|
|
||||||
- Resource alerts
|
|
||||||
- Custom alerts
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
### Retry Strategies
|
|
||||||
- Exponential backoff
|
|
||||||
- Circuit breakers
|
|
||||||
- Dead letter queues
|
|
||||||
- Manual retry options
|
|
||||||
|
|
||||||
### Failure Modes
|
|
||||||
- Node-level failures
|
|
||||||
- Workflow-level failures
|
|
||||||
- System-level failures
|
|
||||||
- Graceful degradation
|
|
||||||
|
|
||||||
## API Endpoints
|
|
||||||
|
|
||||||
### Workflow Management
|
|
||||||
```
|
|
||||||
POST /api/workflows # Create workflow
|
|
||||||
GET /api/workflows/:id # Get workflow
|
|
||||||
PUT /api/workflows/:id # Update workflow
|
|
||||||
DELETE /api/workflows/:id # Delete workflow
|
|
||||||
POST /api/workflows/:id/activate # Activate workflow
|
|
||||||
POST /api/workflows/:id/pause # Pause workflow
|
|
||||||
```
|
|
||||||
|
|
||||||
### Execution Management
|
|
||||||
```
|
|
||||||
POST /api/workflows/:id/execute # Manual execution
|
|
||||||
GET /api/executions/:id # Get execution status
|
|
||||||
POST /api/executions/:id/cancel # Cancel execution
|
|
||||||
GET /api/executions/:id/logs # Get execution logs
|
|
||||||
```
|
|
||||||
|
|
||||||
### Trigger Management
|
|
||||||
```
|
|
||||||
GET /api/workflows/:id/triggers # List triggers
|
|
||||||
POST /api/workflows/:id/triggers # Add trigger
|
|
||||||
PUT /api/triggers/:id # Update trigger
|
|
||||||
DELETE /api/triggers/:id # Remove trigger
|
|
||||||
```
|
|
||||||
|
|
||||||
### Webhook Endpoints
|
|
||||||
```
|
|
||||||
POST /webhooks/:path # Webhook receiver
|
|
||||||
GET /api/webhooks # List webhooks
|
|
||||||
```
|
|
||||||
|
|
||||||
## Database Schema
|
|
||||||
|
|
||||||
### Core Tables
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Workflows table
|
|
||||||
CREATE TABLE workflows (
|
|
||||||
id UUID PRIMARY KEY,
|
|
||||||
name VARCHAR(255),
|
|
||||||
description TEXT,
|
|
||||||
project_id UUID,
|
|
||||||
status VARCHAR(50),
|
|
||||||
definition JSONB,
|
|
||||||
created_at TIMESTAMP,
|
|
||||||
updated_at TIMESTAMP
|
|
||||||
);
|
|
||||||
|
|
||||||
-- Workflow executions
|
|
||||||
CREATE TABLE workflow_executions (
|
|
||||||
id UUID PRIMARY KEY,
|
|
||||||
workflow_id UUID,
|
|
||||||
status VARCHAR(50),
|
|
||||||
started_at TIMESTAMP,
|
|
||||||
completed_at TIMESTAMP,
|
|
||||||
context JSONB,
|
|
||||||
result JSONB,
|
|
||||||
error TEXT
|
|
||||||
);
|
|
||||||
|
|
||||||
-- Triggers
|
|
||||||
CREATE TABLE triggers (
|
|
||||||
id UUID PRIMARY KEY,
|
|
||||||
workflow_id UUID,
|
|
||||||
type VARCHAR(50),
|
|
||||||
config JSONB,
|
|
||||||
is_active BOOLEAN
|
|
||||||
);
|
|
||||||
|
|
||||||
-- Webhook registrations
|
|
||||||
CREATE TABLE webhook_registrations (
|
|
||||||
id UUID PRIMARY KEY,
|
|
||||||
workflow_id UUID,
|
|
||||||
path VARCHAR(255) UNIQUE,
|
|
||||||
secret VARCHAR(255),
|
|
||||||
config JSONB
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
## Deployment
|
|
||||||
|
|
||||||
### Docker Compose (Development)
|
|
||||||
```yaml
|
|
||||||
services:
|
|
||||||
api:
|
|
||||||
build: .
|
|
||||||
ports:
|
|
||||||
- "8000:8000"
|
|
||||||
depends_on:
|
|
||||||
- postgres
|
|
||||||
- redis
|
|
||||||
- rabbitmq
|
|
||||||
|
|
||||||
worker:
|
|
||||||
build: .
|
|
||||||
command: python -m workflow_engine.worker
|
|
||||||
depends_on:
|
|
||||||
- postgres
|
|
||||||
- redis
|
|
||||||
- rabbitmq
|
|
||||||
|
|
||||||
scheduler:
|
|
||||||
build: .
|
|
||||||
command: python -m workflow_engine.scheduler
|
|
||||||
depends_on:
|
|
||||||
- postgres
|
|
||||||
- redis
|
|
||||||
```
|
|
||||||
|
|
||||||
### Kubernetes (Production)
|
|
||||||
- Deployment manifests for each service
|
|
||||||
- Horizontal Pod Autoscaling
|
|
||||||
- Service mesh (Istio)
|
|
||||||
- Persistent volume claims
|
|
||||||
|
|
||||||
## Future Enhancements
|
|
||||||
|
|
||||||
1. **Workflow Versioning**: Track and manage workflow versions
|
|
||||||
2. **A/B Testing**: Test different workflow variations
|
|
||||||
3. **Workflow Templates**: Pre-built workflow templates
|
|
||||||
4. **Advanced Analytics**: Detailed execution analytics
|
|
||||||
5. **Multi-tenancy**: Full isolation between projects
|
|
||||||
6. **Workflow Marketplace**: Share and monetize workflows
|
|
||||||
7. **Visual Debugging**: Step-through debugging
|
|
||||||
8. **Performance Optimization**: ML-based optimization
|
|
|
@ -1,212 +0,0 @@
|
||||||
# Workflow System Migration Guide
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
This guide explains how to apply the database migrations for the new workflow system.
|
|
||||||
|
|
||||||
## Prerequisites
|
|
||||||
|
|
||||||
- Supabase CLI installed
|
|
||||||
- Access to your Supabase project
|
|
||||||
- Database backup (recommended)
|
|
||||||
|
|
||||||
## Migration Files
|
|
||||||
|
|
||||||
1. **`20250115000000_workflow_system.sql`** - Main migration that creates all workflow tables
|
|
||||||
2. **`20250115000001_workflow_system_rollback.sql`** - Rollback migration (if needed)
|
|
||||||
|
|
||||||
## Tables Created
|
|
||||||
|
|
||||||
The migration creates the following tables:
|
|
||||||
|
|
||||||
### Core Tables
|
|
||||||
- `workflows` - Workflow definitions and configurations
|
|
||||||
- `workflow_executions` - Execution history and status
|
|
||||||
- `triggers` - Trigger configurations
|
|
||||||
- `webhook_registrations` - Webhook endpoints
|
|
||||||
- `scheduled_jobs` - Cron-based schedules
|
|
||||||
- `workflow_templates` - Pre-built templates
|
|
||||||
- `workflow_execution_logs` - Detailed execution logs
|
|
||||||
- `workflow_variables` - Workflow-specific variables
|
|
||||||
|
|
||||||
### Enum Types
|
|
||||||
- `workflow_status` - draft, active, paused, disabled, archived
|
|
||||||
- `execution_status` - pending, running, completed, failed, cancelled, timeout
|
|
||||||
- `trigger_type` - webhook, schedule, event, polling, manual, workflow
|
|
||||||
- `node_type` - trigger, agent, tool, condition, loop, parallel, etc.
|
|
||||||
- `connection_type` - data, tool, processed_data, action, condition
|
|
||||||
|
|
||||||
## Applying the Migration
|
|
||||||
|
|
||||||
### Local Development
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Navigate to your backend directory
|
|
||||||
cd backend
|
|
||||||
|
|
||||||
# Apply the migration locally
|
|
||||||
supabase migration up
|
|
||||||
|
|
||||||
# Or apply specific migration
|
|
||||||
supabase db push --file supabase/migrations/20250115000000_workflow_system.sql
|
|
||||||
```
|
|
||||||
|
|
||||||
### Production
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Link to your production project
|
|
||||||
supabase link --project-ref your-project-ref
|
|
||||||
|
|
||||||
# Apply migrations to production
|
|
||||||
supabase db push
|
|
||||||
```
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
After applying the migration, verify the tables were created:
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Check if tables exist
|
|
||||||
SELECT table_name
|
|
||||||
FROM information_schema.tables
|
|
||||||
WHERE table_schema = 'public'
|
|
||||||
AND table_name LIKE 'workflow%';
|
|
||||||
|
|
||||||
-- Check enum types
|
|
||||||
SELECT typname
|
|
||||||
FROM pg_type
|
|
||||||
WHERE typname IN ('workflow_status', 'execution_status', 'trigger_type', 'node_type', 'connection_type');
|
|
||||||
```
|
|
||||||
|
|
||||||
## Row Level Security (RLS)
|
|
||||||
|
|
||||||
The migration includes RLS policies that ensure:
|
|
||||||
|
|
||||||
1. Users can only access workflows in their projects
|
|
||||||
2. Service role has full access for system operations
|
|
||||||
3. Workflow templates are publicly viewable
|
|
||||||
4. Execution logs are project-scoped
|
|
||||||
|
|
||||||
## Post-Migration Steps
|
|
||||||
|
|
||||||
### 1. Update Environment Variables
|
|
||||||
|
|
||||||
Add these to your `.env` file:
|
|
||||||
|
|
||||||
```env
|
|
||||||
# Workflow system configuration
|
|
||||||
WORKFLOW_EXECUTION_TIMEOUT=3600
|
|
||||||
WORKFLOW_MAX_RETRIES=3
|
|
||||||
WEBHOOK_BASE_URL=https://api.yourdomain.com
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Initialize Workflow Engine
|
|
||||||
|
|
||||||
The workflow engine needs to be initialized on API startup:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# In your api.py
|
|
||||||
from workflow_engine.api import initialize as init_workflow_engine
|
|
||||||
|
|
||||||
@asynccontextmanager
|
|
||||||
async def lifespan(app: FastAPI):
|
|
||||||
# ... existing initialization
|
|
||||||
|
|
||||||
# Initialize workflow engine
|
|
||||||
await init_workflow_engine()
|
|
||||||
|
|
||||||
yield
|
|
||||||
# ... cleanup
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Add API Routes
|
|
||||||
|
|
||||||
Include the workflow API routes:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# In your api.py
|
|
||||||
from workflow_engine import api as workflow_api
|
|
||||||
|
|
||||||
app.include_router(workflow_api.router, prefix="/api")
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Update Docker Compose
|
|
||||||
|
|
||||||
Add workflow worker service:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In docker-compose.yml
|
|
||||||
workflow-worker:
|
|
||||||
build: .
|
|
||||||
command: python -m workflow_engine.worker
|
|
||||||
environment:
|
|
||||||
- DATABASE_URL=${DATABASE_URL}
|
|
||||||
- REDIS_URL=${REDIS_URL}
|
|
||||||
depends_on:
|
|
||||||
- postgres
|
|
||||||
- redis
|
|
||||||
- rabbitmq
|
|
||||||
```
|
|
||||||
|
|
||||||
## Rollback Instructions
|
|
||||||
|
|
||||||
If you need to rollback the migration:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Apply the rollback migration
|
|
||||||
supabase db push --file supabase/migrations/20250115000001_workflow_system_rollback.sql
|
|
||||||
```
|
|
||||||
|
|
||||||
**Warning**: This will delete all workflow data. Make sure to backup any important data first.
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Common Issues
|
|
||||||
|
|
||||||
1. **Foreign key constraints fail**
|
|
||||||
- Ensure the `projects` and `project_members` tables exist
|
|
||||||
- Check that `auth.users` is accessible
|
|
||||||
|
|
||||||
2. **Enum type already exists**
|
|
||||||
- Drop the existing type: `DROP TYPE IF EXISTS workflow_status CASCADE;`
|
|
||||||
|
|
||||||
3. **Permission denied**
|
|
||||||
- Ensure you're using the service role key for migrations
|
|
||||||
|
|
||||||
### Checking Migration Status
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- View applied migrations
|
|
||||||
SELECT * FROM supabase_migrations.schema_migrations;
|
|
||||||
|
|
||||||
-- Check for any failed migrations
|
|
||||||
SELECT * FROM supabase_migrations.schema_migrations WHERE success = false;
|
|
||||||
```
|
|
||||||
|
|
||||||
## Performance Considerations
|
|
||||||
|
|
||||||
The migration includes several indexes for optimal performance:
|
|
||||||
|
|
||||||
- Workflow lookups by project_id
|
|
||||||
- Execution queries by status and time
|
|
||||||
- Webhook path lookups
|
|
||||||
- Scheduled job next run times
|
|
||||||
|
|
||||||
Monitor query performance and add additional indexes as needed.
|
|
||||||
|
|
||||||
## Security Notes
|
|
||||||
|
|
||||||
1. All tables have RLS enabled
|
|
||||||
2. Webhook secrets are stored encrypted
|
|
||||||
3. Workflow variables can be marked as secrets
|
|
||||||
4. API authentication required for all operations
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
After successful migration:
|
|
||||||
|
|
||||||
1. Test workflow creation via API
|
|
||||||
2. Verify webhook endpoints are accessible
|
|
||||||
3. Test scheduled job creation
|
|
||||||
4. Monitor execution performance
|
|
||||||
5. Set up cleanup jobs for old execution logs
|
|
Loading…
Reference in New Issue