suna/backend/ARCHITECTURE.md

8.0 KiB

Backend Architecture for Agent Workflow System

Overview

This document describes the scalable, modular, and extensible backend architecture for the agent workflow system, designed to work like Zapier but optimized for AI agent workflows.

Architecture Principles

  1. Scalability: Horizontal scaling through microservices and queue-based processing
  2. Modularity: Clear separation of concerns with pluggable components
  3. Extensibility: Easy to add new triggers, nodes, and integrations
  4. Reliability: Fault tolerance, retries, and graceful degradation
  5. Performance: Async processing, caching, and efficient resource usage

System Components

1. API Gateway Layer

  • Load Balancer: Distributes traffic across API instances
  • Authentication: JWT-based auth with role-based access control
  • Rate Limiting: Per-user and per-IP rate limits
  • Request Routing: Routes to appropriate services

2. Trigger System

Supported Trigger Types:

  • Webhook Triggers

    • Unique URLs per workflow
    • HMAC signature validation
    • Custom header validation
    • Request/response transformation
  • Schedule Triggers

    • Cron-based scheduling
    • Timezone support
    • Execution windows
    • Missed execution handling
  • Event Triggers

    • Real-time event bus (Redis Pub/Sub)
    • Event filtering and routing
    • Event replay capability
  • Polling Triggers

    • Configurable intervals
    • Change detection
    • Rate limiting
  • Manual Triggers

    • UI-based execution
    • API-based execution
    • Bulk execution support

3. Workflow Engine

Core Components:

  • Workflow Orchestrator

    • Manages workflow lifecycle
    • Handles execution flow
    • Manages dependencies
    • Error handling and retries
  • Workflow Executor

    • Executes individual nodes
    • Manages parallel execution
    • Resource allocation
    • Performance monitoring
  • State Manager

    • Distributed state management (Redis)
    • Execution context persistence
    • Checkpoint and recovery
    • Real-time status updates

4. Node Types

  • Agent Nodes: AI-powered processing with multiple models
  • Tool Nodes: Integration with external services
  • Transform Nodes: Data manipulation and formatting
  • Condition Nodes: If/else and switch logic
  • Loop Nodes: For/while iterations
  • Parallel Nodes: Concurrent execution branches
  • Webhook Nodes: HTTP requests to external services
  • Delay Nodes: Time-based delays

5. Data Flow

Trigger → Queue → Orchestrator → Executor → Node → Output
                       ↓              ↓         ↓
                  State Manager   Tool Service  Results

6. Storage Architecture

  • PostgreSQL: Workflow definitions, configurations, audit logs
  • Redis: Execution state, queues, caching, pub/sub
  • S3/Blob Storage: Large files, logs, execution artifacts
  • TimescaleDB: Time-series data, metrics, analytics

7. Queue System

  • RabbitMQ: Task queuing, priority queues, dead letter queues
  • Kafka: Event streaming, audit trail, real-time analytics

Execution Flow

1. Trigger Phase

1. Trigger fires (webhook/schedule/event/etc)
2. Validate trigger configuration
3. Create ExecutionContext
4. Queue workflow for execution

2. Orchestration Phase

1. Load workflow definition
2. Build execution graph
3. Determine execution order
4. Initialize state management

3. Execution Phase

1. Execute nodes in topological order
2. Handle parallel branches
3. Manage data flow between nodes
4. Update execution state

4. Completion Phase

1. Aggregate results
2. Execute post-processing
3. Trigger downstream workflows
4. Clean up resources

Scalability Features

Horizontal Scaling

  • Stateless API servers
  • Distributed queue workers
  • Shared state via Redis
  • Database read replicas

Performance Optimization

  • Connection pooling
  • Result caching
  • Batch processing
  • Async I/O throughout

Resource Management

  • Worker pool management
  • Memory limits per execution
  • CPU throttling
  • Concurrent execution limits

Security

Authentication & Authorization

  • JWT tokens with refresh
  • API key authentication
  • OAuth2 integration
  • Role-based permissions

Data Security

  • Encryption at rest
  • TLS for all communications
  • Secret management (Vault)
  • Audit logging

Webhook Security

  • HMAC signature validation
  • IP whitelisting
  • Rate limiting
  • Request size limits

Monitoring & Observability

Metrics

  • Prometheus metrics
  • Custom business metrics
  • Performance tracking
  • Resource utilization

Logging

  • Structured logging
  • Centralized log aggregation
  • Log levels and filtering
  • Correlation IDs

Tracing

  • Distributed tracing (OpenTelemetry)
  • LLM monitoring (Langfuse)
  • Execution visualization
  • Performance profiling

Alerting

  • Error rate monitoring
  • SLA tracking
  • Resource alerts
  • Custom alerts

Error Handling

Retry Strategies

  • Exponential backoff
  • Circuit breakers
  • Dead letter queues
  • Manual retry options

Failure Modes

  • Node-level failures
  • Workflow-level failures
  • System-level failures
  • Graceful degradation

API Endpoints

Workflow Management

POST   /api/workflows                 # Create workflow
GET    /api/workflows/:id            # Get workflow
PUT    /api/workflows/:id            # Update workflow
DELETE /api/workflows/:id            # Delete workflow
POST   /api/workflows/:id/activate   # Activate workflow
POST   /api/workflows/:id/pause      # Pause workflow

Execution Management

POST   /api/workflows/:id/execute    # Manual execution
GET    /api/executions/:id           # Get execution status
POST   /api/executions/:id/cancel    # Cancel execution
GET    /api/executions/:id/logs      # Get execution logs

Trigger Management

GET    /api/workflows/:id/triggers   # List triggers
POST   /api/workflows/:id/triggers   # Add trigger
PUT    /api/triggers/:id             # Update trigger
DELETE /api/triggers/:id             # Remove trigger

Webhook Endpoints

POST   /webhooks/:path               # Webhook receiver
GET    /api/webhooks                 # List webhooks

Database Schema

Core Tables

-- Workflows table
CREATE TABLE workflows (
    id UUID PRIMARY KEY,
    name VARCHAR(255),
    description TEXT,
    project_id UUID,
    status VARCHAR(50),
    definition JSONB,
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

-- Workflow executions
CREATE TABLE workflow_executions (
    id UUID PRIMARY KEY,
    workflow_id UUID,
    status VARCHAR(50),
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    context JSONB,
    result JSONB,
    error TEXT
);

-- Triggers
CREATE TABLE triggers (
    id UUID PRIMARY KEY,
    workflow_id UUID,
    type VARCHAR(50),
    config JSONB,
    is_active BOOLEAN
);

-- Webhook registrations
CREATE TABLE webhook_registrations (
    id UUID PRIMARY KEY,
    workflow_id UUID,
    path VARCHAR(255) UNIQUE,
    secret VARCHAR(255),
    config JSONB
);

Deployment

Docker Compose (Development)

services:
  api:
    build: .
    ports:
      - "8000:8000"
    depends_on:
      - postgres
      - redis
      - rabbitmq
      
  worker:
    build: .
    command: python -m workflow_engine.worker
    depends_on:
      - postgres
      - redis
      - rabbitmq
      
  scheduler:
    build: .
    command: python -m workflow_engine.scheduler
    depends_on:
      - postgres
      - redis

Kubernetes (Production)

  • Deployment manifests for each service
  • Horizontal Pod Autoscaling
  • Service mesh (Istio)
  • Persistent volume claims

Future Enhancements

  1. Workflow Versioning: Track and manage workflow versions
  2. A/B Testing: Test different workflow variations
  3. Workflow Templates: Pre-built workflow templates
  4. Advanced Analytics: Detailed execution analytics
  5. Multi-tenancy: Full isolation between projects
  6. Workflow Marketplace: Share and monetize workflows
  7. Visual Debugging: Step-through debugging
  8. Performance Optimization: ML-based optimization