15 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code when working with code in this repository.

Development Commands

# Development
bun run dev              # Run Mastra dev server (mastra dev --dir src)

# Testing  
bun run test             # Run all tests (vitest run)
bun run test:watch       # Run tests in watch mode (vitest watch)
bun run test:coverage    # Run tests with coverage (vitest run --coverage)

# Testing specific files
bun test src/agents/weather-agent.test.ts

# Run evaluation tests
npm run eval             # Run all evaluations with Braintrust
npm run eval:file weather-agent.eval.ts  # Run specific eval file
npm run eval:watch       # Run evaluations in watch mode
npm run eval:dev         # Run evaluations in dev mode

# From root directory
bun run lint packages/ai      # Run Biome linter
bun run lint:fix packages/ai  # Fix linting issues
bun run format packages/ai    # Check formatting
bun run format:fix packages/ai # Fix formatting
bun run typecheck packages/ai # Run TypeScript type checking

Architecture Overview

This package implements AI agents and tools using the Mastra framework, integrated with observability through Braintrust. The codebase follows a modular pattern designed for building complex multi-agent workflows.

Folder Structure & Patterns

Source Code (`src/`)

Agents (`src/agents/`)

agents/
├── analyst-agent/
│   ├── analyst-agent.ts           # Agent definition
│   └── analyst-agent-instructions.ts # Instructions/prompts
└── think-and-prep-agent/
    ├── think-and-prep-agent.ts
    └── think-and-prep-instructions.ts

Pattern: Each agent gets its own folder with:

Main agent file (defines tools, model, memory, options)
Instructions file (contains system prompts and behavior definitions)
Uses Agent from Mastra with anthropicCachedModel
Shared memory via getSharedMemory()
Standard options: maxSteps: 18, temperature: 0, maxTokens: 10000

Steps (`src/steps/`)

steps/
├── analyst-step.ts
├── create-todos-step.ts
├── extract-values-search-step.ts
├── generate-chat-title-step.ts
├── get-chat-history.ts
└── think-and-prep-step.ts

Pattern: Steps orchestrate agent execution within workflows:

Use createStep() from Mastra
Define input/output schemas with Zod
Execute agents with proper context passing
Handle message history extraction and formatting
Wrap execution with wrapTraced() for observability
Pass data between steps through structured schemas

Tools (`src/tools/`)

tools/
├── communication-tools/     # Agent-to-agent communication
│   ├── done-tool.ts
│   ├── respond-without-asset-creation.ts
│   └── submit-thoughts-tool.ts
├── database-tools/          # Data access
│   └── find-required-text-values.ts
├── file-tools/             # File operations
│   ├── bash-tool.ts
│   ├── edit-file-tool.ts
│   ├── read-file-tool.ts
│   └── write-file-tool.ts
├── planning-thinking-tools/ # Strategic planning
│   ├── create-plan-investigative-tool.ts
│   ├── create-plan-straightforward-tool.ts
│   ├── review-plan-tool.ts
│   └── sequential-thinking-tool.ts
├── visualization-tools/     # Dashboard/metrics creation
│   ├── create-dashboards-file-tool.ts
│   ├── create-metrics-file-tool.ts
│   ├── modify-dashboards-file-tool.ts
│   └── modify-metrics-file-tool.ts
└── index.ts                # Tool exports

Pattern: Tools are categorized by function:

Use createTool() from Mastra
Define input/output schemas with Zod
Wrap main execution with wrapTraced() for observability
Include detailed descriptions for agent understanding
Export via tools/index.ts for easy importing

Workflows (`src/workflows/`)

workflows/
└── analyst-workflow.ts     # Multi-step workflow definition

Pattern: Workflows orchestrate multiple steps and agents:

Use createWorkflow() from Mastra
Define input/output schemas with Zod
Chain steps with .parallel(), .then(), .branch() patterns
Include runtime context interfaces for type safety
Support conditional branching based on step outputs

Utils (`src/utils/`)

utils/
├── convertToCoreMessages.ts
├── shared-memory.ts
├── memory/
│   ├── agent-memory.ts
│   ├── message-history.ts    # Message passing between agents
│   ├── types.ts             # Message/step data types
│   └── index.ts
└── models/
    ├── ai-fallback.ts        # Fallback model wrapper with retry logic
    ├── anthropic.ts          # Basic Anthropic model wrapper
    ├── anthropic-cached.ts   # Anthropic with caching support
    ├── vertex.ts             # Google Vertex AI model wrapper
    ├── sonnet-4.ts           # Claude Sonnet 4 with fallback
    └── haiku-3-5.ts          # Claude Haiku 3.5 with fallback

Pattern: Utilities support core functionality:

Memory: Handles message history between agents in multi-step workflows
Models: Provides various AI model configurations with fallback support
Message History: Critical for multi-agent workflows - extracts and formats messages for passing between agents

Model Configuration Pattern

The models folder provides different AI model configurations with automatic fallback support:

Base Model Wrappers (anthropic.ts, vertex.ts):
- Wrap AI SDK models with Braintrust tracing
- Handle authentication and configuration
- Provide consistent interface for model usage
Fallback Models (sonnet-4.ts, haiku-3-5.ts):
- Use createFallback() to define multiple model providers
- Automatically switch between providers on errors
- Configure retry behavior and error handling
- Example: Sonnet4 tries Vertex first, falls back to Anthropic
Cached Model (anthropic-cached.ts):
- Adds caching support to Anthropic models
- Automatically adds cache_control to system messages
- Includes connection pooling for better performance
- Used by agents requiring prompt caching

Usage Example:

// For general use with fallback support
import { Sonnet4, Haiku35 } from '@buster/ai';

// For agents with complex prompts needing caching
import { anthropicCachedModel } from '@buster/ai';

// Direct model usage (no fallback)
import { anthropicModel, vertexModel } from '@buster/ai';

Testing Strategy (`tests/`)

Test Structure

tests/
├── agents/integration/          # End-to-end agent tests
├── steps/integration/           # Step execution tests
├── tools/
│   ├── integration/            # Tool + LLM integration tests
│   └── unit/                   # Pure function/schema tests
├── workflows/integration/       # Full workflow tests
├── globalSetup.ts
└── testSetup.ts

Testing Philosophy

Unit Tests (tests/tools/unit/):

Test data structures, schemas, and logic flows
Validate input/output schemas with Zod
Test error handling and edge cases
Mock external dependencies
DO NOT test LLM quality/performance
Focus on: "Does the function work correctly?"

Integration Tests (tests/*/integration/):

Test agents/tools/steps with real LLM calls
Verify workflow execution and data flow
Test that agents can use tools successfully
Validate message passing between agents
DO NOT evaluate response quality
Focus on: "Does the system work end-to-end?"

Evaluation Strategy (`evals/`)

Evaluation Structure

evals/
├── agents/
│   └── analyst-agent/
│       └── workflow-match.eval.ts
├── online-scorer/
│   └── todos.ts
├── steps/
│   └── todos/
│       ├── scorers.ts
│       └── todos-general-expected.eval.ts
└── workflows/
    ├── analyst-workflow-general.eval.ts
    └── analyst-workflow-redo.eval.private.ts

Evaluation Philosophy

Evaluations (.eval.ts files):

Use Braintrust for LLM performance evaluation
Test actual LLM response quality and correctness
Use LLM-as-Judge patterns for scoring
Include datasets for consistent evaluation
Focus on: "Does the LLM produce good results?"

Key Distinction:

Tests verify the system works (data flows, schemas, execution)
Evaluations verify the LLM produces quality outputs

Multi-Agent Workflow Patterns

Example: Analyst Workflow

The analyst workflow demonstrates the multi-agent pattern:

Parallel Initial Steps: generateChatTitleStep, extractValuesSearchStep, createTodosStep
Think and Prep Agent: Processes initial analysis
Conditional Branching: Only runs analyst agent if needed
Message History Passing: Critical for agent-to-agent communication

Message History Flow

// In think-and-prep-step.ts
conversationHistory = extractMessageHistory(step.response.messages);

// In analyst-step.ts  
const formattedMessages = formatMessagesForAnalyst(
  inputData.conversationHistory,
  initialPrompt
);

Key Pattern: Message history from one agent becomes input to the next agent, preserving conversation context and tool usage.

Key Development Patterns

Agent Definition Pattern

export const agentName = new Agent({
  name: 'Agent Name',
  instructions: getInstructions,
  model: Sonnet4,  // Can use Sonnet4, Haiku35, or anthropicCachedModel('model-id')
  tools: { tool1, tool2, tool3 },
  memory: getSharedMemory(),
  defaultGenerateOptions: DEFAULT_OPTIONS,
  defaultStreamOptions: DEFAULT_OPTIONS,
});

Tool Definition Pattern

const inputSchema = z.object({
  param: z.string().describe('Parameter description')
});

const outputSchema = z.object({
  result: z.string()
});

const executeFunction = wrapTraced(
  async (params) => {
    // Tool logic here
  },
  { name: 'tool-name' }
);

export const toolName = createTool({
  id: 'tool-id',
  description: 'Tool description for agent understanding',
  inputSchema,
  outputSchema,
  execute: executeFunction,
});

Step Definition Pattern

const inputSchema = z.object({
  // Input from previous steps
});

const outputSchema = z.object({
  // Output for next steps
});

const execution = async ({ inputData, getInitData, runtimeContext }) => {
  // Step logic with agent execution
  // Extract message history for multi-agent workflows
  // Return structured output
};

export const stepName = createStep({
  id: 'step-id',
  description: 'Step description',
  inputSchema,
  outputSchema,
  execute: execution,
});

Workflow Definition Pattern

const workflow = createWorkflow({
  id: 'workflow-id',
  inputSchema,
  outputSchema,
  steps: [step1, step2, step3],
})
  .parallel([step1, step2, step3])
  .then(step4)
  .branch([
    [condition, step5],
  ])
  .commit();

Message History

Critical for multi-agent workflows
Extracted via extractMessageHistory() from step responses
Formatted via formatMessagesForAnalyst() for agent consumption
Preserves tool calls and results between agents

Runtime Context

Passes workflow-specific data between steps
Type-safe with interfaces like AnalystRuntimeContext
Includes user/thread/organization identifiers

Best Practices

Tool Organization: Group tools by functional category
Schema Validation: Always use Zod schemas for input/output
Observability: Wrap functions with wrapTraced() for monitoring
Message Passing: Use structured message history for multi-agent workflows
Testing Strategy: Unit tests for logic, integration tests for flow, evaluations for quality
Memory Management: Use shared memory for conversation persistence
Error Handling: Graceful handling with user-friendly error messages
Type Safety: Leverage TypeScript with strict configuration

Environment Variables

Required environment variables:

BRAINTRUST_KEY: For observability and evaluations
ANTHROPIC_API_KEY: For Claude model access
Additional keys for specific tools (database connections, etc.)

Conversation History Management

Overview

The AI package supports multi-turn conversations by managing conversation history through the database. This enables workflows to maintain context across multiple interactions.

Key Components

Chat History Utilities (`src/steps/get-chat-history.ts`)

Provides functions for retrieving conversation history:

// Get all messages with metadata for a chat
getChatHistory(chatId: string): Promise<ChatHistoryResult[]>

// Get just the raw LLM messages for a chat
getRawLlmMessages(chatId: string): Promise<MessageHistory[]>

// Get raw LLM messages for a specific message ID
getRawLlmMessagesByMessageId(messageId: string): Promise<MessageHistory | null>

Database Integration

The chat history utilities use the @buster/database helpers for clean separation of concerns:

Database operations stay in the database package
Type validation and transformation happen in the AI package

Conversation History Flow

1. Initial Message with Database Save

// First run - with messageId for database persistence
const messageId = await createTestMessage(chatId, userId);
const runtimeContext = new RuntimeContext();
runtimeContext.set('messageId', messageId);

const result = await analystWorkflow.createRun().start({
  inputData: { prompt: "Initial question" },
  runtimeContext,
});

// Conversation history is automatically saved to database

2. Retrieving Conversation History

// Fetch the conversation history from the database
import { getRawLlmMessagesByMessageId } from '@buster/ai';

const conversationHistory = await getRawLlmMessagesByMessageId(messageId);
// Returns: CoreMessage[] or null

3. Follow-up with History

// Second run - with conversation history
const followUpResult = await analystWorkflow.createRun().start({
  inputData: {
    prompt: "Follow-up question",
    conversationHistory: conversationHistory as CoreMessage[],
  },
  runtimeContext,
});

Testing Conversation History

See tests/workflows/integration/analyst-workflow.int.test.ts for examples:

test('conversation history flow', async () => {
  // 1. Create initial message
  const { chatId, userId } = await createTestChat();
  const messageId = await createTestMessage(chatId, userId);
  
  // 2. Run workflow with messageId
  const runtimeContext = new RuntimeContext();
  runtimeContext.set('messageId', messageId);
  
  const firstRun = await workflow.start({
    inputData: { prompt: "First question" },
    runtimeContext,
  });
  
  // 3. Retrieve conversation history
  const history = await getRawLlmMessagesByMessageId(messageId);
  
  // 4. Run follow-up with history
  const secondRun = await workflow.start({
    inputData: {
      prompt: "Follow-up question",
      conversationHistory: history as CoreMessage[],
    },
    runtimeContext,
  });
});

Best Practices

Use MessageId for Persistence: Always provide a messageId in runtime context when you want to save conversation history
Type Safety: Cast retrieved history to CoreMessage[] after validation
Handle Null Cases: Check if history exists before using it
Test Both Paths: Test workflows both with and without conversation history

15 KiB Raw Blame History

CLAUDE.md

Development Commands

Architecture Overview

Folder Structure & Patterns

Source Code (src/)

Agents (src/agents/)

Steps (src/steps/)

Tools (src/tools/)

Workflows (src/workflows/)

Utils (src/utils/)

Model Configuration Pattern

Testing Strategy (tests/)

Test Structure

Testing Philosophy

Evaluation Strategy (evals/)

Evaluation Structure

Evaluation Philosophy

Multi-Agent Workflow Patterns

Example: Analyst Workflow

Message History Flow

Key Development Patterns

Agent Definition Pattern

Tool Definition Pattern

Step Definition Pattern

Workflow Definition Pattern

Message History

Runtime Context

Best Practices

Environment Variables

Conversation History Management

Overview

Key Components

Chat History Utilities (src/steps/get-chat-history.ts)

Database Integration

Conversation History Flow

1. Initial Message with Database Save

2. Retrieving Conversation History

3. Follow-up with History

Testing Conversation History

Best Practices

15 KiB

Raw Blame History

Source Code (`src/`)

Agents (`src/agents/`)

Steps (`src/steps/`)

Tools (`src/tools/`)

Workflows (`src/workflows/`)

Utils (`src/utils/`)

Testing Strategy (`tests/`)

Evaluation Strategy (`evals/`)

Chat History Utilities (`src/steps/get-chat-history.ts`)