buster/packages/ai/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code when working with code in this repository.

## Development Commands

```bash
# Development
bun run dev              # Run Mastra dev server (mastra dev --dir src)

# Testing
bun run test             # Run all tests (vitest run)
bun run test:watch       # Run tests in watch mode (vitest watch)
bun run test:coverage    # Run tests with coverage (vitest run --coverage)

# Testing specific files
bun test src/agents/weather-agent.test.ts

# Run evaluation tests
npm run eval             # Run all evaluations with Braintrust
npm run eval:file weather-agent.eval.ts  # Run specific eval file
npm run eval:watch       # Run evaluations in watch mode
npm run eval:dev         # Run evaluations in dev mode

# From root directory
bun run lint packages/ai      # Run Biome linter
bun run lint:fix packages/ai  # Fix linting issues
bun run format packages/ai    # Check formatting
bun run format:fix packages/ai # Fix formatting
bun run typecheck packages/ai # Run TypeScript type checking
```

## Architecture Overview

This package implements AI agents and tools using the Mastra framework, integrated with observability through Braintrust. The codebase follows a modular pattern designed for building complex multi-agent workflows.

## Folder Structure & Patterns

### Source Code (`src/`)

#### **Agents** (`src/agents/`)
```
agents/
├── analyst-agent/
│   ├── analyst-agent.ts           # Agent definition
│   └── analyst-agent-instructions.ts # Instructions/prompts
└── think-and-prep-agent/
    ├── think-and-prep-agent.ts
    └── think-and-prep-instructions.ts
```

**Pattern**: Each agent gets its own folder with:
- Main agent file (defines tools, model, memory, options)
- Instructions file (contains system prompts and behavior definitions)
- Uses `Agent` from Mastra with `anthropicCachedModel`
- Shared memory via `getSharedMemory()`
- Standard options: `maxSteps: 18, temperature: 0, maxTokens: 10000`

#### **Steps** (`src/steps/`)
```
steps/
├── analyst-step.ts
├── create-todos-step.ts
├── extract-values-search-step.ts
├── generate-chat-title-step.ts
├── get-chat-history.ts
└── think-and-prep-step.ts
```

**Pattern**: Steps orchestrate agent execution within workflows:
- Use `createStep()` from Mastra
- Define input/output schemas with Zod
- Execute agents with proper context passing
- Handle message history extraction and formatting
- Wrap execution with `wrapTraced()` for observability
- Pass data between steps through structured schemas

#### **Tools** (`src/tools/`)
```
tools/
├── communication-tools/     # Agent-to-agent communication
│   ├── done-tool.ts
│   ├── respond-without-asset-creation.ts
│   └── submit-thoughts-tool.ts
├── database-tools/          # Data access
│   └── find-required-text-values.ts
├── file-tools/             # File operations
│   ├── bash-tool.ts
│   ├── edit-file-tool.ts
│   ├── read-file-tool.ts
│   └── write-file-tool.ts
├── planning-thinking-tools/ # Strategic planning
│   ├── create-plan-investigative-tool.ts
│   ├── create-plan-straightforward-tool.ts
│   ├── review-plan-tool.ts
│   └── sequential-thinking-tool.ts
├── visualization-tools/     # Dashboard/metrics creation
│   ├── create-dashboards-file-tool.ts
│   ├── create-metrics-file-tool.ts
│   ├── modify-dashboards-file-tool.ts
│   └── modify-metrics-file-tool.ts
└── index.ts                # Tool exports
```

**Pattern**: Tools are categorized by function:
- Use `createTool()` from Mastra
- Define input/output schemas with Zod
- Wrap main execution with `wrapTraced()` for observability
- Include detailed descriptions for agent understanding
- Export via `tools/index.ts` for easy importing

#### **Workflows** (`src/workflows/`)
```
workflows/
└── analyst-workflow.ts     # Multi-step workflow definition
```

**Pattern**: Workflows orchestrate multiple steps and agents:
- Use `createWorkflow()` from Mastra
- Define input/output schemas with Zod
- Chain steps with `.parallel()`, `.then()`, `.branch()` patterns
- Include runtime context interfaces for type safety
- Support conditional branching based on step outputs

#### **Utils** (`src/utils/`)
```
utils/
├── convertToCoreMessages.ts
├── shared-memory.ts
├── memory/
│   ├── agent-memory.ts
│   ├── message-history.ts    # Message passing between agents
│   ├── types.ts             # Message/step data types
│   └── index.ts
└── models/
    ├── ai-fallback.ts        # Fallback model wrapper with retry logic
    ├── anthropic.ts          # Basic Anthropic model wrapper
    ├── anthropic-cached.ts   # Anthropic with caching support
    ├── vertex.ts             # Google Vertex AI model wrapper
    ├── sonnet-4.ts           # Claude Sonnet 4 with fallback
    └── haiku-3-5.ts          # Claude Haiku 3.5 with fallback
```

**Pattern**: Utilities support core functionality:
- **Memory**: Handles message history between agents in multi-step workflows
- **Models**: Provides various AI model configurations with fallback support
- **Message History**: Critical for multi-agent workflows - extracts and formats messages for passing between agents

##### Model Configuration Pattern

The models folder provides different AI model configurations with automatic fallback support:

1. **Base Model Wrappers** (`anthropic.ts`, `vertex.ts`):
   - Wrap AI SDK models with Braintrust tracing
   - Handle authentication and configuration
   - Provide consistent interface for model usage

2. **Fallback Models** (`sonnet-4.ts`, `haiku-3-5.ts`):
   - Use `createFallback()` to define multiple model providers
   - Automatically switch between providers on errors
   - Configure retry behavior and error handling
   - Example: Sonnet4 tries Vertex first, falls back to Anthropic

3. **Cached Model** (`anthropic-cached.ts`):
   - Adds caching support to Anthropic models
   - Automatically adds cache_control to system messages
   - Includes connection pooling for better performance
   - Used by agents requiring prompt caching

**Usage Example**:
```typescript
// For general use with fallback support
import { Sonnet4, Haiku35 } from '@buster/ai';

// For agents with complex prompts needing caching
import { anthropicCachedModel } from '@buster/ai';

// Direct model usage (no fallback)
import { anthropicModel, vertexModel } from '@buster/ai';
```

### Testing Strategy (`tests/`)

#### **Test Structure**
```
tests/
├── agents/integration/          # End-to-end agent tests
├── steps/integration/           # Step execution tests
├── tools/
│   ├── integration/            # Tool + LLM integration tests
│   └── unit/                   # Pure function/schema tests
├── workflows/integration/       # Full workflow tests
├── globalSetup.ts
└── testSetup.ts
```

#### **Testing Philosophy**

**Unit Tests** (`tests/tools/unit/`):
- Test data structures, schemas, and logic flows
- Validate input/output schemas with Zod
- Test error handling and edge cases
- Mock external dependencies
- **DO NOT** test LLM quality/performance
- Focus on: "Does the function work correctly?"

**Integration Tests** (`tests/*/integration/`):
- Test agents/tools/steps with real LLM calls
- Verify workflow execution and data flow
- Test that agents can use tools successfully
- Validate message passing between agents
- **DO NOT** evaluate response quality
- Focus on: "Does the system work end-to-end?"

### Evaluation Strategy (`evals/`)

#### **Evaluation Structure**
```
evals/
├── agents/
│   └── analyst-agent/
│       └── workflow-match.eval.ts
├── online-scorer/
│   └── todos.ts
├── steps/
│   └── todos/
│       ├── scorers.ts
│       └── todos-general-expected.eval.ts
└── workflows/
    ├── analyst-workflow-general.eval.ts
    └── analyst-workflow-redo.eval.private.ts
```

#### **Evaluation Philosophy**

**Evaluations** (`.eval.ts` files):
- Use Braintrust for LLM performance evaluation
- Test actual LLM response quality and correctness
- Use LLM-as-Judge patterns for scoring
- Include datasets for consistent evaluation
- Focus on: "Does the LLM produce good results?"

**Key Distinction**:
- **Tests** verify the system works (data flows, schemas, execution)
- **Evaluations** verify the LLM produces quality outputs

## Multi-Agent Workflow Patterns

### Example: Analyst Workflow

The analyst workflow demonstrates the multi-agent pattern:

1. **Parallel Initial Steps**: `generateChatTitleStep`, `extractValuesSearchStep`, `createTodosStep`
2. **Think and Prep Agent**: Processes initial analysis
3. **Conditional Branching**: Only runs analyst agent if needed
4. **Message History Passing**: Critical for agent-to-agent communication

#### Message History Flow

```typescript
// In think-and-prep-step.ts
conversationHistory = extractMessageHistory(step.response.messages);

// In analyst-step.ts
const formattedMessages = formatMessagesForAnalyst(
  inputData.conversationHistory,
  initialPrompt
);
```

**Key Pattern**: Message history from one agent becomes input to the next agent, preserving conversation context and tool usage.

## Key Development Patterns

### Agent Definition Pattern
```typescript
export const agentName = new Agent({
  name: 'Agent Name',
  instructions: getInstructions,
  model: Sonnet4,  // Can use Sonnet4, Haiku35, or anthropicCachedModel('model-id')
  tools: { tool1, tool2, tool3 },
  memory: getSharedMemory(),
  defaultGenerateOptions: DEFAULT_OPTIONS,
  defaultStreamOptions: DEFAULT_OPTIONS,
});
```

### Tool Definition Pattern
```typescript
const inputSchema = z.object({
  param: z.string().describe('Parameter description')
});

const outputSchema = z.object({
  result: z.string()
});

const executeFunction = wrapTraced(
  async (params) => {
    // Tool logic here
  },
  { name: 'tool-name' }
);

export const toolName = createTool({
  id: 'tool-id',
  description: 'Tool description for agent understanding',
  inputSchema,
  outputSchema,
  execute: executeFunction,
});
```

### Step Definition Pattern
```typescript
const inputSchema = z.object({
  // Input from previous steps
});

const outputSchema = z.object({
  // Output for next steps
});

const execution = async ({ inputData, getInitData, runtimeContext }) => {
  // Step logic with agent execution
  // Extract message history for multi-agent workflows
  // Return structured output
};

export const stepName = createStep({
  id: 'step-id',
  description: 'Step description',
  inputSchema,
  outputSchema,
  execute: execution,
});
```

### Workflow Definition Pattern
```typescript
const workflow = createWorkflow({
  id: 'workflow-id',
  inputSchema,
  outputSchema,
  steps: [step1, step2, step3],
})
  .parallel([step1, step2, step3])
  .then(step4)
  .branch([
    [condition, step5],
  ])
  .commit();
```

### Message History
- Critical for multi-agent workflows
- Extracted via `extractMessageHistory()` from step responses
- Formatted via `formatMessagesForAnalyst()` for agent consumption
- Preserves tool calls and results between agents

### Runtime Context
- Passes workflow-specific data between steps
- Type-safe with interfaces like `AnalystRuntimeContext`
- Includes user/thread/organization identifiers

## Best Practices

1. **Tool Organization**: Group tools by functional category
2. **Schema Validation**: Always use Zod schemas for input/output
3. **Observability**: Wrap functions with `wrapTraced()` for monitoring
4. **Message Passing**: Use structured message history for multi-agent workflows
5. **Testing Strategy**: Unit tests for logic, integration tests for flow, evaluations for quality
6. **Memory Management**: Use shared memory for conversation persistence
7. **Error Handling**: Graceful handling with user-friendly error messages
8. **Type Safety**: Leverage TypeScript with strict configuration

## Environment Variables

Required environment variables:
- `BRAINTRUST_KEY`: For observability and evaluations
- `ANTHROPIC_API_KEY`: For Claude model access
- Additional keys for specific tools (database connections, etc.)

## Conversation History Management

### Overview

The AI package supports multi-turn conversations by managing conversation history through the database. This enables workflows to maintain context across multiple interactions.

### Key Components

#### Chat History Utilities (`src/steps/get-chat-history.ts`)

Provides functions for retrieving conversation history:

```typescript
// Get all messages with metadata for a chat
getChatHistory(chatId: string): Promise<ChatHistoryResult[]>

// Get just the raw LLM messages for a chat
getRawLlmMessages(chatId: string): Promise<MessageHistory[]>

// Get raw LLM messages for a specific message ID
getRawLlmMessagesByMessageId(messageId: string): Promise<MessageHistory | null>
```

#### Database Integration

The chat history utilities use the `@buster/database` helpers for clean separation of concerns:
- Database operations stay in the database package
- Type validation and transformation happen in the AI package

### Conversation History Flow

#### 1. Initial Message with Database Save

```typescript
// First run - with messageId for database persistence
const messageId = await createTestMessage(chatId, userId);
const runtimeContext = new RuntimeContext();
runtimeContext.set('messageId', messageId);

const result = await analystWorkflow.createRun().start({
  inputData: { prompt: "Initial question" },
  runtimeContext,
});

// Conversation history is automatically saved to database
```

#### 2. Retrieving Conversation History

```typescript
// Fetch the conversation history from the database
import { getRawLlmMessagesByMessageId } from '@buster/ai';

const conversationHistory = await getRawLlmMessagesByMessageId(messageId);
// Returns: CoreMessage[] or null
```

#### 3. Follow-up with History

```typescript
// Second run - with conversation history
const followUpResult = await analystWorkflow.createRun().start({
  inputData: {
    prompt: "Follow-up question",
    conversationHistory: conversationHistory as CoreMessage[],
  },
  runtimeContext,
});
```

### Testing Conversation History

See `tests/workflows/integration/analyst-workflow.int.test.ts` for examples:

```typescript
test('conversation history flow', async () => {
  // 1. Create initial message
  const { chatId, userId } = await createTestChat();
  const messageId = await createTestMessage(chatId, userId);

  // 2. Run workflow with messageId
  const runtimeContext = new RuntimeContext();
  runtimeContext.set('messageId', messageId);

  const firstRun = await workflow.start({
    inputData: { prompt: "First question" },
    runtimeContext,
  });

  // 3. Retrieve conversation history
  const history = await getRawLlmMessagesByMessageId(messageId);

  // 4. Run follow-up with history
  const secondRun = await workflow.start({
    inputData: {
      prompt: "Follow-up question",
      conversationHistory: history as CoreMessage[],
    },
    runtimeContext,
  });
});
```

### Best Practices

1. **Use MessageId for Persistence**: Always provide a `messageId` in runtime context when you want to save conversation history
2. **Type Safety**: Cast retrieved history to `CoreMessage[]` after validation
3. **Handle Null Cases**: Check if history exists before using it
4. **Test Both Paths**: Test workflows both with and without conversation history