# CLAUDE.md This file provides guidance to Claude Code when working with code in this repository. ## Development Commands ```bash # Development bun run dev # Run Mastra dev server (mastra dev --dir src) # Testing bun run test # Run all tests (vitest run) bun run test:watch # Run tests in watch mode (vitest watch) bun run test:coverage # Run tests with coverage (vitest run --coverage) # Testing specific files bun test src/agents/weather-agent.test.ts # Run evaluation tests npm run eval # Run all evaluations with Braintrust npm run eval:file weather-agent.eval.ts # Run specific eval file npm run eval:watch # Run evaluations in watch mode npm run eval:dev # Run evaluations in dev mode # From root directory bun run lint packages/ai # Run Biome linter bun run lint:fix packages/ai # Fix linting issues bun run format packages/ai # Check formatting bun run format:fix packages/ai # Fix formatting bun run typecheck packages/ai # Run TypeScript type checking ``` ## Architecture Overview This package implements AI agents and tools using the Mastra framework, integrated with observability through Braintrust. The codebase follows a modular pattern designed for building complex multi-agent workflows. ## Folder Structure & Patterns ### Source Code (`src/`) #### **Agents** (`src/agents/`) ``` agents/ ├── analyst-agent/ │ ├── analyst-agent.ts # Agent definition │ └── analyst-agent-instructions.ts # Instructions/prompts └── think-and-prep-agent/ ├── think-and-prep-agent.ts └── think-and-prep-instructions.ts ``` **Pattern**: Each agent gets its own folder with: - Main agent file (defines tools, model, memory, options) - Instructions file (contains system prompts and behavior definitions) - Uses `Agent` from Mastra with `anthropicCachedModel` - Shared memory via `getSharedMemory()` - Standard options: `maxSteps: 18, temperature: 0, maxTokens: 10000` #### **Steps** (`src/steps/`) ``` steps/ ├── analyst-step.ts ├── create-todos-step.ts ├── extract-values-search-step.ts ├── generate-chat-title-step.ts ├── get-chat-history.ts └── think-and-prep-step.ts ``` **Pattern**: Steps orchestrate agent execution within workflows: - Use `createStep()` from Mastra - Define input/output schemas with Zod - Execute agents with proper context passing - Handle message history extraction and formatting - Wrap execution with `wrapTraced()` for observability - Pass data between steps through structured schemas #### **Tools** (`src/tools/`) ``` tools/ ├── communication-tools/ # Agent-to-agent communication │ ├── done-tool.ts │ ├── respond-without-asset-creation.ts │ └── submit-thoughts-tool.ts ├── database-tools/ # Data access │ └── find-required-text-values.ts ├── file-tools/ # File operations │ ├── bash-tool.ts │ ├── edit-file-tool.ts │ ├── read-file-tool.ts │ └── write-file-tool.ts ├── planning-thinking-tools/ # Strategic planning │ ├── create-plan-investigative-tool.ts │ ├── create-plan-straightforward-tool.ts │ ├── review-plan-tool.ts │ └── sequential-thinking-tool.ts ├── visualization-tools/ # Dashboard/metrics creation │ ├── create-dashboards-file-tool.ts │ ├── create-metrics-file-tool.ts │ ├── modify-dashboards-file-tool.ts │ └── modify-metrics-file-tool.ts └── index.ts # Tool exports ``` **Pattern**: Tools are categorized by function: - Use `createTool()` from Mastra - Define input/output schemas with Zod - Wrap main execution with `wrapTraced()` for observability - Include detailed descriptions for agent understanding - Export via `tools/index.ts` for easy importing #### **Workflows** (`src/workflows/`) ``` workflows/ └── analyst-workflow.ts # Multi-step workflow definition ``` **Pattern**: Workflows orchestrate multiple steps and agents: - Use `createWorkflow()` from Mastra - Define input/output schemas with Zod - Chain steps with `.parallel()`, `.then()`, `.branch()` patterns - Include runtime context interfaces for type safety - Support conditional branching based on step outputs #### **Utils** (`src/utils/`) ``` utils/ ├── convertToCoreMessages.ts ├── shared-memory.ts ├── memory/ │ ├── agent-memory.ts │ ├── message-history.ts # Message passing between agents │ ├── types.ts # Message/step data types │ └── index.ts └── models/ ├── ai-fallback.ts # Fallback model wrapper with retry logic ├── anthropic.ts # Basic Anthropic model wrapper ├── anthropic-cached.ts # Anthropic with caching support ├── vertex.ts # Google Vertex AI model wrapper ├── sonnet-4.ts # Claude Sonnet 4 with fallback └── haiku-3-5.ts # Claude Haiku 3.5 with fallback ``` **Pattern**: Utilities support core functionality: - **Memory**: Handles message history between agents in multi-step workflows - **Models**: Provides various AI model configurations with fallback support - **Message History**: Critical for multi-agent workflows - extracts and formats messages for passing between agents ##### Model Configuration Pattern The models folder provides different AI model configurations with automatic fallback support: 1. **Base Model Wrappers** (`anthropic.ts`, `vertex.ts`): - Wrap AI SDK models with Braintrust tracing - Handle authentication and configuration - Provide consistent interface for model usage 2. **Fallback Models** (`sonnet-4.ts`, `haiku-3-5.ts`): - Use `createFallback()` to define multiple model providers - Automatically switch between providers on errors - Configure retry behavior and error handling - Example: Sonnet4 tries Vertex first, falls back to Anthropic 3. **Cached Model** (`anthropic-cached.ts`): - Adds caching support to Anthropic models - Automatically adds cache_control to system messages - Includes connection pooling for better performance - Used by agents requiring prompt caching **Usage Example**: ```typescript // For general use with fallback support import { Sonnet4, Haiku35 } from '@buster/ai'; // For agents with complex prompts needing caching import { anthropicCachedModel } from '@buster/ai'; // Direct model usage (no fallback) import { anthropicModel, vertexModel } from '@buster/ai'; ``` ### Testing Strategy (`tests/`) #### **Test Structure** ``` tests/ ├── agents/integration/ # End-to-end agent tests ├── steps/integration/ # Step execution tests ├── tools/ │ ├── integration/ # Tool + LLM integration tests │ └── unit/ # Pure function/schema tests ├── workflows/integration/ # Full workflow tests ├── globalSetup.ts └── testSetup.ts ``` #### **Testing Philosophy** **Unit Tests** (`tests/tools/unit/`): - Test data structures, schemas, and logic flows - Validate input/output schemas with Zod - Test error handling and edge cases - Mock external dependencies - **DO NOT** test LLM quality/performance - Focus on: "Does the function work correctly?" **Integration Tests** (`tests/*/integration/`): - Test agents/tools/steps with real LLM calls - Verify workflow execution and data flow - Test that agents can use tools successfully - Validate message passing between agents - **DO NOT** evaluate response quality - Focus on: "Does the system work end-to-end?" ### Evaluation Strategy (`evals/`) #### **Evaluation Structure** ``` evals/ ├── agents/ │ └── analyst-agent/ │ └── workflow-match.eval.ts ├── online-scorer/ │ └── todos.ts ├── steps/ │ └── todos/ │ ├── scorers.ts │ └── todos-general-expected.eval.ts └── workflows/ ├── analyst-workflow-general.eval.ts └── analyst-workflow-redo.eval.private.ts ``` #### **Evaluation Philosophy** **Evaluations** (`.eval.ts` files): - Use Braintrust for LLM performance evaluation - Test actual LLM response quality and correctness - Use LLM-as-Judge patterns for scoring - Include datasets for consistent evaluation - Focus on: "Does the LLM produce good results?" **Key Distinction**: - **Tests** verify the system works (data flows, schemas, execution) - **Evaluations** verify the LLM produces quality outputs ## Multi-Agent Workflow Patterns ### Example: Analyst Workflow The analyst workflow demonstrates the multi-agent pattern: 1. **Parallel Initial Steps**: `generateChatTitleStep`, `extractValuesSearchStep`, `createTodosStep` 2. **Think and Prep Agent**: Processes initial analysis 3. **Conditional Branching**: Only runs analyst agent if needed 4. **Message History Passing**: Critical for agent-to-agent communication #### Message History Flow ```typescript // In think-and-prep-step.ts conversationHistory = extractMessageHistory(step.response.messages); // In analyst-step.ts const formattedMessages = formatMessagesForAnalyst( inputData.conversationHistory, initialPrompt ); ``` **Key Pattern**: Message history from one agent becomes input to the next agent, preserving conversation context and tool usage. ## Key Development Patterns ### Agent Definition Pattern ```typescript export const agentName = new Agent({ name: 'Agent Name', instructions: getInstructions, model: Sonnet4, // Can use Sonnet4, Haiku35, or anthropicCachedModel('model-id') tools: { tool1, tool2, tool3 }, memory: getSharedMemory(), defaultGenerateOptions: DEFAULT_OPTIONS, defaultStreamOptions: DEFAULT_OPTIONS, }); ``` ### Tool Definition Pattern ```typescript const inputSchema = z.object({ param: z.string().describe('Parameter description') }); const outputSchema = z.object({ result: z.string() }); const executeFunction = wrapTraced( async (params) => { // Tool logic here }, { name: 'tool-name' } ); export const toolName = createTool({ id: 'tool-id', description: 'Tool description for agent understanding', inputSchema, outputSchema, execute: executeFunction, }); ``` ### Step Definition Pattern ```typescript const inputSchema = z.object({ // Input from previous steps }); const outputSchema = z.object({ // Output for next steps }); const execution = async ({ inputData, getInitData, runtimeContext }) => { // Step logic with agent execution // Extract message history for multi-agent workflows // Return structured output }; export const stepName = createStep({ id: 'step-id', description: 'Step description', inputSchema, outputSchema, execute: execution, }); ``` ### Workflow Definition Pattern ```typescript const workflow = createWorkflow({ id: 'workflow-id', inputSchema, outputSchema, steps: [step1, step2, step3], }) .parallel([step1, step2, step3]) .then(step4) .branch([ [condition, step5], ]) .commit(); ``` ### Message History - Critical for multi-agent workflows - Extracted via `extractMessageHistory()` from step responses - Formatted via `formatMessagesForAnalyst()` for agent consumption - Preserves tool calls and results between agents ### Runtime Context - Passes workflow-specific data between steps - Type-safe with interfaces like `AnalystRuntimeContext` - Includes user/thread/organization identifiers ## Best Practices 1. **Tool Organization**: Group tools by functional category 2. **Schema Validation**: Always use Zod schemas for input/output 3. **Observability**: Wrap functions with `wrapTraced()` for monitoring 4. **Message Passing**: Use structured message history for multi-agent workflows 5. **Testing Strategy**: Unit tests for logic, integration tests for flow, evaluations for quality 6. **Memory Management**: Use shared memory for conversation persistence 7. **Error Handling**: Graceful handling with user-friendly error messages 8. **Type Safety**: Leverage TypeScript with strict configuration ## Environment Variables Required environment variables: - `BRAINTRUST_KEY`: For observability and evaluations - `ANTHROPIC_API_KEY`: For Claude model access - Additional keys for specific tools (database connections, etc.) ## Conversation History Management ### Overview The AI package supports multi-turn conversations by managing conversation history through the database. This enables workflows to maintain context across multiple interactions. ### Key Components #### Chat History Utilities (`src/steps/get-chat-history.ts`) Provides functions for retrieving conversation history: ```typescript // Get all messages with metadata for a chat getChatHistory(chatId: string): Promise // Get just the raw LLM messages for a chat getRawLlmMessages(chatId: string): Promise // Get raw LLM messages for a specific message ID getRawLlmMessagesByMessageId(messageId: string): Promise ``` #### Database Integration The chat history utilities use the `@buster/database` helpers for clean separation of concerns: - Database operations stay in the database package - Type validation and transformation happen in the AI package ### Conversation History Flow #### 1. Initial Message with Database Save ```typescript // First run - with messageId for database persistence const messageId = await createTestMessage(chatId, userId); const runtimeContext = new RuntimeContext(); runtimeContext.set('messageId', messageId); const result = await analystWorkflow.createRun().start({ inputData: { prompt: "Initial question" }, runtimeContext, }); // Conversation history is automatically saved to database ``` #### 2. Retrieving Conversation History ```typescript // Fetch the conversation history from the database import { getRawLlmMessagesByMessageId } from '@buster/ai'; const conversationHistory = await getRawLlmMessagesByMessageId(messageId); // Returns: CoreMessage[] or null ``` #### 3. Follow-up with History ```typescript // Second run - with conversation history const followUpResult = await analystWorkflow.createRun().start({ inputData: { prompt: "Follow-up question", conversationHistory: conversationHistory as CoreMessage[], }, runtimeContext, }); ``` ### Testing Conversation History See `tests/workflows/integration/analyst-workflow.int.test.ts` for examples: ```typescript test('conversation history flow', async () => { // 1. Create initial message const { chatId, userId } = await createTestChat(); const messageId = await createTestMessage(chatId, userId); // 2. Run workflow with messageId const runtimeContext = new RuntimeContext(); runtimeContext.set('messageId', messageId); const firstRun = await workflow.start({ inputData: { prompt: "First question" }, runtimeContext, }); // 3. Retrieve conversation history const history = await getRawLlmMessagesByMessageId(messageId); // 4. Run follow-up with history const secondRun = await workflow.start({ inputData: { prompt: "Follow-up question", conversationHistory: history as CoreMessage[], }, runtimeContext, }); }); ``` ### Best Practices 1. **Use MessageId for Persistence**: Always provide a `messageId` in runtime context when you want to save conversation history 2. **Type Safety**: Cast retrieved history to `CoreMessage[]` after validation 3. **Handle Null Cases**: Check if history exists before using it 4. **Test Both Paths**: Test workflows both with and without conversation history