15 KiB
CLAUDE.md
This file provides guidance to Claude Code when working with code in this repository.
Development Commands
# Development
bun run dev # Run Mastra dev server (mastra dev --dir src)
# Testing
bun run test # Run all tests (vitest run)
bun run test:watch # Run tests in watch mode (vitest watch)
bun run test:coverage # Run tests with coverage (vitest run --coverage)
# Testing specific files
bun test src/agents/weather-agent.test.ts
# Run evaluation tests
npm run eval # Run all evaluations with Braintrust
npm run eval:file weather-agent.eval.ts # Run specific eval file
npm run eval:watch # Run evaluations in watch mode
npm run eval:dev # Run evaluations in dev mode
# From root directory
bun run lint packages/ai # Run Biome linter
bun run lint:fix packages/ai # Fix linting issues
bun run format packages/ai # Check formatting
bun run format:fix packages/ai # Fix formatting
bun run typecheck packages/ai # Run TypeScript type checking
Architecture Overview
This package implements AI agents and tools using the Mastra framework, integrated with observability through Braintrust. The codebase follows a modular pattern designed for building complex multi-agent workflows.
Folder Structure & Patterns
Source Code (src/
)
Agents (src/agents/
)
agents/
├── analyst-agent/
│ ├── analyst-agent.ts # Agent definition
│ └── analyst-agent-instructions.ts # Instructions/prompts
└── think-and-prep-agent/
├── think-and-prep-agent.ts
└── think-and-prep-instructions.ts
Pattern: Each agent gets its own folder with:
- Main agent file (defines tools, model, memory, options)
- Instructions file (contains system prompts and behavior definitions)
- Uses
Agent
from Mastra withanthropicCachedModel
- Shared memory via
getSharedMemory()
- Standard options:
maxSteps: 18, temperature: 0, maxTokens: 10000
Steps (src/steps/
)
steps/
├── analyst-step.ts
├── create-todos-step.ts
├── extract-values-search-step.ts
├── generate-chat-title-step.ts
├── get-chat-history.ts
└── think-and-prep-step.ts
Pattern: Steps orchestrate agent execution within workflows:
- Use
createStep()
from Mastra - Define input/output schemas with Zod
- Execute agents with proper context passing
- Handle message history extraction and formatting
- Wrap execution with
wrapTraced()
for observability - Pass data between steps through structured schemas
Tools (src/tools/
)
tools/
├── communication-tools/ # Agent-to-agent communication
│ ├── done-tool.ts
│ ├── respond-without-asset-creation.ts
│ └── submit-thoughts-tool.ts
├── database-tools/ # Data access
│ └── find-required-text-values.ts
├── file-tools/ # File operations
│ ├── bash-tool.ts
│ ├── edit-file-tool.ts
│ ├── read-file-tool.ts
│ └── write-file-tool.ts
├── planning-thinking-tools/ # Strategic planning
│ ├── create-plan-investigative-tool.ts
│ ├── create-plan-straightforward-tool.ts
│ ├── review-plan-tool.ts
│ └── sequential-thinking-tool.ts
├── visualization-tools/ # Dashboard/metrics creation
│ ├── create-dashboards-file-tool.ts
│ ├── create-metrics-file-tool.ts
│ ├── modify-dashboards-file-tool.ts
│ └── modify-metrics-file-tool.ts
└── index.ts # Tool exports
Pattern: Tools are categorized by function:
- Use
createTool()
from Mastra - Define input/output schemas with Zod
- Wrap main execution with
wrapTraced()
for observability - Include detailed descriptions for agent understanding
- Export via
tools/index.ts
for easy importing
Workflows (src/workflows/
)
workflows/
└── analyst-workflow.ts # Multi-step workflow definition
Pattern: Workflows orchestrate multiple steps and agents:
- Use
createWorkflow()
from Mastra - Define input/output schemas with Zod
- Chain steps with
.parallel()
,.then()
,.branch()
patterns - Include runtime context interfaces for type safety
- Support conditional branching based on step outputs
Utils (src/utils/
)
utils/
├── convertToCoreMessages.ts
├── shared-memory.ts
├── memory/
│ ├── agent-memory.ts
│ ├── message-history.ts # Message passing between agents
│ ├── types.ts # Message/step data types
│ └── index.ts
└── models/
├── ai-fallback.ts # Fallback model wrapper with retry logic
├── anthropic.ts # Basic Anthropic model wrapper
├── anthropic-cached.ts # Anthropic with caching support
├── vertex.ts # Google Vertex AI model wrapper
├── sonnet-4.ts # Claude Sonnet 4 with fallback
└── haiku-3-5.ts # Claude Haiku 3.5 with fallback
Pattern: Utilities support core functionality:
- Memory: Handles message history between agents in multi-step workflows
- Models: Provides various AI model configurations with fallback support
- Message History: Critical for multi-agent workflows - extracts and formats messages for passing between agents
Model Configuration Pattern
The models folder provides different AI model configurations with automatic fallback support:
-
Base Model Wrappers (
anthropic.ts
,vertex.ts
):- Wrap AI SDK models with Braintrust tracing
- Handle authentication and configuration
- Provide consistent interface for model usage
-
Fallback Models (
sonnet-4.ts
,haiku-3-5.ts
):- Use
createFallback()
to define multiple model providers - Automatically switch between providers on errors
- Configure retry behavior and error handling
- Example: Sonnet4 tries Vertex first, falls back to Anthropic
- Use
-
Cached Model (
anthropic-cached.ts
):- Adds caching support to Anthropic models
- Automatically adds cache_control to system messages
- Includes connection pooling for better performance
- Used by agents requiring prompt caching
Usage Example:
// For general use with fallback support
import { Sonnet4, Haiku35 } from '@buster/ai';
// For agents with complex prompts needing caching
import { anthropicCachedModel } from '@buster/ai';
// Direct model usage (no fallback)
import { anthropicModel, vertexModel } from '@buster/ai';
Testing Strategy (tests/
)
Test Structure
tests/
├── agents/integration/ # End-to-end agent tests
├── steps/integration/ # Step execution tests
├── tools/
│ ├── integration/ # Tool + LLM integration tests
│ └── unit/ # Pure function/schema tests
├── workflows/integration/ # Full workflow tests
├── globalSetup.ts
└── testSetup.ts
Testing Philosophy
Unit Tests (tests/tools/unit/
):
- Test data structures, schemas, and logic flows
- Validate input/output schemas with Zod
- Test error handling and edge cases
- Mock external dependencies
- DO NOT test LLM quality/performance
- Focus on: "Does the function work correctly?"
Integration Tests (tests/*/integration/
):
- Test agents/tools/steps with real LLM calls
- Verify workflow execution and data flow
- Test that agents can use tools successfully
- Validate message passing between agents
- DO NOT evaluate response quality
- Focus on: "Does the system work end-to-end?"
Evaluation Strategy (evals/
)
Evaluation Structure
evals/
├── agents/
│ └── analyst-agent/
│ └── workflow-match.eval.ts
├── online-scorer/
│ └── todos.ts
├── steps/
│ └── todos/
│ ├── scorers.ts
│ └── todos-general-expected.eval.ts
└── workflows/
├── analyst-workflow-general.eval.ts
└── analyst-workflow-redo.eval.private.ts
Evaluation Philosophy
Evaluations (.eval.ts
files):
- Use Braintrust for LLM performance evaluation
- Test actual LLM response quality and correctness
- Use LLM-as-Judge patterns for scoring
- Include datasets for consistent evaluation
- Focus on: "Does the LLM produce good results?"
Key Distinction:
- Tests verify the system works (data flows, schemas, execution)
- Evaluations verify the LLM produces quality outputs
Multi-Agent Workflow Patterns
Example: Analyst Workflow
The analyst workflow demonstrates the multi-agent pattern:
- Parallel Initial Steps:
generateChatTitleStep
,extractValuesSearchStep
,createTodosStep
- Think and Prep Agent: Processes initial analysis
- Conditional Branching: Only runs analyst agent if needed
- Message History Passing: Critical for agent-to-agent communication
Message History Flow
// In think-and-prep-step.ts
conversationHistory = extractMessageHistory(step.response.messages);
// In analyst-step.ts
const formattedMessages = formatMessagesForAnalyst(
inputData.conversationHistory,
initialPrompt
);
Key Pattern: Message history from one agent becomes input to the next agent, preserving conversation context and tool usage.
Key Development Patterns
Agent Definition Pattern
export const agentName = new Agent({
name: 'Agent Name',
instructions: getInstructions,
model: Sonnet4, // Can use Sonnet4, Haiku35, or anthropicCachedModel('model-id')
tools: { tool1, tool2, tool3 },
memory: getSharedMemory(),
defaultGenerateOptions: DEFAULT_OPTIONS,
defaultStreamOptions: DEFAULT_OPTIONS,
});
Tool Definition Pattern
const inputSchema = z.object({
param: z.string().describe('Parameter description')
});
const outputSchema = z.object({
result: z.string()
});
const executeFunction = wrapTraced(
async (params) => {
// Tool logic here
},
{ name: 'tool-name' }
);
export const toolName = createTool({
id: 'tool-id',
description: 'Tool description for agent understanding',
inputSchema,
outputSchema,
execute: executeFunction,
});
Step Definition Pattern
const inputSchema = z.object({
// Input from previous steps
});
const outputSchema = z.object({
// Output for next steps
});
const execution = async ({ inputData, getInitData, runtimeContext }) => {
// Step logic with agent execution
// Extract message history for multi-agent workflows
// Return structured output
};
export const stepName = createStep({
id: 'step-id',
description: 'Step description',
inputSchema,
outputSchema,
execute: execution,
});
Workflow Definition Pattern
const workflow = createWorkflow({
id: 'workflow-id',
inputSchema,
outputSchema,
steps: [step1, step2, step3],
})
.parallel([step1, step2, step3])
.then(step4)
.branch([
[condition, step5],
])
.commit();
Message History
- Critical for multi-agent workflows
- Extracted via
extractMessageHistory()
from step responses - Formatted via
formatMessagesForAnalyst()
for agent consumption - Preserves tool calls and results between agents
Runtime Context
- Passes workflow-specific data between steps
- Type-safe with interfaces like
AnalystRuntimeContext
- Includes user/thread/organization identifiers
Best Practices
- Tool Organization: Group tools by functional category
- Schema Validation: Always use Zod schemas for input/output
- Observability: Wrap functions with
wrapTraced()
for monitoring - Message Passing: Use structured message history for multi-agent workflows
- Testing Strategy: Unit tests for logic, integration tests for flow, evaluations for quality
- Memory Management: Use shared memory for conversation persistence
- Error Handling: Graceful handling with user-friendly error messages
- Type Safety: Leverage TypeScript with strict configuration
Environment Variables
Required environment variables:
BRAINTRUST_KEY
: For observability and evaluationsANTHROPIC_API_KEY
: For Claude model access- Additional keys for specific tools (database connections, etc.)
Conversation History Management
Overview
The AI package supports multi-turn conversations by managing conversation history through the database. This enables workflows to maintain context across multiple interactions.
Key Components
Chat History Utilities (src/steps/get-chat-history.ts
)
Provides functions for retrieving conversation history:
// Get all messages with metadata for a chat
getChatHistory(chatId: string): Promise<ChatHistoryResult[]>
// Get just the raw LLM messages for a chat
getRawLlmMessages(chatId: string): Promise<MessageHistory[]>
// Get raw LLM messages for a specific message ID
getRawLlmMessagesByMessageId(messageId: string): Promise<MessageHistory | null>
Database Integration
The chat history utilities use the @buster/database
helpers for clean separation of concerns:
- Database operations stay in the database package
- Type validation and transformation happen in the AI package
Conversation History Flow
1. Initial Message with Database Save
// First run - with messageId for database persistence
const messageId = await createTestMessage(chatId, userId);
const runtimeContext = new RuntimeContext();
runtimeContext.set('messageId', messageId);
const result = await analystWorkflow.createRun().start({
inputData: { prompt: "Initial question" },
runtimeContext,
});
// Conversation history is automatically saved to database
2. Retrieving Conversation History
// Fetch the conversation history from the database
import { getRawLlmMessagesByMessageId } from '@buster/ai';
const conversationHistory = await getRawLlmMessagesByMessageId(messageId);
// Returns: CoreMessage[] or null
3. Follow-up with History
// Second run - with conversation history
const followUpResult = await analystWorkflow.createRun().start({
inputData: {
prompt: "Follow-up question",
conversationHistory: conversationHistory as CoreMessage[],
},
runtimeContext,
});
Testing Conversation History
See tests/workflows/integration/analyst-workflow.int.test.ts
for examples:
test('conversation history flow', async () => {
// 1. Create initial message
const { chatId, userId } = await createTestChat();
const messageId = await createTestMessage(chatId, userId);
// 2. Run workflow with messageId
const runtimeContext = new RuntimeContext();
runtimeContext.set('messageId', messageId);
const firstRun = await workflow.start({
inputData: { prompt: "First question" },
runtimeContext,
});
// 3. Retrieve conversation history
const history = await getRawLlmMessagesByMessageId(messageId);
// 4. Run follow-up with history
const secondRun = await workflow.start({
inputData: {
prompt: "Follow-up question",
conversationHistory: history as CoreMessage[],
},
runtimeContext,
});
});
Best Practices
- Use MessageId for Persistence: Always provide a
messageId
in runtime context when you want to save conversation history - Type Safety: Cast retrieved history to
CoreMessage[]
after validation - Handle Null Cases: Check if history exists before using it
- Test Both Paths: Test workflows both with and without conversation history