buster/packages/ai/CLAUDE.md

493 lines
15 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code when working with code in this repository.
## Development Commands
```bash
# Development
bun run dev # Run Mastra dev server (mastra dev --dir src)
# Testing
bun run test # Run all tests (vitest run)
bun run test:watch # Run tests in watch mode (vitest watch)
bun run test:coverage # Run tests with coverage (vitest run --coverage)
# Testing specific files
bun test src/agents/weather-agent.test.ts
# Run evaluation tests
npm run eval # Run all evaluations with Braintrust
npm run eval:file weather-agent.eval.ts # Run specific eval file
npm run eval:watch # Run evaluations in watch mode
npm run eval:dev # Run evaluations in dev mode
# From root directory
bun run lint packages/ai # Run Biome linter
bun run lint:fix packages/ai # Fix linting issues
bun run format packages/ai # Check formatting
bun run format:fix packages/ai # Fix formatting
bun run typecheck packages/ai # Run TypeScript type checking
```
## Architecture Overview
This package implements AI agents and tools using the Mastra framework, integrated with observability through Braintrust. The codebase follows a modular pattern designed for building complex multi-agent workflows.
## Folder Structure & Patterns
### Source Code (`src/`)
#### **Agents** (`src/agents/`)
```
agents/
├── analyst-agent/
│ ├── analyst-agent.ts # Agent definition
│ └── analyst-agent-instructions.ts # Instructions/prompts
└── think-and-prep-agent/
├── think-and-prep-agent.ts
└── think-and-prep-instructions.ts
```
**Pattern**: Each agent gets its own folder with:
- Main agent file (defines tools, model, memory, options)
- Instructions file (contains system prompts and behavior definitions)
- Uses `Agent` from Mastra with `anthropicCachedModel`
- Shared memory via `getSharedMemory()`
- Standard options: `maxSteps: 18, temperature: 0, maxTokens: 10000`
#### **Steps** (`src/steps/`)
```
steps/
├── analyst-step.ts
├── create-todos-step.ts
├── extract-values-search-step.ts
├── generate-chat-title-step.ts
├── get-chat-history.ts
└── think-and-prep-step.ts
```
**Pattern**: Steps orchestrate agent execution within workflows:
- Use `createStep()` from Mastra
- Define input/output schemas with Zod
- Execute agents with proper context passing
- Handle message history extraction and formatting
- Wrap execution with `wrapTraced()` for observability
- Pass data between steps through structured schemas
#### **Tools** (`src/tools/`)
```
tools/
├── communication-tools/ # Agent-to-agent communication
│ ├── done-tool.ts
│ ├── respond-without-asset-creation.ts
│ └── submit-thoughts-tool.ts
├── database-tools/ # Data access
│ └── find-required-text-values.ts
├── file-tools/ # File operations
│ ├── bash-tool.ts
│ ├── edit-file-tool.ts
│ ├── read-file-tool.ts
│ └── write-file-tool.ts
├── planning-thinking-tools/ # Strategic planning
│ ├── create-plan-investigative-tool.ts
│ ├── create-plan-straightforward-tool.ts
│ ├── review-plan-tool.ts
│ └── sequential-thinking-tool.ts
├── visualization-tools/ # Dashboard/metrics creation
│ ├── create-dashboards-file-tool.ts
│ ├── create-metrics-file-tool.ts
│ ├── modify-dashboards-file-tool.ts
│ └── modify-metrics-file-tool.ts
└── index.ts # Tool exports
```
**Pattern**: Tools are categorized by function:
- Use `createTool()` from Mastra
- Define input/output schemas with Zod
- Wrap main execution with `wrapTraced()` for observability
- Include detailed descriptions for agent understanding
- Export via `tools/index.ts` for easy importing
#### **Workflows** (`src/workflows/`)
```
workflows/
└── analyst-workflow.ts # Multi-step workflow definition
```
**Pattern**: Workflows orchestrate multiple steps and agents:
- Use `createWorkflow()` from Mastra
- Define input/output schemas with Zod
- Chain steps with `.parallel()`, `.then()`, `.branch()` patterns
- Include runtime context interfaces for type safety
- Support conditional branching based on step outputs
#### **Utils** (`src/utils/`)
```
utils/
├── convertToCoreMessages.ts
├── shared-memory.ts
├── memory/
│ ├── agent-memory.ts
│ ├── message-history.ts # Message passing between agents
│ ├── types.ts # Message/step data types
│ └── index.ts
└── models/
├── ai-fallback.ts # Fallback model wrapper with retry logic
├── anthropic.ts # Basic Anthropic model wrapper
├── anthropic-cached.ts # Anthropic with caching support
├── vertex.ts # Google Vertex AI model wrapper
├── sonnet-4.ts # Claude Sonnet 4 with fallback
└── haiku-3-5.ts # Claude Haiku 3.5 with fallback
```
**Pattern**: Utilities support core functionality:
- **Memory**: Handles message history between agents in multi-step workflows
- **Models**: Provides various AI model configurations with fallback support
- **Message History**: Critical for multi-agent workflows - extracts and formats messages for passing between agents
##### Model Configuration Pattern
The models folder provides different AI model configurations with automatic fallback support:
1. **Base Model Wrappers** (`anthropic.ts`, `vertex.ts`):
- Wrap AI SDK models with Braintrust tracing
- Handle authentication and configuration
- Provide consistent interface for model usage
2. **Fallback Models** (`sonnet-4.ts`, `haiku-3-5.ts`):
- Use `createFallback()` to define multiple model providers
- Automatically switch between providers on errors
- Configure retry behavior and error handling
- Example: Sonnet4 tries Vertex first, falls back to Anthropic
3. **Cached Model** (`anthropic-cached.ts`):
- Adds caching support to Anthropic models
- Automatically adds cache_control to system messages
- Includes connection pooling for better performance
- Used by agents requiring prompt caching
**Usage Example**:
```typescript
// For general use with fallback support
import { Sonnet4, Haiku35 } from '@buster/ai';
// For agents with complex prompts needing caching
import { anthropicCachedModel } from '@buster/ai';
// Direct model usage (no fallback)
import { anthropicModel, vertexModel } from '@buster/ai';
```
### Testing Strategy (`tests/`)
#### **Test Structure**
```
tests/
├── agents/integration/ # End-to-end agent tests
├── steps/integration/ # Step execution tests
├── tools/
│ ├── integration/ # Tool + LLM integration tests
│ └── unit/ # Pure function/schema tests
├── workflows/integration/ # Full workflow tests
├── globalSetup.ts
└── testSetup.ts
```
#### **Testing Philosophy**
**Unit Tests** (`tests/tools/unit/`):
- Test data structures, schemas, and logic flows
- Validate input/output schemas with Zod
- Test error handling and edge cases
- Mock external dependencies
- **DO NOT** test LLM quality/performance
- Focus on: "Does the function work correctly?"
**Integration Tests** (`tests/*/integration/`):
- Test agents/tools/steps with real LLM calls
- Verify workflow execution and data flow
- Test that agents can use tools successfully
- Validate message passing between agents
- **DO NOT** evaluate response quality
- Focus on: "Does the system work end-to-end?"
### Evaluation Strategy (`evals/`)
#### **Evaluation Structure**
```
evals/
├── agents/
│ └── analyst-agent/
│ └── workflow-match.eval.ts
├── online-scorer/
│ └── todos.ts
├── steps/
│ └── todos/
│ ├── scorers.ts
│ └── todos-general-expected.eval.ts
└── workflows/
├── analyst-workflow-general.eval.ts
└── analyst-workflow-redo.eval.private.ts
```
#### **Evaluation Philosophy**
**Evaluations** (`.eval.ts` files):
- Use Braintrust for LLM performance evaluation
- Test actual LLM response quality and correctness
- Use LLM-as-Judge patterns for scoring
- Include datasets for consistent evaluation
- Focus on: "Does the LLM produce good results?"
**Key Distinction**:
- **Tests** verify the system works (data flows, schemas, execution)
- **Evaluations** verify the LLM produces quality outputs
## Multi-Agent Workflow Patterns
### Example: Analyst Workflow
The analyst workflow demonstrates the multi-agent pattern:
1. **Parallel Initial Steps**: `generateChatTitleStep`, `extractValuesSearchStep`, `createTodosStep`
2. **Think and Prep Agent**: Processes initial analysis
3. **Conditional Branching**: Only runs analyst agent if needed
4. **Message History Passing**: Critical for agent-to-agent communication
#### Message History Flow
```typescript
// In think-and-prep-step.ts
conversationHistory = extractMessageHistory(step.response.messages);
// In analyst-step.ts
const formattedMessages = formatMessagesForAnalyst(
inputData.conversationHistory,
initialPrompt
);
```
**Key Pattern**: Message history from one agent becomes input to the next agent, preserving conversation context and tool usage.
## Key Development Patterns
### Agent Definition Pattern
```typescript
export const agentName = new Agent({
name: 'Agent Name',
instructions: getInstructions,
model: Sonnet4, // Can use Sonnet4, Haiku35, or anthropicCachedModel('model-id')
tools: { tool1, tool2, tool3 },
memory: getSharedMemory(),
defaultGenerateOptions: DEFAULT_OPTIONS,
defaultStreamOptions: DEFAULT_OPTIONS,
});
```
### Tool Definition Pattern
```typescript
const inputSchema = z.object({
param: z.string().describe('Parameter description')
});
const outputSchema = z.object({
result: z.string()
});
const executeFunction = wrapTraced(
async (params) => {
// Tool logic here
},
{ name: 'tool-name' }
);
export const toolName = createTool({
id: 'tool-id',
description: 'Tool description for agent understanding',
inputSchema,
outputSchema,
execute: executeFunction,
});
```
### Step Definition Pattern
```typescript
const inputSchema = z.object({
// Input from previous steps
});
const outputSchema = z.object({
// Output for next steps
});
const execution = async ({ inputData, getInitData, runtimeContext }) => {
// Step logic with agent execution
// Extract message history for multi-agent workflows
// Return structured output
};
export const stepName = createStep({
id: 'step-id',
description: 'Step description',
inputSchema,
outputSchema,
execute: execution,
});
```
### Workflow Definition Pattern
```typescript
const workflow = createWorkflow({
id: 'workflow-id',
inputSchema,
outputSchema,
steps: [step1, step2, step3],
})
.parallel([step1, step2, step3])
.then(step4)
.branch([
[condition, step5],
])
.commit();
```
### Message History
- Critical for multi-agent workflows
- Extracted via `extractMessageHistory()` from step responses
- Formatted via `formatMessagesForAnalyst()` for agent consumption
- Preserves tool calls and results between agents
### Runtime Context
- Passes workflow-specific data between steps
- Type-safe with interfaces like `AnalystRuntimeContext`
- Includes user/thread/organization identifiers
## Best Practices
1. **Tool Organization**: Group tools by functional category
2. **Schema Validation**: Always use Zod schemas for input/output
3. **Observability**: Wrap functions with `wrapTraced()` for monitoring
4. **Message Passing**: Use structured message history for multi-agent workflows
5. **Testing Strategy**: Unit tests for logic, integration tests for flow, evaluations for quality
6. **Memory Management**: Use shared memory for conversation persistence
7. **Error Handling**: Graceful handling with user-friendly error messages
8. **Type Safety**: Leverage TypeScript with strict configuration
## Environment Variables
Required environment variables:
- `BRAINTRUST_KEY`: For observability and evaluations
- `ANTHROPIC_API_KEY`: For Claude model access
- Additional keys for specific tools (database connections, etc.)
## Conversation History Management
### Overview
The AI package supports multi-turn conversations by managing conversation history through the database. This enables workflows to maintain context across multiple interactions.
### Key Components
#### Chat History Utilities (`src/steps/get-chat-history.ts`)
Provides functions for retrieving conversation history:
```typescript
// Get all messages with metadata for a chat
getChatHistory(chatId: string): Promise<ChatHistoryResult[]>
// Get just the raw LLM messages for a chat
getRawLlmMessages(chatId: string): Promise<MessageHistory[]>
// Get raw LLM messages for a specific message ID
getRawLlmMessagesByMessageId(messageId: string): Promise<MessageHistory | null>
```
#### Database Integration
The chat history utilities use the `@buster/database` helpers for clean separation of concerns:
- Database operations stay in the database package
- Type validation and transformation happen in the AI package
### Conversation History Flow
#### 1. Initial Message with Database Save
```typescript
// First run - with messageId for database persistence
const messageId = await createTestMessage(chatId, userId);
const runtimeContext = new RuntimeContext();
runtimeContext.set('messageId', messageId);
const result = await analystWorkflow.createRun().start({
inputData: { prompt: "Initial question" },
runtimeContext,
});
// Conversation history is automatically saved to database
```
#### 2. Retrieving Conversation History
```typescript
// Fetch the conversation history from the database
import { getRawLlmMessagesByMessageId } from '@buster/ai';
const conversationHistory = await getRawLlmMessagesByMessageId(messageId);
// Returns: CoreMessage[] or null
```
#### 3. Follow-up with History
```typescript
// Second run - with conversation history
const followUpResult = await analystWorkflow.createRun().start({
inputData: {
prompt: "Follow-up question",
conversationHistory: conversationHistory as CoreMessage[],
},
runtimeContext,
});
```
### Testing Conversation History
See `tests/workflows/integration/analyst-workflow.int.test.ts` for examples:
```typescript
test('conversation history flow', async () => {
// 1. Create initial message
const { chatId, userId } = await createTestChat();
const messageId = await createTestMessage(chatId, userId);
// 2. Run workflow with messageId
const runtimeContext = new RuntimeContext();
runtimeContext.set('messageId', messageId);
const firstRun = await workflow.start({
inputData: { prompt: "First question" },
runtimeContext,
});
// 3. Retrieve conversation history
const history = await getRawLlmMessagesByMessageId(messageId);
// 4. Run follow-up with history
const secondRun = await workflow.start({
inputData: {
prompt: "Follow-up question",
conversationHistory: history as CoreMessage[],
},
runtimeContext,
});
});
```
### Best Practices
1. **Use MessageId for Persistence**: Always provide a `messageId` in runtime context when you want to save conversation history
2. **Type Safety**: Cast retrieved history to `CoreMessage[]` after validation
3. **Handle Null Cases**: Check if history exists before using it
4. **Test Both Paths**: Test workflows both with and without conversation history