buster/packages/data-source
dal 77ffeab37a
Fix Snowflake adapter tests for lowercase column names
The Snowflake adapter implementation transforms column names to lowercase
for consistency, but the tests were expecting uppercase column names.
This commit updates the tests to match the implementation:

- Update test expectations to use lowercase column names (id, name)
- Fix hasMoreRows assertions to match implementation logic (only true when rowCount > limit)
- Ensure all Snowflake-related tests pass with the current adapter behavior

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-23 10:09:51 -06:00
..
scripts Use tsx and .ts files for validation 2025-07-21 16:07:14 -06:00
src Fix Snowflake adapter tests for lowercase column names 2025-07-23 10:09:51 -06:00
.env.example Mastra braintrust (#391) 2025-07-02 14:33:40 -07:00
.gitignore Mastra braintrust (#391) 2025-07-02 14:33:40 -07:00
CLAUDE.md Mastra braintrust (#391) 2025-07-02 14:33:40 -07:00
README.md Mastra braintrust (#391) 2025-07-02 14:33:40 -07:00
biome.json Mastra braintrust (#391) 2025-07-02 14:33:40 -07:00
env.d.ts Mastra braintrust (#391) 2025-07-02 14:33:40 -07:00
package.json Use tsx and .ts files for validation 2025-07-21 16:07:14 -06:00
tsconfig.json fix all of the bugs 2025-07-12 22:14:08 -06:00
turbo.json Mastra braintrust (#391) 2025-07-02 14:33:40 -07:00
vitest.config.ts Mastra braintrust (#391) 2025-07-02 14:33:40 -07:00

README.md

@buster/data-source

A TypeScript library for connecting to, querying, and introspecting multiple data source types including Snowflake, BigQuery, PostgreSQL, MySQL, SQL Server, Redshift, and Databricks.

Features

  • Multi-Database Support: Connect to 7+ different database types with a unified interface
  • Query Routing: Route queries to specific data sources or use intelligent defaults
  • Database Introspection: Discover database structure, tables, columns, and statistics
  • Type Safety: Full TypeScript support with comprehensive type definitions
  • Connection Management: Automatic connection pooling and lifecycle management
  • Error Handling: Graceful error handling with detailed error information

Supported Data Sources

  • Snowflake - Full introspection support with clustering information
  • PostgreSQL - Full introspection support
  • MySQL - Full introspection support
  • BigQuery - Basic support (introspection placeholder)
  • SQL Server - Basic support (introspection placeholder)
  • Redshift - Basic support (introspection placeholder)
  • Databricks - Basic support (introspection placeholder)

Installation

npm install @buster/data-source

Quick Start

Basic Usage

import { DataSource, DataSourceType } from '@buster/data-source';

// Configure your data sources
const dataSource = new DataSource({
  dataSources: [
    {
      name: 'snowflake-prod',
      type: DataSourceType.Snowflake,
      credentials: {
        type: DataSourceType.Snowflake,
        account_id: 'your-account',
        username: 'your-username',
        password: 'your-password',
        warehouse_id: 'your-warehouse',
        default_database: 'your-database',
      },
    },
    {
      name: 'postgres-dev',
      type: DataSourceType.PostgreSQL,
      credentials: {
        type: DataSourceType.PostgreSQL,
        host: 'localhost',
        port: 5432,
        database: 'dev_db',
        username: 'dev_user',
        password: 'dev_password',
      },
    },
  ],
  defaultDataSource: 'snowflake-prod',
});

// Execute queries
const result = await dataSource.execute({
  sql: 'SELECT * FROM users LIMIT 10',
  warehouse: 'snowflake-prod', // Optional: specify data source
});

console.log(result.rows);

Database Introspection

// Get all databases
const databases = await dataSource.getDatabases('snowflake-prod');
console.log('Databases:', databases.map(db => db.name));

// Get schemas in a database
const schemas = await dataSource.getSchemas('snowflake-prod', 'ANALYTICS_DB');
console.log('Schemas:', schemas.map(s => s.name));

// Get tables in a schema
const tables = await dataSource.getTables('snowflake-prod', 'ANALYTICS_DB', 'PUBLIC');
console.log('Tables:', tables.map(t => ({ name: t.name, type: t.type, rows: t.rowCount })));

// Get columns in a table
const columns = await dataSource.getColumns('snowflake-prod', 'ANALYTICS_DB', 'PUBLIC', 'USERS');
console.log('Columns:', columns.map(c => ({ name: c.name, type: c.dataType, nullable: c.isNullable })));

// Get table statistics (Snowflake)
const stats = await dataSource.getTableStatistics('ANALYTICS_DB', 'PUBLIC', 'USERS', 'snowflake-prod');
console.log('Table stats:', {
  rowCount: stats.rowCount,
  sizeBytes: stats.sizeBytes,
  columnStats: stats.columnStatistics.length,
});

// Get comprehensive introspection
const fullIntrospection = await dataSource.getFullIntrospection('snowflake-prod');
console.log('Full catalog:', {
  databases: fullIntrospection.databases.length,
  schemas: fullIntrospection.schemas.length,
  tables: fullIntrospection.tables.length,
  columns: fullIntrospection.columns.length,
});

Advanced Usage

// Direct introspector access
const introspector = await dataSource.introspect('snowflake-prod');
const databases = await introspector.getDatabases();

// Add data sources dynamically
await dataSource.addDataSource({
  name: 'mysql-analytics',
  type: DataSourceType.MySQL,
  credentials: {
    type: DataSourceType.MySQL,
    host: 'mysql.example.com',
    database: 'analytics',
    username: 'analyst',
    password: 'secret',
  },
});

// Test connections
const connectionStatus = await dataSource.testAllDataSources();
console.log('Connection status:', connectionStatus);

// Clean up
await dataSource.close();

Configuration

Data Source Configuration

interface DataSourceConfig {
  name: string;                    // Unique identifier
  type: DataSourceType;           // Database type
  credentials: Credentials;       // Type-specific credentials
  config?: Record<string, unknown>; // Additional options
}

Snowflake Credentials

interface SnowflakeCredentials {
  type: DataSourceType.Snowflake;
  account_id: string;             // Account identifier
  warehouse_id: string;           // Warehouse for compute
  username: string;
  password: string;
  role?: string;                  // Optional role
  default_database: string;
  default_schema?: string;
}

PostgreSQL Credentials

interface PostgreSQLCredentials {
  type: DataSourceType.PostgreSQL;
  host: string;
  port?: number;                  // Default: 5432
  database: string;
  username: string;
  password: string;
  schema?: string;                // Default schema
  ssl?: boolean | SSLConfig;      // SSL configuration
  connection_timeout?: number;    // Connection timeout in ms
}

Introspection Types

Database Structure

interface Database {
  name: string;
  owner?: string;
  comment?: string;
  created?: Date;
  lastModified?: Date;
  metadata?: Record<string, unknown>;
}

interface Schema {
  name: string;
  database: string;
  owner?: string;
  comment?: string;
  created?: Date;
  lastModified?: Date;
}

interface Table {
  name: string;
  schema: string;
  database: string;
  type: 'TABLE' | 'VIEW' | 'MATERIALIZED_VIEW' | 'EXTERNAL_TABLE' | 'TEMPORARY_TABLE';
  rowCount?: number;
  sizeBytes?: number;
  comment?: string;
  created?: Date;
  lastModified?: Date;
  clusteringKeys?: string[];      // Snowflake clustering keys
}

interface Column {
  name: string;
  table: string;
  schema: string;
  database: string;
  position: number;
  dataType: string;
  isNullable: boolean;
  defaultValue?: string;
  maxLength?: number;
  precision?: number;
  scale?: number;
  comment?: string;
  isPrimaryKey?: boolean;
  isForeignKey?: boolean;
}

Statistics

interface TableStatistics {
  table: string;
  schema: string;
  database: string;
  rowCount?: number;
  sizeBytes?: number;
  columnStatistics: ColumnStatistics[];
  clusteringInfo?: ClusteringInfo;  // Snowflake-specific
  lastUpdated?: Date;
}

interface ColumnStatistics {
  columnName: string;
  distinctCount?: number;
  nullCount?: number;
  minValue?: unknown;
  maxValue?: unknown;
  avgValue?: number;
  topValues?: Array<{ value: unknown; frequency: number }>;
}

Error Handling

// Query results include success status and error details
const result = await dataSource.execute({
  sql: 'SELECT * FROM non_existent_table',
});

if (!result.success) {
  console.error('Query failed:', result.error?.message);
  console.error('Error code:', result.error?.code);
}

Backward Compatibility

The package maintains backward compatibility with the previous QueryRouter class:

import { QueryRouter } from '@buster/data-source';

// This still works
const router = new QueryRouter({ dataSources: [...] });

Examples

See the examples directory for comprehensive usage examples:

Development

Running Tests

# Run all tests
npm test

# Run specific test suite
npm test -- tests/integration/adapters/snowflake.test.ts

# Type checking
npm run typecheck

Environment Variables for Testing

# PostgreSQL
TEST_POSTGRES_HOST=localhost
TEST_POSTGRES_PORT=5432
TEST_POSTGRES_DATABASE=test_db
TEST_POSTGRES_USERNAME=test_user
TEST_POSTGRES_PASSWORD=test_password

# Snowflake
TEST_SNOWFLAKE_ACCOUNT_ID=your_account
TEST_SNOWFLAKE_USERNAME=your_username
TEST_SNOWFLAKE_PASSWORD=your_password
TEST_SNOWFLAKE_WAREHOUSE_ID=your_warehouse
TEST_SNOWFLAKE_DATABASE=your_database

# MySQL
TEST_MYSQL_HOST=localhost
TEST_MYSQL_PORT=3306
TEST_MYSQL_DATABASE=test_db
TEST_MYSQL_USERNAME=test_user
TEST_MYSQL_PASSWORD=test_password

# BigQuery
TEST_BIGQUERY_PROJECT_ID=your_project
TEST_BIGQUERY_SERVICE_ACCOUNT_KEY=path/to/key.json

Architecture

@buster/data-source
├── src/
│   ├── adapters/           # Database-specific adapters
│   │   ├── base.ts        # Base adapter interface
│   │   ├── snowflake.ts   # Snowflake implementation
│   │   ├── postgresql.ts  # PostgreSQL implementation
│   │   └── ...
│   ├── introspection/     # Database introspection
│   │   ├── base.ts        # Base introspector interface
│   │   ├── snowflake.ts   # Snowflake introspection
│   │   └── ...
│   ├── types/             # Type definitions
│   │   ├── credentials.ts # Credential interfaces
│   │   ├── query.ts       # Query types
│   │   └── introspection.ts # Introspection types
│   ├── data-source.ts     # Main DataSource class
│   └── index.ts           # Public API exports
├── tests/                 # Test suites
└── examples/              # Usage examples

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE for details.

Scoped Full Introspection

You can now scope full introspection to specific databases, schemas, or tables:

// Get introspection for specific databases
const result = await dataSource.getFullIntrospection('myDataSource', {
  databases: ['sales_db', 'analytics_db']
});

// Get introspection for specific schemas
const result = await dataSource.getFullIntrospection('myDataSource', {
  schemas: ['public', 'reporting']
});

// Get introspection for specific tables
const result = await dataSource.getFullIntrospection('myDataSource', {
  tables: ['customers', 'orders', 'products']
});

// Combine filters - get specific tables from specific schemas
const result = await dataSource.getFullIntrospection('myDataSource', {
  schemas: ['public'],
  tables: ['customers', 'orders']
});

The scoping works hierarchically:

  • If databases is specified, only schemas, tables, columns, and views from those databases are included
  • If schemas is specified, only tables, columns, and views from those schemas are included
  • If tables is specified, only those specific tables and their columns are included
  • Filters can be combined for more precise scoping

This is particularly useful for large data sources where you only need to introspect a subset of the available objects.