mirror of https://github.com/buster-so/buster.git
- Modified the SSL configuration in both the PostgreSQL adapter and its tests to use { rejectUnauthorized: false } instead of a boolean true value. - Ensured consistency in handling SSL settings across the adapter and its tests. |
||
---|---|---|
.. | ||
scripts | ||
src | ||
.env.example | ||
.gitignore | ||
CLAUDE.md | ||
README.md | ||
biome.json | ||
env.d.ts | ||
package.json | ||
tsconfig.json | ||
turbo.json | ||
vitest.config.ts |
README.md
@buster/data-source
A TypeScript library for connecting to, querying, and introspecting multiple data source types including Snowflake, BigQuery, PostgreSQL, MySQL, SQL Server, Redshift, and Databricks.
Features
- Multi-Database Support: Connect to 7+ different database types with a unified interface
- Query Routing: Route queries to specific data sources or use intelligent defaults
- Database Introspection: Discover database structure, tables, columns, and statistics
- Type Safety: Full TypeScript support with comprehensive type definitions
- Connection Management: Automatic connection pooling and lifecycle management
- Error Handling: Graceful error handling with detailed error information
Supported Data Sources
- Snowflake - Full introspection support with clustering information
- PostgreSQL - Full introspection support
- MySQL - Full introspection support
- BigQuery - Basic support (introspection placeholder)
- SQL Server - Basic support (introspection placeholder)
- Redshift - Basic support (introspection placeholder)
- Databricks - Basic support (introspection placeholder)
Installation
npm install @buster/data-source
Quick Start
Basic Usage
import { DataSource, DataSourceType } from '@buster/data-source';
// Configure your data sources
const dataSource = new DataSource({
dataSources: [
{
name: 'snowflake-prod',
type: DataSourceType.Snowflake,
credentials: {
type: DataSourceType.Snowflake,
account_id: 'your-account',
username: 'your-username',
password: 'your-password',
warehouse_id: 'your-warehouse',
default_database: 'your-database',
},
},
{
name: 'postgres-dev',
type: DataSourceType.PostgreSQL,
credentials: {
type: DataSourceType.PostgreSQL,
host: 'localhost',
port: 5432,
database: 'dev_db',
username: 'dev_user',
password: 'dev_password',
},
},
],
defaultDataSource: 'snowflake-prod',
});
// Execute queries
const result = await dataSource.execute({
sql: 'SELECT * FROM users LIMIT 10',
warehouse: 'snowflake-prod', // Optional: specify data source
});
console.log(result.rows);
Database Introspection
// Get all databases
const databases = await dataSource.getDatabases('snowflake-prod');
console.log('Databases:', databases.map(db => db.name));
// Get schemas in a database
const schemas = await dataSource.getSchemas('snowflake-prod', 'ANALYTICS_DB');
console.log('Schemas:', schemas.map(s => s.name));
// Get tables in a schema
const tables = await dataSource.getTables('snowflake-prod', 'ANALYTICS_DB', 'PUBLIC');
console.log('Tables:', tables.map(t => ({ name: t.name, type: t.type, rows: t.rowCount })));
// Get columns in a table
const columns = await dataSource.getColumns('snowflake-prod', 'ANALYTICS_DB', 'PUBLIC', 'USERS');
console.log('Columns:', columns.map(c => ({ name: c.name, type: c.dataType, nullable: c.isNullable })));
// Get table statistics (Snowflake)
const stats = await dataSource.getTableStatistics('ANALYTICS_DB', 'PUBLIC', 'USERS', 'snowflake-prod');
console.log('Table stats:', {
rowCount: stats.rowCount,
sizeBytes: stats.sizeBytes,
columnStats: stats.columnStatistics.length,
});
// Get comprehensive introspection
const fullIntrospection = await dataSource.getFullIntrospection('snowflake-prod');
console.log('Full catalog:', {
databases: fullIntrospection.databases.length,
schemas: fullIntrospection.schemas.length,
tables: fullIntrospection.tables.length,
columns: fullIntrospection.columns.length,
});
Advanced Usage
// Direct introspector access
const introspector = await dataSource.introspect('snowflake-prod');
const databases = await introspector.getDatabases();
// Add data sources dynamically
await dataSource.addDataSource({
name: 'mysql-analytics',
type: DataSourceType.MySQL,
credentials: {
type: DataSourceType.MySQL,
host: 'mysql.example.com',
database: 'analytics',
username: 'analyst',
password: 'secret',
},
});
// Test connections
const connectionStatus = await dataSource.testAllDataSources();
console.log('Connection status:', connectionStatus);
// Clean up
await dataSource.close();
Configuration
Data Source Configuration
interface DataSourceConfig {
name: string; // Unique identifier
type: DataSourceType; // Database type
credentials: Credentials; // Type-specific credentials
config?: Record<string, unknown>; // Additional options
}
Snowflake Credentials
interface SnowflakeCredentials {
type: DataSourceType.Snowflake;
account_id: string; // Account identifier
warehouse_id: string; // Warehouse for compute
username: string;
password: string;
role?: string; // Optional role
default_database: string;
default_schema?: string;
}
PostgreSQL Credentials
interface PostgreSQLCredentials {
type: DataSourceType.PostgreSQL;
host: string;
port?: number; // Default: 5432
database: string;
username: string;
password: string;
schema?: string; // Default schema
ssl?: boolean | SSLConfig; // SSL configuration
connection_timeout?: number; // Connection timeout in ms
}
Introspection Types
Database Structure
interface Database {
name: string;
owner?: string;
comment?: string;
created?: Date;
lastModified?: Date;
metadata?: Record<string, unknown>;
}
interface Schema {
name: string;
database: string;
owner?: string;
comment?: string;
created?: Date;
lastModified?: Date;
}
interface Table {
name: string;
schema: string;
database: string;
type: 'TABLE' | 'VIEW' | 'MATERIALIZED_VIEW' | 'EXTERNAL_TABLE' | 'TEMPORARY_TABLE';
rowCount?: number;
sizeBytes?: number;
comment?: string;
created?: Date;
lastModified?: Date;
clusteringKeys?: string[]; // Snowflake clustering keys
}
interface Column {
name: string;
table: string;
schema: string;
database: string;
position: number;
dataType: string;
isNullable: boolean;
defaultValue?: string;
maxLength?: number;
precision?: number;
scale?: number;
comment?: string;
isPrimaryKey?: boolean;
isForeignKey?: boolean;
}
Statistics
interface TableStatistics {
table: string;
schema: string;
database: string;
rowCount?: number;
sizeBytes?: number;
columnStatistics: ColumnStatistics[];
clusteringInfo?: ClusteringInfo; // Snowflake-specific
lastUpdated?: Date;
}
interface ColumnStatistics {
columnName: string;
distinctCount?: number;
nullCount?: number;
minValue?: unknown;
maxValue?: unknown;
avgValue?: number;
topValues?: Array<{ value: unknown; frequency: number }>;
}
Error Handling
// Query results include success status and error details
const result = await dataSource.execute({
sql: 'SELECT * FROM non_existent_table',
});
if (!result.success) {
console.error('Query failed:', result.error?.message);
console.error('Error code:', result.error?.code);
}
Backward Compatibility
The package maintains backward compatibility with the previous QueryRouter
class:
import { QueryRouter } from '@buster/data-source';
// This still works
const router = new QueryRouter({ dataSources: [...] });
Examples
See the examples directory for comprehensive usage examples:
- Basic Usage - Simple query execution
- Introspection - Database discovery and cataloging
- Advanced Routing - Multi-database scenarios
Development
Running Tests
# Run all tests
npm test
# Run specific test suite
npm test -- tests/integration/adapters/snowflake.test.ts
# Type checking
npm run typecheck
Environment Variables for Testing
# PostgreSQL
TEST_POSTGRES_HOST=localhost
TEST_POSTGRES_PORT=5432
TEST_POSTGRES_DATABASE=test_db
TEST_POSTGRES_USERNAME=test_user
TEST_POSTGRES_PASSWORD=test_password
# Snowflake
TEST_SNOWFLAKE_ACCOUNT_ID=your_account
TEST_SNOWFLAKE_USERNAME=your_username
TEST_SNOWFLAKE_PASSWORD=your_password
TEST_SNOWFLAKE_WAREHOUSE_ID=your_warehouse
TEST_SNOWFLAKE_DATABASE=your_database
# MySQL
TEST_MYSQL_HOST=localhost
TEST_MYSQL_PORT=3306
TEST_MYSQL_DATABASE=test_db
TEST_MYSQL_USERNAME=test_user
TEST_MYSQL_PASSWORD=test_password
# BigQuery
TEST_BIGQUERY_PROJECT_ID=your_project
TEST_BIGQUERY_SERVICE_ACCOUNT_KEY=path/to/key.json
Architecture
@buster/data-source
├── src/
│ ├── adapters/ # Database-specific adapters
│ │ ├── base.ts # Base adapter interface
│ │ ├── snowflake.ts # Snowflake implementation
│ │ ├── postgresql.ts # PostgreSQL implementation
│ │ └── ...
│ ├── introspection/ # Database introspection
│ │ ├── base.ts # Base introspector interface
│ │ ├── snowflake.ts # Snowflake introspection
│ │ └── ...
│ ├── types/ # Type definitions
│ │ ├── credentials.ts # Credential interfaces
│ │ ├── query.ts # Query types
│ │ └── introspection.ts # Introspection types
│ ├── data-source.ts # Main DataSource class
│ └── index.ts # Public API exports
├── tests/ # Test suites
└── examples/ # Usage examples
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
MIT License - see LICENSE for details.
Scoped Full Introspection
You can now scope full introspection to specific databases, schemas, or tables:
// Get introspection for specific databases
const result = await dataSource.getFullIntrospection('myDataSource', {
databases: ['sales_db', 'analytics_db']
});
// Get introspection for specific schemas
const result = await dataSource.getFullIntrospection('myDataSource', {
schemas: ['public', 'reporting']
});
// Get introspection for specific tables
const result = await dataSource.getFullIntrospection('myDataSource', {
tables: ['customers', 'orders', 'products']
});
// Combine filters - get specific tables from specific schemas
const result = await dataSource.getFullIntrospection('myDataSource', {
schemas: ['public'],
tables: ['customers', 'orders']
});
The scoping works hierarchically:
- If
databases
is specified, only schemas, tables, columns, and views from those databases are included - If
schemas
is specified, only tables, columns, and views from those schemas are included - If
tables
is specified, only those specific tables and their columns are included - Filters can be combined for more precise scoping
This is particularly useful for large data sources where you only need to introspect a subset of the available objects.