mirror of https://github.com/buster-so/buster.git
81 lines
3.0 KiB
Markdown
81 lines
3.0 KiB
Markdown
|
# SQL Analyzer Library
|
||
|
|
||
|
## Purpose
|
||
|
The SQL Analyzer library provides functionality to parse and analyze SQL queries, extracting tables, columns, joins, and CTEs with lineage tracing. It's designed for integration with a Tokio-based web server.
|
||
|
|
||
|
## Key Features
|
||
|
- Extracts tables with database/schema identifiers and aliases
|
||
|
- Links columns to their tables, deduplicating per table
|
||
|
- Identifies joins with lineage to base tables
|
||
|
- Recursively analyzes CTEs, tracking their lineage
|
||
|
- Flags vague references (unqualified columns or tables without schema)
|
||
|
- Provides non-blocking parsing for web server compatibility
|
||
|
|
||
|
## Internal Organization
|
||
|
|
||
|
### Module Structure
|
||
|
- **lib.rs**: Main entry point and public API
|
||
|
- **types.rs**: Data structures for tables, joins, CTEs, and query summaries
|
||
|
- **errors.rs**: Custom error types for SQL analysis
|
||
|
- **utils/mod.rs**: Core analysis logic using sqlparser
|
||
|
|
||
|
### Key Components
|
||
|
- **QuerySummary**: The main output structure containing tables, joins, and CTEs
|
||
|
- **TableInfo**: Information about tables including identifiers and columns
|
||
|
- **JoinInfo**: Describes joins between tables with conditions
|
||
|
- **CteSummary**: Contains a CTE's query summary and column mappings
|
||
|
- **SqlAnalyzerError**: Custom errors for parsing issues and vague references
|
||
|
- **QueryAnalyzer**: The worker class that implements the Visitor pattern
|
||
|
|
||
|
## Usage Patterns
|
||
|
|
||
|
### Basic Usage
|
||
|
```rust
|
||
|
use sql_analyzer::analyze_query;
|
||
|
|
||
|
let sql = "SELECT u.id FROM schema.users u JOIN schema.orders o ON u.id = o.user_id";
|
||
|
match analyze_query(sql.to_string()).await {
|
||
|
Ok(summary) => {
|
||
|
println!("Tables: {:?}", summary.tables);
|
||
|
println!("Joins: {:?}", summary.joins);
|
||
|
},
|
||
|
Err(e) => eprintln!("Error: {}", e),
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### Working with CTEs
|
||
|
```rust
|
||
|
let sql = "WITH users_cte AS (SELECT id FROM schema.users)
|
||
|
SELECT o.* FROM schema.orders o JOIN users_cte ON o.user_id = users_cte.id";
|
||
|
let summary = analyze_query(sql.to_string()).await?;
|
||
|
|
||
|
// Access the CTE information
|
||
|
for cte in &summary.ctes {
|
||
|
println!("CTE: {}", cte.name);
|
||
|
println!("CTE base tables: {:?}", cte.summary.tables);
|
||
|
println!("Column mappings: {:?}", cte.column_mappings);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Dependencies
|
||
|
- **sqlparser**: SQL parsing functionality
|
||
|
- **tokio**: Async runtime and blocking task management
|
||
|
- **anyhow**: Error handling
|
||
|
- **serde**: Serialization for query summary structures
|
||
|
- **thiserror**: Custom error definitions
|
||
|
|
||
|
## Testing
|
||
|
- Test cases should cover various SQL constructs
|
||
|
- Ensure tests for error cases (vague references)
|
||
|
- Test CTE lineage tracing and complex join scenarios
|
||
|
|
||
|
## Code Navigation Tips
|
||
|
- Start in lib.rs for the public API
|
||
|
- The core analysis logic is in utils/mod.rs
|
||
|
- The QueryAnalyzer struct implements the sqlparser Visitor trait
|
||
|
- The parsing of object names and column references is key to understanding the lineage tracking
|
||
|
|
||
|
## Common Pitfalls
|
||
|
- Vague references in SQL will cause errors by design
|
||
|
- Complex subqueries may need special handling
|
||
|
- Non-standard SQL dialects might not parse correctly with the generic dialect
|