mirror of https://github.com/buster-so/buster.git
251 lines
11 KiB
Markdown
251 lines
11 KiB
Markdown
---
|
|
title: Semantic Model Definition
|
|
author: Gemini Assistant
|
|
date: 2024-07-26
|
|
status: Draft
|
|
parent_prd: semantic_layer_refactor_overview.md
|
|
ticket: N/A
|
|
---
|
|
|
|
# Semantic Model Definition
|
|
|
|
## Parent Project
|
|
|
|
This is a sub-PRD of the [Semantic Layer and Deployment Refactor](semantic_layer_refactor_overview.md) project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.
|
|
|
|
## Problem Statement
|
|
|
|
The current Rust structs in `api/libs/semantic_layer/src/models.rs` need to be the definitive representation of a data model as defined in user-created YAML files. These structs will be used by the CLI for parsing and by the API for request handling and persistence.
|
|
|
|
Current behavior:
|
|
- The `Model` struct in `api/libs/semantic_layer/src/models.rs` is a good starting point but lacks fields for `database` and `schema` which are crucial for deployment and configuration inheritance.
|
|
- The existing fields need to be reviewed to ensure they align with all attributes we want to support in the YAML model definitions (e.g., for entities/relationships, dimensions, measures, metrics, filters).
|
|
|
|
Expected behavior:
|
|
- The structs in `api/libs/semantic_layer/src/models.rs` (primarily `Model`, `Dimension`, `Measure`, `Relationship`, `Metric`, `Filter`) will comprehensively define the structure of a semantic model.
|
|
- The `Model` struct will include optional `database: Option<String>` and `schema: Option<String>` fields, with `#[serde(skip_serializing_if = "Option::is_none")]` to ensure they are not serialized if absent (useful for overrides).
|
|
- All fields will correctly use `Option<T>` where attributes are optional in the YAML.
|
|
- `serde` attributes (`rename`, `default`) will be used appropriately to match YAML conventions and handle missing fields gracefully.
|
|
|
|
## Goals
|
|
|
|
1. Define a comprehensive set of Rust structs in `api/libs/semantic_layer/src/models.rs` that accurately represent the YAML structure for semantic models.
|
|
2. Ensure the `Model` struct includes optional `database` and `schema` fields.
|
|
3. Verify that all optional fields in the YAML correspond to `Option<T>` in Rust and use `#[serde(default)]` or other appropriate `serde` attributes where necessary.
|
|
4. Ensure field names in Rust map correctly to YAML field names using `#[serde(rename = "...")]` if they differ (e.g., `type` vs `type_`).
|
|
|
|
## Non-Goals
|
|
|
|
1. Implementing the parsing logic itself (this PRD focuses on struct definition).
|
|
2. Defining how these models are stored in the database (covered in `prd_api_model_persistence.md`).
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Define Core Model Structures
|
|
|
|
#### Technical Design
|
|
Review and update the following structs in `api/libs/semantic_layer/src/models.rs`:
|
|
|
|
```rust
|
|
// Path: api/libs/semantic_layer/src/models.rs
|
|
|
|
use serde::Deserialize;
|
|
|
|
// #[derive(Debug, Deserialize, PartialEq)] // Removed PartialEq for brevity in PRD, keep in real code
|
|
// pub struct SemanticLayerSpec { // Assuming top-level might be a Vec<Model> directly from file
|
|
// pub models: Vec<Model>,
|
|
// }
|
|
|
|
#[derive(Debug, Deserialize)] // PartialEq can be added back later if needed for tests here
|
|
pub struct Model {
|
|
pub name: String,
|
|
pub description: Option<String>,
|
|
|
|
// Added for deployment context, resolved by CLI
|
|
#[serde(skip_serializing_if = "Option::is_none")]
|
|
pub database: Option<String>,
|
|
#[serde(skip_serializing_if = "Option::is_none")]
|
|
pub schema: Option<String>,
|
|
|
|
#[serde(default)]
|
|
pub dimensions: Vec<Dimension>,
|
|
#[serde(default)]
|
|
pub measures: Vec<Measure>,
|
|
#[serde(default)]
|
|
pub metrics: Vec<Metric>,
|
|
#[serde(default)]
|
|
pub filters: Vec<Filter>,
|
|
#[serde(rename = "entities", default)] // Renamed from 'relationships' to match user YAML, default if empty
|
|
pub relationships: Vec<Relationship>,
|
|
}
|
|
|
|
#[derive(Debug, Deserialize)]
|
|
pub struct Dimension {
|
|
pub name: String,
|
|
pub description: Option<String>,
|
|
#[serde(rename = "type")]
|
|
pub type_: Option<String>,
|
|
#[serde(default)]
|
|
pub searchable: bool,
|
|
pub options: Option<Vec<String>>,
|
|
// Consider adding expr for derived dimensions if supported by backend processing
|
|
// pub expr: Option<String>,
|
|
}
|
|
|
|
#[derive(Debug, Deserialize)]
|
|
pub struct Measure {
|
|
pub name: String,
|
|
pub description: Option<String>,
|
|
#[serde(rename = "type")]
|
|
pub type_: Option<String>,
|
|
// Aggregation might be relevant, or handled by `expr`
|
|
// pub agg: Option<String>,
|
|
// pub expr: Option<String>,
|
|
}
|
|
|
|
#[derive(Debug, Deserialize)]
|
|
pub struct Metric {
|
|
pub name: String,
|
|
pub expr: String, // Expression is core to a metric
|
|
pub description: Option<String>,
|
|
#[serde(default)]
|
|
pub args: Vec<Argument>, // Changed to default empty vec
|
|
}
|
|
|
|
#[derive(Debug, Deserialize)]
|
|
pub struct Filter {
|
|
pub name: String,
|
|
pub expr: String, // Expression is core to a filter
|
|
pub description: Option<String>,
|
|
#[serde(default)]
|
|
pub args: Vec<Argument>, // Changed to default empty vec
|
|
}
|
|
|
|
#[derive(Debug, Deserialize)]
|
|
pub struct Argument {
|
|
pub name: String,
|
|
#[serde(rename = "type")]
|
|
pub type_: String,
|
|
pub description: Option<String>,
|
|
}
|
|
|
|
#[derive(Debug, Deserialize)]
|
|
pub struct Relationship { // This was 'Entity' in CLI, 'Relationship' in semantic_layer, aligns with user's yml 'entities'
|
|
pub name: String, // Name of the related entity/model
|
|
|
|
// Fields from existing semantic_layer::Relationship
|
|
pub primary_key: String, // Field in the current model
|
|
pub foreign_key: String, // Field in the related model ('name')
|
|
|
|
#[serde(rename = "type")]
|
|
pub type_: Option<String>, // e.g., LEFT, INNER, etc. (join type)
|
|
pub cardinality: Option<String>, // e.g., one-to-one, one-to-many
|
|
pub description: Option<String>,
|
|
|
|
// Fields from CLI Entity struct to consider merging/mapping
|
|
// pub ref_: Option<String>, // If 'name' refers to an alias and 'ref_' is actual model
|
|
// pub expr: String, // The join condition or foreign key expression if not simple keys
|
|
// pub entity_type: String, // 'foreign', 'derived' etc. Could map to relationship characteristics
|
|
// pub project_path: Option<String>, // For cross-project relationships, handled by CLI config resolution
|
|
}
|
|
|
|
```
|
|
|
|
**Key decisions for `Relationship` struct:**
|
|
- The `Relationship` struct aims to consolidate what was previously `Entity` in the CLI's model parsing and the existing `Relationship` in the semantic layer.
|
|
- `name`: Will refer to the target model name of the relationship.
|
|
- `primary_key`: Column in the *current* model used for the join.
|
|
- `foreign_key`: Column in the *target* model (specified by `name`) used for the join.
|
|
- `type_`: Join type (e.g., `LEFT`, `INNER`).
|
|
- `cardinality`: e.g., `one_to_one`, `one_to_many`.
|
|
- The `expr` field from the CLI's `Entity` (which represented the join condition, like `self.id = other.self_id`) is important. We need to decide if `primary_key` and `foreign_key` are sufficient, or if a more general `expr` is needed for complex joins. For now, sticking to `primary_key` and `foreign_key` for simplicity, assuming simple key-based joins. If complex joins are needed, `expr` can be added back, potentially replacing `primary_key`/`foreign_key` or coexisting.
|
|
- `project_path` from CLI's `Entity`: If a relationship `name` can be an alias, and `ref` points to the actual model, this is a pattern to consider. For now, `name` is assumed to be the actual target model name.
|
|
- `ref_` from CLI's `Entity`: If a relationship `name` can be an alias, and `ref` points to the actual model, this is a pattern to consider. For now, `name` is assumed to be the actual target model name.
|
|
|
|
#### Implementation Steps
|
|
1. [x] Add `database: Option<String>` and `schema: Option<String>` to the `Model` struct with `#[serde(skip_serializing_if = "Option::is_none")]`.
|
|
2. [x] Review and update `Dimension`, `Measure`, `Metric`, `Filter`, and `Argument` structs to ensure all necessary fields are present, optionality is correct (`Option<T>`), and `serde` attributes (`default`, `rename`) are used appropriately.
|
|
3. [x] Rename `Model.relationships` to `Model.entities` via `#[serde(rename = "entities")]` to match common YAML usage, while keeping the struct name `Relationship` internally if preferred, or renaming the struct to `Entity` as well for consistency.
|
|
4. [x] Define the `Relationship` (or `Entity`) struct to include `name`, `primary_key`, `foreign_key`, `type_` (join type), `cardinality`, and `description`. Clarify the meaning of `primary_key` and `foreign_key` in this context (current model vs. related model).
|
|
5. [ ] Ensure all structs derive `Debug` and `Deserialize`. `PartialEq` can be added for testing.
|
|
|
|
#### Tests
|
|
|
|
Unit tests in `api/libs/semantic_layer/src/models.rs` should verify deserialization of example YAML snippets into these structs, covering:
|
|
- All fields present.
|
|
- Optional fields missing (should default or be `None`).
|
|
- Renamed fields (e.g., `type` in YAML to `type_` in Rust).
|
|
- Defaulted collections (e.g., empty `dimensions` list if not in YAML).
|
|
|
|
```rust
|
|
// Example Test Snippet (conceptual)
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::*;
|
|
use serde_yaml;
|
|
|
|
#[test]
|
|
fn test_deserialize_model_with_optional_db_schema() {
|
|
let yaml_content = r#"
|
|
models:
|
|
- name: my_model
|
|
database: prod_db
|
|
schema: prod_schema
|
|
dimensions: []
|
|
measures: []
|
|
entities: []
|
|
"#;
|
|
// Assuming we deserialize Vec<Model> if the top level is `models:`
|
|
// Or adjust if a single file represents a single Model or a SemanticLayerSpec struct.
|
|
// For now, let's assume a direct Model deserialization for simplicity of the test snippet's focus.
|
|
let model_yaml = r#"
|
|
name: my_model
|
|
database: prod_db
|
|
schema: prod_schema
|
|
description: A test model
|
|
dimensions:
|
|
- name: id
|
|
type: integer
|
|
- name: status
|
|
type: string
|
|
searchable: true
|
|
options: ["active", "inactive"]
|
|
entities:
|
|
- name: related_model
|
|
primary_key: id
|
|
foreign_key: my_model_id
|
|
type: LEFT
|
|
cardinality: one-to-many
|
|
metrics:
|
|
- name: total_revenue
|
|
expr: SUM(amount)
|
|
"#;
|
|
let parsed_model: Result<Model, _> = serde_yaml::from_str(model_yaml);
|
|
assert!(parsed_model.is_ok());
|
|
let model = parsed_model.unwrap();
|
|
assert_eq!(model.name, "my_model");
|
|
assert_eq!(model.database, Some("prod_db".to_string()));
|
|
assert_eq!(model.schema, Some("prod_schema".to_string()));
|
|
assert_eq!(model.dimensions.len(), 2);
|
|
assert_eq!(model.relationships.len(), 1);
|
|
assert_eq!(model.metrics.len(), 1);
|
|
}
|
|
|
|
// Add more tests for other structs and variations
|
|
}
|
|
```
|
|
|
|
#### Success Criteria
|
|
- [ ] All structs in `api/libs/semantic_layer/src/models.rs` are defined as per the technical design.
|
|
- [ ] Unit tests for deserialization pass, covering various YAML structures.
|
|
- [ ] Code is reviewed and approved.
|
|
|
|
## Dependencies on Other Components
|
|
- None for this specific PRD, as it's foundational.
|
|
|
|
## Security Considerations
|
|
- Not directly applicable at the struct definition level, but `serde` itself is a well-vetted library.
|
|
|
|
## References
|
|
- Existing `api/libs/semantic_layer/src/models.rs`
|
|
- Existing CLI model parsing in `cli/cli/src/commands/deploy.rs` (for field reference) |