mirror of https://github.com/buster-so/buster.git
11 KiB
11 KiB
title | author | date | status | parent_prd | ticket |
---|---|---|---|---|---|
Semantic Model Definition | Gemini Assistant | 2024-07-26 | Draft | semantic_layer_refactor_overview.md | N/A |
Semantic Model Definition
Parent Project
This is a sub-PRD of the Semantic Layer and Deployment Refactor project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.
Problem Statement
The current Rust structs in api/libs/semantic_layer/src/models.rs
need to be the definitive representation of a data model as defined in user-created YAML files. These structs will be used by the CLI for parsing and by the API for request handling and persistence.
Current behavior:
- The
Model
struct inapi/libs/semantic_layer/src/models.rs
is a good starting point but lacks fields fordatabase
andschema
which are crucial for deployment and configuration inheritance. - The existing fields need to be reviewed to ensure they align with all attributes we want to support in the YAML model definitions (e.g., for entities/relationships, dimensions, measures, metrics, filters).
Expected behavior:
- The structs in
api/libs/semantic_layer/src/models.rs
(primarilyModel
,Dimension
,Measure
,Relationship
,Metric
,Filter
) will comprehensively define the structure of a semantic model. - The
Model
struct will include optionaldatabase: Option<String>
andschema: Option<String>
fields, with#[serde(skip_serializing_if = "Option::is_none")]
to ensure they are not serialized if absent (useful for overrides). - All fields will correctly use
Option<T>
where attributes are optional in the YAML. serde
attributes (rename
,default
) will be used appropriately to match YAML conventions and handle missing fields gracefully.
Goals
- Define a comprehensive set of Rust structs in
api/libs/semantic_layer/src/models.rs
that accurately represent the YAML structure for semantic models. - Ensure the
Model
struct includes optionaldatabase
andschema
fields. - Verify that all optional fields in the YAML correspond to
Option<T>
in Rust and use#[serde(default)]
or other appropriateserde
attributes where necessary. - Ensure field names in Rust map correctly to YAML field names using
#[serde(rename = "...")]
if they differ (e.g.,type
vstype_
).
Non-Goals
- Implementing the parsing logic itself (this PRD focuses on struct definition).
- Defining how these models are stored in the database (covered in
prd_api_model_persistence.md
).
Implementation Plan
Phase 1: Define Core Model Structures
Technical Design
Review and update the following structs in api/libs/semantic_layer/src/models.rs
:
// Path: api/libs/semantic_layer/src/models.rs
use serde::Deserialize;
// #[derive(Debug, Deserialize, PartialEq)] // Removed PartialEq for brevity in PRD, keep in real code
// pub struct SemanticLayerSpec { // Assuming top-level might be a Vec<Model> directly from file
// pub models: Vec<Model>,
// }
#[derive(Debug, Deserialize)] // PartialEq can be added back later if needed for tests here
pub struct Model {
pub name: String,
pub description: Option<String>,
// Added for deployment context, resolved by CLI
#[serde(skip_serializing_if = "Option::is_none")]
pub database: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub schema: Option<String>,
#[serde(default)]
pub dimensions: Vec<Dimension>,
#[serde(default)]
pub measures: Vec<Measure>,
#[serde(default)]
pub metrics: Vec<Metric>,
#[serde(default)]
pub filters: Vec<Filter>,
#[serde(rename = "entities", default)] // Renamed from 'relationships' to match user YAML, default if empty
pub relationships: Vec<Relationship>,
}
#[derive(Debug, Deserialize)]
pub struct Dimension {
pub name: String,
pub description: Option<String>,
#[serde(rename = "type")]
pub type_: Option<String>,
#[serde(default)]
pub searchable: bool,
pub options: Option<Vec<String>>,
// Consider adding expr for derived dimensions if supported by backend processing
// pub expr: Option<String>,
}
#[derive(Debug, Deserialize)]
pub struct Measure {
pub name: String,
pub description: Option<String>,
#[serde(rename = "type")]
pub type_: Option<String>,
// Aggregation might be relevant, or handled by `expr`
// pub agg: Option<String>,
// pub expr: Option<String>,
}
#[derive(Debug, Deserialize)]
pub struct Metric {
pub name: String,
pub expr: String, // Expression is core to a metric
pub description: Option<String>,
#[serde(default)]
pub args: Vec<Argument>, // Changed to default empty vec
}
#[derive(Debug, Deserialize)]
pub struct Filter {
pub name: String,
pub expr: String, // Expression is core to a filter
pub description: Option<String>,
#[serde(default)]
pub args: Vec<Argument>, // Changed to default empty vec
}
#[derive(Debug, Deserialize)]
pub struct Argument {
pub name: String,
#[serde(rename = "type")]
pub type_: String,
pub description: Option<String>,
}
#[derive(Debug, Deserialize)]
pub struct Relationship { // This was 'Entity' in CLI, 'Relationship' in semantic_layer, aligns with user's yml 'entities'
pub name: String, // Name of the related entity/model
// Fields from existing semantic_layer::Relationship
pub primary_key: String, // Field in the current model
pub foreign_key: String, // Field in the related model ('name')
#[serde(rename = "type")]
pub type_: Option<String>, // e.g., LEFT, INNER, etc. (join type)
pub cardinality: Option<String>, // e.g., one-to-one, one-to-many
pub description: Option<String>,
// Fields from CLI Entity struct to consider merging/mapping
// pub ref_: Option<String>, // If 'name' refers to an alias and 'ref_' is actual model
// pub expr: String, // The join condition or foreign key expression if not simple keys
// pub entity_type: String, // 'foreign', 'derived' etc. Could map to relationship characteristics
// pub project_path: Option<String>, // For cross-project relationships, handled by CLI config resolution
}
Key decisions for Relationship
struct:
- The
Relationship
struct aims to consolidate what was previouslyEntity
in the CLI's model parsing and the existingRelationship
in the semantic layer. name
: Will refer to the target model name of the relationship.primary_key
: Column in the current model used for the join.foreign_key
: Column in the target model (specified byname
) used for the join.type_
: Join type (e.g.,LEFT
,INNER
).cardinality
: e.g.,one_to_one
,one_to_many
.- The
expr
field from the CLI'sEntity
(which represented the join condition, likeself.id = other.self_id
) is important. We need to decide ifprimary_key
andforeign_key
are sufficient, or if a more generalexpr
is needed for complex joins. For now, sticking toprimary_key
andforeign_key
for simplicity, assuming simple key-based joins. If complex joins are needed,expr
can be added back, potentially replacingprimary_key
/foreign_key
or coexisting. project_path
from CLI'sEntity
: If a relationshipname
can be an alias, andref
points to the actual model, this is a pattern to consider. For now,name
is assumed to be the actual target model name.ref_
from CLI'sEntity
: If a relationshipname
can be an alias, andref
points to the actual model, this is a pattern to consider. For now,name
is assumed to be the actual target model name.
Implementation Steps
- Add
database: Option<String>
andschema: Option<String>
to theModel
struct with#[serde(skip_serializing_if = "Option::is_none")]
. - Review and update
Dimension
,Measure
,Metric
,Filter
, andArgument
structs to ensure all necessary fields are present, optionality is correct (Option<T>
), andserde
attributes (default
,rename
) are used appropriately. - Rename
Model.relationships
toModel.entities
via#[serde(rename = "entities")]
to match common YAML usage, while keeping the struct nameRelationship
internally if preferred, or renaming the struct toEntity
as well for consistency. - Define the
Relationship
(orEntity
) struct to includename
,primary_key
,foreign_key
,type_
(join type),cardinality
, anddescription
. Clarify the meaning ofprimary_key
andforeign_key
in this context (current model vs. related model). - Ensure all structs derive
Debug
andDeserialize
.PartialEq
can be added for testing.
Tests
Unit tests in api/libs/semantic_layer/src/models.rs
should verify deserialization of example YAML snippets into these structs, covering:
- All fields present.
- Optional fields missing (should default or be
None
). - Renamed fields (e.g.,
type
in YAML totype_
in Rust). - Defaulted collections (e.g., empty
dimensions
list if not in YAML).
// Example Test Snippet (conceptual)
#[cfg(test)]
mod tests {
use super::*;
use serde_yaml;
#[test]
fn test_deserialize_model_with_optional_db_schema() {
let yaml_content = r#"
models:
- name: my_model
database: prod_db
schema: prod_schema
dimensions: []
measures: []
entities: []
"#;
// Assuming we deserialize Vec<Model> if the top level is `models:`
// Or adjust if a single file represents a single Model or a SemanticLayerSpec struct.
// For now, let's assume a direct Model deserialization for simplicity of the test snippet's focus.
let model_yaml = r#"
name: my_model
database: prod_db
schema: prod_schema
description: A test model
dimensions:
- name: id
type: integer
- name: status
type: string
searchable: true
options: ["active", "inactive"]
entities:
- name: related_model
primary_key: id
foreign_key: my_model_id
type: LEFT
cardinality: one-to-many
metrics:
- name: total_revenue
expr: SUM(amount)
"#;
let parsed_model: Result<Model, _> = serde_yaml::from_str(model_yaml);
assert!(parsed_model.is_ok());
let model = parsed_model.unwrap();
assert_eq!(model.name, "my_model");
assert_eq!(model.database, Some("prod_db".to_string()));
assert_eq!(model.schema, Some("prod_schema".to_string()));
assert_eq!(model.dimensions.len(), 2);
assert_eq!(model.relationships.len(), 1);
assert_eq!(model.metrics.len(), 1);
}
// Add more tests for other structs and variations
}
Success Criteria
- All structs in
api/libs/semantic_layer/src/models.rs
are defined as per the technical design. - Unit tests for deserialization pass, covering various YAML structures.
- Code is reviewed and approved.
Dependencies on Other Components
- None for this specific PRD, as it's foundational.
Security Considerations
- Not directly applicable at the struct definition level, but
serde
itself is a well-vetted library.
References
- Existing
api/libs/semantic_layer/src/models.rs
- Existing CLI model parsing in
cli/cli/src/commands/deploy.rs
(for field reference)