buster/apps/api/prds/prd_semantic_model_definiti...

---
title: Semantic Model Definition
author: Gemini Assistant
date: 2024-07-26
status: Draft
parent_prd: semantic_layer_refactor_overview.md
ticket: N/A
---

# Semantic Model Definition

## Parent Project

This is a sub-PRD of the [Semantic Layer and Deployment Refactor](semantic_layer_refactor_overview.md) project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.

## Problem Statement

The current Rust structs in `api/libs/semantic_layer/src/models.rs` need to be the definitive representation of a data model as defined in user-created YAML files. These structs will be used by the CLI for parsing and by the API for request handling and persistence.

Current behavior:
-   The `Model` struct in `api/libs/semantic_layer/src/models.rs` is a good starting point but lacks fields for `database` and `schema` which are crucial for deployment and configuration inheritance.
-   The existing fields need to be reviewed to ensure they align with all attributes we want to support in the YAML model definitions (e.g., for entities/relationships, dimensions, measures, metrics, filters).

Expected behavior:
-   The structs in `api/libs/semantic_layer/src/models.rs` (primarily `Model`, `Dimension`, `Measure`, `Relationship`, `Metric`, `Filter`) will comprehensively define the structure of a semantic model.
-   The `Model` struct will include optional `database: Option<String>` and `schema: Option<String>` fields, with `#[serde(skip_serializing_if = "Option::is_none")]` to ensure they are not serialized if absent (useful for overrides).
-   All fields will correctly use `Option<T>` where attributes are optional in the YAML.
-   `serde` attributes (`rename`, `default`) will be used appropriately to match YAML conventions and handle missing fields gracefully.

## Goals

1.  Define a comprehensive set of Rust structs in `api/libs/semantic_layer/src/models.rs` that accurately represent the YAML structure for semantic models.
2.  Ensure the `Model` struct includes optional `database` and `schema` fields.
3.  Verify that all optional fields in the YAML correspond to `Option<T>` in Rust and use `#[serde(default)]` or other appropriate `serde` attributes where necessary.
4.  Ensure field names in Rust map correctly to YAML field names using `#[serde(rename = "...")]` if they differ (e.g., `type` vs `type_`).

## Non-Goals

1.  Implementing the parsing logic itself (this PRD focuses on struct definition).
2.  Defining how these models are stored in the database (covered in `prd_api_model_persistence.md`).

## Implementation Plan

### Phase 1: Define Core Model Structures

#### Technical Design
Review and update the following structs in `api/libs/semantic_layer/src/models.rs`:

```rust
// Path: api/libs/semantic_layer/src/models.rs

use serde::Deserialize;

// #[derive(Debug, Deserialize, PartialEq)] // Removed PartialEq for brevity in PRD, keep in real code
// pub struct SemanticLayerSpec { // Assuming top-level might be a Vec<Model> directly from file
//     pub models: Vec<Model>,
// }

#[derive(Debug, Deserialize)] // PartialEq can be added back later if needed for tests here
pub struct Model {
    pub name: String,
    pub description: Option<String>,

    // Added for deployment context, resolved by CLI
    #[serde(skip_serializing_if = "Option::is_none")]
    pub database: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub schema: Option<String>,

    #[serde(default)]
    pub dimensions: Vec<Dimension>,
    #[serde(default)]
    pub measures: Vec<Measure>,
    #[serde(default)]
    pub metrics: Vec<Metric>,
    #[serde(default)]
    pub filters: Vec<Filter>,
    #[serde(rename = "entities", default)] // Renamed from 'relationships' to match user YAML, default if empty
    pub relationships: Vec<Relationship>,
}

#[derive(Debug, Deserialize)]
pub struct Dimension {
    pub name: String,
    pub description: Option<String>,
    #[serde(rename = "type")]
    pub type_: Option<String>,
    #[serde(default)]
    pub searchable: bool,
    pub options: Option<Vec<String>>,
    // Consider adding expr for derived dimensions if supported by backend processing
    // pub expr: Option<String>,
}

#[derive(Debug, Deserialize)]
pub struct Measure {
    pub name: String,
    pub description: Option<String>,
    #[serde(rename = "type")]
    pub type_: Option<String>,
    // Aggregation might be relevant, or handled by `expr`
    // pub agg: Option<String>,
    // pub expr: Option<String>,
}

#[derive(Debug, Deserialize)]
pub struct Metric {
    pub name: String,
    pub expr: String, // Expression is core to a metric
    pub description: Option<String>,
    #[serde(default)]
    pub args: Vec<Argument>, // Changed to default empty vec
}

#[derive(Debug, Deserialize)]
pub struct Filter {
    pub name: String,
    pub expr: String, // Expression is core to a filter
    pub description: Option<String>,
    #[serde(default)]
    pub args: Vec<Argument>, // Changed to default empty vec
}

#[derive(Debug, Deserialize)]
pub struct Argument {
    pub name: String,
    #[serde(rename = "type")]
    pub type_: String,
    pub description: Option<String>,
}

#[derive(Debug, Deserialize)]
pub struct Relationship { // This was 'Entity' in CLI, 'Relationship' in semantic_layer, aligns with user's yml 'entities'
    pub name: String, // Name of the related entity/model

    // Fields from existing semantic_layer::Relationship
    pub primary_key: String, // Field in the current model
    pub foreign_key: String, // Field in the related model ('name')

    #[serde(rename = "type")]
    pub type_: Option<String>, // e.g., LEFT, INNER, etc. (join type)
    pub cardinality: Option<String>, // e.g., one-to-one, one-to-many
    pub description: Option<String>,

    // Fields from CLI Entity struct to consider merging/mapping
    // pub ref_: Option<String>, // If 'name' refers to an alias and 'ref_' is actual model
    // pub expr: String, // The join condition or foreign key expression if not simple keys
    // pub entity_type: String, // 'foreign', 'derived' etc. Could map to relationship characteristics
    // pub project_path: Option<String>, // For cross-project relationships, handled by CLI config resolution
}

```

**Key decisions for `Relationship` struct:**
-   The `Relationship` struct aims to consolidate what was previously `Entity` in the CLI's model parsing and the existing `Relationship` in the semantic layer.
-   `name`: Will refer to the target model name of the relationship.
-   `primary_key`: Column in the *current* model used for the join.
-   `foreign_key`: Column in the *target* model (specified by `name`) used for the join.
-   `type_`: Join type (e.g., `LEFT`, `INNER`).
-   `cardinality`: e.g., `one_to_one`, `one_to_many`.
-   The `expr` field from the CLI's `Entity` (which represented the join condition, like `self.id = other.self_id`) is important. We need to decide if `primary_key` and `foreign_key` are sufficient, or if a more general `expr` is needed for complex joins. For now, sticking to `primary_key` and `foreign_key` for simplicity, assuming simple key-based joins. If complex joins are needed, `expr` can be added back, potentially replacing `primary_key`/`foreign_key` or coexisting.
-   `project_path` from CLI's `Entity`: If a relationship `name` can be an alias, and `ref` points to the actual model, this is a pattern to consider. For now, `name` is assumed to be the actual target model name.
-   `ref_` from CLI's `Entity`: If a relationship `name` can be an alias, and `ref` points to the actual model, this is a pattern to consider. For now, `name` is assumed to be the actual target model name.

#### Implementation Steps
1.  [x] Add `database: Option<String>` and `schema: Option<String>` to the `Model` struct with `#[serde(skip_serializing_if = "Option::is_none")]`.
2.  [x] Review and update `Dimension`, `Measure`, `Metric`, `Filter`, and `Argument` structs to ensure all necessary fields are present, optionality is correct (`Option<T>`), and `serde` attributes (`default`, `rename`) are used appropriately.
3.  [x] Rename `Model.relationships` to `Model.entities` via `#[serde(rename = "entities")]` to match common YAML usage, while keeping the struct name `Relationship` internally if preferred, or renaming the struct to `Entity` as well for consistency.
4.  [x] Define the `Relationship` (or `Entity`) struct to include `name`, `primary_key`, `foreign_key`, `type_` (join type), `cardinality`, and `description`. Clarify the meaning of `primary_key` and `foreign_key` in this context (current model vs. related model).
5.  [ ] Ensure all structs derive `Debug` and `Deserialize`. `PartialEq` can be added for testing.

#### Tests

Unit tests in `api/libs/semantic_layer/src/models.rs` should verify deserialization of example YAML snippets into these structs, covering:
-   All fields present.
-   Optional fields missing (should default or be `None`).
-   Renamed fields (e.g., `type` in YAML to `type_` in Rust).
-   Defaulted collections (e.g., empty `dimensions` list if not in YAML).

```rust
// Example Test Snippet (conceptual)
#[cfg(test)]
mod tests {
    use super::*;
    use serde_yaml;

    #[test]
    fn test_deserialize_model_with_optional_db_schema() {
        let yaml_content = r#"
models:
  - name: my_model
    database: prod_db
    schema: prod_schema
    dimensions: []
    measures: []
    entities: []
"#;
        // Assuming we deserialize Vec<Model> if the top level is `models:`
        // Or adjust if a single file represents a single Model or a SemanticLayerSpec struct.
        // For now, let's assume a direct Model deserialization for simplicity of the test snippet's focus.
        let model_yaml = r#"
name: my_model
database: prod_db
schema: prod_schema
description: A test model
dimensions:
  - name: id
    type: integer
  - name: status
    type: string
    searchable: true
    options: ["active", "inactive"]
entities:
  - name: related_model
    primary_key: id
    foreign_key: my_model_id
    type: LEFT
    cardinality: one-to-many
metrics:
  - name: total_revenue
    expr: SUM(amount)
"#;
        let parsed_model: Result<Model, _> = serde_yaml::from_str(model_yaml);
        assert!(parsed_model.is_ok());
        let model = parsed_model.unwrap();
        assert_eq!(model.name, "my_model");
        assert_eq!(model.database, Some("prod_db".to_string()));
        assert_eq!(model.schema, Some("prod_schema".to_string()));
        assert_eq!(model.dimensions.len(), 2);
        assert_eq!(model.relationships.len(), 1);
        assert_eq!(model.metrics.len(), 1);
    }

    // Add more tests for other structs and variations
}
```

#### Success Criteria
- [ ] All structs in `api/libs/semantic_layer/src/models.rs` are defined as per the technical design.
- [ ] Unit tests for deserialization pass, covering various YAML structures.
- [ ] Code is reviewed and approved.

## Dependencies on Other Components
- None for this specific PRD, as it's foundational.

## Security Considerations
- Not directly applicable at the struct definition level, but `serde` itself is a well-vetted library.

## References
- Existing `api/libs/semantic_layer/src/models.rs`
- Existing CLI model parsing in `cli/cli/src/commands/deploy.rs` (for field reference)