--- title: API Request Handling for Model Deployment author: Gemini Assistant date: 2024-07-26 status: Draft parent_prd: semantic_layer_refactor_overview.md ticket: N/A --- # API Request Handling for Model Deployment ## Parent Project This is a sub-PRD of the [Semantic Layer and Deployment Refactor](semantic_layer_refactor_overview.md) project. Please refer to the parent PRD for the overall project context, goals, and implementation plan. ## Problem Statement The current `/deploy_datasets` API endpoint in `api/server/src/routes/rest/routes/datasets/deploy_datasets.rs` expects a request body (`Vec`) that is structured differently from the new unified `semantic_layer::Model`. To align with the new approach, this endpoint needs to be updated to accept a payload consisting of `Vec`. Current behavior: - The API endpoint `/deploy_datasets` takes `Json(Vec)`. - `DeployDatasetsRequest` has fields like `data_source_name`, `env`, `type_`, `name`, `model` (SQL model name), `schema`, `database`, `description`, `sql_definition`, `entity_relationships`, `columns`, `yml_file`. - This structure requires the CLI to transform its parsed YAML (currently CLI-specific structs) into this `DeployDatasetsRequest` format. Expected behavior: - The API endpoint `/deploy_datasets` will be updated to accept `Json(Vec)` (or a new wrapper struct if necessary, e.g., `BatchDeploySemanticModelsRequest { models: Vec, global_env: Option }`, but directly using `Vec` is preferred if `env` can be handled or is implicit). - The `semantic_layer::Model` (defined in `api/libs/semantic_layer/src/models.rs`) will be the primary data structure received. - The existing logic within `handle_deploy_datasets` and `deploy_datasets_handler` will need to be refactored to work with these new input structs instead of the old `DeployDatasetsRequest`. - Information like `data_source_name`, `schema`, and `database` will now primarily come from the fields within each `semantic_layer::Model` instance (which were resolved by the CLI). - The `env` field, previously on `DeployDatasetsRequest`, needs consideration. If it's a global setting for the batch, it might need to be passed differently or inferred. For now, assume `env` might be associated with the `DataSource` in the database and resolved there, or becomes part of the `semantic_layer::Model` if it can vary per model in a batch. - *Decision: The `env` is typically tied to a `DataSource` entry in the DB. The API should look up the `DataSource` using `name` (from `model.data_source_name`) and an `env` (e.g., hardcoded to "dev" or configurable). For simplicity, we can assume a default `env` like "dev" when looking up the data source if not provided explicitly with the model.* The `data_source_name` on the `semantic_layer::Model` will be the key. ## Goals 1. Update the signature of the `deploy_datasets` Axum handler function to accept `Json(Vec)` or an equivalent new request struct. 2. Refactor the internal logic of `handle_deploy_datasets` and `deploy_datasets_handler` to process `semantic_layer::Model` objects. 3. Map fields from `semantic_layer::Model` (and its nested structs like `Dimension`, `Measure`, `Relationship`) to the corresponding database entities (`Dataset`, `DatasetColumn`). 4. Ensure existing functionalities like data source lookup, organization ID handling, and user permissions checks are maintained. 5. Decide on and implement handling for the `env` parameter. ## Non-Goals 1. Implementing the type inference logic (covered in `prd_api_type_inference.md`). 2. Implementing the detailed persistence logic for all new semantic model parts like metrics and filters (covered in `prd_api_model_persistence.md`). This PRD focuses on adapting to the new request shape for existing core entities (datasets, columns). 3. Changing the response structure (`DeployDatasetsResponse`) significantly, though the source of its data will change. ## Implementation Plan ### Phase 1: Adapt API Endpoint and Core Logic #### Technical Design **1. Update Request Struct and Handler Signature:** - The main entry point `deploy_datasets` in `deploy_datasets.rs` will change its `Json` extractor. ```rust // In api/server/src/routes/rest/routes/datasets/deploy_datasets.rs // ... other imports ... use semantic_layer::models::Model as SemanticModel; // Alias for clarity // Current request structs (DeployDatasetsRequest, DeployDatasetsColumnsRequest, etc.) will be REMOVED or deprecated. // Updated Axum handler function signature pub async fn deploy_datasets( Extension(user): Extension, Json(requests): Json>, // <<<<< CHANGED HERE ) -> Result, (StatusCode, String)> { // ... organization_id and permission checks remain similar ... // Call handler function, passing Vec match handle_deploy_datasets(&user.id, requests).await { // <<<<< CHANGED HERE Ok(result) => Ok(ApiResponse::JsonData(result)), Err(e) => { tracing::error!("Error in deploy_datasets: {:?}", e); Err((StatusCode::INTERNAL_SERVER_ERROR, e.to_string())) } } } async fn handle_deploy_datasets( user_id: &Uuid, models: Vec, // <<<<< CHANGED HERE ) -> Result { // The logic to produce DeployDatasetsResponse might need adjustment // based on how ValidationResult is now generated. let results = deploy_datasets_handler(user_id, models, false).await?; // ... existing summary logic ... // Ok(DeployDatasetsResponse { results, summary }) } // This is the core function to refactor async fn deploy_datasets_handler( user_id: &Uuid, models: Vec, // <<<<< CHANGED HERE _is_simple: bool, // This parameter might be obsolete ) -> Result> { // ValidationResult might need to refer to SemanticModel fields let organization_id = get_user_organization_id(user_id).await?; let mut conn = get_pg_pool().get().await?; let mut results: Vec = Vec::new(); // Grouping by (data_source_name, env, database) might change slightly. // data_source_name and database now come from SemanticModel. // `env` needs to be resolved (e.g., assume "dev" for data source lookup). let default_env = "dev".to_string(); // Temporary map for models by their original data_source_name, as SemanticModel has it as Option // but CLI should have resolved it. let mut models_by_resolved_ds: HashMap> = HashMap::new(); for model in models { if let Some(ds_name) = &model.data_source_name { models_by_resolved_ds.entry(ds_name.clone()).or_default().push(model); } else { // This should ideally be caught by CLI validation let mut val_res = ValidationResult::new(model.name.clone(), "UNKNOWN_DS".to_string(), model.schema.clone().unwrap_or_default()); val_res.add_error(ValidationError::internal_error("DataSourceName missing on model".to_string())); results.push(val_res); continue; } } for (data_source_name, model_group) in models_by_resolved_ds { // Fetch DataSource using data_source_name and default_env let data_source = match data_sources::table .filter(data_sources::name.eq(&data_source_name)) .filter(data_sources::env.eq(&default_env)) .filter(data_sources::organization_id.eq(&organization_id)) // ... other filters ... .first::(&mut conn).await { Ok(ds) => ds, Err(_) => { /* ... handle error, push to results ... */ continue; } }; for semantic_model in model_group { // Now iterating over SemanticModel // Create a ValidationResult instance let mut validation_result = ValidationResult::new( semantic_model.name.clone(), data_source_name.clone(), // Use the resolved one semantic_model.schema.as_ref().cloned().unwrap_or_default(), ); // --- Map SemanticModel to Dataset --- let dataset_id = Uuid::new_v4(); // Or fetch existing by semantic_model.name and data_source.id let now = Utc::now(); let db_dataset = crate::database::models::Dataset { id: dataset_id, name: semantic_model.name.clone(), data_source_id: data_source.id, database_name: semantic_model.name.clone(), // Assuming model name is table/view name when_to_use: semantic_model.description.clone(), type_: DatasetType::View, // Default, or determine from SemanticModel if it has such a field // definition: semantic_model.sql_definition.clone(), // semantic_model doesn't have sql_definition directly. Where does this come from? // Perhaps from model.model field if that's a convention for SQL block? // For now, leave it empty or decide source. definition: String::new(), // Placeholder schema: semantic_model.schema.as_ref().cloned().unwrap_or_else(|| { validation_result.add_error(ValidationError::internal_error("Schema missing".to_string())); String::new() // Default, though error is added }), database_identifier: semantic_model.database.clone(), // This is Option yml_file: None, // CLI used to send this, API doesn't strictly need it if model def is complete // ... other fields like created_at, updated_at, user_id, organization_id ... }; // Add db_dataset to a list for bulk upsert // --- Map SemanticModel.dimensions and SemanticModel.measures to DatasetColumn --- let mut dataset_columns_to_upsert = Vec::new(); for dim in &semantic_model.dimensions { dataset_columns_to_upsert.push(crate::database::models::DatasetColumn { id: Uuid::new_v4(), dataset_id: dataset_id, // Link to above dataset name: dim.name.clone(), type_: dim.type_.clone().unwrap_or_else(|| "UNKNOWN".to_string()), // Type inference later if UNKNOWN description: dim.description.clone(), semantic_type: Some("dimension".to_string()), dim_type: dim.type_.clone(), // Or specific mapping // expr: dim.expr.clone(), // If Dimension has expr and DatasetColumn supports it // ... other fields ... }); } for measure in &semantic_model.measures { dataset_columns_to_upsert.push(crate::database::models::DatasetColumn { id: Uuid::new_v4(), dataset_id: dataset_id, name: measure.name.clone(), type_: measure.type_.clone().unwrap_or_else(|| "UNKNOWN".to_string()), description: measure.description.clone(), semantic_type: Some("measure".to_string()), // agg: measure.agg.clone(), // If Measure has agg and DatasetColumn supports it // expr: measure.expr.clone(), // If Measure has expr and DatasetColumn supports it // ... other fields ... }); } // Add dataset_columns_to_upsert to a list for bulk upsert, associated with dataset_id // --- Placeholder for Relationships, Metrics, Filters --- // semantic_model.relationships, semantic_model.metrics, semantic_model.filters // will be handled in prd_api_model_persistence.md // After DB operations (mocked for now or simplified upsert): // validation_result.success = true; (if all good) results.push(validation_result); } // Perform bulk upserts for datasets and columns for this data_source group here } Ok(results) } ``` *Self-correction: The `sql_definition` was part of the old `DeployDatasetsRequest`. The `semantic_layer::Model` doesn't have a direct `sql_definition` field. If the underlying SQL for a model/view is still needed at this stage, we need to clarify where it comes from. If the `semantic_layer::Model` is purely semantic, the API might not need the full SQL, or it might be embedded in a specific field within `semantic_layer::Model` (e.g., a `model_source: Option` field or similar). For now, `Dataset.definition` is set to empty.* The `Model.model` field in the old CLI structure was sometimes used for this. We should clarify if a field like `semantic_layer::Model.source_query: Option` is needed. **2. `ValidationResult` and `ValidationError`:** - These structs in `deploy_datasets.rs` will likely remain but will be populated based on validating `SemanticModel` objects. - `ValidationResult` references `model_name`, `data_source_name`, `schema`. These are available from `SemanticModel`. #### Implementation Steps 1. [ ] Modify the `deploy_datasets` Axum handler to accept `Json(Vec)`. 2. Update `handle_deploy_datasets` and `deploy_datasets_handler` to take `Vec` as input. 3. Refactor the grouping logic in `deploy_datasets_handler`: a. Models should be grouped by their `data_source_name` (present on `semantic_layer::Model`). b. Determine how `env` is handled for `DataSource` lookup (e.g., use a default like "dev"). 4. Inside the loop for each `SemanticModel`: a. Adapt the creation of `ValidationResult` using fields from `SemanticModel`. b. Map `SemanticModel` fields (name, description, schema, database) to `crate::database::models::Dataset`. - Clarify the source for `Dataset.definition` (SQL query). c. Map `SemanticModel.dimensions` to `crate::database::models::DatasetColumn` (setting `semantic_type` to "dimension"). d. Map `SemanticModel.measures` to `crate::database::models::DatasetColumn` (setting `semantic_type` to "measure"). e. Ensure `dataset_id` linkage is correct. 5. Adapt the existing bulk upsert logic for `Dataset` and `DatasetColumn` to use the newly mapped objects. 6. Ensure soft-delete logic for columns not present in the new request still functions correctly based on the incoming columns for a dataset. 7. Temporarily bypass or add placeholders for processing `relationships`, `metrics`, and `filters` from `SemanticModel` (to be fully addressed in `prd_api_model_persistence.md`). #### Tests - **Unit Tests for `deploy_datasets_handler`:** - Mock database interactions. - Test with a valid `Vec`: ensure `Dataset` and `DatasetColumn` objects are correctly formed. - Test with models having missing `data_source_name` or `schema` (should result in validation errors if CLI didn't catch it, or be handled gracefully if API expects them to be resolved). - Test data source lookup failure. - **Integration Tests (CLI calling the actual, modified API endpoint):** - Full flow: CLI sends `Vec`, API receives and processes it, basic data (datasets, columns) is stored. - Test with multiple models targeting the same and different data sources. #### Success Criteria - [ ] API endpoint `/deploy_datasets` successfully accepts `Json(Vec)`. - [ ] Core logic in `deploy_datasets_handler` correctly processes `semantic_layer::Model` objects and maps them to `Dataset` and `DatasetColumn` database models. - [ ] Existing data source lookup, permission checks, and basic upsert operations for datasets/columns function with the new input structure. - [ ] The `env` for data source lookup is handled correctly. - [ ] Unit and integration tests pass. ## Dependencies on Other Components - **`prd_semantic_model_definition.md`**: The API must use the exact `semantic_layer::Model` struct defined there. - **`prd_cli_deployment_logic.md`**: The CLI must send data in the format this API endpoint now expects. ## Security Considerations - All incoming fields from `semantic_layer::Model` must be treated as untrusted input and validated/sanitized before database interaction, especially string fields used in queries or stored directly. - Permissions (`is_user_workspace_admin_or_data_admin`) must continue to be robustly checked. ## References - `api/server/src/routes/rest/routes/datasets/deploy_datasets.rs` (current implementation) - `api/libs/semantic_layer/src/models.rs` (new request DTO) - Axum documentation for JSON extraction.