buster/api/prds/prd_api_request_handling.md

17 KiB

title author date status parent_prd ticket
API Request Handling for Model Deployment Gemini Assistant 2024-07-26 Draft semantic_layer_refactor_overview.md N/A

API Request Handling for Model Deployment

Parent Project

This is a sub-PRD of the Semantic Layer and Deployment Refactor project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.

Problem Statement

The current /deploy_datasets API endpoint in api/server/src/routes/rest/routes/datasets/deploy_datasets.rs expects a request body (Vec<DeployDatasetsRequest>) that is structured differently from the new unified semantic_layer::Model. To align with the new approach, this endpoint needs to be updated to accept a payload consisting of Vec<semantic_layer::Model>.

Current behavior:

  • The API endpoint /deploy_datasets takes Json(Vec<DeployDatasetsRequest>).
  • DeployDatasetsRequest has fields like data_source_name, env, type_, name, model (SQL model name), schema, database, description, sql_definition, entity_relationships, columns, yml_file.
  • This structure requires the CLI to transform its parsed YAML (currently CLI-specific structs) into this DeployDatasetsRequest format.

Expected behavior:

  • The API endpoint /deploy_datasets will be updated to accept Json(Vec<semantic_layer::Model>) (or a new wrapper struct if necessary, e.g., BatchDeploySemanticModelsRequest { models: Vec<semantic_layer::Model>, global_env: Option<String> }, but directly using Vec<semantic_layer::Model> is preferred if env can be handled or is implicit).
  • The semantic_layer::Model (defined in api/libs/semantic_layer/src/models.rs) will be the primary data structure received.
  • The existing logic within handle_deploy_datasets and deploy_datasets_handler will need to be refactored to work with these new input structs instead of the old DeployDatasetsRequest.
  • Information like data_source_name, schema, and database will now primarily come from the fields within each semantic_layer::Model instance (which were resolved by the CLI).
  • The env field, previously on DeployDatasetsRequest, needs consideration. If it's a global setting for the batch, it might need to be passed differently or inferred. For now, assume env might be associated with the DataSource in the database and resolved there, or becomes part of the semantic_layer::Model if it can vary per model in a batch.
    • Decision: The env is typically tied to a DataSource entry in the DB. The API should look up the DataSource using name (from model.data_source_name) and an env (e.g., hardcoded to "dev" or configurable). For simplicity, we can assume a default env like "dev" when looking up the data source if not provided explicitly with the model. The data_source_name on the semantic_layer::Model will be the key.

Goals

  1. Update the signature of the deploy_datasets Axum handler function to accept Json(Vec<semantic_layer::Model>) or an equivalent new request struct.
  2. Refactor the internal logic of handle_deploy_datasets and deploy_datasets_handler to process semantic_layer::Model objects.
  3. Map fields from semantic_layer::Model (and its nested structs like Dimension, Measure, Relationship) to the corresponding database entities (Dataset, DatasetColumn).
  4. Ensure existing functionalities like data source lookup, organization ID handling, and user permissions checks are maintained.
  5. Decide on and implement handling for the env parameter.

Non-Goals

  1. Implementing the type inference logic (covered in prd_api_type_inference.md).
  2. Implementing the detailed persistence logic for all new semantic model parts like metrics and filters (covered in prd_api_model_persistence.md). This PRD focuses on adapting to the new request shape for existing core entities (datasets, columns).
  3. Changing the response structure (DeployDatasetsResponse) significantly, though the source of its data will change.

Implementation Plan

Phase 1: Adapt API Endpoint and Core Logic

Technical Design

1. Update Request Struct and Handler Signature:

  • The main entry point deploy_datasets in deploy_datasets.rs will change its Json extractor.
// In api/server/src/routes/rest/routes/datasets/deploy_datasets.rs

// ... other imports ...
use semantic_layer::models::Model as SemanticModel; // Alias for clarity

// Current request structs (DeployDatasetsRequest, DeployDatasetsColumnsRequest, etc.) will be REMOVED or deprecated.

// Updated Axum handler function signature
pub async fn deploy_datasets(
    Extension(user): Extension<AuthenticatedUser>,
    Json(requests): Json<Vec<SemanticModel>>, // <<<<< CHANGED HERE
) -> Result<ApiResponse<DeployDatasetsResponse>, (StatusCode, String)> {
    // ... organization_id and permission checks remain similar ...

    // Call handler function, passing Vec<SemanticModel>
    match handle_deploy_datasets(&user.id, requests).await { // <<<<< CHANGED HERE
        Ok(result) => Ok(ApiResponse::JsonData(result)),
        Err(e) => {
            tracing::error!("Error in deploy_datasets: {:?}", e);
            Err((StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))
        }
    }
}

async fn handle_deploy_datasets(
    user_id: &Uuid,
    models: Vec<SemanticModel>, // <<<<< CHANGED HERE
) -> Result<DeployDatasetsResponse> {
    // The logic to produce DeployDatasetsResponse might need adjustment
    // based on how ValidationResult is now generated.
    let results = deploy_datasets_handler(user_id, models, false).await?;
    // ... existing summary logic ...
    // Ok(DeployDatasetsResponse { results, summary })
}

// This is the core function to refactor
async fn deploy_datasets_handler(
    user_id: &Uuid,
    models: Vec<SemanticModel>, // <<<<< CHANGED HERE
    _is_simple: bool, // This parameter might be obsolete
) -> Result<Vec<ValidationResult>> { // ValidationResult might need to refer to SemanticModel fields
    let organization_id = get_user_organization_id(user_id).await?;
    let mut conn = get_pg_pool().get().await?;
    let mut results: Vec<ValidationResult> = Vec::new();

    // Grouping by (data_source_name, env, database) might change slightly.
    // data_source_name and database now come from SemanticModel.
    // `env` needs to be resolved (e.g., assume "dev" for data source lookup).
    let default_env = "dev".to_string();

    // Temporary map for models by their original data_source_name, as SemanticModel has it as Option<String>
    // but CLI should have resolved it.
    let mut models_by_resolved_ds: HashMap<String, Vec<SemanticModel>> = HashMap::new();
    for model in models {
        if let Some(ds_name) = &model.data_source_name {
             models_by_resolved_ds.entry(ds_name.clone()).or_default().push(model);
        } else {
            // This should ideally be caught by CLI validation
            let mut val_res = ValidationResult::new(model.name.clone(), "UNKNOWN_DS".to_string(), model.schema.clone().unwrap_or_default());
            val_res.add_error(ValidationError::internal_error("DataSourceName missing on model".to_string()));
            results.push(val_res);
            continue;
        }
    }

    for (data_source_name, model_group) in models_by_resolved_ds {
        // Fetch DataSource using data_source_name and default_env
        let data_source = match data_sources::table
            .filter(data_sources::name.eq(&data_source_name))
            .filter(data_sources::env.eq(&default_env))
            .filter(data_sources::organization_id.eq(&organization_id))
            // ... other filters ...
            .first::<DataSource>(&mut conn).await {
                Ok(ds) => ds,
                Err(_) => { /* ... handle error, push to results ... */ continue; }
            };

        for semantic_model in model_group { // Now iterating over SemanticModel
            // Create a ValidationResult instance
            let mut validation_result = ValidationResult::new(
                semantic_model.name.clone(),
                data_source_name.clone(), // Use the resolved one
                semantic_model.schema.as_ref().cloned().unwrap_or_default(),
            );

            // --- Map SemanticModel to Dataset --- 
            let dataset_id = Uuid::new_v4(); // Or fetch existing by semantic_model.name and data_source.id
            let now = Utc::now();
            let db_dataset = crate::database::models::Dataset {
                id: dataset_id, 
                name: semantic_model.name.clone(),
                data_source_id: data_source.id,
                database_name: semantic_model.name.clone(), // Assuming model name is table/view name
                when_to_use: semantic_model.description.clone(),
                type_: DatasetType::View, // Default, or determine from SemanticModel if it has such a field
                // definition: semantic_model.sql_definition.clone(), // semantic_model doesn't have sql_definition directly. Where does this come from?
                                                              // Perhaps from model.model field if that's a convention for SQL block?
                                                              // For now, leave it empty or decide source.
                definition: String::new(), // Placeholder
                schema: semantic_model.schema.as_ref().cloned().unwrap_or_else(|| {
                    validation_result.add_error(ValidationError::internal_error("Schema missing".to_string()));
                    String::new() // Default, though error is added
                }),
                database_identifier: semantic_model.database.clone(), // This is Option<String>
                yml_file: None, // CLI used to send this, API doesn't strictly need it if model def is complete
                // ... other fields like created_at, updated_at, user_id, organization_id ...
            };
            // Add db_dataset to a list for bulk upsert

            // --- Map SemanticModel.dimensions and SemanticModel.measures to DatasetColumn --- 
            let mut dataset_columns_to_upsert = Vec::new();
            for dim in &semantic_model.dimensions {
                dataset_columns_to_upsert.push(crate::database::models::DatasetColumn {
                    id: Uuid::new_v4(),
                    dataset_id: dataset_id, // Link to above dataset
                    name: dim.name.clone(),
                    type_: dim.type_.clone().unwrap_or_else(|| "UNKNOWN".to_string()), // Type inference later if UNKNOWN
                    description: dim.description.clone(),
                    semantic_type: Some("dimension".to_string()),
                    dim_type: dim.type_.clone(), // Or specific mapping
                    // expr: dim.expr.clone(), // If Dimension has expr and DatasetColumn supports it
                    // ... other fields ...
                });
            }
            for measure in &semantic_model.measures {
                 dataset_columns_to_upsert.push(crate::database::models::DatasetColumn {
                    id: Uuid::new_v4(),
                    dataset_id: dataset_id,
                    name: measure.name.clone(),
                    type_: measure.type_.clone().unwrap_or_else(|| "UNKNOWN".to_string()),
                    description: measure.description.clone(),
                    semantic_type: Some("measure".to_string()),
                    // agg: measure.agg.clone(), // If Measure has agg and DatasetColumn supports it
                    // expr: measure.expr.clone(), // If Measure has expr and DatasetColumn supports it
                    // ... other fields ...
                });
            }
            // Add dataset_columns_to_upsert to a list for bulk upsert, associated with dataset_id

            // --- Placeholder for Relationships, Metrics, Filters --- 
            // semantic_model.relationships, semantic_model.metrics, semantic_model.filters
            // will be handled in prd_api_model_persistence.md

            // After DB operations (mocked for now or simplified upsert):
            // validation_result.success = true; (if all good)
            results.push(validation_result);
        }
        // Perform bulk upserts for datasets and columns for this data_source group here
    }

    Ok(results)
}

Self-correction: The sql_definition was part of the old DeployDatasetsRequest. The semantic_layer::Model doesn't have a direct sql_definition field. If the underlying SQL for a model/view is still needed at this stage, we need to clarify where it comes from. If the semantic_layer::Model is purely semantic, the API might not need the full SQL, or it might be embedded in a specific field within semantic_layer::Model (e.g., a model_source: Option<String> field or similar). For now, Dataset.definition is set to empty. The Model.model field in the old CLI structure was sometimes used for this. We should clarify if a field like semantic_layer::Model.source_query: Option<String> is needed.

2. ValidationResult and ValidationError:

  • These structs in deploy_datasets.rs will likely remain but will be populated based on validating SemanticModel objects.
  • ValidationResult references model_name, data_source_name, schema. These are available from SemanticModel.

Implementation Steps

  1. Modify the deploy_datasets Axum handler to accept Json(Vec<semantic_layer::Model>).
  2. Update handle_deploy_datasets and deploy_datasets_handler to take Vec<semantic_layer::Model> as input.
  3. Refactor the grouping logic in deploy_datasets_handler: a. Models should be grouped by their data_source_name (present on semantic_layer::Model). b. Determine how env is handled for DataSource lookup (e.g., use a default like "dev").
  4. Inside the loop for each SemanticModel: a. Adapt the creation of ValidationResult using fields from SemanticModel. b. Map SemanticModel fields (name, description, schema, database) to crate::database::models::Dataset. - Clarify the source for Dataset.definition (SQL query). c. Map SemanticModel.dimensions to crate::database::models::DatasetColumn (setting semantic_type to "dimension"). d. Map SemanticModel.measures to crate::database::models::DatasetColumn (setting semantic_type to "measure"). e. Ensure dataset_id linkage is correct.
  5. Adapt the existing bulk upsert logic for Dataset and DatasetColumn to use the newly mapped objects.
  6. Ensure soft-delete logic for columns not present in the new request still functions correctly based on the incoming columns for a dataset.
  7. Temporarily bypass or add placeholders for processing relationships, metrics, and filters from SemanticModel (to be fully addressed in prd_api_model_persistence.md).

Tests

  • Unit Tests for deploy_datasets_handler:
    • Mock database interactions.
    • Test with a valid Vec<SemanticModel>: ensure Dataset and DatasetColumn objects are correctly formed.
    • Test with models having missing data_source_name or schema (should result in validation errors if CLI didn't catch it, or be handled gracefully if API expects them to be resolved).
    • Test data source lookup failure.
  • Integration Tests (CLI calling the actual, modified API endpoint):
    • Full flow: CLI sends Vec<SemanticModel>, API receives and processes it, basic data (datasets, columns) is stored.
    • Test with multiple models targeting the same and different data sources.

Success Criteria

  • API endpoint /deploy_datasets successfully accepts Json(Vec<semantic_layer::Model>).
  • Core logic in deploy_datasets_handler correctly processes semantic_layer::Model objects and maps them to Dataset and DatasetColumn database models.
  • Existing data source lookup, permission checks, and basic upsert operations for datasets/columns function with the new input structure.
  • The env for data source lookup is handled correctly.
  • Unit and integration tests pass.

Dependencies on Other Components

  • prd_semantic_model_definition.md: The API must use the exact semantic_layer::Model struct defined there.
  • prd_cli_deployment_logic.md: The CLI must send data in the format this API endpoint now expects.

Security Considerations

  • All incoming fields from semantic_layer::Model must be treated as untrusted input and validated/sanitized before database interaction, especially string fields used in queries or stored directly.
  • Permissions (is_user_workspace_admin_or_data_admin) must continue to be robustly checked.

References

  • api/server/src/routes/rest/routes/datasets/deploy_datasets.rs (current implementation)
  • api/libs/semantic_layer/src/models.rs (new request DTO)
  • Axum documentation for JSON extraction.

</rewritten_file>