buster/api/prds/prd_api_request_handling.md

266 lines
17 KiB
Markdown

---
title: API Request Handling for Model Deployment
author: Gemini Assistant
date: 2024-07-26
status: Draft
parent_prd: semantic_layer_refactor_overview.md
ticket: N/A
---
# API Request Handling for Model Deployment
## Parent Project
This is a sub-PRD of the [Semantic Layer and Deployment Refactor](semantic_layer_refactor_overview.md) project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.
## Problem Statement
The current `/deploy_datasets` API endpoint in `api/server/src/routes/rest/routes/datasets/deploy_datasets.rs` expects a request body (`Vec<DeployDatasetsRequest>`) that is structured differently from the new unified `semantic_layer::Model`. To align with the new approach, this endpoint needs to be updated to accept a payload consisting of `Vec<semantic_layer::Model>`.
Current behavior:
- The API endpoint `/deploy_datasets` takes `Json(Vec<DeployDatasetsRequest>)`.
- `DeployDatasetsRequest` has fields like `data_source_name`, `env`, `type_`, `name`, `model` (SQL model name), `schema`, `database`, `description`, `sql_definition`, `entity_relationships`, `columns`, `yml_file`.
- This structure requires the CLI to transform its parsed YAML (currently CLI-specific structs) into this `DeployDatasetsRequest` format.
Expected behavior:
- The API endpoint `/deploy_datasets` will be updated to accept `Json(Vec<semantic_layer::Model>)` (or a new wrapper struct if necessary, e.g., `BatchDeploySemanticModelsRequest { models: Vec<semantic_layer::Model>, global_env: Option<String> }`, but directly using `Vec<semantic_layer::Model>` is preferred if `env` can be handled or is implicit).
- The `semantic_layer::Model` (defined in `api/libs/semantic_layer/src/models.rs`) will be the primary data structure received.
- The existing logic within `handle_deploy_datasets` and `deploy_datasets_handler` will need to be refactored to work with these new input structs instead of the old `DeployDatasetsRequest`.
- Information like `data_source_name`, `schema`, and `database` will now primarily come from the fields within each `semantic_layer::Model` instance (which were resolved by the CLI).
- The `env` field, previously on `DeployDatasetsRequest`, needs consideration. If it's a global setting for the batch, it might need to be passed differently or inferred. For now, assume `env` might be associated with the `DataSource` in the database and resolved there, or becomes part of the `semantic_layer::Model` if it can vary per model in a batch.
- *Decision: The `env` is typically tied to a `DataSource` entry in the DB. The API should look up the `DataSource` using `name` (from `model.data_source_name`) and an `env` (e.g., hardcoded to "dev" or configurable). For simplicity, we can assume a default `env` like "dev" when looking up the data source if not provided explicitly with the model.* The `data_source_name` on the `semantic_layer::Model` will be the key.
## Goals
1. Update the signature of the `deploy_datasets` Axum handler function to accept `Json(Vec<semantic_layer::Model>)` or an equivalent new request struct.
2. Refactor the internal logic of `handle_deploy_datasets` and `deploy_datasets_handler` to process `semantic_layer::Model` objects.
3. Map fields from `semantic_layer::Model` (and its nested structs like `Dimension`, `Measure`, `Relationship`) to the corresponding database entities (`Dataset`, `DatasetColumn`).
4. Ensure existing functionalities like data source lookup, organization ID handling, and user permissions checks are maintained.
5. Decide on and implement handling for the `env` parameter.
## Non-Goals
1. Implementing the type inference logic (covered in `prd_api_type_inference.md`).
2. Implementing the detailed persistence logic for all new semantic model parts like metrics and filters (covered in `prd_api_model_persistence.md`). This PRD focuses on adapting to the new request shape for existing core entities (datasets, columns).
3. Changing the response structure (`DeployDatasetsResponse`) significantly, though the source of its data will change.
## Implementation Plan
### Phase 1: Adapt API Endpoint and Core Logic
#### Technical Design
**1. Update Request Struct and Handler Signature:**
- The main entry point `deploy_datasets` in `deploy_datasets.rs` will change its `Json` extractor.
```rust
// In api/server/src/routes/rest/routes/datasets/deploy_datasets.rs
// ... other imports ...
use semantic_layer::models::Model as SemanticModel; // Alias for clarity
// Current request structs (DeployDatasetsRequest, DeployDatasetsColumnsRequest, etc.) will be REMOVED or deprecated.
// Updated Axum handler function signature
pub async fn deploy_datasets(
Extension(user): Extension<AuthenticatedUser>,
Json(requests): Json<Vec<SemanticModel>>, // <<<<< CHANGED HERE
) -> Result<ApiResponse<DeployDatasetsResponse>, (StatusCode, String)> {
// ... organization_id and permission checks remain similar ...
// Call handler function, passing Vec<SemanticModel>
match handle_deploy_datasets(&user.id, requests).await { // <<<<< CHANGED HERE
Ok(result) => Ok(ApiResponse::JsonData(result)),
Err(e) => {
tracing::error!("Error in deploy_datasets: {:?}", e);
Err((StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))
}
}
}
async fn handle_deploy_datasets(
user_id: &Uuid,
models: Vec<SemanticModel>, // <<<<< CHANGED HERE
) -> Result<DeployDatasetsResponse> {
// The logic to produce DeployDatasetsResponse might need adjustment
// based on how ValidationResult is now generated.
let results = deploy_datasets_handler(user_id, models, false).await?;
// ... existing summary logic ...
// Ok(DeployDatasetsResponse { results, summary })
}
// This is the core function to refactor
async fn deploy_datasets_handler(
user_id: &Uuid,
models: Vec<SemanticModel>, // <<<<< CHANGED HERE
_is_simple: bool, // This parameter might be obsolete
) -> Result<Vec<ValidationResult>> { // ValidationResult might need to refer to SemanticModel fields
let organization_id = get_user_organization_id(user_id).await?;
let mut conn = get_pg_pool().get().await?;
let mut results: Vec<ValidationResult> = Vec::new();
// Grouping by (data_source_name, env, database) might change slightly.
// data_source_name and database now come from SemanticModel.
// `env` needs to be resolved (e.g., assume "dev" for data source lookup).
let default_env = "dev".to_string();
// Temporary map for models by their original data_source_name, as SemanticModel has it as Option<String>
// but CLI should have resolved it.
let mut models_by_resolved_ds: HashMap<String, Vec<SemanticModel>> = HashMap::new();
for model in models {
if let Some(ds_name) = &model.data_source_name {
models_by_resolved_ds.entry(ds_name.clone()).or_default().push(model);
} else {
// This should ideally be caught by CLI validation
let mut val_res = ValidationResult::new(model.name.clone(), "UNKNOWN_DS".to_string(), model.schema.clone().unwrap_or_default());
val_res.add_error(ValidationError::internal_error("DataSourceName missing on model".to_string()));
results.push(val_res);
continue;
}
}
for (data_source_name, model_group) in models_by_resolved_ds {
// Fetch DataSource using data_source_name and default_env
let data_source = match data_sources::table
.filter(data_sources::name.eq(&data_source_name))
.filter(data_sources::env.eq(&default_env))
.filter(data_sources::organization_id.eq(&organization_id))
// ... other filters ...
.first::<DataSource>(&mut conn).await {
Ok(ds) => ds,
Err(_) => { /* ... handle error, push to results ... */ continue; }
};
for semantic_model in model_group { // Now iterating over SemanticModel
// Create a ValidationResult instance
let mut validation_result = ValidationResult::new(
semantic_model.name.clone(),
data_source_name.clone(), // Use the resolved one
semantic_model.schema.as_ref().cloned().unwrap_or_default(),
);
// --- Map SemanticModel to Dataset ---
let dataset_id = Uuid::new_v4(); // Or fetch existing by semantic_model.name and data_source.id
let now = Utc::now();
let db_dataset = crate::database::models::Dataset {
id: dataset_id,
name: semantic_model.name.clone(),
data_source_id: data_source.id,
database_name: semantic_model.name.clone(), // Assuming model name is table/view name
when_to_use: semantic_model.description.clone(),
type_: DatasetType::View, // Default, or determine from SemanticModel if it has such a field
// definition: semantic_model.sql_definition.clone(), // semantic_model doesn't have sql_definition directly. Where does this come from?
// Perhaps from model.model field if that's a convention for SQL block?
// For now, leave it empty or decide source.
definition: String::new(), // Placeholder
schema: semantic_model.schema.as_ref().cloned().unwrap_or_else(|| {
validation_result.add_error(ValidationError::internal_error("Schema missing".to_string()));
String::new() // Default, though error is added
}),
database_identifier: semantic_model.database.clone(), // This is Option<String>
yml_file: None, // CLI used to send this, API doesn't strictly need it if model def is complete
// ... other fields like created_at, updated_at, user_id, organization_id ...
};
// Add db_dataset to a list for bulk upsert
// --- Map SemanticModel.dimensions and SemanticModel.measures to DatasetColumn ---
let mut dataset_columns_to_upsert = Vec::new();
for dim in &semantic_model.dimensions {
dataset_columns_to_upsert.push(crate::database::models::DatasetColumn {
id: Uuid::new_v4(),
dataset_id: dataset_id, // Link to above dataset
name: dim.name.clone(),
type_: dim.type_.clone().unwrap_or_else(|| "UNKNOWN".to_string()), // Type inference later if UNKNOWN
description: dim.description.clone(),
semantic_type: Some("dimension".to_string()),
dim_type: dim.type_.clone(), // Or specific mapping
// expr: dim.expr.clone(), // If Dimension has expr and DatasetColumn supports it
// ... other fields ...
});
}
for measure in &semantic_model.measures {
dataset_columns_to_upsert.push(crate::database::models::DatasetColumn {
id: Uuid::new_v4(),
dataset_id: dataset_id,
name: measure.name.clone(),
type_: measure.type_.clone().unwrap_or_else(|| "UNKNOWN".to_string()),
description: measure.description.clone(),
semantic_type: Some("measure".to_string()),
// agg: measure.agg.clone(), // If Measure has agg and DatasetColumn supports it
// expr: measure.expr.clone(), // If Measure has expr and DatasetColumn supports it
// ... other fields ...
});
}
// Add dataset_columns_to_upsert to a list for bulk upsert, associated with dataset_id
// --- Placeholder for Relationships, Metrics, Filters ---
// semantic_model.relationships, semantic_model.metrics, semantic_model.filters
// will be handled in prd_api_model_persistence.md
// After DB operations (mocked for now or simplified upsert):
// validation_result.success = true; (if all good)
results.push(validation_result);
}
// Perform bulk upserts for datasets and columns for this data_source group here
}
Ok(results)
}
```
*Self-correction: The `sql_definition` was part of the old `DeployDatasetsRequest`. The `semantic_layer::Model` doesn't have a direct `sql_definition` field. If the underlying SQL for a model/view is still needed at this stage, we need to clarify where it comes from. If the `semantic_layer::Model` is purely semantic, the API might not need the full SQL, or it might be embedded in a specific field within `semantic_layer::Model` (e.g., a `model_source: Option<String>` field or similar). For now, `Dataset.definition` is set to empty.* The `Model.model` field in the old CLI structure was sometimes used for this. We should clarify if a field like `semantic_layer::Model.source_query: Option<String>` is needed.
**2. `ValidationResult` and `ValidationError`:**
- These structs in `deploy_datasets.rs` will likely remain but will be populated based on validating `SemanticModel` objects.
- `ValidationResult` references `model_name`, `data_source_name`, `schema`. These are available from `SemanticModel`.
#### Implementation Steps
1. [ ] Modify the `deploy_datasets` Axum handler to accept `Json(Vec<semantic_layer::Model>)`.
2. Update `handle_deploy_datasets` and `deploy_datasets_handler` to take `Vec<semantic_layer::Model>` as input.
3. Refactor the grouping logic in `deploy_datasets_handler`:
a. Models should be grouped by their `data_source_name` (present on `semantic_layer::Model`).
b. Determine how `env` is handled for `DataSource` lookup (e.g., use a default like "dev").
4. Inside the loop for each `SemanticModel`:
a. Adapt the creation of `ValidationResult` using fields from `SemanticModel`.
b. Map `SemanticModel` fields (name, description, schema, database) to `crate::database::models::Dataset`.
- Clarify the source for `Dataset.definition` (SQL query).
c. Map `SemanticModel.dimensions` to `crate::database::models::DatasetColumn` (setting `semantic_type` to "dimension").
d. Map `SemanticModel.measures` to `crate::database::models::DatasetColumn` (setting `semantic_type` to "measure").
e. Ensure `dataset_id` linkage is correct.
5. Adapt the existing bulk upsert logic for `Dataset` and `DatasetColumn` to use the newly mapped objects.
6. Ensure soft-delete logic for columns not present in the new request still functions correctly based on the incoming columns for a dataset.
7. Temporarily bypass or add placeholders for processing `relationships`, `metrics`, and `filters` from `SemanticModel` (to be fully addressed in `prd_api_model_persistence.md`).
#### Tests
- **Unit Tests for `deploy_datasets_handler`:**
- Mock database interactions.
- Test with a valid `Vec<SemanticModel>`: ensure `Dataset` and `DatasetColumn` objects are correctly formed.
- Test with models having missing `data_source_name` or `schema` (should result in validation errors if CLI didn't catch it, or be handled gracefully if API expects them to be resolved).
- Test data source lookup failure.
- **Integration Tests (CLI calling the actual, modified API endpoint):**
- Full flow: CLI sends `Vec<SemanticModel>`, API receives and processes it, basic data (datasets, columns) is stored.
- Test with multiple models targeting the same and different data sources.
#### Success Criteria
- [ ] API endpoint `/deploy_datasets` successfully accepts `Json(Vec<semantic_layer::Model>)`.
- [ ] Core logic in `deploy_datasets_handler` correctly processes `semantic_layer::Model` objects and maps them to `Dataset` and `DatasetColumn` database models.
- [ ] Existing data source lookup, permission checks, and basic upsert operations for datasets/columns function with the new input structure.
- [ ] The `env` for data source lookup is handled correctly.
- [ ] Unit and integration tests pass.
## Dependencies on Other Components
- **`prd_semantic_model_definition.md`**: The API must use the exact `semantic_layer::Model` struct defined there.
- **`prd_cli_deployment_logic.md`**: The CLI must send data in the format this API endpoint now expects.
## Security Considerations
- All incoming fields from `semantic_layer::Model` must be treated as untrusted input and validated/sanitized before database interaction, especially string fields used in queries or stored directly.
- Permissions (`is_user_workspace_admin_or_data_admin`) must continue to be robustly checked.
## References
- `api/server/src/routes/rest/routes/datasets/deploy_datasets.rs` (current implementation)
- `api/libs/semantic_layer/src/models.rs` (new request DTO)
- Axum documentation for JSON extraction.
</rewritten_file>