17 KiB
title | author | date | status | parent_prd | ticket |
---|---|---|---|---|---|
API Request Handling for Model Deployment | Gemini Assistant | 2024-07-26 | Draft | semantic_layer_refactor_overview.md | N/A |
API Request Handling for Model Deployment
Parent Project
This is a sub-PRD of the Semantic Layer and Deployment Refactor project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.
Problem Statement
The current /deploy_datasets
API endpoint in api/server/src/routes/rest/routes/datasets/deploy_datasets.rs
expects a request body (Vec<DeployDatasetsRequest>
) that is structured differently from the new unified semantic_layer::Model
. To align with the new approach, this endpoint needs to be updated to accept a payload consisting of Vec<semantic_layer::Model>
.
Current behavior:
- The API endpoint
/deploy_datasets
takesJson(Vec<DeployDatasetsRequest>)
. DeployDatasetsRequest
has fields likedata_source_name
,env
,type_
,name
,model
(SQL model name),schema
,database
,description
,sql_definition
,entity_relationships
,columns
,yml_file
.- This structure requires the CLI to transform its parsed YAML (currently CLI-specific structs) into this
DeployDatasetsRequest
format.
Expected behavior:
- The API endpoint
/deploy_datasets
will be updated to acceptJson(Vec<semantic_layer::Model>)
(or a new wrapper struct if necessary, e.g.,BatchDeploySemanticModelsRequest { models: Vec<semantic_layer::Model>, global_env: Option<String> }
, but directly usingVec<semantic_layer::Model>
is preferred ifenv
can be handled or is implicit). - The
semantic_layer::Model
(defined inapi/libs/semantic_layer/src/models.rs
) will be the primary data structure received. - The existing logic within
handle_deploy_datasets
anddeploy_datasets_handler
will need to be refactored to work with these new input structs instead of the oldDeployDatasetsRequest
. - Information like
data_source_name
,schema
, anddatabase
will now primarily come from the fields within eachsemantic_layer::Model
instance (which were resolved by the CLI). - The
env
field, previously onDeployDatasetsRequest
, needs consideration. If it's a global setting for the batch, it might need to be passed differently or inferred. For now, assumeenv
might be associated with theDataSource
in the database and resolved there, or becomes part of thesemantic_layer::Model
if it can vary per model in a batch.- Decision: The
env
is typically tied to aDataSource
entry in the DB. The API should look up theDataSource
usingname
(frommodel.data_source_name
) and anenv
(e.g., hardcoded to "dev" or configurable). For simplicity, we can assume a defaultenv
like "dev" when looking up the data source if not provided explicitly with the model. Thedata_source_name
on thesemantic_layer::Model
will be the key.
- Decision: The
Goals
- Update the signature of the
deploy_datasets
Axum handler function to acceptJson(Vec<semantic_layer::Model>)
or an equivalent new request struct. - Refactor the internal logic of
handle_deploy_datasets
anddeploy_datasets_handler
to processsemantic_layer::Model
objects. - Map fields from
semantic_layer::Model
(and its nested structs likeDimension
,Measure
,Relationship
) to the corresponding database entities (Dataset
,DatasetColumn
). - Ensure existing functionalities like data source lookup, organization ID handling, and user permissions checks are maintained.
- Decide on and implement handling for the
env
parameter.
Non-Goals
- Implementing the type inference logic (covered in
prd_api_type_inference.md
). - Implementing the detailed persistence logic for all new semantic model parts like metrics and filters (covered in
prd_api_model_persistence.md
). This PRD focuses on adapting to the new request shape for existing core entities (datasets, columns). - Changing the response structure (
DeployDatasetsResponse
) significantly, though the source of its data will change.
Implementation Plan
Phase 1: Adapt API Endpoint and Core Logic
Technical Design
1. Update Request Struct and Handler Signature:
- The main entry point
deploy_datasets
indeploy_datasets.rs
will change itsJson
extractor.
// In api/server/src/routes/rest/routes/datasets/deploy_datasets.rs
// ... other imports ...
use semantic_layer::models::Model as SemanticModel; // Alias for clarity
// Current request structs (DeployDatasetsRequest, DeployDatasetsColumnsRequest, etc.) will be REMOVED or deprecated.
// Updated Axum handler function signature
pub async fn deploy_datasets(
Extension(user): Extension<AuthenticatedUser>,
Json(requests): Json<Vec<SemanticModel>>, // <<<<< CHANGED HERE
) -> Result<ApiResponse<DeployDatasetsResponse>, (StatusCode, String)> {
// ... organization_id and permission checks remain similar ...
// Call handler function, passing Vec<SemanticModel>
match handle_deploy_datasets(&user.id, requests).await { // <<<<< CHANGED HERE
Ok(result) => Ok(ApiResponse::JsonData(result)),
Err(e) => {
tracing::error!("Error in deploy_datasets: {:?}", e);
Err((StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))
}
}
}
async fn handle_deploy_datasets(
user_id: &Uuid,
models: Vec<SemanticModel>, // <<<<< CHANGED HERE
) -> Result<DeployDatasetsResponse> {
// The logic to produce DeployDatasetsResponse might need adjustment
// based on how ValidationResult is now generated.
let results = deploy_datasets_handler(user_id, models, false).await?;
// ... existing summary logic ...
// Ok(DeployDatasetsResponse { results, summary })
}
// This is the core function to refactor
async fn deploy_datasets_handler(
user_id: &Uuid,
models: Vec<SemanticModel>, // <<<<< CHANGED HERE
_is_simple: bool, // This parameter might be obsolete
) -> Result<Vec<ValidationResult>> { // ValidationResult might need to refer to SemanticModel fields
let organization_id = get_user_organization_id(user_id).await?;
let mut conn = get_pg_pool().get().await?;
let mut results: Vec<ValidationResult> = Vec::new();
// Grouping by (data_source_name, env, database) might change slightly.
// data_source_name and database now come from SemanticModel.
// `env` needs to be resolved (e.g., assume "dev" for data source lookup).
let default_env = "dev".to_string();
// Temporary map for models by their original data_source_name, as SemanticModel has it as Option<String>
// but CLI should have resolved it.
let mut models_by_resolved_ds: HashMap<String, Vec<SemanticModel>> = HashMap::new();
for model in models {
if let Some(ds_name) = &model.data_source_name {
models_by_resolved_ds.entry(ds_name.clone()).or_default().push(model);
} else {
// This should ideally be caught by CLI validation
let mut val_res = ValidationResult::new(model.name.clone(), "UNKNOWN_DS".to_string(), model.schema.clone().unwrap_or_default());
val_res.add_error(ValidationError::internal_error("DataSourceName missing on model".to_string()));
results.push(val_res);
continue;
}
}
for (data_source_name, model_group) in models_by_resolved_ds {
// Fetch DataSource using data_source_name and default_env
let data_source = match data_sources::table
.filter(data_sources::name.eq(&data_source_name))
.filter(data_sources::env.eq(&default_env))
.filter(data_sources::organization_id.eq(&organization_id))
// ... other filters ...
.first::<DataSource>(&mut conn).await {
Ok(ds) => ds,
Err(_) => { /* ... handle error, push to results ... */ continue; }
};
for semantic_model in model_group { // Now iterating over SemanticModel
// Create a ValidationResult instance
let mut validation_result = ValidationResult::new(
semantic_model.name.clone(),
data_source_name.clone(), // Use the resolved one
semantic_model.schema.as_ref().cloned().unwrap_or_default(),
);
// --- Map SemanticModel to Dataset ---
let dataset_id = Uuid::new_v4(); // Or fetch existing by semantic_model.name and data_source.id
let now = Utc::now();
let db_dataset = crate::database::models::Dataset {
id: dataset_id,
name: semantic_model.name.clone(),
data_source_id: data_source.id,
database_name: semantic_model.name.clone(), // Assuming model name is table/view name
when_to_use: semantic_model.description.clone(),
type_: DatasetType::View, // Default, or determine from SemanticModel if it has such a field
// definition: semantic_model.sql_definition.clone(), // semantic_model doesn't have sql_definition directly. Where does this come from?
// Perhaps from model.model field if that's a convention for SQL block?
// For now, leave it empty or decide source.
definition: String::new(), // Placeholder
schema: semantic_model.schema.as_ref().cloned().unwrap_or_else(|| {
validation_result.add_error(ValidationError::internal_error("Schema missing".to_string()));
String::new() // Default, though error is added
}),
database_identifier: semantic_model.database.clone(), // This is Option<String>
yml_file: None, // CLI used to send this, API doesn't strictly need it if model def is complete
// ... other fields like created_at, updated_at, user_id, organization_id ...
};
// Add db_dataset to a list for bulk upsert
// --- Map SemanticModel.dimensions and SemanticModel.measures to DatasetColumn ---
let mut dataset_columns_to_upsert = Vec::new();
for dim in &semantic_model.dimensions {
dataset_columns_to_upsert.push(crate::database::models::DatasetColumn {
id: Uuid::new_v4(),
dataset_id: dataset_id, // Link to above dataset
name: dim.name.clone(),
type_: dim.type_.clone().unwrap_or_else(|| "UNKNOWN".to_string()), // Type inference later if UNKNOWN
description: dim.description.clone(),
semantic_type: Some("dimension".to_string()),
dim_type: dim.type_.clone(), // Or specific mapping
// expr: dim.expr.clone(), // If Dimension has expr and DatasetColumn supports it
// ... other fields ...
});
}
for measure in &semantic_model.measures {
dataset_columns_to_upsert.push(crate::database::models::DatasetColumn {
id: Uuid::new_v4(),
dataset_id: dataset_id,
name: measure.name.clone(),
type_: measure.type_.clone().unwrap_or_else(|| "UNKNOWN".to_string()),
description: measure.description.clone(),
semantic_type: Some("measure".to_string()),
// agg: measure.agg.clone(), // If Measure has agg and DatasetColumn supports it
// expr: measure.expr.clone(), // If Measure has expr and DatasetColumn supports it
// ... other fields ...
});
}
// Add dataset_columns_to_upsert to a list for bulk upsert, associated with dataset_id
// --- Placeholder for Relationships, Metrics, Filters ---
// semantic_model.relationships, semantic_model.metrics, semantic_model.filters
// will be handled in prd_api_model_persistence.md
// After DB operations (mocked for now or simplified upsert):
// validation_result.success = true; (if all good)
results.push(validation_result);
}
// Perform bulk upserts for datasets and columns for this data_source group here
}
Ok(results)
}
Self-correction: The sql_definition
was part of the old DeployDatasetsRequest
. The semantic_layer::Model
doesn't have a direct sql_definition
field. If the underlying SQL for a model/view is still needed at this stage, we need to clarify where it comes from. If the semantic_layer::Model
is purely semantic, the API might not need the full SQL, or it might be embedded in a specific field within semantic_layer::Model
(e.g., a model_source: Option<String>
field or similar). For now, Dataset.definition
is set to empty. The Model.model
field in the old CLI structure was sometimes used for this. We should clarify if a field like semantic_layer::Model.source_query: Option<String>
is needed.
2. ValidationResult
and ValidationError
:
- These structs in
deploy_datasets.rs
will likely remain but will be populated based on validatingSemanticModel
objects. ValidationResult
referencesmodel_name
,data_source_name
,schema
. These are available fromSemanticModel
.
Implementation Steps
- Modify the
deploy_datasets
Axum handler to acceptJson(Vec<semantic_layer::Model>)
. - Update
handle_deploy_datasets
anddeploy_datasets_handler
to takeVec<semantic_layer::Model>
as input. - Refactor the grouping logic in
deploy_datasets_handler
: a. Models should be grouped by theirdata_source_name
(present onsemantic_layer::Model
). b. Determine howenv
is handled forDataSource
lookup (e.g., use a default like "dev"). - Inside the loop for each
SemanticModel
: a. Adapt the creation ofValidationResult
using fields fromSemanticModel
. b. MapSemanticModel
fields (name, description, schema, database) tocrate::database::models::Dataset
. - Clarify the source forDataset.definition
(SQL query). c. MapSemanticModel.dimensions
tocrate::database::models::DatasetColumn
(settingsemantic_type
to "dimension"). d. MapSemanticModel.measures
tocrate::database::models::DatasetColumn
(settingsemantic_type
to "measure"). e. Ensuredataset_id
linkage is correct. - Adapt the existing bulk upsert logic for
Dataset
andDatasetColumn
to use the newly mapped objects. - Ensure soft-delete logic for columns not present in the new request still functions correctly based on the incoming columns for a dataset.
- Temporarily bypass or add placeholders for processing
relationships
,metrics
, andfilters
fromSemanticModel
(to be fully addressed inprd_api_model_persistence.md
).
Tests
- Unit Tests for
deploy_datasets_handler
:- Mock database interactions.
- Test with a valid
Vec<SemanticModel>
: ensureDataset
andDatasetColumn
objects are correctly formed. - Test with models having missing
data_source_name
orschema
(should result in validation errors if CLI didn't catch it, or be handled gracefully if API expects them to be resolved). - Test data source lookup failure.
- Integration Tests (CLI calling the actual, modified API endpoint):
- Full flow: CLI sends
Vec<SemanticModel>
, API receives and processes it, basic data (datasets, columns) is stored. - Test with multiple models targeting the same and different data sources.
- Full flow: CLI sends
Success Criteria
- API endpoint
/deploy_datasets
successfully acceptsJson(Vec<semantic_layer::Model>)
. - Core logic in
deploy_datasets_handler
correctly processessemantic_layer::Model
objects and maps them toDataset
andDatasetColumn
database models. - Existing data source lookup, permission checks, and basic upsert operations for datasets/columns function with the new input structure.
- The
env
for data source lookup is handled correctly. - Unit and integration tests pass.
Dependencies on Other Components
prd_semantic_model_definition.md
: The API must use the exactsemantic_layer::Model
struct defined there.prd_cli_deployment_logic.md
: The CLI must send data in the format this API endpoint now expects.
Security Considerations
- All incoming fields from
semantic_layer::Model
must be treated as untrusted input and validated/sanitized before database interaction, especially string fields used in queries or stored directly. - Permissions (
is_user_workspace_admin_or_data_admin
) must continue to be robustly checked.
References
api/server/src/routes/rest/routes/datasets/deploy_datasets.rs
(current implementation)api/libs/semantic_layer/src/models.rs
(new request DTO)- Axum documentation for JSON extraction.
</rewritten_file>