12 KiB
title | author | date | status | parent_prd | ticket |
---|---|---|---|---|---|
CLI Configuration and Model Discovery | Gemini Assistant | 2024-07-26 | Draft | semantic_layer_refactor_overview.md | N/A |
CLI Configuration and Model Discovery
Parent Project
This is a sub-PRD of the Semantic Layer and Deployment Refactor project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.
Problem Statement
The current CLI (deploy
command) has a limited way of discovering model files and managing configurations. The buster.yml
is typically expected in the current directory or a specified path, and its structure is flat. This doesn't cater well to monorepos or projects where models might be organized into sub-directories, each potentially with slightly different default configurations (like schema or database).
Behavior of deploy
command pathing:
- The
deploy
command, when invoked, should look forbuster.yml
only in the effective target directory (either the one provided as an argument or the current working directory if no argument is given). - It should not search in parent directories for
buster.yml
. - Once
buster.yml
is found (or defaults are used if it's not found), model file discovery (e.g.,*.yml
) will proceed based on the paths specified inbuster.yml
(newprojects
field) or the directory containingbuster.yml
itself (for backward compatibility or simple cases). Model search should be recursive within these specified model paths.
Current behavior:
BusterConfig
incli/cli/src/utils/config.rs
is flat, primarily supporting globaldata_source_name
,schema
,database
,exclude_tags
, andexclude_files
.- Model discovery is generally from the current path or a specified path, without a structured way to define multiple distinct "project" sources of models within one
buster.yml
. - The
deploy
command might have ambiguous behavior regardingbuster.yml
lookup upwards in the directory tree.
Expected behavior:
- The
deploy
command will strictly look forbuster.yml
in the target directory (current or specified) and not traverse upwards. BusterConfig
will be extended to include an optionalprojects: Vec<ProjectConfig>
field.- Each
ProjectConfig
will define apath
(relative to thebuster.yml
location) where its models are located, and can optionally specify its owndata_source_name
,schema
, anddatabase
that override the global settings inbuster.yml
for models within that project path. - The CLI's model discovery logic will iterate through these
projects
(if defined) or use thebuster.yml
directory /model_paths
(if defined andprojects
is not) to find model files (.yml
excludingbuster.yml
). - Configuration (database, schema, data_source_name) for a deployed model will be resolved with the following precedence: Model File -> ProjectConfig -> Global BusterConfig.
Goals
- Modify the
deploy
command to search forbuster.yml
only in the current/specified directory and its subdirectories (when looking for models), not parent directories. - Extend
BusterConfig
incli/cli/src/utils/config.rs
to includeprojects: Option<Vec<ProjectConfig>>
. - Define a
ProjectConfig
struct withpath: String
and optionaldata_source_name
,schema
,database
fields. - Update the model file discovery logic in
cli/cli/src/commands/deploy.rs
to: a. Honor theprojects
structure inbuster.yml
if present. b. Ifprojects
is not present, fall back to existingmodel_paths
logic or searching the directory ofbuster.yml
(or current/specified path if nobuster.yml
). c. Recursively search for.yml
model files within the determined project/model paths. - Implement the configuration inheritance logic (Model File > ProjectConfig > Global BusterConfig) when preparing models for deployment.
Non-Goals
- Changing the YAML parsing for individual model files (covered in
prd_semantic_model_definition.md
andprd_cli_deployment_logic.md
). - Altering the
exclude_tags
orexclude_files
functionality at the global level (though they would apply to all discovered models).
Implementation Plan
Phase 1: Update BusterConfig and Discovery Logic
Technical Design
1. BusterConfig
and ProjectConfig
structs (cli/cli/src/utils/config.rs
):
// In cli/cli/src/utils/config.rs (or equivalent)
use serde::Deserialize;
use std::path::{Path, PathBuf};
#[derive(Debug, Deserialize, Clone, Default)] // Added Default
pub struct BusterConfig {
pub data_source_name: Option<String>,
pub schema: Option<String>,
pub database: Option<String>,
pub exclude_tags: Option<Vec<String>>,
pub exclude_files: Option<Vec<PathBuf>>, // Assuming PathBuf is more appropriate here
pub model_paths: Option<Vec<String>>, // Existing field for model paths
pub projects: Option<Vec<ProjectConfig>>, // New field for projects
}
#[derive(Debug, Deserialize, Clone)]
pub struct ProjectConfig {
pub name: String, // A name for the project (optional, for logging/identification)
pub path: String, // Path relative to buster.yml location
pub data_source_name: Option<String>,
pub schema: Option<String>,
pub database: Option<String>,
}
impl BusterConfig {
// Helper function to load BusterConfig from a directory.
// This should only look for buster.yml in the specified `dir`.
pub fn load_from_dir(dir: &Path) -> Result<Option<Self>, anyhow::Error> {
let config_path = dir.join("buster.yml");
if config_path.exists() {
let content = std::fs::read_to_string(config_path)?;
let config: BusterConfig = serde_yaml::from_str(&content)?;
Ok(Some(config))
} else {
Ok(None)
}
}
// Method to resolve effective model search paths
// Returns a list of (PathBuf, Option<ProjectConfig>) where PathBuf is absolute
pub fn resolve_effective_model_paths(&self, buster_yml_dir: &Path) -> Vec<(PathBuf, Option<&ProjectConfig>)> {
let mut effective_paths = Vec::new();
if let Some(projects) = &self.projects {
for project_config in projects {
let project_path = buster_yml_dir.join(&project_config.path);
effective_paths.push((project_path, Some(project_config)));
}
} else if let Some(model_paths) = &self.model_paths {
for model_path_str in model_paths {
let model_path = buster_yml_dir.join(model_path_str);
effective_paths.push((model_path, None));
}
} else {
// Default to the directory containing buster.yml if no projects or model_paths specified
effective_paths.push((buster_yml_dir.to_path_buf(), None));
}
effective_paths
}
}
2. Model Discovery in deploy.rs
:
- The
deploy
function will first determine thebase_dir
(current or specified path). - It will call
BusterConfig::load_from_dir(&base_dir)
to get the config. If nobuster.yml
is found, it proceeds with default/empty config. - Use
config.resolve_effective_model_paths(&base_dir)
to get search paths and associated project configs. - For each path returned:
- Recursively find all
*.yml
files (excludingbuster.yml
). - Keep track of the
Option<ProjectConfig>
associated with models found under each path for later config resolution.
- Recursively find all
// Conceptual logic in cli/cli/src/commands/deploy.rs
// ... imports ...
use crate::utils::config::{BusterConfig, ProjectConfig}; // Assuming this path
async fn deploy(path_arg: Option<&str>, /* ... other args ... */) -> Result<()> {
let current_dir = std::env::current_dir()?;
let base_dir = path_arg.map(PathBuf::from).unwrap_or(current_dir);
// Load buster.yml strictly from base_dir
let buster_config = BusterConfig::load_from_dir(&base_dir)?.unwrap_or_default();
let mut all_model_files_with_context = Vec::new();
let effective_search_paths = buster_config.resolve_effective_model_paths(&base_dir);
for (search_path, project_config_opt) in effective_search_paths {
if search_path.is_dir() {
// WalkDir or similar to find *.yml files recursively
// For each found yml_file_path:
// all_model_files_with_context.push((yml_file_path, project_config_opt.cloned()));
} else if search_path.is_file() && search_path.extension().map_or(false, |ext| ext == "yml") {
// all_model_files_with_context.push((search_path, project_config_opt.cloned()));
}
}
// ... rest of the deployment logic will use all_model_files_with_context ...
// Each element now carries its potential ProjectConfig for resolving DB/schema
Ok(())
}
Implementation Steps
- Define
ProjectConfig
struct incli/cli/src/utils/config.rs
. - Add
projects: Option<Vec<ProjectConfig>>
toBusterConfig
struct. - Update
BusterConfig::load_from_dir
(or ensure existing loader) to only look in the provided directory. - Implement
BusterConfig::resolve_effective_model_paths(&self, buster_yml_dir: &Path)
method. - In
cli/cli/src/commands/deploy.rs
: a. Modifydeploy
to determinebase_dir
(current or specified path). b. LoadBusterConfig
strictly frombase_dir
. c. Useresolve_effective_model_paths
to get search locations. d. Implement recursive search for.yml
files (excludingbuster.yml
) in these locations, associating found files with theirOption<ProjectConfig>
. - Ensure exclusion logic (
exclude_files
,exclude_tags
) is still applied correctly to the discovered files.
Tests
- Unit Tests for
BusterConfig
:- Test
BusterConfig::load_from_dir
correctly loads or returnsNone
. - Test
resolve_effective_model_paths
:- With
projects
defined. - With
model_paths
defined (andprojects
undefined). - With neither defined (should default to
buster_yml_dir
). - Paths are correctly made absolute from
buster_yml_dir
.
- With
- Test
- Integration-like Tests for
deploy
discovery (mocking file system or using temp dirs):buster.yml
in current dir,projects
point to subdirs -> models found correctly.buster.yml
in current dir, noprojects
, nomodel_paths
-> models in current dir found.deploy
called with path argument ->buster.yml
loaded from that path.deploy
command does NOT findbuster.yml
in parent directories.- Ensure
exclude_files
patterns correctly filter results fromprojects
paths.
Success Criteria
BusterConfig
andProjectConfig
are correctly defined and can be deserialized from YAML.deploy
command loadsbuster.yml
only from the specified/current directory.- Model discovery correctly uses
projects
, thenmodel_paths
, thenbuster.yml
directory, and finds.yml
files recursively within these. - Discovered model files are correctly associated with their
ProjectConfig
(if any) for later steps. - All tests pass.
Dependencies on Other Components
- Relies on
prd_semantic_model_definition.md
for the structure of model files being discovered, but primarily focuses on finding them and their configuration context.
Security Considerations
- Path resolution from
buster.yml
(forprojects.path
ormodel_paths
) must be handled carefully to ensure paths are treated as relative tobuster.yml
and do not allow traversal to unintended locations (e.g.,../../../../../etc/passwd
). Standard library functions likePath::join
are generally safe but input validation or sanitization might be considered if paths can be arbitrary strings.
References
- Existing
cli/cli/src/utils/config.rs
- Existing
cli/cli/src/commands/deploy.rs
model discovery logic.