mirror of https://github.com/buster-so/buster.git
212 lines
12 KiB
Markdown
212 lines
12 KiB
Markdown
|
---
|
||
|
title: CLI Configuration and Model Discovery
|
||
|
author: Gemini Assistant
|
||
|
date: 2024-07-26
|
||
|
status: Draft
|
||
|
parent_prd: semantic_layer_refactor_overview.md
|
||
|
ticket: N/A
|
||
|
---
|
||
|
|
||
|
# CLI Configuration and Model Discovery
|
||
|
|
||
|
## Parent Project
|
||
|
|
||
|
This is a sub-PRD of the [Semantic Layer and Deployment Refactor](semantic_layer_refactor_overview.md) project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.
|
||
|
|
||
|
## Problem Statement
|
||
|
|
||
|
The current CLI (`deploy` command) has a limited way of discovering model files and managing configurations. The `buster.yml` is typically expected in the current directory or a specified path, and its structure is flat. This doesn't cater well to monorepos or projects where models might be organized into sub-directories, each potentially with slightly different default configurations (like schema or database).
|
||
|
|
||
|
Behavior of `deploy` command pathing:
|
||
|
- The `deploy` command, when invoked, should look for `buster.yml` only in the effective target directory (either the one provided as an argument or the current working directory if no argument is given).
|
||
|
- It should *not* search in parent directories for `buster.yml`.
|
||
|
- Once `buster.yml` is found (or defaults are used if it's not found), model file discovery (e.g., `*.yml`) will proceed based on the paths specified in `buster.yml` (new `projects` field) or the directory containing `buster.yml` itself (for backward compatibility or simple cases). Model search should be recursive *within* these specified model paths.
|
||
|
|
||
|
Current behavior:
|
||
|
- `BusterConfig` in `cli/cli/src/utils/config.rs` is flat, primarily supporting global `data_source_name`, `schema`, `database`, `exclude_tags`, and `exclude_files`.
|
||
|
- Model discovery is generally from the current path or a specified path, without a structured way to define multiple distinct "project" sources of models within one `buster.yml`.
|
||
|
- The `deploy` command might have ambiguous behavior regarding `buster.yml` lookup upwards in the directory tree.
|
||
|
|
||
|
Expected behavior:
|
||
|
- The `deploy` command will strictly look for `buster.yml` in the target directory (current or specified) and *not* traverse upwards.
|
||
|
- `BusterConfig` will be extended to include an optional `projects: Vec<ProjectConfig>` field.
|
||
|
- Each `ProjectConfig` will define a `path` (relative to the `buster.yml` location) where its models are located, and can optionally specify its own `data_source_name`, `schema`, and `database` that override the global settings in `buster.yml` for models within that project path.
|
||
|
- The CLI's model discovery logic will iterate through these `projects` (if defined) or use the `buster.yml` directory / `model_paths` (if defined and `projects` is not) to find model files (`.yml` excluding `buster.yml`).
|
||
|
- Configuration (database, schema, data_source_name) for a deployed model will be resolved with the following precedence: Model File -> ProjectConfig -> Global BusterConfig.
|
||
|
|
||
|
## Goals
|
||
|
|
||
|
1. Modify the `deploy` command to search for `buster.yml` only in the current/specified directory and its subdirectories (when looking for models), not parent directories.
|
||
|
2. Extend `BusterConfig` in `cli/cli/src/utils/config.rs` to include `projects: Option<Vec<ProjectConfig>>`.
|
||
|
3. Define a `ProjectConfig` struct with `path: String` and optional `data_source_name`, `schema`, `database` fields.
|
||
|
4. Update the model file discovery logic in `cli/cli/src/commands/deploy.rs` to:
|
||
|
a. Honor the `projects` structure in `buster.yml` if present.
|
||
|
b. If `projects` is not present, fall back to existing `model_paths` logic or searching the directory of `buster.yml` (or current/specified path if no `buster.yml`).
|
||
|
c. Recursively search for `.yml` model files within the determined project/model paths.
|
||
|
5. Implement the configuration inheritance logic (Model File > ProjectConfig > Global BusterConfig) when preparing models for deployment.
|
||
|
|
||
|
## Non-Goals
|
||
|
|
||
|
1. Changing the YAML parsing for individual model files (covered in `prd_semantic_model_definition.md` and `prd_cli_deployment_logic.md`).
|
||
|
2. Altering the `exclude_tags` or `exclude_files` functionality at the global level (though they would apply to all discovered models).
|
||
|
|
||
|
## Implementation Plan
|
||
|
|
||
|
### Phase 1: Update BusterConfig and Discovery Logic
|
||
|
|
||
|
#### Technical Design
|
||
|
|
||
|
**1. `BusterConfig` and `ProjectConfig` structs (`cli/cli/src/utils/config.rs`):**
|
||
|
|
||
|
```rust
|
||
|
// In cli/cli/src/utils/config.rs (or equivalent)
|
||
|
use serde::Deserialize;
|
||
|
use std::path::{Path, PathBuf};
|
||
|
|
||
|
#[derive(Debug, Deserialize, Clone, Default)] // Added Default
|
||
|
pub struct BusterConfig {
|
||
|
pub data_source_name: Option<String>,
|
||
|
pub schema: Option<String>,
|
||
|
pub database: Option<String>,
|
||
|
pub exclude_tags: Option<Vec<String>>,
|
||
|
pub exclude_files: Option<Vec<PathBuf>>, // Assuming PathBuf is more appropriate here
|
||
|
pub model_paths: Option<Vec<String>>, // Existing field for model paths
|
||
|
pub projects: Option<Vec<ProjectConfig>>, // New field for projects
|
||
|
}
|
||
|
|
||
|
#[derive(Debug, Deserialize, Clone)]
|
||
|
pub struct ProjectConfig {
|
||
|
pub name: String, // A name for the project (optional, for logging/identification)
|
||
|
pub path: String, // Path relative to buster.yml location
|
||
|
pub data_source_name: Option<String>,
|
||
|
pub schema: Option<String>,
|
||
|
pub database: Option<String>,
|
||
|
}
|
||
|
|
||
|
impl BusterConfig {
|
||
|
// Helper function to load BusterConfig from a directory.
|
||
|
// This should only look for buster.yml in the specified `dir`.
|
||
|
pub fn load_from_dir(dir: &Path) -> Result<Option<Self>, anyhow::Error> {
|
||
|
let config_path = dir.join("buster.yml");
|
||
|
if config_path.exists() {
|
||
|
let content = std::fs::read_to_string(config_path)?;
|
||
|
let config: BusterConfig = serde_yaml::from_str(&content)?;
|
||
|
Ok(Some(config))
|
||
|
} else {
|
||
|
Ok(None)
|
||
|
}
|
||
|
}
|
||
|
|
||
|
// Method to resolve effective model search paths
|
||
|
// Returns a list of (PathBuf, Option<ProjectConfig>) where PathBuf is absolute
|
||
|
pub fn resolve_effective_model_paths(&self, buster_yml_dir: &Path) -> Vec<(PathBuf, Option<&ProjectConfig>)> {
|
||
|
let mut effective_paths = Vec::new();
|
||
|
|
||
|
if let Some(projects) = &self.projects {
|
||
|
for project_config in projects {
|
||
|
let project_path = buster_yml_dir.join(&project_config.path);
|
||
|
effective_paths.push((project_path, Some(project_config)));
|
||
|
}
|
||
|
} else if let Some(model_paths) = &self.model_paths {
|
||
|
for model_path_str in model_paths {
|
||
|
let model_path = buster_yml_dir.join(model_path_str);
|
||
|
effective_paths.push((model_path, None));
|
||
|
}
|
||
|
} else {
|
||
|
// Default to the directory containing buster.yml if no projects or model_paths specified
|
||
|
effective_paths.push((buster_yml_dir.to_path_buf(), None));
|
||
|
}
|
||
|
effective_paths
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
**2. Model Discovery in `deploy.rs`:**
|
||
|
- The `deploy` function will first determine the `base_dir` (current or specified path).
|
||
|
- It will call `BusterConfig::load_from_dir(&base_dir)` to get the config. If no `buster.yml` is found, it proceeds with default/empty config.
|
||
|
- Use `config.resolve_effective_model_paths(&base_dir)` to get search paths and associated project configs.
|
||
|
- For each path returned:
|
||
|
- Recursively find all `*.yml` files (excluding `buster.yml`).
|
||
|
- Keep track of the `Option<ProjectConfig>` associated with models found under each path for later config resolution.
|
||
|
|
||
|
```rust
|
||
|
// Conceptual logic in cli/cli/src/commands/deploy.rs
|
||
|
// ... imports ...
|
||
|
use crate::utils::config::{BusterConfig, ProjectConfig}; // Assuming this path
|
||
|
|
||
|
async fn deploy(path_arg: Option<&str>, /* ... other args ... */) -> Result<()> {
|
||
|
let current_dir = std::env::current_dir()?;
|
||
|
let base_dir = path_arg.map(PathBuf::from).unwrap_or(current_dir);
|
||
|
|
||
|
// Load buster.yml strictly from base_dir
|
||
|
let buster_config = BusterConfig::load_from_dir(&base_dir)?.unwrap_or_default();
|
||
|
|
||
|
let mut all_model_files_with_context = Vec::new();
|
||
|
|
||
|
let effective_search_paths = buster_config.resolve_effective_model_paths(&base_dir);
|
||
|
|
||
|
for (search_path, project_config_opt) in effective_search_paths {
|
||
|
if search_path.is_dir() {
|
||
|
// WalkDir or similar to find *.yml files recursively
|
||
|
// For each found yml_file_path:
|
||
|
// all_model_files_with_context.push((yml_file_path, project_config_opt.cloned()));
|
||
|
} else if search_path.is_file() && search_path.extension().map_or(false, |ext| ext == "yml") {
|
||
|
// all_model_files_with_context.push((search_path, project_config_opt.cloned()));
|
||
|
}
|
||
|
}
|
||
|
|
||
|
// ... rest of the deployment logic will use all_model_files_with_context ...
|
||
|
// Each element now carries its potential ProjectConfig for resolving DB/schema
|
||
|
|
||
|
Ok(())
|
||
|
}
|
||
|
|
||
|
```
|
||
|
|
||
|
#### Implementation Steps
|
||
|
1. [ ] Define `ProjectConfig` struct in `cli/cli/src/utils/config.rs`.
|
||
|
2. [ ] Add `projects: Option<Vec<ProjectConfig>>` to `BusterConfig` struct.
|
||
|
3. [ ] Update `BusterConfig::load_from_dir` (or ensure existing loader) to only look in the provided directory.
|
||
|
4. [ ] Implement `BusterConfig::resolve_effective_model_paths(&self, buster_yml_dir: &Path)` method.
|
||
|
5. [ ] In `cli/cli/src/commands/deploy.rs`:
|
||
|
a. Modify `deploy` to determine `base_dir` (current or specified path).
|
||
|
b. Load `BusterConfig` strictly from `base_dir`.
|
||
|
c. Use `resolve_effective_model_paths` to get search locations.
|
||
|
d. Implement recursive search for `.yml` files (excluding `buster.yml`) in these locations, associating found files with their `Option<ProjectConfig>`.
|
||
|
6. [ ] Ensure exclusion logic (`exclude_files`, `exclude_tags`) is still applied correctly to the discovered files.
|
||
|
|
||
|
#### Tests
|
||
|
|
||
|
- **Unit Tests for `BusterConfig`:**
|
||
|
- Test `BusterConfig::load_from_dir` correctly loads or returns `None`.
|
||
|
- Test `resolve_effective_model_paths`:
|
||
|
- With `projects` defined.
|
||
|
- With `model_paths` defined (and `projects` undefined).
|
||
|
- With neither defined (should default to `buster_yml_dir`).
|
||
|
- Paths are correctly made absolute from `buster_yml_dir`.
|
||
|
- **Integration-like Tests for `deploy` discovery (mocking file system or using temp dirs):**
|
||
|
- `buster.yml` in current dir, `projects` point to subdirs -> models found correctly.
|
||
|
- `buster.yml` in current dir, no `projects`, no `model_paths` -> models in current dir found.
|
||
|
- `deploy` called with path argument -> `buster.yml` loaded from that path.
|
||
|
- `deploy` command does NOT find `buster.yml` in parent directories.
|
||
|
- Ensure `exclude_files` patterns correctly filter results from `projects` paths.
|
||
|
|
||
|
#### Success Criteria
|
||
|
- [ ] `BusterConfig` and `ProjectConfig` are correctly defined and can be deserialized from YAML.
|
||
|
- [ ] `deploy` command loads `buster.yml` only from the specified/current directory.
|
||
|
- [ ] Model discovery correctly uses `projects`, then `model_paths`, then `buster.yml` directory, and finds `.yml` files recursively within these.
|
||
|
- [ ] Discovered model files are correctly associated with their `ProjectConfig` (if any) for later steps.
|
||
|
- [ ] All tests pass.
|
||
|
|
||
|
## Dependencies on Other Components
|
||
|
|
||
|
- Relies on `prd_semantic_model_definition.md` for the structure of model files being discovered, but primarily focuses on *finding* them and their *configuration context*.
|
||
|
|
||
|
## Security Considerations
|
||
|
|
||
|
- Path resolution from `buster.yml` (for `projects.path` or `model_paths`) must be handled carefully to ensure paths are treated as relative to `buster.yml` and do not allow traversal to unintended locations (e.g., `../../../../../etc/passwd`). Standard library functions like `Path::join` are generally safe but input validation or sanitization might be considered if paths can be arbitrary strings.
|
||
|
|
||
|
## References
|
||
|
|
||
|
- Existing `cli/cli/src/utils/config.rs`
|
||
|
- Existing `cli/cli/src/commands/deploy.rs` model discovery logic.
|