buster/apps/api/prds/prd_cli_config_and_discover...

12 KiB

title author date status parent_prd ticket
CLI Configuration and Model Discovery Gemini Assistant 2024-07-26 Draft semantic_layer_refactor_overview.md N/A

CLI Configuration and Model Discovery

Parent Project

This is a sub-PRD of the Semantic Layer and Deployment Refactor project. Please refer to the parent PRD for the overall project context, goals, and implementation plan.

Problem Statement

The current CLI (deploy command) has a limited way of discovering model files and managing configurations. The buster.yml is typically expected in the current directory or a specified path, and its structure is flat. This doesn't cater well to monorepos or projects where models might be organized into sub-directories, each potentially with slightly different default configurations (like schema or database).

Behavior of deploy command pathing:

  • The deploy command, when invoked, should look for buster.yml only in the effective target directory (either the one provided as an argument or the current working directory if no argument is given).
  • It should not search in parent directories for buster.yml.
  • Once buster.yml is found (or defaults are used if it's not found), model file discovery (e.g., *.yml) will proceed based on the paths specified in buster.yml (new projects field) or the directory containing buster.yml itself (for backward compatibility or simple cases). Model search should be recursive within these specified model paths.

Current behavior:

  • BusterConfig in cli/cli/src/utils/config.rs is flat, primarily supporting global data_source_name, schema, database, exclude_tags, and exclude_files.
  • Model discovery is generally from the current path or a specified path, without a structured way to define multiple distinct "project" sources of models within one buster.yml.
  • The deploy command might have ambiguous behavior regarding buster.yml lookup upwards in the directory tree.

Expected behavior:

  • The deploy command will strictly look for buster.yml in the target directory (current or specified) and not traverse upwards.
  • BusterConfig will be extended to include an optional projects: Vec<ProjectConfig> field.
  • Each ProjectConfig will define a path (relative to the buster.yml location) where its models are located, and can optionally specify its own data_source_name, schema, and database that override the global settings in buster.yml for models within that project path.
  • The CLI's model discovery logic will iterate through these projects (if defined) or use the buster.yml directory / model_paths (if defined and projects is not) to find model files (.yml excluding buster.yml).
  • Configuration (database, schema, data_source_name) for a deployed model will be resolved with the following precedence: Model File -> ProjectConfig -> Global BusterConfig.

Goals

  1. Modify the deploy command to search for buster.yml only in the current/specified directory and its subdirectories (when looking for models), not parent directories.
  2. Extend BusterConfig in cli/cli/src/utils/config.rs to include projects: Option<Vec<ProjectConfig>>.
  3. Define a ProjectConfig struct with path: String and optional data_source_name, schema, database fields.
  4. Update the model file discovery logic in cli/cli/src/commands/deploy.rs to: a. Honor the projects structure in buster.yml if present. b. If projects is not present, fall back to existing model_paths logic or searching the directory of buster.yml (or current/specified path if no buster.yml). c. Recursively search for .yml model files within the determined project/model paths.
  5. Implement the configuration inheritance logic (Model File > ProjectConfig > Global BusterConfig) when preparing models for deployment.

Non-Goals

  1. Changing the YAML parsing for individual model files (covered in prd_semantic_model_definition.md and prd_cli_deployment_logic.md).
  2. Altering the exclude_tags or exclude_files functionality at the global level (though they would apply to all discovered models).

Implementation Plan

Phase 1: Update BusterConfig and Discovery Logic

Technical Design

1. BusterConfig and ProjectConfig structs (cli/cli/src/utils/config.rs):

// In cli/cli/src/utils/config.rs (or equivalent)
use serde::Deserialize;
use std::path::{Path, PathBuf};

#[derive(Debug, Deserialize, Clone, Default)] // Added Default
pub struct BusterConfig {
    pub data_source_name: Option<String>,
    pub schema: Option<String>,
    pub database: Option<String>,
    pub exclude_tags: Option<Vec<String>>,
    pub exclude_files: Option<Vec<PathBuf>>, // Assuming PathBuf is more appropriate here
    pub model_paths: Option<Vec<String>>,    // Existing field for model paths
    pub projects: Option<Vec<ProjectConfig>>, // New field for projects
}

#[derive(Debug, Deserialize, Clone)]
pub struct ProjectConfig {
    pub name: String, // A name for the project (optional, for logging/identification)
    pub path: String, // Path relative to buster.yml location
    pub data_source_name: Option<String>,
    pub schema: Option<String>,
    pub database: Option<String>,
}

impl BusterConfig {
    // Helper function to load BusterConfig from a directory.
    // This should only look for buster.yml in the specified `dir`.
    pub fn load_from_dir(dir: &Path) -> Result<Option<Self>, anyhow::Error> {
        let config_path = dir.join("buster.yml");
        if config_path.exists() {
            let content = std::fs::read_to_string(config_path)?;
            let config: BusterConfig = serde_yaml::from_str(&content)?;
            Ok(Some(config))
        } else {
            Ok(None)
        }
    }

    // Method to resolve effective model search paths
    // Returns a list of (PathBuf, Option<ProjectConfig>) where PathBuf is absolute
    pub fn resolve_effective_model_paths(&self, buster_yml_dir: &Path) -> Vec<(PathBuf, Option<&ProjectConfig>)> {
        let mut effective_paths = Vec::new();

        if let Some(projects) = &self.projects {
            for project_config in projects {
                let project_path = buster_yml_dir.join(&project_config.path);
                effective_paths.push((project_path, Some(project_config)));
            }
        } else if let Some(model_paths) = &self.model_paths {
            for model_path_str in model_paths {
                let model_path = buster_yml_dir.join(model_path_str);
                effective_paths.push((model_path, None));
            }
        } else {
            // Default to the directory containing buster.yml if no projects or model_paths specified
            effective_paths.push((buster_yml_dir.to_path_buf(), None));
        }
        effective_paths
    }
}

2. Model Discovery in deploy.rs:

  • The deploy function will first determine the base_dir (current or specified path).
  • It will call BusterConfig::load_from_dir(&base_dir) to get the config. If no buster.yml is found, it proceeds with default/empty config.
  • Use config.resolve_effective_model_paths(&base_dir) to get search paths and associated project configs.
  • For each path returned:
    • Recursively find all *.yml files (excluding buster.yml).
    • Keep track of the Option<ProjectConfig> associated with models found under each path for later config resolution.
// Conceptual logic in cli/cli/src/commands/deploy.rs
// ... imports ...
use crate::utils::config::{BusterConfig, ProjectConfig}; // Assuming this path

async fn deploy(path_arg: Option<&str>, /* ... other args ... */) -> Result<()> {
    let current_dir = std::env::current_dir()?;
    let base_dir = path_arg.map(PathBuf::from).unwrap_or(current_dir);

    // Load buster.yml strictly from base_dir
    let buster_config = BusterConfig::load_from_dir(&base_dir)?.unwrap_or_default();

    let mut all_model_files_with_context = Vec::new();

    let effective_search_paths = buster_config.resolve_effective_model_paths(&base_dir);

    for (search_path, project_config_opt) in effective_search_paths {
        if search_path.is_dir() {
            // WalkDir or similar to find *.yml files recursively
            // For each found yml_file_path:
            // all_model_files_with_context.push((yml_file_path, project_config_opt.cloned()));
        } else if search_path.is_file() && search_path.extension().map_or(false, |ext| ext == "yml") {
            // all_model_files_with_context.push((search_path, project_config_opt.cloned()));
        }
    }

    // ... rest of the deployment logic will use all_model_files_with_context ...
    // Each element now carries its potential ProjectConfig for resolving DB/schema

    Ok(())
}

Implementation Steps

  1. Define ProjectConfig struct in cli/cli/src/utils/config.rs.
  2. Add projects: Option<Vec<ProjectConfig>> to BusterConfig struct.
  3. Update BusterConfig::load_from_dir (or ensure existing loader) to only look in the provided directory.
  4. Implement BusterConfig::resolve_effective_model_paths(&self, buster_yml_dir: &Path) method.
  5. In cli/cli/src/commands/deploy.rs: a. Modify deploy to determine base_dir (current or specified path). b. Load BusterConfig strictly from base_dir. c. Use resolve_effective_model_paths to get search locations. d. Implement recursive search for .yml files (excluding buster.yml) in these locations, associating found files with their Option<ProjectConfig>.
  6. Ensure exclusion logic (exclude_files, exclude_tags) is still applied correctly to the discovered files.

Tests

  • Unit Tests for BusterConfig:
    • Test BusterConfig::load_from_dir correctly loads or returns None.
    • Test resolve_effective_model_paths:
      • With projects defined.
      • With model_paths defined (and projects undefined).
      • With neither defined (should default to buster_yml_dir).
      • Paths are correctly made absolute from buster_yml_dir.
  • Integration-like Tests for deploy discovery (mocking file system or using temp dirs):
    • buster.yml in current dir, projects point to subdirs -> models found correctly.
    • buster.yml in current dir, no projects, no model_paths -> models in current dir found.
    • deploy called with path argument -> buster.yml loaded from that path.
    • deploy command does NOT find buster.yml in parent directories.
    • Ensure exclude_files patterns correctly filter results from projects paths.

Success Criteria

  • BusterConfig and ProjectConfig are correctly defined and can be deserialized from YAML.
  • deploy command loads buster.yml only from the specified/current directory.
  • Model discovery correctly uses projects, then model_paths, then buster.yml directory, and finds .yml files recursively within these.
  • Discovered model files are correctly associated with their ProjectConfig (if any) for later steps.
  • All tests pass.

Dependencies on Other Components

  • Relies on prd_semantic_model_definition.md for the structure of model files being discovered, but primarily focuses on finding them and their configuration context.

Security Considerations

  • Path resolution from buster.yml (for projects.path or model_paths) must be handled carefully to ensure paths are treated as relative to buster.yml and do not allow traversal to unintended locations (e.g., ../../../../../etc/passwd). Standard library functions like Path::join are generally safe but input validation or sanitization might be considered if paths can be arbitrary strings.

References

  • Existing cli/cli/src/utils/config.rs
  • Existing cli/cli/src/commands/deploy.rs model discovery logic.