- Enhance message streaming with more precise content and tool call handling
- Add logic to only send and store meaningful assistant messages and tool calls
- Prevent sending empty or redundant messages during stream processing
- Improve tool call and content update tracking in agent stream method
- Optimize message inclusion in recursive thread generation
- Add support for initial non-tool response and subsequent tool-enabled processing
- Implement `PendingToolCall` to manage incremental tool call updates
- Update `Delta` and related types to support more flexible tool call streaming
- Modify agent stream processing to handle tool calls with improved state management
- Add robust handling for tool call deltas and function call arguments
- Move hardcoded agent prompt to a constant `AGENT_PROMPT` in the `agent_thread.rs` file
- Update the prompt's date from January 27, 2025 to February 7, 2025
- Simplify thread initialization by referencing the new constant
- Maintain existing prompt structure and guidelines
- Implement a maximum recursion depth of 30 for agent thread processing
- Add recursion depth tracking to prevent potential infinite loops
- Provide user-friendly message when maximum recursion depth is reached
- Update debug logging to include current recursion depth
- Modify both synchronous and streaming thread processing methods
- Refactor agent stream processing to use a recursive approach for handling tool calls
- Enhance tool execution with more robust error handling and result tracking
- Improve stream chunk processing with detailed state management
- Add support for recursive thread generation based on tool call results
- Implement cloning for LiteLLMClient to support stream processing tasks
- Update search data catalog and file search tools to use consistent JSON response structure
- Enforce strict response format with mandatory "results" key and comprehensive field requirements
- Add explicit guidelines for result composition and handling of empty result sets
- Improve response predictability and parsing reliability for AI-powered search tools
- Add `stream: Some(false)` to file search and data catalog tools
- Make `json_schema` optional in `ResponseFormat` serialization
- Enhance logging in search file tool with debug and warning messages
- Improve error context when parsing LLM JSON responses
- Wrap agent tools in an Arc for safe concurrent access
- Modify tool addition methods to work with Arc-wrapped HashMap
- Ensure thread-safe tool registration and retrieval
- Update stream processing to use Arc-cloned tools reference
- Add comprehensive debug logging for agent thread processing
- Modify agent thread to use user ID for streaming subscription
- Update stream processing to include detailed tool call and content logging
- Improve error handling and visibility in stream processing
- Add user context to agent thread initialization
- Remove redundant tool execution handling in stream processing
- Add `ValueToolExecutor` to convert tool outputs to `serde_json::Value`
- Introduce `IntoValueTool` trait for easy value type conversion
- Update agent tool addition methods to use new value conversion mechanism
- Simplify tool registration by automatically converting tool outputs
- Remove previous manual boxing and type conversion logic
- Add comprehensive dataset search functionality using LLM for intelligent dataset matching
- Implement search across datasets with relevance ranking based on YML content
- Create structured search result output with dataset metadata
- Add robust error handling, logging, and parsing for search operations
- Include test coverage for search result validation
- Enhance tool with flexible query parameter support and detailed response messages
- Add comprehensive file search functionality using LLM for intelligent file matching
- Implement search across metric and dashboard files with relevance ranking
- Create structured search result output with file metadata
- Add robust error handling and logging for search operations
- Include test coverage for search result parsing
- Implement robust line-based content modification for metric and dashboard files
- Add comprehensive error handling and validation for file modifications
- Improve modification tracking with detailed modification results
- Optimize file processing with batch insertion and validation
- Add extensive test coverage for modification and validation logic
- Add generic Output type to ToolExecutor trait
- Update file tools to use strongly-typed output structs
- Modify agent and tool implementations to support generic output
- Improve error handling and result reporting in file-related tools
- Add more detailed status messages for file operations
- Rename file type modules from `metric_file` and `dashboard_file` to `metric_yml` and `dashboard_yml`
- Modify metric files migration to use a verification enum instead of boolean
- Update messages_to_files junction table with UUID primary key and additional timestamp columns
- Adjust database models to support new file and message structures
- Refactor file creation utility to use new model structures
- Extend Dataset model and schema to include optional database_identifier field
- Update dataset creation and deployment routes to handle new database_identifier parameter
- Modify dataset DDL generation to use database_identifier for schema resolution when available
- Add support for parsing Snowflake timestamp structs with epoch and fraction fields
- Implement handling of Snowflake timestamp logical types (with and without timezone)
- Enhance JSON value processing to detect and convert Snowflake timestamp objects
- Add error handling and logging for timestamp parsing
* xAxisTickinerval
* fix(visualization): Add x-axis time unit configuration for bar and combo charts
- Extend chart configuration to support optional x-axis time unit
- Update modify visualization agent to dynamically set x-axis time interval
- Modify bar line and combo chart prompts to include x_axis_time_unit parameter
* only use valid time units
* refactor(visualization): Simplify x-axis time unit configuration
- Modify modify_visualization_agent to extract and remove x-axis time unit more efficiently
- Update global styling result structure for x-axis time interval
- Adjust format_label_prompt comment to clarify date format default behavior
---------
Co-authored-by: Nate Kelley <nate@buster.so>
- Restructure deploy_datasets function to separate concerns
- Improve column and relationship validation in dataset deployment
- Enhance error handling and validation result generation
- Add support for more comprehensive column and relationship checks
- Refactor validation logic to handle multiple error types
- Add support for capturing source type (table, view, materialized view)
- Improve column metadata queries for Postgres, MySQL, BigQuery, and Snowflake
- Include more comprehensive column information during dataset import
- Extend DatasetColumnRecord to include source_type field
- Add comprehensive column validation before dataset deployment
- Validate existence of all required columns in source database
- Simplify column type and nullability retrieval
- Enhance error reporting for missing columns
- Update deployment logic to use pre-validated column information
- Add detailed validation error logging in CLI
- Improve type compatibility checks in dataset validation
- Modify deployment process to handle and report validation errors more comprehensively
- Add Hash derive for Verification enum
- Update API and CLI to support more informative validation results
- Introduce helper functions for processing string and JSON values
- Implement case-insensitive string and JSON value transformations
- Add robust timestamp parsing with error handling
- Enhance Snowflake query result processing with consistent data normalization
- Refactor stored values processing in dataset deployment to use background task
- Add `StoredValueColumn` struct to encapsulate column processing details
- Implement `process_stored_values_background` for parallel and resilient value storage
- Add logging for successful and failed stored value processing
- Update CLI to handle optional SQL definitions and improve file processing
* fix: add stored values support for dataset columns
This commit introduces stored values functionality for dataset columns, including:
- Adding a `stored_values` flag to column deployment requests
- Implementing a mechanism to store column values during dataset deployment
- Updating data analyst and SQL generation agents to leverage stored values
- Creating a new utility module for stored values search and management
* refactor(stored_values): improve stored values implementation and schema management
This commit enhances the stored values functionality with several key improvements:
- Update schema and table creation to use organization ID as schema name
- Modify stored values storage to include column ID
- Improve value extraction and embedding generation process
- Remove unnecessary distance calculation in search results
- Clean up unused values_engine module
- Added implementation for creating metric files with database insertion
- Introduced `create_metric_file()` and `create_dashboard_file()` functions
- Updated `CreateFilesTool` to handle different file types
- Added `data_metadata` field to `MetricFile` struct
- Implemented basic file type validation and creation logic
- Added robust error handling for JSON parameter parsing in `CreateFilesTool`
- Updated file type description to clarify naming conventions
- Prepared for file creation logic implementation
- Moved file-related types to a new `file_types` module
- Removed redundant `types.rs` file
- Removed individual tool files for bulk file modifications, file creation, file opening, data catalog search, file search, and sending to user
- Created a new `file_tools` module in `tools/mod.rs` to centralize file-related tool implementations
- Commented out individual tool module imports in preparation for future implementation
- Simplified the tools module structure for better organization and maintainability
- Added environment variable-based LLM client initialization in `Agent::new()`
- Introduced `add_tool()` and `add_tools()` methods for more flexible tool registration
- Implemented new `get_name()` method for `ToolExecutor` trait
- Added comprehensive test cases for Agent with and without tools
- Updated `AgentThread` with a convenient constructor method
- Temporarily commented out unused tool modules
- Added debug print in LiteLLM client for response logging
- Renamed existing messages table to `messages_deprecated`
- Created new `messages` table with updated schema and additional indexes
- Updated Diesel schema to reflect new table structure and relationships
- Added new foreign key constraints for threads and users
- Prepared for migration of existing message data
- Created `messages_to_files` junction table to link messages with dashboard and metric files
- Added foreign key constraints and indexes for efficient file-message relationships
- Updated Diesel schema to include new `messages_to_files`, `dashboard_files`, and `metric_files` tables
- Removed unnecessary timestamp triggers from migration files
- Added support for retrieving and passing data source type during SQL agent generation
- Updated SQL generation prompts to include data source type in system messages
- Modified `generate_sql_agent` function to fetch and utilize data source type information
- Improved SQL generation context by dynamically incorporating data source type details
- Renamed `search_datasets.rs` to `search_data_catalog.rs`
- Updated `mod.rs` to reflect the new module and tool name
- Removed the placeholder `SearchDatasetsTool` implementation
- Prepared for future implementation of data catalog search functionality
- Relocated `ToolExecutor` trait from `agent/types.rs` to `tools/mod.rs`
- Updated import paths in `agent/agent.rs` to reflect new trait location
- Simplified `types.rs` by removing unused imports and trait definition
- Prepared for more comprehensive tool management in the tools module
- Renamed `Thread` to `AgentThread` for clarity
- Modified `ToolExecutor` to return `serde_json::Value` instead of `String`
- Updated message processing to handle new message structures
- Improved content handling in streaming and tool call scenarios
- Simplified message content extraction and serialization
- Uncommented and activated module implementations for `LiteLLMClient` and `Agent`
- Updated test cases to use new message and tool call structures
- Simplified test assertions for chat completion and tool call scenarios
- Restored module exports in `mod.rs` files
- Commented out imports and module exports for `LiteLLMClient` and `Agent`
- Removed active module implementations in `mod.rs` files
- Simplified type definitions and test cases
- Prepared for potential future reimplementation or restructuring
- Simplified message and tool structures in `types.rs`
- Updated `agent.rs` to use new type structures
- Enhanced error handling in agent tests with detailed error printing
- Updated `.cursorrules` with debugging recommendation
- Removed redundant fields and improved type flexibility
- Streamlined serialization and deserialization of AI client types
- Updated `types.rs` to support more complex message and response structures
- Added support for multi-content messages with `Content` struct
- Introduced optional fields and improved serialization handling
- Enhanced type flexibility for AI client responses
- Updated test cases to reflect new type structures
- Added `Default` implementation for `LiteLLMClient`
- Improved support for tool calls and streaming responses
- Updated `.env.example` with new LLM configuration variables
- Modified `LiteLLMClient` to support API key and base URL from environment variables
- Enhanced client initialization to use env vars with optional parameter overrides
- Added comprehensive test case for environment variable configuration
- Updated test cases to use `mockito::Server::new_async()`
- Added request body validation in test mocks
- Improved test coverage for chat completion and streaming scenarios
- Updated .cursorrules with detailed testing guidelines
- Added dev dependencies for Mockito and Tokio async testing
- Updated .cursorrules with async test requirement
- Expanded utils modules with new serde_helpers and tools
- Added new AI client module for LiteLLM
* feat: enhance deploy command with skip_dbt option
- Updated the Deploy command to accept a `skip_dbt` boolean argument, allowing users to bypass the dbt run during deployment.
- Refactored the deploy function to conditionally execute the dbt command based on the `skip_dbt` flag, improving deployment flexibility.
* Refactor query engine and CLI commands for improved functionality and error handling
- Updated `get_bigquery_columns` and `get_snowflake_columns` functions to enhance column name handling and ensure proper error reporting.
- Modified `get_snowflake_client` to accept a database ID for better connection management.
- Enhanced the `deploy` command in the CLI to include additional parameters (`path`, `data_source_name`, `schema`, `env`) for more flexible deployments.
- Improved error handling and reporting in the `deploy` function, including detailed summaries of deployment errors and successful file processing.
- Updated `get_model_files` to accept a directory path and added checks for file existence, enhancing robustness.
- Adjusted model file structures to include schema information and refined the upload process to handle optional parameters more effectively.
These changes collectively improve the usability and reliability of the query engine and deployment process.
* Update dataset DDL generation to include optional YML file content
- Modified `generate_sql_agent` to append optional YML file content to dataset DDL
- Ensures more comprehensive dataset representation during SQL agent generation
- Handles cases where YML file might be present or absent gracefully
- Added `html-escape` crate to `Cargo.toml` for HTML escaping.
- Updated email template processing to escape HTML in message and button text, preventing potential XSS vulnerabilities.
- Modified test cases to include HTML content in email parameters, ensuring proper handling and escaping.
This change improves security by sanitizing user input in email communications.
* fix permission check on post_dataset rest
* refactor: enhance dataset overview access lineage and permission checks
- Updated the `get_dataset_overview` function to conditionally add default access lineage based on user roles and existing access paths.
- Simplified the logic for adding user roles to the lineage, ensuring clarity and maintainability.
- Improved handling for the `RestrictedQuerier` role to include checks for existing access before adding default lineage, enhancing permission accuracy.
- Streamlined code by removing redundant checks and consolidating role handling, optimizing overall readability.
* feat: Enhance permission group handling and data retrieval
- Introduced a new `PermissionGroupInfo` struct to encapsulate detailed information about permission groups, including user and dataset counts.
- Updated the `get_permission_group` and `list_permission_groups` functions to improve data retrieval and error handling.
- Refactored SQL queries in `list_permission_groups` to include additional joins for counting users and datasets associated with permission groups, enhancing the overall functionality and clarity of the API.
- Streamlined code for better readability and maintainability, ensuring consistent handling of user and permission group data.
* refactor: Improve dataset access handling and permission checks
- Enhanced the `get_restricted_user_datasets` and `get_restricted_user_datasets_with_metadata` functions to include additional permission checks for dataset groups and permission groups.
- Consolidated SQL queries to ensure proper filtering of deleted records and improved clarity in dataset retrieval logic.
- Introduced new joins and filters to handle dataset group permissions, ensuring accurate access control for users.
- Streamlined code for better readability and maintainability, enhancing overall functionality in dataset access management.
* fix: Update SQL migration and seed data for user attributes
- Modified the SQL migration to specify the schema for the `users` table, ensuring clarity in the update statement.
- Adjusted the seed data for `users_to_organizations` to change the `organization_id` from 'public' to 'none', reflecting a more accurate state for user roles and organization associations.
- Ensured consistency in the formatting of SQL insert statements for better readability.
* fix: Prevent users from updating their own profiles
- Added a check in the `update_user_handler` to prevent users from updating their own information, returning an error if they attempt to do so.
- This change enhances security by ensuring that users cannot modify their own records, which could lead to unauthorized changes.
* refactor: Simplify dashboard permission queries by removing team-based joins
- Removed left joins with `teams_to_users` table in dashboard permission queries
- Simplified permission checks to only filter by direct user ID
- Updated queries in `get_user_dashboard_permission`, `get_bulk_user_dashboard_permission`, and `list_dashboards_handler`
- Streamlined SQL query logic for more direct and efficient permission checks
- Enhanced the `get_restricted_user_datasets` and `get_restricted_user_datasets_with_metadata` functions to include additional permission checks for dataset groups and permission groups.
- Consolidated SQL queries to ensure proper filtering of deleted records and improved clarity in dataset retrieval logic.
- Introduced new joins and filters to handle dataset group permissions, ensuring accurate access control for users.
- Streamlined code for better readability and maintainability, enhancing overall functionality in dataset access management.
- Introduced a new `PermissionGroupInfo` struct to encapsulate detailed information about permission groups, including user and dataset counts.
- Updated the `get_permission_group` and `list_permission_groups` functions to improve data retrieval and error handling.
- Refactored SQL queries in `list_permission_groups` to include additional joins for counting users and datasets associated with permission groups, enhancing the overall functionality and clarity of the API.
- Streamlined code for better readability and maintainability, ensuring consistent handling of user and permission group data.
- Updated the `get_dataset_overview` function to conditionally add default access lineage based on user roles and existing access paths.
- Simplified the logic for adding user roles to the lineage, ensuring clarity and maintainability.
- Improved handling for the `RestrictedQuerier` role to include checks for existing access before adding default lineage, enhancing permission accuracy.
- Streamlined code by removing redundant checks and consolidating role handling, optimizing overall readability.