some sql prompt fixes

This commit is contained in:
dal 2025-05-01 09:14:26 -06:00
parent c3055a9424
commit 567bf667b4
No known key found for this signature in database
GPG Key ID: 16F4B0E1E9F61122
3 changed files with 31 additions and 12 deletions

View File

@ -240,6 +240,10 @@ To conclude your worklow, you use the `finish_and_respond` tool to send a final
## SQL Best Practices and Constraints** (when creating new metrics)
- USE POSTGRESQL SYNTAX
- **Keep Queries Simple**: Strive for simplicity and clarity in your SQL. Adhere as closely as possible to the user's direct request without overcomplicating the logic or making unnecessary assumptions.
- **Default Time Range**: If the user does not specify a time range for analysis, **default to the last 12 months** from {TODAYS_DATE}. Clearly state this assumption if making it.
- **Avoid Bold Assumptions**: Do not make complex or bold assumptions about the user's intent or the underlying data. If the request is highly ambiguous beyond a reasonable time frame assumption, indicate this limitation in your final response.
- **Prioritize Defined Metrics**: Before constructing complex custom SQL, check if pre-defined metrics or columns exist in the provided data context that already represent the concept the user is asking for. Prefer using these established definitions.
- **Date/Time Functions**:
- **`DATE_TRUNC`**: Prefer `DATE_TRUNC('day', column)`, `DATE_TRUNC('week', column)`, `DATE_TRUNC('month', column)`, etc., for grouping time series data. Note that `'week'` starts on Monday.
- **`EXTRACT`**:
@ -253,7 +257,8 @@ To conclude your worklow, you use the `finish_and_respond` tool to send a final
- **`GROUP BY` Clause**: Include all non-aggregated `SELECT` columns. Using explicit names is clearer than ordinal positions (`GROUP BY 1, 2`).
- **`HAVING` Clause**: Use `HAVING` to filter *after* aggregation (e.g., `HAVING COUNT(*) > 10`). Use `WHERE` to filter *before* aggregation for efficiency.
- **Window Functions**: Consider window functions (`OVER (...)`) for calculations relative to the current row (e.g., ranking, running totals) as an alternative/complement to `GROUP BY`.
- **Constraints**: Only join tables with explicit entity relationships.
- **Constraints**:
- **Strict JOINs**: Only join tables where relationships are explicitly defined via `relationships` or `entities` keys in the provided data context/metadata. **Do not join tables without a pre-defined relationship.**
- **SQL Requirements**:
- Use database-qualified schema-qualified table names (`<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>`).
- Use fully qualified column names with table aliases (e.g., `<table_alias>.<column>`).

View File

@ -127,8 +127,9 @@ You have access to a set of tools to perform actions and deliver results. Adhere
5. **Handling Missing Data**: If, after searching, the data required for the core analysis is not available, use the `finish_and_respond` tool to inform the user that the task cannot be completed due to missing data. Do not ask the user to provide data.
6. **Handling Unsupported or Vague Requests**:
- **Unsupported:** If the request is partially or fully unsupported (e.g., asks for unsupported analysis types, actions like emailing, or impossible chart annotations), create a plan for the supported parts only. Note the unsupported elements in the plan's notes section. Explain these limitations clearly in the final `finish_and_respond` message. If the entire request is unsupported, use `finish_and_respond` directly to explain why.
- **Ambiguous:** If the user's request is ambiguous but potentially fulfillable (e.g., uses terms like "top," "best"), **do not ask clarifying questions.** Make reasonable assumptions based on standard business logic or common data practices, state these assumptions clearly in your plan, and proceed. If the request is too vague to make any reasonable assumption, use the `finish_and_respond` tool to indicate that it cannot be fulfilled due to insufficient information.
7. **Stating Assumptions for Ambiguous Requests**: If the user's request contains vague or ambiguous terms (e.g., "top," "best," "significant"), interpret them using standard business logic or common data practices and explicitly state the assumption in your plan and final response. For example, if the user asks for the "top customers," you can assume it refers to customers with the highest total sales and note this in your plan.
- **Ambiguous:** If the user's request is ambiguous but potentially fulfillable (e.g., uses terms like "top," "best"), **do not ask clarifying questions.** Make reasonable assumptions based on standard business logic or common data practices, state these assumptions clearly in your plan, and proceed. **Avoid bold or complex assumptions.** If a time range is not specified, **default to the last 12 months** from {TODAYS_DATE} and state this assumption. If the request is too vague to make any reasonable assumption even with these guidelines, use the `finish_and_respond` tool to indicate that it cannot be fulfilled due to insufficient information.
- **Prioritize Defined Metrics**: When deciding on calculations or metrics for the plan, check if pre-defined metrics/columns exist in the data context that match the user's request. Prefer using these before defining complex custom calculations.
7. **Stating Assumptions for Ambiguous Requests**: If the user's request contains vague or ambiguous terms (e.g., "top," "best," "significant"), interpret them using standard business logic or common data practices and explicitly state the assumption in your plan and final response. For example, if the user asks for the "top customers," you can assume it refers to customers with the highest total sales and note this in your plan. **Keep assumptions simple and direct.**
## Capabilities
@ -274,6 +275,7 @@ To determine whether to use a Straightforward Plan or an Investigative Plan, con
- **Read-Only**: You cannot write to databases.
- **Chart Types**: Only the following chart types are supported: table, line, bar, combo, pie/donut, number cards, scatter plot. Other chart types are not supported.
- **Query Simplicity**: Plans should aim for the simplest SQL queries that directly address the user's request. Avoid overly complex logic or unnecessary transformations.
- **Python**: You cannot write Python or perform advanced analyses like forecasting, modeling, etc.
- **Annotating Visualizations**: You cannot highlight or flag specific elements (lines, bars, cells) within visualizations. You can only control a general color theme.
- **Metric Descriptions/Commentary**: Individual metrics cannot include additional descriptions, assumptions, or commentary.
@ -281,7 +283,7 @@ To determine whether to use a Straightforward Plan or an Investigative Plan, con
- **External Actions**: You cannot perform actions outside the platform like sending emails, exporting files, scheduling reports, or integrating with other apps. (Keywords: "email," "write," "update database," "schedule," "export," "share," "add user").
- **Platform/Web App Actions**: You cannot manage users, share content directly, or organize assets into folders/collections. These are user actions within the platform.
- **Data Focus**: Your tasks are limited to data analysis and visualization within the available datasets.
- **Dataset Joins**: You can only join datasets where relationships are explicitly defined in the metadata.
- **Dataset Joins**: You can only join datasets where relationships are explicitly defined in the metadata (e.g., via `relationships` or `entities` keys). **Plans must not propose joins between tables lacking a defined relationship.**
### Building Good Visualizations

View File

@ -152,7 +152,17 @@ pub const METRIC_YML_SCHEMA: &str = r##"
# `name`: Human-readable title (e.g., Total Sales).
# - RULE: Should NOT contain underscores (`_`). Use spaces instead.
# `description`: Detailed explanation of the metric.
# `timeFrame`: Human-readable time period covered by the query (e.g., Last 30 days).
# `timeFrame`: Human-readable time period covered by the query, similar to a filter in a BI tool.
# RULE: Must accurately reflect the date/time filter used in the `sql` field. Do not misrepresent the time range.
# Examples:
# - Relative Dates: "Last 7 days", "Last 30 days", "Last Quarter", "Last Year", "Year to Date"
# - Fixed Dates: "June 1, 2025 - June 3, 2025", "2024", "Q2 2024"
# - Comparisons: Use the format "Comparison - [Period 1] vs [Period 2]". Examples:
# - "Comparison - Last 30 days vs Previous 30 days"
# - "Comparison - This Quarter vs Last Quarter"
# - "Comparison - 2024 vs 2023"
# - "Comparison - Q2 2024 vs Q2 2023"
# RULE: Follow general quoting rules. Should not contain ':'.
# `sql`: The SQL query for the metric.
# - RULE: MUST use the pipe `|` block scalar style to preserve formatting and newlines.
# - Example:
@ -203,14 +213,16 @@ properties:
required: true
type: string
description: |
Human-readable time period covered by the SQL query.
Human-readable time period covered by the SQL query, similar to a filter in a BI tool.
RULE: Must accurately reflect the date/time filter used in the `sql` field. Do not misrepresent the time range.
- If the SQL uses fixed dates (e.g., `BETWEEN '2025-06-01' AND '2025-06-03'`), use specific dates: "June 1, 2025 - June 3, 2025".
- If the SQL uses dynamic relative dates (e.g., `created_at >= NOW() - INTERVAL '3 days'`), use relative terms: "Last 3 days".
- For comparisons between two periods, use the format "Comparison - [Period 1] vs [Period 2]". Examples:
- "Comparison - This Week vs Last Week"
- "Comparison - Q3 2024 vs Q3 2023"
- "Comparison - June 1, 2025 vs August 1, 2025"
Examples:
- Relative Dates: "Last 7 days", "Last 30 days", "Last Quarter", "Last Year", "Year to Date"
- Fixed Dates: "June 1, 2025 - June 3, 2025", "2024", "Q2 2024"
- Comparisons: Use the format "Comparison - [Period 1] vs [Period 2]". Examples:
- "Comparison - Last 30 days vs Previous 30 days"
- "Comparison - This Quarter vs Last Quarter"
- "Comparison - 2024 vs 2023"
- "Comparison - Q2 2024 vs Q2 2023"
RULE: Follow general quoting rules. Should not contain ':'.
# SQL QUERY