Merge pull request #1028 from buster-so/dallin-bus-1859-generate-series-instruction-to-fill-missing-dates-needs-to

Dallin bus 1859 generate series instruction to fill missing dates needs to be dialect specific
This commit is contained in:
dal 2025-09-22 09:53:45 -06:00 committed by GitHub
commit b6ad7a927b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 69 additions and 87 deletions

View File

@ -371,32 +371,8 @@ You operate in a loop to complete tasks:
- Generate SQL queries using only native SQL constructs, such as CURRENT_DATE, that can be directly executed in a SQL environment without requiring prepared statements, parameterized queries, or string formatting like {{variable}}.
- You are not able to build interactive dashboards and metrics that allow users to change the filters, you can only build static dashboards and metrics.
- Consider potential data duplication and apply deduplication techniques (e.g., `DISTINCT`, `GROUP BY`) where necessary.
- Fill Missing Values: For metrics, especially in time series, fill potentially missing values (NULLs) using `COALESCE(<column>, 0)` to default them to zero, ensuring continuous data unless the user specifically requests otherwise.
- Handle Missing Time Periods: When creating time series visualizations, ensure ALL requested time periods are represented, even when no underlying data exists for certain periods. This is critical for avoiding confusing gaps in charts and tables.
- **Generate Complete Date Ranges**: Use `generate_series()` to create a complete series of dates/periods, then LEFT JOIN with your actual data:
```sql
WITH date_series AS (
SELECT generate_series(
DATE_TRUNC('month', CURRENT_DATE - INTERVAL '11 months'),
DATE_TRUNC('month', CURRENT_DATE),
INTERVAL '1 month'
)::date AS period_start
)
SELECT
ds.period_start,
COALESCE(SUM(t.amount), 0) AS total_amount
FROM date_series ds
LEFT JOIN database.schema.transactions t ON DATE_TRUNC('month', t.date) = ds.period_start
GROUP BY ds.period_start
ORDER BY ds.period_start;
```
- **Common Time Period Patterns**:
- Daily: `generate_series(start_date, end_date, INTERVAL '1 day')`
- Weekly: `generate_series(DATE_TRUNC('week', start_date), DATE_TRUNC('week', end_date), INTERVAL '1 week')`
- Monthly: `generate_series(DATE_TRUNC('month', start_date), DATE_TRUNC('month', end_date), INTERVAL '1 month')`
- Quarterly: `generate_series(DATE_TRUNC('quarter', start_date), DATE_TRUNC('quarter', end_date), INTERVAL '3 months')`
- **Always use LEFT JOIN**: Join the generated date series with your data tables, not the other way around, to preserve all time periods.
- **Default Missing Values**: Use `COALESCE()` or `ISNULL()` to convert NULLs to appropriate defaults (usually 0 for counts/sums, but consider the context).
- Fill Missing Values: For metrics, especially in time series, fill potentially missing values (NULLs) using appropriate null-handling functions to default them to zero, ensuring continuous data unless the user specifically requests otherwise.
- Handle Missing Time Periods: When creating time series visualizations, ensure ALL requested time periods are represented, even when no underlying data exists for certain periods. This is critical for avoiding confusing gaps in charts and tables. Refer to the SQL dialect-specific guidance for the appropriate method to generate complete date ranges for your database.
</sql_best_practices>
<visualization_and_charting_guidelines>

View File

@ -18,7 +18,13 @@ export const SQL_DIALECT_GUIDANCE = {
- **Aggregations**: Use \`FILTER (WHERE condition)\` for conditional aggregations instead of \`CASE WHEN\`
- **JSON Operations**: PostgreSQL excels at JSON - use \`->\`, \`->>\`, \`@>\`, \`?\` operators for JSON queries
- **Window Functions**: Powerful for analytics - \`LAG()\`, \`LEAD()\`, \`FIRST_VALUE()\`, \`PERCENT_RANK()\`
- **Date Spine Example**: For complete time series, use \`generate_series()\`:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented, even when no underlying data exists. Use \`generate_series()\` to create complete date ranges:
- Daily: \`generate_series(start_date, end_date, INTERVAL '1 day')\`
- Weekly: \`generate_series(DATE_TRUNC('week', start_date), DATE_TRUNC('week', end_date), INTERVAL '1 week')\`
- Monthly: \`generate_series(DATE_TRUNC('month', start_date), DATE_TRUNC('month', end_date), INTERVAL '1 month')\`
- Quarterly: \`generate_series(DATE_TRUNC('quarter', start_date), DATE_TRUNC('quarter', end_date), INTERVAL '3 months')\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` to default missing values to 0
\`\`\`sql
WITH date_spine AS (
SELECT generate_series(
@ -51,7 +57,13 @@ export const SQL_DIALECT_GUIDANCE = {
- For large tables, use \`SAMPLE\` for quick data exploration: \`SELECT * FROM table SAMPLE (10 PERCENT)\`
- **String Functions**: \`ILIKE\` for case-insensitive, \`REGEXP_LIKE()\` for patterns, \`SPLIT_PART()\` for string splitting
- **Semi-Structured Data**: Excellent JSON/XML support with \`VARIANT\` type, \`PARSE_JSON()\`, \`FLATTEN()\`
- **Date Spine Example**: For complete time series, use table functions:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented. Use \`TABLE(GENERATOR())\` to create complete date ranges:
- Daily: \`DATEADD('day', seq, start_date) FROM TABLE(GENERATOR(ROWCOUNT => days_needed))\`
- Weekly: \`DATEADD('week', seq, DATE_TRUNC('WEEK', start_date)) FROM TABLE(GENERATOR(ROWCOUNT => weeks_needed))\`
- Monthly: \`DATEADD('month', seq, DATE_TRUNC('MONTH', start_date)) FROM TABLE(GENERATOR(ROWCOUNT => months_needed))\`
- Quarterly: \`DATEADD('quarter', seq, DATE_TRUNC('QUARTER', start_date)) FROM TABLE(GENERATOR(ROWCOUNT => quarters_needed))\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` or \`IFNULL()\` to default missing values to 0
\`\`\`sql
WITH date_spine AS (
SELECT DATEADD('month', ROW_NUMBER() OVER (ORDER BY NULL) - 1,
@ -86,7 +98,13 @@ export const SQL_DIALECT_GUIDANCE = {
- **Array/Struct Operations**: \`UNNEST()\` for array expansion, \`ARRAY_AGG()\` for aggregation into arrays
- **Geographic Functions**: \`ST_GEOGPOINT()\`, \`ST_DISTANCE()\`, \`ST_WITHIN()\` for spatial analysis
- **Cost Optimization**: Use \`LIMIT\` early, partition pruning, and clustering for cost control
- **Date Spine Example**: For complete time series, use \`GENERATE_DATE_ARRAY()\`:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented. Use \`GENERATE_DATE_ARRAY()\` or \`GENERATE_TIMESTAMP_ARRAY()\` to create complete date ranges:
- Daily: \`GENERATE_DATE_ARRAY(start_date, end_date, INTERVAL 1 DAY)\`
- Weekly: \`GENERATE_DATE_ARRAY(DATE_TRUNC(start_date, WEEK), DATE_TRUNC(end_date, WEEK), INTERVAL 1 WEEK)\`
- Monthly: \`GENERATE_DATE_ARRAY(DATE_TRUNC(start_date, MONTH), DATE_TRUNC(end_date, MONTH), INTERVAL 1 MONTH)\`
- Quarterly: \`GENERATE_DATE_ARRAY(DATE_TRUNC(start_date, QUARTER), DATE_TRUNC(end_date, QUARTER), INTERVAL 1 QUARTER)\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` or \`IFNULL()\` to default missing values to 0
\`\`\`sql
WITH date_spine AS (
SELECT period_date
@ -124,7 +142,13 @@ export const SQL_DIALECT_GUIDANCE = {
- **Column Compression**: Automatically chooses compression encodings, but can specify manually
- **Vacuum Operations**: Regular \`VACUUM\` and \`ANALYZE\` for optimal performance
- **Concurrency**: Use workload management (WLM) to optimize query concurrency
- **Date Spine Example**: For complete time series, use row numbers with dateadd:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented. Use row numbers with \`DATEADD\` to create complete date ranges:
- Daily: Generate rows then \`DATEADD(day, row_num - 1, start_date)\`
- Weekly: Generate rows then \`DATEADD(week, row_num - 1, DATE_TRUNC('week', start_date))\`
- Monthly: Generate rows then \`DATEADD(month, row_num - 1, DATE_TRUNC('month', start_date))\`
- Quarterly: Generate rows then \`DATEADD(quarter, row_num - 1, DATE_TRUNC('quarter', start_date))\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` or \`NVL()\` to default missing values to 0
\`\`\`sql
WITH date_spine AS (
SELECT DATEADD(month, row_number() OVER (ORDER BY 1) - 1,
@ -155,7 +179,13 @@ export const SQL_DIALECT_GUIDANCE = {
- **Current Date/Time**: \`CURDATE()\`, \`NOW()\`, \`CURRENT_TIMESTAMP\`.
- **String Functions**: \`REGEXP\` for pattern matching, \`SUBSTRING_INDEX()\` for splitting, \`CONCAT()\` for concatenation
- **JSON Support**: \`JSON_EXTRACT()\`, \`JSON_UNQUOTE()\`, \`JSON_ARRAY()\` for JSON operations (MySQL 5.7+)
- **Date Spine Example**: For complete time series, use recursive CTE or number series:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented. Use recursive CTE to create complete date ranges:
- Daily: Recursive CTE with \`DATE_ADD(period_date, INTERVAL 1 DAY)\`
- Weekly: Recursive CTE with \`DATE_ADD(period_date, INTERVAL 1 WEEK)\`
- Monthly: Recursive CTE with \`DATE_ADD(period_date, INTERVAL 1 MONTH)\`
- Quarterly: Recursive CTE with \`DATE_ADD(period_date, INTERVAL 3 MONTH)\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` or \`IFNULL()\` to default missing values to 0
\`\`\`sql
WITH RECURSIVE date_spine AS (
SELECT DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 11 MONTH), '%Y-%m-01') AS period_date
@ -186,7 +216,13 @@ export const SQL_DIALECT_GUIDANCE = {
- **Current Date/Time**: \`CURDATE()\`, \`NOW()\`, \`CURRENT_TIMESTAMP\`.
- **Performance Tips**: Same as MySQL plus MariaDB-specific optimizations like improved optimizer
- **String Functions**: Enhanced regex support compared to MySQL, \`CONCAT_WS()\` for separator-based concatenation
- **Date Spine Example**: For complete time series, use recursive CTE or number series:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented. Use recursive CTE to create complete date ranges:
- Daily: Recursive CTE with \`DATE_ADD(period_date, INTERVAL 1 DAY)\`
- Weekly: Recursive CTE with \`DATE_ADD(period_date, INTERVAL 1 WEEK)\`
- Monthly: Recursive CTE with \`DATE_ADD(period_date, INTERVAL 1 MONTH)\`
- Quarterly: Recursive CTE with \`DATE_ADD(period_date, INTERVAL 3 MONTH)\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` or \`IFNULL()\` to default missing values to 0
\`\`\`sql
WITH RECURSIVE date_spine AS (
SELECT DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 11 MONTH), '%Y-%m-01') AS period_date
@ -215,7 +251,13 @@ export const SQL_DIALECT_GUIDANCE = {
- **Window Functions**: Advanced analytics with \`OVER()\` clause, \`LAG()\`, \`LEAD()\`, \`ROW_NUMBER()\`
- **JSON Support**: \`OPENJSON()\`, \`JSON_VALUE()\`, \`JSON_QUERY()\` for JSON operations (SQL Server 2016+)
- **Memory-Optimized Tables**: In-memory OLTP for high-performance scenarios
- **Date Spine Example**: For complete time series, use recursive CTE or VALUES table:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented. Use recursive CTE or VALUES table to create complete date ranges:
- Daily: VALUES table with \`DATEADD(day, n.num, start_date)\`
- Weekly: VALUES table with \`DATEADD(week, n.num, DATE_TRUNC('week', start_date))\`
- Monthly: VALUES table with \`DATEADD(month, n.num, DATE_TRUNC('month', start_date))\`
- Quarterly: VALUES table with \`DATEADD(quarter, n.num, DATE_TRUNC('quarter', start_date))\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` or \`ISNULL()\` to default missing values to 0
\`\`\`sql
WITH date_spine AS (
SELECT DATEADD(month, n.num,
@ -246,7 +288,13 @@ export const SQL_DIALECT_GUIDANCE = {
- **String Functions**: \`regexp_extract()\`, \`split()\`, \`concat_ws()\` for string manipulation
- **Delta Lake Features**: Time travel with \`@v123\` or \`TIMESTAMP AS OF\`, ACID transactions
- **ML Integration**: Built-in MLflow integration, \`ML.PREDICT()\` for model inference
- **Date Spine Example**: For complete time series, use sequence function:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented. Use \`sequence()\` function to create complete date ranges:
- Daily: \`sequence(to_date(start_date), to_date(end_date), interval 1 day)\`
- Weekly: \`sequence(date_trunc('WEEK', start_date), date_trunc('WEEK', end_date), interval 1 week)\`
- Monthly: \`sequence(date_trunc('MONTH', start_date), date_trunc('MONTH', end_date), interval 1 month)\`
- Quarterly: \`sequence(date_trunc('QUARTER', start_date), date_trunc('QUARTER', end_date), interval 3 month)\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` or \`NVL()\` to default missing values to 0
\`\`\`sql
WITH date_spine AS (
SELECT add_months(DATE_TRUNC('MONTH', add_months(current_date(), -11)), n) AS period_date
@ -271,7 +319,13 @@ export const SQL_DIALECT_GUIDANCE = {
- **\`EXTRACT\`**: \`EXTRACT(DOW FROM column)\` (0=Sun), \`EXTRACT(ISODOW FROM column)\` (1=Mon), \`EXTRACT(WEEK FROM column)\`, \`EXTRACT(EPOCH FROM column)\` (Unix timestamp).
- **Intervals**: Use \`INTERVAL '1 day'\`, \`INTERVAL '1 month'\`, etc.
- **Current Date/Time**: \`CURRENT_DATE\`, \`CURRENT_TIMESTAMP\`, \`NOW()\`.
- **Date Spine Example**: For complete time series, use \`generate_series()\`:
- **Handling Missing Time Periods**: When creating time series visualizations, ensure ALL requested time periods are represented, even when no underlying data exists. Use \`generate_series()\` to create complete date ranges:
- Daily: \`generate_series(start_date, end_date, INTERVAL '1 day')\`
- Weekly: \`generate_series(DATE_TRUNC('week', start_date), DATE_TRUNC('week', end_date), INTERVAL '1 week')\`
- Monthly: \`generate_series(DATE_TRUNC('month', start_date), DATE_TRUNC('month', end_date), INTERVAL '1 month')\`
- Quarterly: \`generate_series(DATE_TRUNC('quarter', start_date), DATE_TRUNC('quarter', end_date), INTERVAL '3 months')\`
- Always LEFT JOIN the date spine with your data tables
- Use \`COALESCE()\` to default missing values to 0
\`\`\`sql
WITH date_spine AS (
SELECT generate_series(

View File

@ -560,32 +560,8 @@ If all true → proceed to submit prep for Asset Creation with `submitThoughts`.
- Avoid division by zero errors by using NULLIF() or CASE statements (e.g., `SELECT amount / NULLIF(quantity, 0)` or `CASE WHEN quantity = 0 THEN NULL ELSE amount / quantity END`).
- Generate SQL queries using only native SQL constructs, such as CURRENT_DATE, that can be directly executed in a SQL environment without requiring prepared statements, parameterized queries, or string formatting like {{variable}}.
- Consider potential data duplication and apply deduplication techniques (e.g., `DISTINCT`, `GROUP BY`) where necessary.
- Fill Missing Values: For metrics, especially in time series, fill potentially missing values (NULLs) using `COALESCE(<column>, 0)` to default them to zero, ensuring continuous data unless the user specifically requests otherwise.
- Handle Missing Time Periods: When creating time series visualizations, ensure ALL requested time periods are represented, even when no underlying data exists for certain periods. This is critical for avoiding confusing gaps in charts and tables.
- **Generate Complete Date Ranges**: Use `generate_series()` to create a complete series of dates/periods, then LEFT JOIN with your actual data:
```sql
WITH date_series AS (
SELECT generate_series(
DATE_TRUNC('month', CURRENT_DATE - INTERVAL '11 months'),
DATE_TRUNC('month', CURRENT_DATE),
INTERVAL '1 month'
)::date AS period_start
)
SELECT
ds.period_start,
COALESCE(SUM(t.amount), 0) AS total_amount
FROM date_series ds
LEFT JOIN database.schema.transactions t ON DATE_TRUNC('month', t.date) = ds.period_start
GROUP BY ds.period_start
ORDER BY ds.period_start;
```
- **Common Time Period Patterns**:
- Daily: `generate_series(start_date, end_date, INTERVAL '1 day')`
- Weekly: `generate_series(DATE_TRUNC('week', start_date), DATE_TRUNC('week', end_date), INTERVAL '1 week')`
- Monthly: `generate_series(DATE_TRUNC('month', start_date), DATE_TRUNC('month', end_date), INTERVAL '1 month')`
- Quarterly: `generate_series(DATE_TRUNC('quarter', start_date), DATE_TRUNC('quarter', end_date), INTERVAL '3 months')`
- **Always use LEFT JOIN**: Join the generated date series with your data tables, not the other way around, to preserve all time periods.
- **Default Missing Values**: Use `COALESCE()` or `ISNULL()` to convert NULLs to appropriate defaults (usually 0 for counts/sums, but consider the context).
- Fill Missing Values: For metrics, especially in time series, fill potentially missing values (NULLs) using appropriate null-handling functions to default them to zero, ensuring continuous data unless the user specifically requests otherwise.
- Handle Missing Time Periods: When creating time series visualizations, ensure ALL requested time periods are represented, even when no underlying data exists for certain periods. This is critical for avoiding confusing gaps in charts and tables. Refer to the SQL dialect-specific guidance for the appropriate method to generate complete date ranges for your database.
</sql_best_practices>

View File

@ -482,32 +482,8 @@ When in doubt, be more thorough rather than less. Reports are the default becaus
- Avoid division by zero errors by using NULLIF() or CASE statements (e.g., `SELECT amount / NULLIF(quantity, 0)` or `CASE WHEN quantity = 0 THEN NULL ELSE amount / quantity END`).
- Generate SQL queries using only native SQL constructs, such as CURRENT_DATE, that can be directly executed in a SQL environment without requiring prepared statements, parameterized queries, or string formatting like {{variable}}.
- Consider potential data duplication and apply deduplication techniques (e.g., `DISTINCT`, `GROUP BY`) where necessary.
- Fill Missing Values: For metrics, especially in time series, fill potentially missing values (NULLs) using `COALESCE(<column>, 0)` to default them to zero, ensuring continuous data unless the user specifically requests otherwise.
- Handle Missing Time Periods: When creating time series visualizations, ensure ALL requested time periods are represented, even when no underlying data exists for certain periods. This is critical for avoiding confusing gaps in charts and tables.
- **Generate Complete Date Ranges**: Use `generate_series()` to create a complete series of dates/periods, then LEFT JOIN with your actual data:
```sql
WITH date_series AS (
SELECT generate_series(
DATE_TRUNC('month', CURRENT_DATE - INTERVAL '11 months'),
DATE_TRUNC('month', CURRENT_DATE),
INTERVAL '1 month'
)::date AS period_start
)
SELECT
ds.period_start,
COALESCE(SUM(t.amount), 0) AS total_amount
FROM date_series ds
LEFT JOIN database.schema.transactions t ON DATE_TRUNC('month', t.date) = ds.period_start
GROUP BY ds.period_start
ORDER BY ds.period_start;
```
- **Common Time Period Patterns**:
- Daily: `generate_series(start_date, end_date, INTERVAL '1 day')`
- Weekly: `generate_series(DATE_TRUNC('week', start_date), DATE_TRUNC('week', end_date), INTERVAL '1 week')`
- Monthly: `generate_series(DATE_TRUNC('month', start_date), DATE_TRUNC('month', end_date), INTERVAL '1 month')`
- Quarterly: `generate_series(DATE_TRUNC('quarter', start_date), DATE_TRUNC('quarter', end_date), INTERVAL '3 months')`
- **Always use LEFT JOIN**: Join the generated date series with your data tables, not the other way around, to preserve all time periods.
- **Default Missing Values**: Use `COALESCE()` or `ISNULL()` to convert NULLs to appropriate defaults (usually 0 for counts/sums, but consider the context).
- Fill Missing Values: For metrics, especially in time series, fill potentially missing values (NULLs) using appropriate null-handling functions to default them to zero, ensuring continuous data unless the user specifically requests otherwise.
- Handle Missing Time Periods: When creating time series visualizations, ensure ALL requested time periods are represented, even when no underlying data exists for certain periods. This is critical for avoiding confusing gaps in charts and tables. Refer to the SQL dialect-specific guidance for the appropriate method to generate complete date ranges for your database.
</sql_best_practices>
<dashboard_rules>