From d0aad7c6af9c7ef2e4f3a3649218d1d9d584e1b4 Mon Sep 17 00:00:00 2001 From: Jacob Anderson Date: Mon, 7 Jul 2025 10:54:11 -0600 Subject: [PATCH 1/2] Fixes for horizontal bar chart and precomputed metric logic --- packages/ai/package.json | 4 +++- .../analyst-agent-instructions.ts | 1 - .../think-and-prep-instructions.ts | 24 ++++++++++++++++++- 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/packages/ai/package.json b/packages/ai/package.json index 37cd01552..26aa2e821 100644 --- a/packages/ai/package.json +++ b/packages/ai/package.json @@ -23,7 +23,9 @@ "test": "vitest run", "test:watch": "vitest watch", "test:coverage": "vitest run --coverage", - "eval:metrics": "npx braintrust eval evals/agents/analyst-agent/metrics" + "eval:metrics": "npx braintrust eval evals/agents/analyst-agent/metrics", + "braintrust:push": "npx braintrust push evals/agents/analyst-agent/metrics/test_scorers.ts", + "braintrust:push:staged": "npx braintrust push evals/agents/analyst-agent/metrics/staged_scorers.ts" }, "dependencies": { "@ai-sdk/anthropic": "^1.2.12", diff --git a/packages/ai/src/agents/analyst-agent/analyst-agent-instructions.ts b/packages/ai/src/agents/analyst-agent/analyst-agent-instructions.ts index 7f2184875..38495fd74 100644 --- a/packages/ai/src/agents/analyst-agent/analyst-agent-instructions.ts +++ b/packages/ai/src/agents/analyst-agent/analyst-agent-instructions.ts @@ -237,7 +237,6 @@ ${params.sqlDialectGuidance} - For requests identifying a single item (e.g., "the product with the most revenue"), include the item name in the title or description (e.g., "Revenue of Top Product: Product X - $500") - Number cards should always have a metricHeader and metricSubheader. - Always use your best judgment when selecting visualization types, and be confident in your decision - - For horizontal bar charts, use the same axis logic as vertical bar charts, flipping the x and y axis will be handled on the front end. - When building horizontal bar charts, put your desired x-axis as the y and the desired y-axis as the x in chartConfig (e.g. if i want my y-axis to be the product name and my x-axis to be the revenue, in my chartConfig i would do barAndLineAxis: x: [product_name] y: [revenue] and allow the front end to handle the horizontal orientation) - Visualization Design Guidelines - Always display names instead of IDs when available (e.g., "Product Name" instead of "Product ID") diff --git a/packages/ai/src/agents/think-and-prep-agent/think-and-prep-instructions.ts b/packages/ai/src/agents/think-and-prep-agent/think-and-prep-instructions.ts index 5105db642..d732a1109 100644 --- a/packages/ai/src/agents/think-and-prep-agent/think-and-prep-instructions.ts +++ b/packages/ai/src/agents/think-and-prep-agent/think-and-prep-instructions.ts @@ -148,8 +148,16 @@ Once all TODO list items are addressed and submitted for review, the system will - Avoid defaulting to COUNT(DISTINCT) without evaluating alternatives. Compare SUM, COUNT, and other functions against the query’s goal, considering whether volume, frequency, or proportions are most relevant. For example, when analyzing customer preferences, evaluate whether counting unique purchases or summing quantities better represents the trend. Choose the function that minimizes distortion. - Clarify the meaning of "most" in the query's context before selecting an aggregation function. Evaluate whether "most" refers to total volume (e.g., total units) or frequency (e.g., number of events) by analyzing the entity and metric, and prefer SUM for volume unless frequency is explicitly indicated. For example, when asked for the item with the most issues, sum the issue quantities unless the query specifies counting incidents. Validate the choice with executeSql to ensure alignment with intent. The best practice is typically to look for total volume instead of frequency unless there is a specific reason to use frequency. - Explain why you chose the aggregation function you did. Review your explanation and make changes if it does not adhere to the . +- Before building custom metrics, first scan the database context for existing precomputed metrics that could answer the query. Only build custom calculations if no suitable precomputed metrics exist. +- When building custom metrics, leverage existing precomputed metrics as building blocks rather than starting from raw data. This ensures accuracy and performance by using already-validated calculations. + +- When building horizontal bar charts, configure the axes as you would for a regular vertical bar chart - put categories (labels) on the x-axis and values (quantities) on the y-axis in chartConfig. The frontend will automatically handle the horizontal orientation transformation (e.g. if you want to show revenue by product name horizontally, use barAndLineAxis: x: [product_name] y: [revenue] - Chart.js will automatically display product names along the left side and revenue extending rightward) +- When building bar charts, always assign the category/label to the x-axis and the value/quantity to the y-axis. If you are building a horizontal bar chart, just use barLayout: horizontal and allow the chart builder to handle the axis changes. Always explain your reasoning for axis configuration in your thoughts. Evaluate that your explanation adheres +- Use horizontal bar charts (barLayout: horizontal) when creating rankings or "top N" visualizations, such as "top 10 customers by revenue", "best performing products", "highest sales by region", or any scenario where you are ranking categories by a metric. Horizontal orientation is more readable for ranking data. + + - A "thought" is a single use of the \`sequentialThinking\` tool to record your reasoning and efficiently/thoroughly resolve TODO list items. - Begin by attempting to address all TODO items in your first thought based on the available documentation. @@ -166,9 +174,17 @@ Once all TODO list items are addressed and submitted for review, the system will - If flagged items remain, set "totalThoughts" to "1 + (number of items likely needed)" - If you set "totalThoughts" to a specified number, but have sufficiently addressed all TODO list items earlier than anticipated, you should not continue recording thoughts. Instead, set "nextThoughtNeeded" to "false" and "needsMoreThoughts" to "false" and disregard the remaining thought count you previously set in "totalThoughts" - Explore the database schema thoroughly to map query components to relevant tables, columns, and relationships, validating selections with schema metadata or sample data. +- **PRECOMPUTED METRIC EVALUATION**: For TODO items involving calculations, metrics, or aggregations, always check for existing precomputed metrics first: + 1. **Scan the database context** for any precomputed metrics that could answer the query + 2. **List relevant precomputed metrics** you find and evaluate their applicability + 3. **Justify your decision** to use or exclude each precomputed metric + 4. **State your conclusion**: either "Using precomputed metric: [name]" or "No suitable precomputed metrics found" + 5. **Only proceed with raw data calculations** if no suitable precomputed metrics exist +- Precomputed metrics are preferred over building custom calculations from raw data for accuracy and performance. +- After evaluating precomputed metrics, ensure your approach still adheres to and . - Adhere to the when constructing filters or selecting data for analysis. Apply these practices to ensure filters are precise, direct, and aligned with the query's intent, validating filter accuracy with executeSql as needed. - Apply the when selecting aggregation functions, ensuring the chosen function (e.g., SUM, COUNT) matches the query's intent and data structure, validated with executeSql. -- When building horizontal bar charts, put your desired x-axis as the y and the desired y-axis as the x in chartConfig (e.g. if i want my y-axis to be the product name and my x-axis to be the revenue, in my chartConfig i would do barAndLineAxis: x: [product_name] y: [revenue] and allow the front end to handle the horizontal orientation) +- When building bar charts, Adhere to the when building bar charts. Explain how you adhere to each guideline from the best practices in your thoughts. @@ -189,6 +205,7 @@ Once all TODO list items are addressed and submitted for review, the system will - \`SELECT DISTINCT shipment_status FROM orders LIMIT 25\` *Be careful of queries that will drown out the exact text you're looking for if the ILIKE queries can return too many results* - Use this tool if you're unsure about data in the database, what it looks like, or if it exists. + - Use this tool to understand how numbers are stored in the database. If you need to do a calculation, make sure to use the \`executeSql\` tool to understand how the numbers are stored and then use the correct aggregation function. - Do *not* use this tool to construct a final analytical query(s) for visualizations, this is only used for identifying undocumented text or enum values - Do *not* use this tool to query system level tables (e.g., information schema, show commands, etc) - Do *not* use this tool to query/check for tables or columns that are not explicitly included in the documentation (all available tables/columns are included in the documentation) @@ -319,6 +336,10 @@ Once all TODO list items are addressed and submitted for review, the system will (Only the metric and Doug Smith filter are included at this stage.) - Follow-up Request: "Only show his online sales." - Updated Title: Monthly Online Sales for Doug Smith +- Check for existing precomputed metrics first before planning new metrics + - Scan the database context for precomputed metrics that match the query intent + - Use existing metrics when possible, applying filters or aggregations as needed + - Document which precomputed metrics you evaluated and why you used or excluded them - Prioritize query simplicity when planning/building metrics - When building metrics, you should aim for the simplest SQL queries that still address the entirety of the user's request - Avoid overly complex logic or unnecessary transformations @@ -453,6 +474,7 @@ ${params.sqlDialectGuidance} - Display names instead of IDs when available (e.g., "Customer A" not "Cust123"). - For comparisons, use a single chart (e.g., bar chart for categories, line chart for time series). - For "top N" requests (e.g., "top products"), limit to top 10 unless specified otherwise. + - When building bar charts, Adhere to the when building bar charts. Explain how you adhere to each guideline from the best practices in your thoughts. - Planning and Description Guidelines - For grouped/stacked bar charts, specify the grouping/stacking field (e.g., "grouped by \`[field_name]\`"). - For bar charts with time units (e.g., days of the week, months, quarters, years) on the x-axis, sort the bars in chronological order rather than in ascending or descending order based on the y-axis measure. From aa24e709fea44fa64e168f319e81cec3c604915d Mon Sep 17 00:00:00 2001 From: Jacob Anderson Date: Mon, 7 Jul 2025 14:36:42 -0600 Subject: [PATCH 2/2] changes to precomputed metric and vertical/horziontal bar chart logic --- .../think-and-prep-instructions.ts | 36 +++++++++++-------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/packages/ai/src/agents/think-and-prep-agent/think-and-prep-instructions.ts b/packages/ai/src/agents/think-and-prep-agent/think-and-prep-instructions.ts index d732a1109..4f45d52ec 100644 --- a/packages/ai/src/agents/think-and-prep-agent/think-and-prep-instructions.ts +++ b/packages/ai/src/agents/think-and-prep-agent/think-and-prep-instructions.ts @@ -140,6 +140,23 @@ Once all TODO list items are addressed and submitted for review, the system will - Use dynamic filters based on descriptive attributes instead of static, hardcoded values to ensure robustness to dataset changes. Identify fields like category, material, or type that generalize the target condition, and avoid hardcoding specific identifiers like IDs. For example, when filtering for items with specific properties, use attribute fields like "material" or "category" rather than listing specific item IDs. Validate with executeSql to confirm the filter captures all relevant data, including potential new entries. + +- **CRITICAL FIRST STEP**: Before planning ANY calculations, metrics, aggregations, or data analysis approach, you MUST scan the database context for existing precomputed metrics +- **IMMEDIATE SCANNING REQUIREMENT**: The moment you identify a TODO item involves counting, summing, calculating, or analyzing data, your FIRST action must be to look for precomputed metrics that could solve the problem +- Follow this systematic evaluation process for TODO items involving calculations, metrics, or aggregations: + 1. **Scan the database context** for any precomputed metrics that could answer the query + 2. **List ALL relevant precomputed metrics** you find and evaluate their applicability + 3. **Justify your decision** to use or exclude each precomputed metric + 4. **State your conclusion**: either "Using precomputed metric: [name]" or "No suitable precomputed metrics found" + 5. **Only proceed with raw data calculations** if no suitable precomputed metrics exist +- Precomputed metrics are preferred over building custom calculations from raw data for accuracy and performance +- When building custom metrics, leverage existing precomputed metrics as building blocks rather than starting from raw data to ensure accuracy and performance by using already-validated calculations +- Scan the database context for precomputed metrics that match the query intent when planning new metrics +- Use existing metrics when possible, applying filters or aggregations as needed +- Document which precomputed metrics you evaluated and why you used or excluded them in your sequential thinking +- After evaluating precomputed metrics, ensure your approach still adheres to and + + - Determine the query’s aggregation intent by analyzing whether it seeks to measure total volume, frequency of occurrences, or proportional representation. Select aggregation functions that directly align with this intent. For example, when asked for the most popular item, clarify whether popularity means total units sold or number of transactions, then choose SUM or COUNT accordingly. Ensure the aggregation reflects the user’s goal. - Use SUM for aggregating quantitative measures like total items sold or amounts when the query focuses on volume. Check schema for fields representing quantities, such as order quantities or amounts, and apply SUM to those fields. For example, to find the top-selling product by volume, sum the quantity field rather than counting transactions. Avoid underrepresenting total impact. @@ -148,14 +165,12 @@ Once all TODO list items are addressed and submitted for review, the system will - Avoid defaulting to COUNT(DISTINCT) without evaluating alternatives. Compare SUM, COUNT, and other functions against the query’s goal, considering whether volume, frequency, or proportions are most relevant. For example, when analyzing customer preferences, evaluate whether counting unique purchases or summing quantities better represents the trend. Choose the function that minimizes distortion. - Clarify the meaning of "most" in the query's context before selecting an aggregation function. Evaluate whether "most" refers to total volume (e.g., total units) or frequency (e.g., number of events) by analyzing the entity and metric, and prefer SUM for volume unless frequency is explicitly indicated. For example, when asked for the item with the most issues, sum the issue quantities unless the query specifies counting incidents. Validate the choice with executeSql to ensure alignment with intent. The best practice is typically to look for total volume instead of frequency unless there is a specific reason to use frequency. - Explain why you chose the aggregation function you did. Review your explanation and make changes if it does not adhere to the . -- Before building custom metrics, first scan the database context for existing precomputed metrics that could answer the query. Only build custom calculations if no suitable precomputed metrics exist. -- When building custom metrics, leverage existing precomputed metrics as building blocks rather than starting from raw data. This ensures accuracy and performance by using already-validated calculations. +- **Chart orientation selection**: Use vertical bar charts (default) for general category comparisons and time series data. Use horizontal bar charts (barLayout: horizontal) for rankings, "top N" lists, or when category names are long and would be hard to read on the x-axis. - When building horizontal bar charts, configure the axes as you would for a regular vertical bar chart - put categories (labels) on the x-axis and values (quantities) on the y-axis in chartConfig. The frontend will automatically handle the horizontal orientation transformation (e.g. if you want to show revenue by product name horizontally, use barAndLineAxis: x: [product_name] y: [revenue] - Chart.js will automatically display product names along the left side and revenue extending rightward) - When building bar charts, always assign the category/label to the x-axis and the value/quantity to the y-axis. If you are building a horizontal bar chart, just use barLayout: horizontal and allow the chart builder to handle the axis changes. Always explain your reasoning for axis configuration in your thoughts. Evaluate that your explanation adheres -- Use horizontal bar charts (barLayout: horizontal) when creating rankings or "top N" visualizations, such as "top 10 customers by revenue", "best performing products", "highest sales by region", or any scenario where you are ranking categories by a metric. Horizontal orientation is more readable for ranking data. @@ -174,16 +189,10 @@ Once all TODO list items are addressed and submitted for review, the system will - If flagged items remain, set "totalThoughts" to "1 + (number of items likely needed)" - If you set "totalThoughts" to a specified number, but have sufficiently addressed all TODO list items earlier than anticipated, you should not continue recording thoughts. Instead, set "nextThoughtNeeded" to "false" and "needsMoreThoughts" to "false" and disregard the remaining thought count you previously set in "totalThoughts" - Explore the database schema thoroughly to map query components to relevant tables, columns, and relationships, validating selections with schema metadata or sample data. -- **PRECOMPUTED METRIC EVALUATION**: For TODO items involving calculations, metrics, or aggregations, always check for existing precomputed metrics first: - 1. **Scan the database context** for any precomputed metrics that could answer the query - 2. **List relevant precomputed metrics** you find and evaluate their applicability - 3. **Justify your decision** to use or exclude each precomputed metric - 4. **State your conclusion**: either "Using precomputed metric: [name]" or "No suitable precomputed metrics found" - 5. **Only proceed with raw data calculations** if no suitable precomputed metrics exist -- Precomputed metrics are preferred over building custom calculations from raw data for accuracy and performance. -- After evaluating precomputed metrics, ensure your approach still adheres to and . +- **PRECOMPUTED METRICS PRIORITY**: When you encounter any TODO item requiring calculations, counting, aggregations, or data analysis, immediately apply BEFORE planning any custom approach. Look for tables ending in '*_count', '*_metrics', '*_summary' etc. first. - Adhere to the when constructing filters or selecting data for analysis. Apply these practices to ensure filters are precise, direct, and aligned with the query's intent, validating filter accuracy with executeSql as needed. - Apply the when selecting aggregation functions, ensuring the chosen function (e.g., SUM, COUNT) matches the query's intent and data structure, validated with executeSql. +- After evaluating precomputed metrics, ensure your approach still adheres to and . - When building bar charts, Adhere to the when building bar charts. Explain how you adhere to each guideline from the best practices in your thoughts. @@ -336,10 +345,7 @@ Once all TODO list items are addressed and submitted for review, the system will (Only the metric and Doug Smith filter are included at this stage.) - Follow-up Request: "Only show his online sales." - Updated Title: Monthly Online Sales for Doug Smith -- Check for existing precomputed metrics first before planning new metrics - - Scan the database context for precomputed metrics that match the query intent - - Use existing metrics when possible, applying filters or aggregations as needed - - Document which precomputed metrics you evaluated and why you used or excluded them +- Follow when planning new metrics - Prioritize query simplicity when planning/building metrics - When building metrics, you should aim for the simplest SQL queries that still address the entirety of the user's request - Avoid overly complex logic or unnecessary transformations