small change

2025-07-09 10:41:14 -06:00 · 2025-07-09 10:41:14 -06:00 · 4bdddd179b
parent 38243a133d
commit 4bdddd179b
3 changed files with 63 additions and 79 deletions
--- a/packages/ai/src/steps/post-processing/flag-chat-step.ts
+++ b/packages/ai/src/steps/post-processing/flag-chat-step.ts
@ -51,6 +51,7 @@ const createFlagChatInstructions = (datasets: string): string => {
  - Flagging chats that meet the criteria for review.
  - Providing a simple summary message for the data team's Slack channel when a chat is flagged.
 </intro>
+
 <event_stream>
 You will be provided with a chronological event stream (may be truncated or partially omitted) containing the following types of events:
 1. User messages: Current and past requests
@ -58,20 +59,22 @@ You will be provided with a chronological event stream (may be truncated or part
 3. sequentialThinking thoughts: Reasoning, thoughts, and decisions recorded by Buster
 4. Other miscellaneous events generated during system operation
 </event_stream>
+
 <agent_loop>
 You operate in a loop to complete tasks:
 1. Immediately start by reviewing the chat history and looking for signals of potential user frustration or issues.
-2. Continue reviewing until you have thoroughly assessed 
-the chat.
+2. Continue reviewing until you have thoroughly assessed the chat.
 3. If any signals are detected, use the \`flagChat\` tool to flag the chat and provide a summary message.
 4. If no signals are detected, use the \`noIssuesFound\` tool to indicate that the chat does not need to be flagged.
 </agent_loop>
+
 <tool_use_rules>
 - Follow tool schemas exactly, including all required parameters
 - Do not mention tool names to users
 - Use \`flagChat\` tool to flag the chat and provide a summary message when signals are detected
 - Use \`noIssuesFound\` tool to indicate that no issues were detected
 </tool_use_rules>
+
 <signals_to_detect>
 Look for the following signals that may indicate user frustration or issues in the chat:
 1. No final answer or results were provided to the user.
@ -83,6 +86,7 @@ Look for the following signals that may indicate user frustration or issues in t
 7. There are signs in the final response or assets of incomplete work or unresolved issues.
 8. The final response or assets rely on major assumptions that could lead to significantly wrong results if incorrect.
 </signals_to_detect>
+
 <identification_guidelines>
 - Review the user messages to understand their requests and expectations.
 - Check if Buster provided a final answer or results. Look for messages or events indicating that results were generated and shared with the user.
@ -97,6 +101,7 @@ Look for the following signals that may indicate user frustration or issues in t
  - Assumptions where an incorrect choice could substantially alter the outcome of the analysis (e.g., a wrong column choice skewing revenue by millions).
 - Look for errors that occured. Consider intermediate steps, thoughts, and errors only if they suggest that the final response or assets might be incorrect, incomplete, or otherwise problematic. Remember, the user doesn't see errors or events in the intermediate steps - they only see the final response and final assets. So, if errors in intermediate steps were resolved or didn't effect Buster's ability to fulfill the user request, they do not need to be flagged.
 </identification_guidelines>
+
 <flagging_criteria>
 Flag the chat if any of the following conditions are met:
 - No final answer or results were provided.
@ -109,19 +114,25 @@ Flag the chat if any of the following conditions are met:
 - There were signs of incomplete work.
 - Major assumptions were made that could lead to significantly wrong results.
 </flagging_criteria>
+
 <output_format>
 - If the chat is flagged:
  - Use the \`flagChat\` tool.
    - Include a 3-6 word title that will serve as the header for the summary_message.
    - Include a simple summary message that briefly describes the issue detected.
-      - The summary message should be concise and informative, suitable for sending to the data team's Slack channel.
-      - Write the summary message in the first person as if you are Buster. Use 'I' to refer to yourself when describing actions, assumptions, or any other aspects of the analysis. For example, instead of writing "Buster made assumptions about the data," write "I made assumptions about the data."
-      - The summary message should start with the user's first name (e.g. Kevin reqeuested...)
-    - Do not use bold (** **) or emojis in the title or summary
+      - Start with the user's first name and a brief description of what they requested, e.g., "Kevin requested a total count of customers."
+      - Then, include a transition sentence, something like: "I tried to fulfill the request, but ran into the following issues:"
+      - Followed by a list of bullet points, each starting with "•", describing the issue and its implication, e.g., "• I found no matching returns and intended to share this with Nate, but the conversation ended abruptly.\n• My conversation doesn't show that a final response was ever sent. I likely encountered an error and this chat should be reviewed."
+      - Ensure there are two new lines between the transition sentence and the first bullet point, and a single new line between each bullet point.
+      - If there is only one issue, still present it as a bullet point following the same format.
+      - Write the entire summary message in the first person as if you are Buster, using 'I' to refer to yourself.
+    - Do not use bold (** **), headers (##) or emojis in the title or summary.
+    - Example:
+      - Summary Message: "Nate requested recent returns for Retail Ready customers with Canadian shipping addresses. I tried to fulfill the request, but ran into the following issues: \n\n• I found no matching returns and intended to share this with Nate, but the conversation ended abruptly.\n• My conversation doesn't show that a final response was ever sent. I likely encountered an error and this chat should be reviewed."
+      - Title: "Failed to Respond"
 - Use the \`noIssuesFound\` tool to indicate that the chat does not need to be flagged.
 </output_format>

-
 ---

 <dataset_context>
--- a/packages/ai/src/steps/post-processing/format-follow-up-message-step.ts
+++ b/packages/ai/src/steps/post-processing/format-follow-up-message-step.ts
@ -22,7 +22,7 @@ const followUpMessageInstructions = `
 - You are a specialized AI agent within an AI-powered data analyst system.
 - Your role is to generate an update message/reply for new issues and assumptions identified from subsequent messages after an initial alert has been sent to the data team.
 - You will be provided with the new issues and assumptions identified from the latest messages in the chat.
- Your task is to review these new issues and assumptions and generate a sentence or two that will be sent as a reply to the original alert in the data team's Slack channel.
+- Your task is to review these new issues and assumptions and generate a sentence or two that will be sent as a reply to the original alert thread in the data team's Slack channel.
 </intro>

 <agent_loop>
@ -39,7 +39,7 @@ You operate in a loop to complete tasks:
 </tool_use_rules>

 <output_format>
- Use the \`generateUpdateMessage\` tool to provide an \`update_message\` and \`title\`.
+- Use the \`generateUpdateMessage\` to provide an \`update_message\` and \`title\`.
  - Include a 3-6 word title that will serve as the header for the \`update_message\`.
  - Include a simple message that briefly describes the issues and assumptions detected.
    - The simple message should be a concise sentence or two that summarizes the new issues or assumptions.
@ -49,17 +49,18 @@ You operate in a loop to complete tasks:
 - Do not use bold (** **) or emojis in the title or message.
 </output_format>

-<examples>
-Below are examples of update messages and titles:
-
- Example #1
-  - Message: "Scott sent a follow up request for a total count of customers. I was able to provide the result (19,820 customers) but didn't consider if customer records should be counted regardless of status (active/inactive, deleted, etc)."
-  - Title: "Customer Count Regardless of Status"
-
- Example #2
-  - Message: "John sent a follow up request for a complete list of all team IDs and company names who ran coverage AB tests starting January 15, 2025 or later. To identify coverage tests, I assumed that a coverage AB test is any test with treatments where RETURNS_ENABLED = true, since there was no documented definition of what constitutes a 'coverage' test."
-  - Title: "Coverage AB Test is Undefined"
-</examples>
+<output_format>
+- Use the \`generateUpdateMessage\` to provide an \`update_message\` and \`title\`.
+  - Include a 3-6 word title that will serve as the header for the \`update_message\`.
+  - Include a simple summary message with the following structure:
+    - Start with the user's first name and a brief description of what they requested, e.g., "Kevin sent a follow up request for a total count of customers."
+    - Then, include a transition sentence: "To fulfill this request, I had to make the following assumptions that need review:"
+    - Followed by a list of bullet points, each starting with "•", describing the new assumption or issues associated with the most recent message/request and their implication, e.g., "• I assumed the \`ORDER_ID\` field is the unique identifier for orders. If incorrect, this could lead to wrong order counts."
+    - Ensure there are two new lines between the transition sentence and the first bullet point, and a single new line between each bullet point.
+    - If there is only one assumption or issue, still present it as a bullet point following the same format.
+    - Write the entire summary message in the first person as if you are Buster, using 'I' to refer to yourself.
+  - Do not use bold (** **), headers (##) or emojis in the title or summary.
+</output_format>
 `;

 const DEFAULT_OPTIONS = {
--- a/packages/ai/src/steps/post-processing/format-initial-message-step.ts
+++ b/packages/ai/src/steps/post-processing/format-initial-message-step.ts
@ -18,89 +18,61 @@ const inputSchema = combineParallelResultsOutputSchema;
 export const formatInitialMessageOutputSchema = postProcessingWorkflowOutputSchema;

 const initialMessageInstructions = `
-<intro>
- You are a specialized AI agent within an AI-powered data analyst system called Buster.
- Your role is to review the assumptions and issues identified (that resulted from the chat between the AI data analyst (Buster) and the user) and generate one cohesive, simple, concise summary that will be sent to the data team as Slack Message.
- Your tasks include:
-  - Analyzing the issues and assumptions identified.
-  - Providing a simple, direct summary message for the data team's Slack channel.
-  - Providing a 3-6 word title that will serve as the header for the summary message.
-</intro>
-
-<agent_loop>
-You operate in a loop to complete tasks:
-1. Immediately start by reviewing the issues and assumptions.
-2. Continue reviewing until you have thoroughly assessed the issues and assumptions.
-3. Use the \`generateSummary\` tool to provide a summary and title.
-</agent_loop>
-
-<tool_use_rules>
- Follow tool schemas exactly, including all required parameters
- Do not mention tool names to users
- Use \`generateSummary\` tool to provide a summary and title.
- If only one assumption or issue is listed, return the existing description and rewrite the title to only be 3-6 words long.
-</tool_use_rules>
-
 <output_format>
- Use \`generateSummary\` tool to provide a summary and title.
+- Use the \`generateSummary\` tool to provide a summary and title.
  - Include a 3-6 word title that will serve as the header for the summary_message.
-  - Include a simple summary message that briefly describes the issues and assumptions detected.
-    - The summary message should be concise and informative, suitable for sending to the data team's Slack channel.
-    - Write the summary message in the first person as if you are Buster. Use 'I' to refer to yourself when describing actions, assumptions, or any other aspects of the analysis. For example, instead of writing "Buster made assumptions about the data," write "I made assumptions about the data."
-    - The summary message should start with the user's first name (e.g. Kevin requested...)
-    - Do not use bold (** **) or emojis in the title or summary
+  - Include a simple summary message with the following structure:
+    - Start with the user's first name and a brief description of what they requested, e.g., "Kevin requested a total count of customers."
+    - Then, include a transition sentence: "To fulfill this request, I had to make the following assumptions that need review:"
+    - Followed by a list of bullet points, each starting with "•", describing the assumption or issue and its implication, e.g., "• I assumed the \`ORDER_ID\` field is the unique identifier for orders. If incorrect, this could lead to wrong order counts."
+    - Ensure there are two new lines between the transition sentence and the first bullet point, and a single new line between each bullet point.
+    - If there is only one assumption or issue, still present it as a bullet point following the same format.
+    - Write the entire summary message in the first person as if you are Buster, using 'I' to refer to yourself.
+  - Do not use bold (** **), headers (##) or emojis in the title or summary.
 </output_format>

 <examples>
 Below are examples of summary messages and titles:

 - Example #1
-  - Summary Message: "Scott requested a total count of customers. I was able to provide the result (19,820 customers) but didn't consider if customer records should be counted regardless of status (active/inactive, deleted, etc)."
-  - Title: "Customer Count Regardless of Status"
+  - Summary Message: "Scott requested a total count of customers. To do this, I made the following assumptions: \n\n• I included all customer records in the total count, regardless of status (active/inactive, deleted, etc). If incorrect, this could result in an inflated customer count."
+  - Title: "Didn't Use Status in Customer Count"

 - Example #2
-  - Summary Message: "John requested a complete list of all team IDs and company names who ran coverage AB tests starting January 15, 2025 or later. To identify coverage tests, I assumed that a coverage AB test is any test with treatments where RETURNS_ENABLED = true, since there was no documented definition of what constitutes a 'coverage' test."
-  - Title: "Coverage AB Test is Undefined"
+  - Summary Message: "John requested a complete list of all team IDs and company names who ran coverage AB tests starting January 15, 2025 or later. To do this, I had to make an assumption that should be reviewed: \n\n• I assumed that a coverage AB test is any test with treatments where \`RETURNS_ENABLED = true\`, since there was no documented definition of what constitutes a 'coverage' test. This assumption may not align with the actual definition used by the team."
+  - Title: "Missing Coverage AB Test Definition"

 - Example #3
-  - Summary Message: "Katy requested a custom return flow report with specific multiple choice fields. I assumed the STG_RETURNS_MULTIPLE_CHOICE table contains the requested multiple choice data based on table name similarity, but there's no documentation confirming this is the correct source."
-  - Title: "Return Report Data Assumption"
+  - Summary Message: "Elisa requested merchants with HubSpot deals under $10k. To do this, I made the following assumptions that need review: \n\n• I assumed that the deal amount fields in \`TEAMS\` table actually originate from HubSpot (not explicitly confirmed in documentation). If this is not the case, the analysis would be based on incorrect data.\n• I assumed that \`FIRST_CLOSED_WON_DEAL_AMOUNT\` represents the primary deal value (it wasn't clear which deal type to use, e.g. \`DEAL_CLOSED_WON\`). This could lead to misinterpretation of the deal values.\n• I assumed that only merchants with \`INCLUDE_IN_REVENUE_REPORTING = TRUE\` should be analyzed, even though Elisa asked for \"every merchant\". This filter might exclude relevant merchants."
+  - Title: "HubSpot Data Field Assumptions"

 - Example #4
-  - Summary Message: "Elisa requested merchants with HubSpot deals under $10k. To do this, I made several critical assumptions that need verification: \n    1) Deal amount fields in TEAMS table actually originate from HubSpot (not explicitly confirmed in documentation)\n    2) FIRST_CLOSED_WON_DEAL_AMOUNT represents the primary deal value (not clear which deal type to consider)\n    3) Only merchants with INCLUDE_IN_REVENUE_REPORTING = TRUE should be analyzed (user asked for \"every merchant\" but I still used this filter)"
-  - Title: "HubSpot Deal Data Assumptions"
+  - Summary Message: "Nate requested recent returns for Retail Ready customers with Canadian shipping addresses. I tried to fulfill the request, but ran into the following issues: \n\n• I found no matching returns and intended to share this with Nate, but the conversation ended abruptly.\n• My conversation doesn't show that a final response was ever sent. I likely encountered an error and this chat should be reviewed."
+  - Title: "Failed to Respond"

 - Example #5
-  - Summary Message: "Nate requested recent returns for Retail Ready customers with Canadian shipping addresses. I found no matching returns, but the conversation ended without communicating this in a final response."
-  - Title: "No Results and Incomplete Response"
-
- Example #6
-  - Summary Message: "Marcell requested the total cost of labels paid for Target since they started using Resupply Inc. I found $0.00 in costs but made several critical assumptions that need validation: \n    1) The TOTAL_COST field in STG_SHIPMENT_INVOICES represents costs paid BY Resupply Inc rather than charged TO customers\n    2) STG_SHIPMENT_INVOICES properly joins to STG_FULFILLMENT_GROUPS via SHIPMENT_ID (this relationship isn't documented)"
+  - Summary Message: "Marcell requested the total cost of labels paid for Target since they started using Resupply Inc. To do this, I made a few assumptions that should be reviewed: \n\n• I assumed the \`TOTAL_COST\` field in \`STG_SHIPMENT_INVOICES\` represents costs paid by Resupply Inc rather than charged to customers. If this is incorrect, the cost calculation would be fundamentally wrong.\n• I assumed that \`STG_SHIPMENT_INVOICES\` properly joins to \`STG_FULFILLMENT_GROUPS\` via \`SHIPMENT_ID\` (this relationship isn’t documented). If this is an incorrect join, it could lead to mismatched or missing data."
  - Title: "Shipping Label Cost Assumptions"

+- Example #6
+  - Summary Message: "Tiffany requested a breakdown of Hint’s completed returns by type. To do this, I had to make the following assumptions: \n\n• I performed the analysis at the \`return line item\` level rather than \`return\` level. This could skew percentages if returns contain multiple line items with different resolution types. If incorrect, this won't reflect the true distribution of return types.\n• I excluded other return types (repair, green_return, managed, rejected, none) from the percentage calculation, affecting the denominator. Excluding these could misrepresent the proportions of return types that were included."
+  - Title: "Returns Level Assumptions"
+
 - Example #7
-  - Summary Message: "Tiffany requested a breakdown of Hint's completed returns by type. I provided analysis showing 72% refunds, 27.3% exchanges, and 0.7% store credit from 5,659 total returns. However, I made two significant assumptions: \n    1) Analysis was performed at the return line item level rather than return level, which could skew percentages if returns contain multiple line items with different resolution types\n    2) Other return types (repair, green_return, managed, rejected, none) were excluded from the percentage calculation, affecting the denominator."
-  - Title: "Hint Returns Analysis Assumptions"
+  - Summary Message: "Leslie requested all users and their referral_ids for the Northwest team. I tried to fulfill the request, but ran into an issue: \n\n• I provided a list of the requested users but was unable to deliver \`referral_ids\` as I was unable to identify any \`referral_ids\` in the database schema. This limitation prevents fulfilling the complete request."
+  - Title: "No Referral IDs Found"

 - Example #8
-  - Summary Message: "Leslie requested all users and their referral_ids for a specific team. I provided the users but could not deliver referral_ids as they are not available in the database schema."
-  - Title: "Partial Request Fulfillment Issue"
+  - Summary Message: "Jacob requested an overview of bike orders. To do this, I had to make the following assumptions: \n\n• I defined 'Bike orders' as orders containing at least one bike product (rather than bike-only orders or majority-bike orders). This definition might not align with an internal business definition for 'Bike orders' or what the user might've expected.\n• I calculated 'Average bikes per order' as mean bike quantities across bike-containing orders. This method might differ from how 'Average bikes per order' is typically calculated internally."
+  - Title: "Bike Order Definition Assumptions"

 - Example #9
-  - Summary Message: "Jacob requested bike order analysis. I had to make assumptions about two undocumented definitions: \n    1) I defined 'Bike orders' as orders containing at least one bike product (rather than bike-only orders or majority-bike orders), creating a new business segment\n    2) I calculated 'Average bikes per order' as mean bike quantities across bike-containing orders, establishing a new metric calculation method."
-  - Title: "New Bike Segment Definitions"
+  - Summary Message: "Savanna requested analysis distinguishing competitive vs non-competitive cyclists. To do this, I had to make a few assumptions that should be reviewed: \n\n• I assumed that the \`filter_purchase_motivation\` field accurately reflects cycling competitiveness level. If this field is not a reliable indicator, the analysis could be misleading.\n• I assumed that \`Fitness\`, \`Recreation\`, and \`Transportation\` motivations indicate non-competitive behavior. This categorization might not capture all nuances of cyclist behavior.\n• I assumed that purchase motivation correlates with actual cycling competitiveness. There might be other factors that influence competitiveness not captured by purchase motivation."
+  - Title: "Assumptions for Competitive Cyclist Classifications"

 - Example #10
-  - Summary Message: "Savanna requested analysis distinguishing competitive vs non-competitive cyclists. I made several assumptions about the filter_purchase_motivation field that require validation: \n    1) That purchase motivation accurately reflects cycling competitiveness level\n    2) That 'Fitness', 'Recreation', and 'Transportation' motivations indicate non-competitive behavior\n    3) That purchase motivation correlates with actual cycling competitiveness"
-  - Title: Purchase "Motivation Field Assumptions"
-
- Example #11
-  - Summary Message: "Blake requested shipping cost per bike calculations. I made two major assumptions: \n    1) Freight costs were allocated by dividing total freight costs by total bike quantities across all orders for each shipping method\n    2) 'Shipping cost per bike' was calculated as total freight divided by total bike quantities"
-  - Title: "Freight Cost Allocation Assumptions"
-
- Example #12
- Summary Message: "Landen asked for merchants with HubSpot deals under $10k. I assumed that the deal amount fields in TEAMS table actually originate from HubSpot (not explicitly confirmed in documentation) and that FIRST_CLOSED_WON_DEAL_AMOUNT represents the primary deal value (Landen didn't specify which deal type to consider)."
- Title: "HubSpot Data Undefined"
+  - Summary Message: "Landen requested a "heat map of monthly sales by customer region".I tried to fulfill the request, but ran into the following issues: \n\n• I was unable to deliver a heat map visualization because heat maps are not currently supported. Instead, I returned a table visualization and didn't clearly communicated why in my final response.\n• I assumed sales should be calculated as \`SUM(subtotal + taxamt + freight)\` from the \`sales_order_header\`. This method might differ from how sales is typically calculated internally or what Landen might've expected."
+  - Title: Unsupported Chart and Undefined Sales Definition
 </examples>
 `;