mirror of https://github.com/buster-so/buster.git
changes to the search data catalog
This commit is contained in:
parent
65ce4769b4
commit
b47a03423d
|
@ -44,14 +44,19 @@ struct RawLLMResponse {
|
||||||
}
|
}
|
||||||
|
|
||||||
const CATALOG_SEARCH_PROMPT: &str = r#"
|
const CATALOG_SEARCH_PROMPT: &str = r#"
|
||||||
You are a dataset search assistant. You have access to a collection of datasets with their YML content.
|
You are a dataset search assistant tasked with finding highly relevant datasets that SPECIFICALLY match the user's requirements.
|
||||||
Your task is to identify all relevant datasets based on the following search request:
|
Your task is to identify only the most relevant datasets based on the following search request:
|
||||||
|
|
||||||
{queries_joined_with_newlines}
|
{queries_joined_with_newlines}
|
||||||
|
|
||||||
Consider all queries collectively to determine relevance. These queries describe different aspects of the problem or question that needs to be answered.
|
Evaluation Criteria:
|
||||||
|
1. Direct Relevance: The dataset must directly address the core aspects of the search query
|
||||||
|
2. Schema Alignment: The dataset's structure should contain fields that match the required information
|
||||||
|
3. Data Coverage: The dataset should cover the specific domain or business context mentioned
|
||||||
|
4. Recency & Quality: Prefer datasets with complete metadata and documentation
|
||||||
|
|
||||||
The YML content contains important information about the dataset including its schema, description, and other metadata.
|
The YML content contains important information about the dataset including its schema, description, and other metadata.
|
||||||
Use this information to determine if the dataset would be relevant to answering the queries.
|
Only include datasets that meet AT LEAST 3 of the above criteria with high confidence.
|
||||||
|
|
||||||
IMPORTANT: You must return your response in this exact JSON format:
|
IMPORTANT: You must return your response in this exact JSON format:
|
||||||
{
|
{
|
||||||
|
@ -67,11 +72,12 @@ Available datasets:
|
||||||
{datasets_array_as_json}
|
{datasets_array_as_json}
|
||||||
|
|
||||||
Requirements:
|
Requirements:
|
||||||
1. Return all relevant datasets (no limit)
|
1. Return ONLY datasets that are highly relevant (meeting 3+ criteria)
|
||||||
2. Order results from most to least relevant
|
2. Order results from most to least relevant
|
||||||
3. ALWAYS include the "results" key in your response, even if the array is empty
|
3. ALWAYS include the "results" key in your response, even if the array is empty
|
||||||
4. Each result MUST ONLY include the "id" field containing the UUID string
|
4. Each result MUST ONLY include the "id" field containing the UUID string
|
||||||
5. If no datasets are relevant, return {"results": []}
|
5. If no datasets meet the relevance criteria, return {"results": []}
|
||||||
|
6. Exclude datasets that only tangentially relate to the query
|
||||||
"#;
|
"#;
|
||||||
|
|
||||||
pub struct SearchDataCatalogTool {
|
pub struct SearchDataCatalogTool {
|
||||||
|
|
Loading…
Reference in New Issue