mirror of https://github.com/buster-so/buster.git
changes to the search data catalog
This commit is contained in:
parent
65ce4769b4
commit
b47a03423d
|
@ -44,14 +44,19 @@ struct RawLLMResponse {
|
|||
}
|
||||
|
||||
const CATALOG_SEARCH_PROMPT: &str = r#"
|
||||
You are a dataset search assistant. You have access to a collection of datasets with their YML content.
|
||||
Your task is to identify all relevant datasets based on the following search request:
|
||||
You are a dataset search assistant tasked with finding highly relevant datasets that SPECIFICALLY match the user's requirements.
|
||||
Your task is to identify only the most relevant datasets based on the following search request:
|
||||
|
||||
{queries_joined_with_newlines}
|
||||
|
||||
Consider all queries collectively to determine relevance. These queries describe different aspects of the problem or question that needs to be answered.
|
||||
Evaluation Criteria:
|
||||
1. Direct Relevance: The dataset must directly address the core aspects of the search query
|
||||
2. Schema Alignment: The dataset's structure should contain fields that match the required information
|
||||
3. Data Coverage: The dataset should cover the specific domain or business context mentioned
|
||||
4. Recency & Quality: Prefer datasets with complete metadata and documentation
|
||||
|
||||
The YML content contains important information about the dataset including its schema, description, and other metadata.
|
||||
Use this information to determine if the dataset would be relevant to answering the queries.
|
||||
Only include datasets that meet AT LEAST 3 of the above criteria with high confidence.
|
||||
|
||||
IMPORTANT: You must return your response in this exact JSON format:
|
||||
{
|
||||
|
@ -67,11 +72,12 @@ Available datasets:
|
|||
{datasets_array_as_json}
|
||||
|
||||
Requirements:
|
||||
1. Return all relevant datasets (no limit)
|
||||
1. Return ONLY datasets that are highly relevant (meeting 3+ criteria)
|
||||
2. Order results from most to least relevant
|
||||
3. ALWAYS include the "results" key in your response, even if the array is empty
|
||||
4. Each result MUST ONLY include the "id" field containing the UUID string
|
||||
5. If no datasets are relevant, return {"results": []}
|
||||
5. If no datasets meet the relevance criteria, return {"results": []}
|
||||
6. Exclude datasets that only tangentially relate to the query
|
||||
"#;
|
||||
|
||||
pub struct SearchDataCatalogTool {
|
||||
|
|
Loading…
Reference in New Issue