26 KiB
Search REST Endpoint
Overview
This PRD outlines the implementation of a new REST endpoint for search functionality. The endpoint will allow users to search for various asset types using a query string and provide options to filter by asset type and limit the number of results. This will complement the existing WebSocket-based search functionality while following our REST API standards.
Problem Statement
Currently, search functionality is only available through WebSocket connections. To provide a more consistent API experience and follow REST best practices, we need to create a dedicated REST endpoint for search operations that:
- Accepts search queries similar to the WebSocket implementation
- Allows filtering by asset types using an array format instead of individual exclude flags
- Returns a consistent response format that matches our existing search results
- Handles empty queries by returning the most recent records for specified asset types
- Properly handles permissions to ensure users only see assets they have access to
Goals
- Create a new search REST endpoint that provides feature parity with the WebSocket implementation
- Implement a new search library to centralize search logic and reduce code duplication
- Ensure proper error handling and consistent response formats
- Follow best practices for REST API design and handler implementation
- Ensure optimal performance for search operations
Technical Design
New Search Library Structure
We'll create a new library to centralize search functionality:
libs/
└── search/
├── Cargo.toml
└── src/
├── lib.rs
├── types.rs
└── search.rs
Cargo.toml
[package]
name = "search"
version = "0.1.0"
edition = "2021"
[dependencies]
anyhow = { workspace = true }
chrono = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
tokio = { workspace = true }
tracing = { workspace = true }
uuid = { workspace = true }
sqlx = { workspace = true }
tokio-stream = { workspace = true }
database = { path = "../database" }
Types (types.rs)
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
#[derive(Serialize, Debug)]
pub struct MessageSearchResult {
pub id: Uuid,
#[serde(rename = "name")]
pub title: String,
pub summary_question: String,
pub updated_at: DateTime<Utc>,
pub highlights: Vec<String>,
pub score: f64,
#[serde(rename = "type")]
pub type_: SearchObjectType,
}
#[derive(Serialize, Debug)]
pub struct GenericSearchResult {
pub id: Uuid,
pub name: String,
pub updated_at: DateTime<Utc>,
pub highlights: Vec<String>,
pub score: f64,
#[serde(rename = "type")]
pub type_: SearchObjectType,
}
#[derive(Serialize, Debug)]
#[serde(untagged)]
pub enum SearchObject {
Message(MessageSearchResult),
Collection(GenericSearchResult),
Dashboard(GenericSearchResult),
DataSource(GenericSearchResult),
Dataset(GenericSearchResult),
PermissionGroup(GenericSearchResult),
Team(GenericSearchResult),
Term(GenericSearchResult),
}
#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(rename_all = "snake_case")]
pub enum SearchObjectType {
Thread,
Collection,
Dashboard,
DataSource,
Dataset,
PermissionGroup,
Team,
Term,
}
impl ToString for SearchObjectType {
fn to_string(&self) -> String {
match self {
SearchObjectType::Thread => "thread".to_string(),
SearchObjectType::Collection => "collection".to_string(),
SearchObjectType::Dashboard => "dashboard".to_string(),
SearchObjectType::DataSource => "data_source".to_string(),
SearchObjectType::Dataset => "dataset".to_string(),
SearchObjectType::PermissionGroup => "permission_group".to_string(),
SearchObjectType::Team => "team".to_string(),
SearchObjectType::Term => "term".to_string(),
}
}
}
#[derive(Debug)]
pub struct SearchOptions {
pub num_results: i64,
pub asset_types: Vec<SearchObjectType>,
}
impl Default for SearchOptions {
fn default() -> Self {
SearchOptions {
num_results: 10,
asset_types: vec![],
}
}
}
impl SearchOptions {
pub fn new() -> Self {
Self::default()
}
pub fn with_custom_options(num_results: i64, asset_types: Vec<SearchObjectType>) -> Self {
SearchOptions {
num_results,
asset_types,
}
}
pub fn asset_types_to_string(&self) -> String {
if self.asset_types.is_empty() {
// If no asset types specified, include all types
return "'thread', 'collection', 'dashboard', 'data_source', 'dataset', 'permission_group', 'team', 'term'".to_string();
}
self.asset_types
.iter()
.map(|t| format!("'{}'", t.to_string()))
.collect::<Vec<_>>()
.join(",")
}
}
Search Implementation (search.rs)
use anyhow::{anyhow, Result};
use chrono::Utc;
use sqlx::Row;
use tokio_stream::StreamExt;
use uuid::Uuid;
use database::pool::get_sqlx_pool;
use crate::types::{
GenericSearchResult, MessageSearchResult, SearchObject, SearchObjectType, SearchOptions,
};
pub async fn search(
user_id: Uuid,
organization_id: Uuid,
query: String,
options: SearchOptions,
) -> Result<Vec<SearchObject>> {
if query.is_empty() {
return list_recent_assets(user_id, options).await;
}
let mut conn = get_sqlx_pool().acquire().await?;
let search_terms: Vec<String> = query
.split_whitespace()
.map(|term| sanitize_search_term(term.to_lowercase()))
.collect();
let query = format!(
r#"
SELECT DISTINCT ON (asset_search.content, asset_search.asset_type)
asset_search.asset_id,
asset_search.content,
asset_search.updated_at,
asset_search.asset_type,
pgroonga_score(asset_search.tableoid, asset_search.ctid) AS rank
FROM
asset_search
INNER JOIN
asset_permissions
ON
asset_search.asset_id = asset_permissions.asset_id
WHERE
asset_search.asset_type IN ({})
AND asset_search.content &@~ '{}'
AND (asset_permissions.identity_id = '{}' OR asset_permissions.identity_id = '{}')
AND asset_permissions.deleted_at IS NULL
AND asset_search.deleted_at IS NULL
ORDER BY asset_search.content, asset_search.asset_type, rank DESC
LIMIT {};
"#,
options.asset_types_to_string(),
search_terms
.iter()
.map(|term| term.replace('\'', "''"))
.collect::<Vec<_>>()
.join(" OR "),
user_id,
organization_id,
options.num_results
);
let mut results = sqlx::raw_sql(&query).fetch(&mut *conn);
let mut results_vec = Vec::new();
while let Some(row) = results.try_next().await? {
let content: String = match row.try_get("content") {
Ok(content) => content,
Err(e) => return Err(anyhow!("Error getting content: {:?}", e)),
};
// Skip empty content
if content.trim().is_empty() {
continue;
}
let id: Uuid = match row.try_get("asset_id") {
Ok(id) => id,
Err(e) => return Err(anyhow!("Error getting asset_id: {:?}", e)),
};
let updated_at = match row.try_get("updated_at") {
Ok(updated_at) => updated_at,
Err(e) => return Err(anyhow!("Error getting updated_at: {:?}", e)),
};
let asset_type: String = match row.try_get("asset_type") {
Ok(asset_type) => asset_type,
Err(e) => return Err(anyhow!("Error getting asset_type: {:?}", e)),
};
let rank: f64 = match row.try_get("rank") {
Ok(rank) => rank,
Err(_) => 0.0,
};
let highlights = find_highlights(&content, &search_terms);
let search_object = match asset_type.as_str() {
"thread" => {
let content_json: serde_json::Value =
serde_json::from_str(&content).unwrap_or_default();
let title = content_json["title"]
.as_str()
.unwrap_or("Untitled Thread")
.to_string();
let summary_question = content_json["summary_question"]
.as_str()
.unwrap_or("")
.to_string();
SearchObject::Message(MessageSearchResult {
id,
title,
summary_question,
updated_at,
highlights,
score: rank,
type_: SearchObjectType::Thread,
})
}
"collection" => SearchObject::Collection(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights,
score: rank,
type_: SearchObjectType::Collection,
}),
"dashboard" => SearchObject::Dashboard(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights,
score: rank,
type_: SearchObjectType::Dashboard,
}),
"data_source" => SearchObject::DataSource(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights,
score: rank,
type_: SearchObjectType::DataSource,
}),
"dataset" => SearchObject::Dataset(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights,
score: rank,
type_: SearchObjectType::Dataset,
}),
"permission_group" => SearchObject::PermissionGroup(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights,
score: rank,
type_: SearchObjectType::PermissionGroup,
}),
"team" => SearchObject::Team(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights,
score: rank,
type_: SearchObjectType::Team,
}),
"term" => SearchObject::Term(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights,
score: rank,
type_: SearchObjectType::Term,
}),
_ => continue,
};
results_vec.push(search_object);
}
// Only filter when we have a query
if !query.is_empty() {
results_vec = results_vec
.into_iter()
.filter(|result| match result {
SearchObject::Message(message) => !message.highlights.is_empty(),
SearchObject::Collection(collection) => !collection.highlights.is_empty(),
SearchObject::Dashboard(dashboard) => !dashboard.highlights.is_empty(),
SearchObject::DataSource(data_source) => !data_source.highlights.is_empty(),
SearchObject::Dataset(dataset) => !dataset.highlights.is_empty(),
SearchObject::PermissionGroup(permission_group) => {
!permission_group.highlights.is_empty()
}
SearchObject::Team(team) => !team.highlights.is_empty(),
SearchObject::Term(term) => !term.highlights.is_empty(),
})
.collect();
}
Ok(results_vec)
}
pub async fn list_recent_assets(
user_id: Uuid,
options: SearchOptions,
) -> Result<Vec<SearchObject>> {
let mut conn = get_sqlx_pool().acquire().await?;
// Default to 50 results if not specified for empty query listing
let num_results = if options.num_results <= 0 { 50 } else { options.num_results };
let query = format!(
r#"
WITH distinct_assets AS (
SELECT DISTINCT ON (content, asset_type)
asset_search.asset_id,
asset_search.content,
asset_search.updated_at,
asset_search.asset_type
FROM
asset_search
INNER JOIN
asset_permissions
ON
asset_search.asset_id = asset_permissions.asset_id
WHERE
asset_search.asset_type IN ({})
AND (asset_permissions.identity_id = '{}')
AND asset_search.deleted_at IS NULL
AND asset_permissions.deleted_at IS NULL
)
SELECT *
FROM distinct_assets
ORDER BY updated_at DESC
LIMIT {};
"#,
options.asset_types_to_string(),
user_id,
num_results
);
let mut results = sqlx::raw_sql(&query).fetch(&mut *conn);
let mut results_vec = Vec::new();
while let Some(row) = results.try_next().await? {
let id: Uuid = match row.try_get("asset_id") {
Ok(id) => id,
Err(e) => return Err(anyhow!("Error getting asset_id: {:?}", e)),
};
let content: String = match row.try_get("content") {
Ok(content) => content,
Err(e) => return Err(anyhow!("Error getting content: {:?}", e)),
};
// Skip empty content
if content.trim().is_empty() {
continue;
}
let updated_at = match row.try_get("updated_at") {
Ok(updated_at) => updated_at,
Err(e) => return Err(anyhow!("Error getting updated_at: {:?}", e)),
};
let asset_type: String = match row.try_get("asset_type") {
Ok(asset_type) => asset_type,
Err(e) => return Err(anyhow!("Error getting asset_type: {:?}", e)),
};
let search_object = match asset_type.as_str() {
"thread" => {
let content_json: serde_json::Value =
serde_json::from_str(&content).unwrap_or_default();
let title = content_json["title"]
.as_str()
.unwrap_or("Untitled Thread")
.to_string();
let summary_question = content_json["summary_question"]
.as_str()
.unwrap_or("")
.to_string();
SearchObject::Message(MessageSearchResult {
id,
title,
summary_question,
updated_at,
highlights: vec![],
score: 0.0,
type_: SearchObjectType::Thread,
})
}
"collection" => SearchObject::Collection(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights: vec![],
score: 0.0,
type_: SearchObjectType::Collection,
}),
"dashboard" => SearchObject::Dashboard(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights: vec![],
score: 0.0,
type_: SearchObjectType::Dashboard,
}),
"data_source" => SearchObject::DataSource(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights: vec![],
score: 0.0,
type_: SearchObjectType::DataSource,
}),
"dataset" => SearchObject::Dataset(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights: vec![],
score: 0.0,
type_: SearchObjectType::Dataset,
}),
"permission_group" => SearchObject::PermissionGroup(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights: vec![],
score: 0.0,
type_: SearchObjectType::PermissionGroup,
}),
"team" => SearchObject::Team(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights: vec![],
score: 0.0,
type_: SearchObjectType::Team,
}),
"term" => SearchObject::Term(GenericSearchResult {
id,
name: extract_name_from_content(&content),
updated_at,
highlights: vec![],
score: 0.0,
type_: SearchObjectType::Term,
}),
_ => continue,
};
results_vec.push(search_object);
}
Ok(results_vec)
}
fn extract_name_from_content(content: &str) -> String {
match serde_json::from_str::<serde_json::Value>(content) {
Ok(json) => {
if let Some(name) = json["name"].as_str() {
return name.to_string();
}
if let Some(title) = json["title"].as_str() {
return title.to_string();
}
"Untitled".to_string()
}
Err(_) => "Untitled".to_string(),
}
}
fn find_highlights(content: &str, search_terms: &[String]) -> Vec<String> {
let mut highlights = Vec::new();
// Try to parse the content as JSON first
if let Ok(json) = serde_json::from_str::<serde_json::Value>(content) {
// Convert the JSON back to a string for highlighting
let content_str = json.to_string().to_lowercase();
for term in search_terms {
if content_str.contains(term) {
// Here you would extract context around the match
let term_start = content_str.find(term).unwrap_or(0);
let context_start = term_start.saturating_sub(20);
let context_end = (term_start + term.len() + 20).min(content_str.len());
let highlight = content_str[context_start..context_end].to_string();
highlights.push(highlight);
}
}
} else {
// If not JSON, treat as plain text
let content_lower = content.to_lowercase();
for term in search_terms {
if content_lower.contains(term) {
let term_start = content_lower.find(term).unwrap_or(0);
let context_start = term_start.saturating_sub(20);
let context_end = (term_start + term.len() + 20).min(content_lower.len());
let highlight = content_lower[context_start..context_end].to_string();
highlights.push(highlight);
}
}
}
highlights
}
fn sanitize_search_term(term: String) -> String {
// Remove special characters that might interfere with the search
let term = term.replace(['(', ')', '[', ']', '{', '}', '\\', '*', '+', '.', '?', '^', '$', '|'], "");
// If the term is now empty, use a default that will match nothing
if term.is_empty() {
return "NOMATCHPOSSIBLE".to_string();
}
term
}
Library Entry Point (lib.rs)
pub mod types;
pub mod search;
pub use types::{
SearchObject, SearchObjectType, SearchOptions,
MessageSearchResult, GenericSearchResult
};
pub use search::{search, list_recent_assets};
Handler Implementation
Create a new handler for the search functionality in libs/handlers/src/search/search_handler.rs
:
use anyhow::Result;
use uuid::Uuid;
use search::{SearchObject, SearchObjectType, SearchOptions, search as search_lib};
use crate::users::user_utils::get_user_organization_id;
pub async fn search_handler(
user_id: Uuid,
query: String,
num_results: Option<i64>,
asset_types: Option<Vec<SearchObjectType>>,
) -> Result<Vec<SearchObject>> {
let num_results = num_results.unwrap_or(50);
let asset_types = asset_types.unwrap_or_else(Vec::new);
let options = SearchOptions::with_custom_options(num_results, asset_types);
let user_organization_id = get_user_organization_id(&user_id).await?;
let results = search_lib(user_id, user_organization_id, query, options).await?;
Ok(results)
}
Update libs/handlers/src/search/mod.rs
:
mod search_handler;
pub use search_handler::search_handler;
REST Endpoint Implementation
Create a new REST endpoint at api/src/routes/rest/routes/search/mod.rs
:
use axum::{
routing::get,
Router,
};
mod search;
pub fn router() -> Router {
Router::new()
.route("/", get(search::search))
}
Create the handler at api/src/routes/rest/routes/search/search.rs
:
use axum::{
extract::Query,
Json,
};
use serde::Deserialize;
use uuid::Uuid;
use handlers::search::search_handler;
use middleware::AuthenticatedUser;
use search::{SearchObject, SearchObjectType};
use crate::{
routes::rest::ApiResponse,
utils::error_handling::AppError,
};
#[derive(Debug, Deserialize)]
pub struct SearchQuery {
pub query: String,
pub num_results: Option<i64>,
pub asset_types: Option<Vec<SearchObjectType>>,
}
pub async fn search(
user: AuthenticatedUser,
Query(params): Query<SearchQuery>,
) -> Result<Json<ApiResponse<Vec<SearchObject>>>, AppError> {
let results = search_handler(
user.id,
params.query,
params.num_results,
params.asset_types,
)
.await
.map_err(AppError::internal_server_error)?;
Ok(Json(ApiResponse::success(results)))
}
Adding to Router
Update the main REST router in api/src/routes/rest/routes/mod.rs
to include the search router:
// Existing imports
use search::router as search_router;
pub fn router() -> Router {
Router::new()
// Existing routes
.nest("/search", search_router())
// Other routes
}
Implementation Plan
The implementation will be divided into the following phases:
Phase 1: Create Search Library
- Create the new search library structure in
libs/search/
- Implement the search functionality in the library
- Write unit tests for the library functions
- Test the library with existing WebSocket search functionality
Phase 2: Implement Search Handler
- Create the search handler in
libs/handlers/src/search/
- Write unit tests for the handler
- Ensure proper error handling and response formatting
Phase 3: Implement REST Endpoint
- Create the REST endpoint in
api/src/routes/rest/routes/search/
- Add the endpoint to the main REST router
- Write integration tests for the endpoint
- Test the endpoint with various search parameters
Phase 4: Documentation and Testing
- Document the new endpoint in API documentation
- Write comprehensive tests for edge cases
- Perform performance testing
- Deploy to staging for QA testing
Testing Strategy
Unit Tests
-
Search Library Tests
- Test search with empty query returns most recent items
- Test search with query returns matching items
- Test search with different asset type filters
- Test search with null/empty asset types array
- Test search with various num_results values
- Test search with special characters and SQL injection attempts
- Test error handling for database connection failures
-
Handler Tests
- Test handler correctly processes parameters
- Test handler handles errors appropriately
- Test handler properly applies permissions
- Test handler returns correct result format
Integration Tests
-
REST Endpoint Tests
- Test endpoint returns 200 for valid requests
- Test endpoint returns appropriate errors for invalid requests
- Test authentication requirements
- Test with various query parameters
- Test performance with large result sets
- Test concurrency with multiple simultaneous requests
-
End-to-End Tests
- Test searches across all asset types
- Test results match expected format and content
- Test pagination behavior with num_results
- Test error scenarios return appropriate status codes and messages
Performance Tests
-
Test endpoint response time with:
- Large number of assets
- Complex queries
- Multiple concurrent requests
- Various combinations of asset types
-
Ensure search performance meets requirements:
- Queries complete in < 1 second for typical loads
- System handles concurrent search requests without degradation
Acceptance Criteria
- The REST endpoint successfully processes search requests with the specified parameters
- Empty queries return the most recent assets of specified types (or all types if none specified)
- Results are properly filtered by user permissions
- Response format matches the specified SearchObject structure
- All unit and integration tests pass
- Performance meets specified requirements
- Documentation is complete and accurate
Appendix
Related Documentation
- @[api/documentation/rest.mdc] - REST API guidelines
- @[api/documentation/handlers.mdc] - Handler implementation guidelines
- @[api/documentation/libs.mdc] - Library structure guidelines
Dependencies
- Existing search engine utilities
- Permissions system
- WebSocket search implementation (as reference)