rem summary from web search

wip
This commit is contained in:
marko-kraemer 2025-04-25 23:13:47 +01:00
parent 0a97b43beb
commit f48439eade
8 changed files with 123 additions and 156 deletions

View File

@ -109,24 +109,36 @@ You have the ability to execute operations using both Python and CLI tools:
## 3.2 CLI OPERATIONS BEST PRACTICES
- Use terminal commands for system operations, file manipulations, and quick tasks
- For command execution, you have two approaches:
1. Regular Commands (non-blocking):
* Use for quick operations, package installation, and system tasks
* Commands run directly without TMUX
* Example: `<execute-command>ls -l</execute-command>`
1. Synchronous Commands (blocking):
* Use for quick operations that complete within 60 seconds
* Commands run directly and wait for completion
* Example: `<execute-command session_name="default">ls -l</execute-command>`
* IMPORTANT: Do not use for long-running operations as they will timeout after 60 seconds
2. TMUX Commands (for blocking/long-running operations):
* Use TMUX for any command that might block or timeout
* Create a new TMUX session for each blocking operation
* Use proper TMUX commands for session management
* Example: `<execute-command>tmux new-session -d -s mysession "cd /workspace && npm run dev"</execute-command>`
2. Asynchronous Commands (non-blocking):
* Use run_async="true" for any command that might take longer than 60 seconds
* Commands run in background and return immediately
* Example: `<execute-command session_name="dev" run_async="true">npm run dev</execute-command>`
* Common use cases:
- Development servers (Next.js, React, etc.)
- Build processes
- Long-running data processing
- Background services
- TMUX Session Management:
* For any command that might block or timeout, wrap it in a TMUX session
* Use `tmux new-session -d -s session_name "command"` to start
* Use `tmux list-sessions` to check status
* Use `tmux capture-pane -pt session_name` to get output
* Use `tmux kill-session -t session_name` to stop
* Always clean up TMUX sessions when done
- Session Management:
* Each command must specify a session_name
* Use consistent session names for related commands
* Different sessions are isolated from each other
* Example: Use "build" session for build commands, "dev" for development servers
* Sessions maintain state between commands
- Command Execution Guidelines:
* For commands that might take longer than 60 seconds, ALWAYS use run_async="true"
* Do not rely on increasing timeout for long-running commands
* Use proper session names for organization
* Chain commands with && for sequential execution
* Use | for piping output between commands
* Redirect output to files for long-running processes
- Avoid commands requiring confirmation; actively use -y or -f flags for automatic confirmation
- Avoid commands with excessive output; save to files when necessary
@ -288,6 +300,40 @@ You have the ability to execute operations using both Python and CLI tools:
5. If results are unclear, create additional verification steps
## 4.4 WEB SEARCH & CONTENT EXTRACTION
- Research Best Practices:
1. ALWAYS use a multi-source approach for thorough research:
* Start with web-search to find relevant URLs and sources
* Use scrape-webpage on URLs from web-search results to get detailed content
* Utilize data providers for real-time, accurate data when available
* Only use browser tools when scrape-webpage fails or interaction is needed
2. Data Provider Priority:
* ALWAYS check if a data provider exists for your research topic
* Use data providers as the primary source when available
* Data providers offer real-time, accurate data for:
- LinkedIn data
- Twitter data
- Zillow data
- Amazon data
- Yahoo Finance data
- Active Jobs data
* Only fall back to web search when no data provider is available
3. Research Workflow:
a. First check for relevant data providers
b. If no data provider exists:
- Use web-search to find relevant URLs
- Use scrape-webpage on URLs from web-search results
- Only if scrape-webpage fails or if the page requires interaction:
* Use direct browser tools (browser_navigate_to, browser_go_back, browser_wait, browser_click_element, browser_input_text, browser_send_keys, browser_switch_tab, browser_close_tab, browser_scroll_down, browser_scroll_up, browser_scroll_to_text, browser_get_dropdown_options, browser_select_dropdown_option, browser_drag_drop, browser_click_coordinates etc.)
* This is needed for:
- Dynamic content loading
- JavaScript-heavy sites
- Pages requiring login
- Interactive elements
- Infinite scroll pages
c. Cross-reference information from multiple sources
d. Verify data accuracy and freshness
e. Document sources and timestamps
- Web Search Best Practices:
1. Use specific, targeted search queries to obtain the most relevant results
2. Include key terms and contextual information in search queries
@ -307,7 +353,7 @@ You have the ability to execute operations using both Python and CLI tools:
* Interactive elements
* Infinite scroll pages
4. DO NOT use browser tools directly unless scrape-webpage fails or interaction is required
5. Maintain this strict workflow order: web-search scrape-webpage browser tools (if needed)
5. Maintain this strict workflow order: web-search scrape-webpage direct browser tools (if needed)
6. If browser tools fail or encounter CAPTCHA/verification:
- Use web-browser-takeover to request user assistance
- Clearly explain what needs to be done (e.g., solve CAPTCHA)
@ -328,12 +374,6 @@ You have the ability to execute operations using both Python and CLI tools:
4. Provide timestamp context when sharing web search information
5. Specify date ranges when searching for time-sensitive topics
- Search Result Analysis:
1. Compare multiple sources for fact verification
2. Evaluate source credibility based on domain, publication type
3. Extract key information from search result summaries
4. Deeply analyze content from high-relevance results
5. Synthesize information from multiple search results
- Results Limitations:
1. Acknowledge when content is not accessible or behind paywalls

View File

@ -67,35 +67,6 @@ class SandboxFilesTool(SandboxToolsBase):
print(f"Error getting workspace state: {str(e)}")
return {}
async def _ensure_sandbox(self) -> Sandbox:
"""Ensure we have a valid sandbox instance, retrieving it from the project if needed."""
if self._sandbox is None:
try:
# Get database client
client = await self.thread_manager.db.client
# Get project data
project = await client.table('projects').select('*').eq('project_id', self.project_id).execute()
if not project.data or len(project.data) == 0:
raise ValueError(f"Project {self.project_id} not found")
project_data = project.data[0]
sandbox_info = project_data.get('sandbox', {})
if not sandbox_info.get('id'):
raise ValueError(f"No sandbox found for project {self.project_id}")
# Store sandbox info
self._sandbox_id = sandbox_info['id']
self._sandbox_pass = sandbox_info.get('pass')
self._sandbox_url = sandbox_info.get('sandbox_url')
# Get or start the sandbox
self._sandbox = await get_or_start_sandbox(self._sandbox_id)
except Exception as e:
logger.error(f"Error retrieving sandbox for project {self.project_id}: {str(e)}", exc_info=True)
raise e
def _get_preview_url(self, file_path: str) -> Optional[str]:
"""Get the preview URL for a file if it's an HTML file."""

View File

@ -41,11 +41,11 @@ class WebSearchTool(Tool):
"type": "string",
"description": "The search query to find relevant web pages. Be specific and include key terms to improve search accuracy. For best results, use natural language questions or keyword combinations that precisely describe what you're looking for."
},
"summary": {
"type": "boolean",
"description": "Whether to include a summary of each search result. Summaries provide key context about each page without requiring full content extraction. Set to true to get concise descriptions of each result.",
"default": True
},
# "summary": {
# "type": "boolean",
# "description": "Whether to include a summary of each search result. Summaries provide key context about each page without requiring full content extraction. Set to true to get concise descriptions of each result.",
# "default": True
# },
"num_results": {
"type": "integer",
"description": "The number of search results to return. Increase for more comprehensive research or decrease for focused, high-relevance results.",
@ -60,7 +60,7 @@ class WebSearchTool(Tool):
tag_name="web-search",
mappings=[
{"param_name": "query", "node_type": "attribute", "path": "."},
{"param_name": "summary", "node_type": "attribute", "path": "."},
# {"param_name": "summary", "node_type": "attribute", "path": "."},
{"param_name": "num_results", "node_type": "attribute", "path": "."}
],
example='''
@ -71,21 +71,18 @@ class WebSearchTool(Tool):
The tool returns information including:
- Titles of relevant web pages
- URLs for accessing the pages
- Summaries of page content (if summary=true)
- Published dates (when available)
-->
<!-- Simple search example -->
<web-search
query="current weather in New York City"
summary="true"
num_results="20">
</web-search>
<!-- Another search example -->
<web-search
query="healthy breakfast recipes"
summary="true"
num_results="20">
</web-search>
'''
@ -93,7 +90,7 @@ class WebSearchTool(Tool):
async def web_search(
self,
query: str,
summary: bool = True,
# summary: bool = True,
num_results: int = 20
) -> ToolResult:
"""
@ -140,13 +137,13 @@ class WebSearchTool(Tool):
"url": result.get("url", ""),
}
if summary:
# Prefer full content; fall back to description
formatted_result["snippet"] = (
result.get("content") or
result.get("description") or
""
)
# if summary:
# # Prefer full content; fall back to description
# formatted_result["snippet"] = (
# result.get("content") or
# result.get("description") or
# ""
# )
formatted_results.append(formatted_result)
@ -207,7 +204,7 @@ class WebSearchTool(Tool):
<!-- 1. First search for relevant content -->
<web-search
query="latest AI research papers"
summary="true"
# summary="true"
num_results="5">
</web-search>
@ -311,7 +308,7 @@ if __name__ == "__main__":
search_tool = WebSearchTool()
result = await search_tool.web_search(
query="rubber gym mats best prices comparison",
summary=True,
# summary=True,
num_results=20
)
print(result)

View File

@ -107,46 +107,6 @@ async def log_requests_middleware(request: Request, call_next):
logger.error(f"Request failed: {method} {path} | Error: {str(e)} | Time: {process_time:.2f}s")
raise
# @app.middleware("http")
# async def throw_error_middleware(request: Request, call_next):
# client_ip = request.client.host
# if client_ip != "109.49.168.102":
# logger.warning(f"Request blocked from IP {client_ip} to {request.method} {request.url.path}")
# return JSONResponse(
# status_code=403,
# content={"error": "Request blocked", "message": "Test DDoS protection"}
# )
# return await call_next(request)
# @app.middleware("http")
# async def rate_limit_middleware(request: Request, call_next):
# global ip_tracker
# client_ip = request.client.host
# # Clean up old entries (older than 5 minutes)
# current_time = time.time()
# ip_tracker = OrderedDict((ip, ts) for ip, ts in ip_tracker.items()
# if current_time - ts < 300)
# # Check if IP is already tracked
# if client_ip in ip_tracker:
# ip_tracker[client_ip] = current_time
# return await call_next(request)
# # Check if we've hit the limit
# if len(ip_tracker) >= MAX_CONCURRENT_IPS:
# logger.warning(f"Rate limit exceeded. Current IPs: {len(ip_tracker)}")
# return JSONResponse(
# status_code=429,
# content={"error": "Too many concurrent connections",
# "message": "Maximum number of concurrent connections reached"}
# )
# # Add new IP
# ip_tracker[client_ip] = current_time
# logger.info(f"New connection from IP {client_ip}. Total connections: {len(ip_tracker)}")
# return await call_next(request)
# Define allowed origins based on environment
allowed_origins = ["https://www.suna.so", "https://suna.so", "https://staging.suna.so", "http://localhost:3000"]

View File

@ -16,7 +16,7 @@ Usage:
import os
from enum import Enum
from typing import Dict, Any, Optional, get_type_hints
from typing import Dict, Any, Optional, get_type_hints, Union
from dotenv import load_dotenv
import logging
@ -40,11 +40,13 @@ class Configuration:
ENV_MODE: EnvMode = EnvMode.LOCAL
# LLM API keys
ANTHROPIC_API_KEY: str = None
OPENAI_API_KEY: Optional[str] = None
ANTHROPIC_API_KEY: Optional[str] = None
GROQ_API_KEY: Optional[str] = None
OPENROUTER_API_KEY: Optional[str] = None
OPENROUTER_API_BASE: str = "https://openrouter.ai/api/v1"
OPENROUTER_API_BASE: Optional[str] = "https://openrouter.ai/api/v1"
OR_SITE_URL: Optional[str] = None
OR_APP_NAME: Optional[str] = "Suna.so"
# AWS Bedrock credentials
AWS_ACCESS_KEY_ID: Optional[str] = None
@ -52,38 +54,35 @@ class Configuration:
AWS_REGION_NAME: Optional[str] = None
# Model configuration
MODEL_TO_USE: str = "anthropic/claude-3-7-sonnet-latest"
MODEL_TO_USE: Optional[str] = "anthropic/claude-3-7-sonnet-latest"
# Supabase configuration
SUPABASE_URL: Optional[str] = None
SUPABASE_ANON_KEY: Optional[str] = None
SUPABASE_SERVICE_ROLE_KEY: Optional[str] = None
SUPABASE_URL: str
SUPABASE_ANON_KEY: str
SUPABASE_SERVICE_ROLE_KEY: str
# Redis configuration
REDIS_HOST: Optional[str] = None
REDIS_HOST: str
REDIS_PORT: int = 6379
REDIS_PASSWORD: Optional[str] = None
REDIS_PASSWORD: str
REDIS_SSL: bool = True
# Daytona sandbox configuration
DAYTONA_API_KEY: Optional[str] = None
DAYTONA_SERVER_URL: Optional[str] = None
DAYTONA_TARGET: Optional[str] = None
DAYTONA_API_KEY: str
DAYTONA_SERVER_URL: str
DAYTONA_TARGET: str
# Search and other API keys
TAVILY_API_KEY: Optional[str] = None
RAPID_API_KEY: Optional[str] = None
TAVILY_API_KEY: str
RAPID_API_KEY: str
CLOUDFLARE_API_TOKEN: Optional[str] = None
FIRECRAWL_API_KEY: Optional[str] = None
FIRECRAWL_API_KEY: str
# Stripe configuration
STRIPE_SECRET_KEY: Optional[str] = None
STRIPE_DEFAULT_PLAN_ID: Optional[str] = None
STRIPE_DEFAULT_TRIAL_DAYS: int = 14
# Open Router configuration
OR_SITE_URL: Optional[str] = None
OR_APP_NAME: Optional[str] = "Suna.so"
def __init__(self):
"""Initialize configuration by loading from environment variables."""
@ -130,28 +129,24 @@ class Configuration:
setattr(self, key, env_val)
def _validate(self):
"""Validate configuration based on environment mode."""
# Keys required in all environments
required_keys = []
"""Validate configuration based on type hints."""
# Get all configuration fields and their type hints
type_hints = get_type_hints(self.__class__)
# Add keys required in non-local environments
if self.ENV_MODE != EnvMode.LOCAL:
required_keys.extend([
"SUPABASE_URL",
"SUPABASE_SERVICE_ROLE_KEY"
])
# Find missing required fields
missing_fields = []
for field, field_type in type_hints.items():
# Check if the field is Optional
is_optional = hasattr(field_type, "__origin__") and field_type.__origin__ is Union and type(None) in field_type.__args__
# Additional keys required in production
if self.ENV_MODE == EnvMode.PRODUCTION:
required_keys.extend([
"REDIS_HOST",
"REDIS_PASSWORD"
])
# If not optional and value is None, add to missing fields
if not is_optional and getattr(self, field) is None:
missing_fields.append(field)
# Validate required keys
for key in required_keys:
if not getattr(self, key):
logger.warning(f"Required configuration {key} is missing for {self.ENV_MODE.value} environment")
if missing_fields:
error_msg = f"Missing required configuration fields: {', '.join(missing_fields)}"
logger.error(error_msg)
raise ValueError(error_msg)
def get(self, key: str, default: Any = None) -> Any:
"""Get a configuration value with an optional default."""

View File

@ -1252,11 +1252,15 @@ export default function ThreadPage({ params }: { params: Promise<ThreadParams> }
/>
<div className="flex flex-1 items-center justify-center p-4">
<div className="flex w-full max-w-md flex-col items-center gap-4 rounded-lg border bg-card p-6 text-center">
<h2 className="text-lg font-semibold text-destructive">Error</h2>
<p className="text-sm text-muted-foreground">{error}</p>
<Button variant="outline" onClick={() => router.push(`/projects/${project?.id || ''}`)}>
Back to Project
</Button>
<div className="rounded-full bg-destructive/10 p-3">
<AlertTriangle className="h-6 w-6 text-destructive" />
</div>
<h2 className="text-lg font-semibold text-destructive">Thread Not Found</h2>
<p className="text-sm text-muted-foreground">
{error.includes('JSON object requested, multiple (or no) rows returned')
? 'This thread either does not exist or you do not have access to it.'
: error}
</p>
</div>
</div>
</div>

View File

@ -28,7 +28,7 @@ export default function DashboardLayout({
const router = useRouter()
useEffect(() => {
setShowPricingAlert(false)
setShowPricingAlert(true)
setShowMaintenanceAlert(false)
}, [])

View File

@ -11,8 +11,8 @@ export const createClient = () => {
supabaseUrl = `http://${supabaseUrl}`;
}
console.log('Supabase URL:', supabaseUrl);
console.log('Supabase Anon Key:', supabaseAnonKey);
// console.log('Supabase URL:', supabaseUrl);
// console.log('Supabase Anon Key:', supabaseAnonKey);
return createBrowserClient(
supabaseUrl,