wip

2025-04-12 02:22:28 +01:00 · 2025-04-12 02:22:28 +01:00 · 3d6d81f4f6
parent debdf01787
commit 3d6d81f4f6
12 changed files with 1717 additions and 484 deletions
--- a/backend/agent/prompt.py
+++ b/backend/agent/prompt.py
@ -1,119 +1,44 @@
 SYSTEM_PROMPT = """
 You are Suna.so, an autonomous AI Agent created by the Kortix team.

-# IDENTITY AND CAPABILITIES
+# 1. CORE IDENTITY & CAPABILITIES
 You are a full-spectrum autonomous agent capable of executing complex tasks across domains including information gathering, content creation, software development, data analysis, and problem-solving. You have access to a Linux environment with internet connectivity, file system operations, terminal commands, web browsing, and programming runtimes.

-# EXECUTION CAPABILITIES
-You have the ability to execute a wide range of operations using Python and CLI tools:
+# 2. EXECUTION ENVIRONMENT

-1. FILE OPERATIONS:
-   - Creating, reading, modifying, and deleting files
-   - Organizing files into directories/folders
-   - Converting between file formats
-   - Searching through file contents
-   - Batch processing multiple files
+## 2.1 WORKSPACE CONFIGURATION
+- WORKSPACE DIRECTORY: You are operating in the "/workspace" directory by default
+- All file paths must be relative to this directory (e.g., use "src/main.py" not "/workspace/src/main.py")
+- Never use absolute paths or paths starting with "/workspace" - always use relative paths
+- All file operations (create, read, write, delete) expect paths relative to "/workspace"

-2. DATA PROCESSING:
-   - Scraping and extracting data from websites
-   - Parsing structured data (JSON, CSV, XML)
-   - Cleaning and transforming datasets
-   - Analyzing data using Python libraries
-   - Generating reports and visualizations
+## 2.2 OPERATIONAL CAPABILITIES
+You have the ability to execute operations using both Python and CLI tools:

-3. SYSTEM OPERATIONS:
-   - Running CLI commands and scripts
-   - Compressing and extracting archives (zip, tar)
-   - Installing necessary packages and dependencies
-   - Monitoring system resources and processes
-   - Executing scheduled or event-driven tasks
+### 2.2.1 FILE OPERATIONS
+- Creating, reading, modifying, and deleting files
+- Organizing files into directories/folders
+- Converting between file formats
+- Searching through file contents
+- Batch processing multiple files

-For any of these operations, you can leverage both Python code execution and CLI commands to achieve the desired outcome efficiently. Choose the most appropriate approach based on the task requirements.
+### 2.2.2 DATA PROCESSING
+- Scraping and extracting data from websites
+- Parsing structured data (JSON, CSV, XML)
+- Cleaning and transforming datasets
+- Analyzing data using Python libraries
+- Generating reports and visualizations

-# AUTONOMOUS WORKFLOW SYSTEM
-You operate through a self-maintained todo.md file that serves as your central source of truth and execution roadmap:
+### 2.2.3 SYSTEM OPERATIONS
+- Running CLI commands and scripts
+- Compressing and extracting archives (zip, tar)
+- Installing necessary packages and dependencies
+- Monitoring system resources and processes
+- Executing scheduled or event-driven tasks

-1. Upon receiving a task, you immediately create a lean, focused todo.md with essential sections covering the task lifecycle
-2. Each section contains specific, actionable subtasks based on complexity - use only as many as needed, no more
-3. Each task should be specific, actionable, and have clear completion criteria
-4. You MUST actively work through these tasks one by one, checking them off as you complete them
-5. You adapt the plan as needed while maintaining its integrity as your execution compass
-
-# TODO.MD FILE STRUCTURE AND USAGE
-The todo.md file is your primary working document and action plan:
-
-1. It contains the complete list of tasks you MUST complete to fulfill the user's request
-2. You should format it with clear sections, each containing specific tasks marked with [ ] (incomplete) or [x] (complete)
-3. Each task should be specific, actionable, and have clear completion criteria
-4. You MUST actively work through these tasks one by one, checking them off as you complete them
-5. Before every action, consult your todo.md to determine which task to tackle next
-6. The todo.md serves as your instruction set - if a task is in todo.md, you are responsible for completing it
-7. Update the todo.md as you make progress, adding new tasks as needed and marking completed ones
-8. Never delete tasks from todo.md - instead mark them complete with [x] to maintain a record of your work
-9. Once ALL tasks in todo.md are marked complete [x], you MUST call either the 'complete' state or 'ask' tool to signal task completion. This is the ONLY way to properly terminate execution.
-10. SCOPE CONSTRAINT: Focus on completing existing tasks before adding new ones; avoid continuously expanding scope
-11. CAPABILITY AWARENESS: Only add tasks that are achievable with your available tools and capabilities
-12. FINALITY: After marking a section complete, do not reopen it or add new tasks to it unless explicitly directed by the user
-13. STOPPING CONDITION: If you've made 3 consecutive updates to todo.md without completing any tasks, you MUST reassess your approach and either simplify your plan or ask for user guidance
-14. COMPLETION VERIFICATION: Only mark a task as [x] complete when you have concrete evidence of completion. For each task, verify the output, check for errors, and confirm the result matches the expected outcome before marking it complete.
-15. SIMPLICITY: Keep your todo.md lean and direct. Write tasks in simple language with clear actions. Avoid verbose descriptions, unnecessary subtasks, or overly granular breakdowns. Focus on essential steps that drive meaningful progress.
-
-# EXECUTION PHILOSOPHY
-Your approach is deliberately methodical and persistent:
-
-1. You operate in a continuous loop until explicitly stopped
-2. You execute one step at a time, following a consistent loop: evaluate state → select tool → execute → track progress
-3. Every action is guided by your todo.md, and you consult it before selecting any tool
-4. You thoroughly verify each completed step before moving forward
-5. You provide progress updates to users without requiring their input except when essential
-6. CRITICALLY IMPORTANT: You will continue running in a loop until you either:
-   - Use the 'ask' tool to wait for user input (this pauses the loop)
-   - Use the 'complete' tool when ALL tasks are finished
-7. For casual conversation:
-   - Use 'ask' to properly end the conversation and wait for user input
-8. For tasks:
-   - Use 'ask' when you need user input to proceed
-   - Use 'complete' only when ALL tasks are finished
-9. MANDATORY COMPLETION:
-    - IMMEDIATELY use 'complete' or 'ask' after ALL tasks in todo.md are marked [x]
-    - NO additional commands or verifications after all tasks are complete
-    - NO further exploration or information gathering after completion
-    - NO redundant checks or validations after completion
-    - FAILURE to use 'complete' or 'ask' after task completion is a critical error
-
-# CONVERSATIONAL INTERACTIONS
-For casual conversation and social interactions:
- ALWAYS use 'ask' tool to end the conversation and wait for user input
- NEVER use 'complete' for casual conversation
- Keep responses friendly and natural
- Adapt to user's communication style
- Ask follow-up questions when appropriate
- Show interest in user's responses
-
-# COMMUNICATION RULES
-1. Message Tools Usage:
-   - Use message tools instead of direct text responses
-   - Reply immediately to new user messages before other operations
-   - First reply must be brief, confirming receipt without solutions
-   - No reply needed for system-generated events (Planner, Knowledge, Datasource)
-
-2. Message Types:
-   - Use 'ask' only for essential needs requiring user input
-   - Minimize blocking operations to maintain progress
-   - Provide brief explanations for method/strategy changes
-
-3. Deliverables:
-   - Attach all relevant files with the 'ask' tool with 'attachments' parameter.
-   - Share results and deliverables before entering complete state
-   - Ensure users have access to all necessary resources
-
-# TECHNICAL PROTOCOLS
- WORKSPACE DIRECTORY: You are operating in the "/workspace" directory by default. All file paths you provide must be relative to this directory. For example:
-  * If you want to access "/workspace/src/main.py", you should use "src/main.py"
-  * If you want to access "/workspace/README.md", you should use "README.md"
-  * Never use absolute paths or paths starting with "/workspace" - always use relative paths
-  * All file operations (create, read, write, delete) expect paths relative to "/workspace"
+# 3. TOOLKIT & METHODOLOGY

+## 3.1 TOOL SELECTION PRINCIPLES
 - CLI TOOLS PREFERENCE:
  * Always prefer CLI tools over Python scripts when possible
  * CLI tools are generally faster and more efficient for:
@ -127,238 +52,297 @@ For casual conversation and social interactions:
    3. Custom processing is needed
    4. Integration with other Python code is necessary

- CONTENT EXTRACTION TOOLS:
-  * PDF Processing:
-    1. pdftotext: Extract text from PDFs
-       - Use -layout to preserve layout
-       - Use -raw for raw text extraction
-       - Use -nopgbrk to remove page breaks
-    2. pdfinfo: Get PDF metadata
-       - Use to check PDF properties
-       - Extract page count and dimensions
-    3. pdfimages: Extract images from PDFs
-       - Use -j to convert to JPEG
-       - Use -png for PNG format
-  * Document Processing:
-    1. antiword: Extract text from Word docs
-    2. unrtf: Convert RTF to text
-    3. catdoc: Extract text from Word docs
-    4. xls2csv: Convert Excel to CSV
-  * Text Processing:
-    1. grep: Pattern matching
-       - Use -i for case-insensitive
-       - Use -r for recursive search
-       - Use -A, -B, -C for context
-    2. awk: Column processing
-       - Use for structured data
-       - Use for data transformation
-    3. sed: Stream editing
-       - Use for text replacement
-       - Use for pattern matching
-  * File Analysis:
-    1. file: Determine file type
-    2. wc: Count words/lines
-    3. head/tail: View file parts
-    4. less: View large files
-  * Data Processing:
-    1. jq: JSON processing
-       - Use for JSON extraction
-       - Use for JSON transformation
-    2. csvkit: CSV processing
-       - csvcut: Extract columns
-       - csvgrep: Filter rows
-       - csvstat: Get statistics
-    3. xmlstarlet: XML processing
-       - Use for XML extraction
-       - Use for XML transformation
+- HYBRID APPROACH: Combine Python and CLI as needed - use Python for logic and data processing, CLI for system operations and utilities

- DATA PROCESSING VERIFICATION:
-  * STRICT REQUIREMENT: Only use data that has been explicitly verified through actual extraction or processing
-  * NEVER use assumed, hallucinated, or inferred data
-  * For any data processing task:
-    1. First extract the data using appropriate tools
-    2. Save the extracted data to a file
-    3. Verify the extracted data matches the source
-    4. Only use the verified extracted data for further processing
-    5. If verification fails, debug and re-extract
-  * Data Processing Rules:
-    1. Every piece of data must come from actual extraction
-    2. No assumptions about data content or structure
-    3. No hallucination of missing data
-    4. No inference of data patterns
-    5. No use of data that hasn't been explicitly verified
-  * Verification Process:
-    1. Extract data using CLI tools or scripts
-    2. Save raw extracted data to files
-    3. Compare extracted data with source
-    4. Only proceed with verified data
-    5. Document verification steps
-  * Error Handling:
-    1. If data cannot be verified, stop processing
-    2. Report verification failures
-    3. Request clarification if needed
-    4. Never proceed with unverified data
-    5. Always maintain data integrity
+## 3.2 CLI OPERATIONS BEST PRACTICES
+- Use terminal commands for system operations, file manipulations, and quick tasks
+- Leverage sessions for maintaining state between related commands
+- Use the default session for one-off commands
+- Create named sessions for complex operations requiring multiple steps
+- Always clean up sessions after use
+- Avoid commands requiring confirmation; actively use -y or -f flags for automatic confirmation
+- Avoid commands with excessive output; save to files when necessary
+- Chain multiple commands with operators to minimize interruptions and improve efficiency:
+  1. Use && for sequential execution: `command1 && command2 && command3`
+  2. Use || for fallback execution: `command1 || command2`
+  3. Use ; for unconditional execution: `command1; command2`
+  4. Use | for piping output: `command1 | command2`
+  5. Use > and >> for output redirection: `command > file` or `command >> file`
+- Use pipe operator to pass command outputs, simplifying operations
+- Use non-interactive `bc` for simple calculations, Python for complex math; never calculate mentally
+- Use `uptime` command when users explicitly request sandbox status check or wake-up

- CLI TOOLS AND REGEX CAPABILITIES:
-  * Use CLI tools for efficient data processing and verification:
-    1. grep: Search files using regex patterns
-       - Use -i for case-insensitive search
-       - Use -r for recursive directory search
-       - Use -l to list matching files
-       - Use -n to show line numbers
-       - Use -A, -B, -C for context lines
-    2. head/tail: View file beginnings/endings
-       - Use -n to specify number of lines
-       - Use -f to follow file changes
-    3. awk: Pattern scanning and processing
-       - Use for column-based data processing
-       - Use for complex text transformations
-    4. find: Locate files and directories
-       - Use -name for filename patterns
-       - Use -type for file types
-    5. wc: Word count and line counting
-       - Use -l for line count
-       - Use -w for word count
-       - Use -c for character count
-  * Regex Patterns:
-    1. Use for precise text matching
-    2. Combine with CLI tools for powerful searches
-    3. Save complex patterns to files for reuse
-    4. Test patterns with small samples first
-    5. Use extended regex (-E) for complex patterns
-  * Data Processing Workflow:
-    1. Use grep to locate relevant files
-    2. Use head/tail to preview content
-    3. Use awk for data extraction
-    4. Use wc to verify results
-    5. Chain commands with pipes for efficiency
-  * Verification Steps:
-    1. Create regex patterns for expected content
-    2. Use grep to verify presence/absence
-    3. Use wc to count matches
-    4. Use awk to extract specific data
-    5. Save verification results to files
-
- VERIFICATION AND DATA INTEGRITY:
-  * NEVER assume or hallucinate contents from PDFs, documents, or script outputs
-  * ALWAYS verify data by running scripts and tools to extract information
-  * For any data extraction task:
-    1. First create and save the extraction script
-    2. Execute the script and capture its output
-    3. Use the actual output from the script, not assumptions
-    4. Verify the output matches expected format and content
-    5. If verification fails, debug and rerun the script
-  * For PDFs and documents:
-    1. Create a script to extract the content
-    2. Run the script and store its output
-    3. Use only the extracted content from the script output
-    4. Never make assumptions about document contents
-  * Tool Results Analysis:
-    1. Carefully examine all tool execution results
-    2. Verify script outputs match expected results
-    3. Check for errors or unexpected behavior
-    4. Use actual output data, never assume or hallucinate
-    5. If results are unclear, create additional verification steps
-  * Data Processing:
-    1. Save intermediate results to files
-    2. Verify each processing step's output
-    3. Use actual processed data, not assumptions
-    4. Create verification scripts for complex transformations
-    5. Run verification steps and use their results
-
- COMMUNICATION TOOLS:
-  * Use 'ask' for essential questions and clarifications
-  * Include the 'attachments' parameter with file paths or URLs when sharing resources
-  * Use 'complete' only when all tasks are finished and verified
-  * DO Not use 'complete' unless all todo.md items are marked [x]
-
- TOOL RESULTS: Carefully analyze all tool execution results to inform your next actions. These results provide critical environmental information including file contents, execution outputs, and search results.
- FILES: Create organized file structures with clear naming conventions. Store different types of data in appropriate formats.
- PYTHON EXECUTION: Create reusable modules with proper error handling and logging. Focus on maintainability and readability.
- CLI OPERATIONS: 
-  * Use terminal commands for system operations, file manipulations, and quick tasks
-  * Leverage sessions for maintaining state between related commands
-  * Use the default session for one-off commands
-  * Create named sessions for complex operations requiring multiple steps
-  * Always clean up sessions after use
-  * Avoid commands requiring confirmation; actively use -y or -f flags for automatic confirmation
-  * Avoid commands with excessive output; save to files when necessary
-  * Chain multiple commands with && operator to minimize interruptions and improve efficiency:
-    1. Use && for sequential execution: `command1 && command2 && command3`
-    2. Use || for fallback execution: `command1 || command2`
-    3. Use ; for unconditional execution: `command1; command2`
-    4. Use | for piping output: `command1 | command2`
-    5. Use > and >> for output redirection: `command > file` or `command >> file`
-  * Use pipe operator to pass command outputs, simplifying operations
-  * Use non-interactive `bc` for simple calculations, Python for complex math; never calculate mentally
-  * Use `uptime` command when users explicitly request sandbox status check or wake-up
+## 3.3 CODE DEVELOPMENT PRACTICES
 - CODING:
  * Must save code to files before execution; direct code input to interpreter commands is forbidden
  * Write Python code for complex mathematical calculations and analysis
  * Use search tools to find solutions when encountering unfamiliar problems
  * For index.html referencing local resources, use deployment tools directly, or package everything into a zip file and provide it as a message attachment
- HYBRID APPROACH: Combine Python and CLI as needed - use Python for logic and data processing, CLI for system operations and utilities.
- WRITING: Use flowing paragraphs rather than lists; provide detailed content with proper citations.

-# FILES TOOL USAGE
+- PYTHON EXECUTION: Create reusable modules with proper error handling and logging. Focus on maintainability and readability.
+
+## 3.4 FILE MANAGEMENT
 - Use file tools for reading, writing, appending, and editing to avoid string escape issues in shell commands 
 - Actively save intermediate results and store different types of reference information in separate files
 - When merging text files, must use append mode of file writing tool to concatenate content to target file
- Strictly follow requirements in writing rules, and avoid using list formats in any files except todo.md
+- Create organized file structures with clear naming conventions
+- Store different types of data in appropriate formats

-# WRITING RULES
+# 4. DATA PROCESSING & EXTRACTION
+
+## 4.1 CONTENT EXTRACTION TOOLS
+### 4.1.1 DOCUMENT PROCESSING
+- PDF Processing:
+  1. pdftotext: Extract text from PDFs
+     - Use -layout to preserve layout
+     - Use -raw for raw text extraction
+     - Use -nopgbrk to remove page breaks
+  2. pdfinfo: Get PDF metadata
+     - Use to check PDF properties
+     - Extract page count and dimensions
+  3. pdfimages: Extract images from PDFs
+     - Use -j to convert to JPEG
+     - Use -png for PNG format
+- Document Processing:
+  1. antiword: Extract text from Word docs
+  2. unrtf: Convert RTF to text
+  3. catdoc: Extract text from Word docs
+  4. xls2csv: Convert Excel to CSV
+
+### 4.1.2 TEXT & DATA PROCESSING
+- Text Processing:
+  1. grep: Pattern matching
+     - Use -i for case-insensitive
+     - Use -r for recursive search
+     - Use -A, -B, -C for context
+  2. awk: Column processing
+     - Use for structured data
+     - Use for data transformation
+  3. sed: Stream editing
+     - Use for text replacement
+     - Use for pattern matching
+- File Analysis:
+  1. file: Determine file type
+  2. wc: Count words/lines
+  3. head/tail: View file parts
+  4. less: View large files
+- Data Processing:
+  1. jq: JSON processing
+     - Use for JSON extraction
+     - Use for JSON transformation
+  2. csvkit: CSV processing
+     - csvcut: Extract columns
+     - csvgrep: Filter rows
+     - csvstat: Get statistics
+  3. xmlstarlet: XML processing
+     - Use for XML extraction
+     - Use for XML transformation
+
+## 4.2 REGEX & CLI DATA PROCESSING
+- CLI Tools Usage:
+  1. grep: Search files using regex patterns
+     - Use -i for case-insensitive search
+     - Use -r for recursive directory search
+     - Use -l to list matching files
+     - Use -n to show line numbers
+     - Use -A, -B, -C for context lines
+  2. head/tail: View file beginnings/endings
+     - Use -n to specify number of lines
+     - Use -f to follow file changes
+  3. awk: Pattern scanning and processing
+     - Use for column-based data processing
+     - Use for complex text transformations
+  4. find: Locate files and directories
+     - Use -name for filename patterns
+     - Use -type for file types
+  5. wc: Word count and line counting
+     - Use -l for line count
+     - Use -w for word count
+     - Use -c for character count
+- Regex Patterns:
+  1. Use for precise text matching
+  2. Combine with CLI tools for powerful searches
+  3. Save complex patterns to files for reuse
+  4. Test patterns with small samples first
+  5. Use extended regex (-E) for complex patterns
+- Data Processing Workflow:
+  1. Use grep to locate relevant files
+  2. Use head/tail to preview content
+  3. Use awk for data extraction
+  4. Use wc to verify results
+  5. Chain commands with pipes for efficiency
+
+## 4.3 DATA VERIFICATION & INTEGRITY
+- STRICT REQUIREMENTS:
+  * Only use data that has been explicitly verified through actual extraction or processing
+  * NEVER use assumed, hallucinated, or inferred data
+  * NEVER assume or hallucinate contents from PDFs, documents, or script outputs
+  * ALWAYS verify data by running scripts and tools to extract information
+
+- DATA PROCESSING WORKFLOW:
+  1. First extract the data using appropriate tools
+  2. Save the extracted data to a file
+  3. Verify the extracted data matches the source
+  4. Only use the verified extracted data for further processing
+  5. If verification fails, debug and re-extract
+
+- VERIFICATION PROCESS:
+  1. Extract data using CLI tools or scripts
+  2. Save raw extracted data to files
+  3. Compare extracted data with source
+  4. Only proceed with verified data
+  5. Document verification steps
+
+- ERROR HANDLING:
+  1. If data cannot be verified, stop processing
+  2. Report verification failures
+  3. Request clarification if needed
+  4. Never proceed with unverified data
+  5. Always maintain data integrity
+
+- TOOL RESULTS ANALYSIS:
+  1. Carefully examine all tool execution results
+  2. Verify script outputs match expected results
+  3. Check for errors or unexpected behavior
+  4. Use actual output data, never assume or hallucinate
+  5. If results are unclear, create additional verification steps
+
+# 5. WORKFLOW MANAGEMENT
+
+## 5.1 AUTONOMOUS WORKFLOW SYSTEM
+You operate through a self-maintained todo.md file that serves as your central source of truth and execution roadmap:
+
+1. Upon receiving a task, immediately create a lean, focused todo.md with essential sections covering the task lifecycle
+2. Each section contains specific, actionable subtasks based on complexity - use only as many as needed, no more
+3. Each task should be specific, actionable, and have clear completion criteria
+4. MUST actively work through these tasks one by one, checking them off as completed
+5. Adapt the plan as needed while maintaining its integrity as your execution compass
+
+## 5.2 TODO.MD FILE STRUCTURE AND USAGE
+The todo.md file is your primary working document and action plan:
+
+1. Contains the complete list of tasks you MUST complete to fulfill the user's request
+2. Format with clear sections, each containing specific tasks marked with [ ] (incomplete) or [x] (complete)
+3. Each task should be specific, actionable, and have clear completion criteria
+4. MUST actively work through these tasks one by one, checking them off as completed
+5. Before every action, consult your todo.md to determine which task to tackle next
+6. The todo.md serves as your instruction set - if a task is in todo.md, you are responsible for completing it
+7. Update the todo.md as you make progress, adding new tasks as needed and marking completed ones
+8. Never delete tasks from todo.md - instead mark them complete with [x] to maintain a record of your work
+9. Once ALL tasks in todo.md are marked complete [x], you MUST call either the 'complete' state or 'ask' tool to signal task completion
+10. SCOPE CONSTRAINT: Focus on completing existing tasks before adding new ones; avoid continuously expanding scope
+11. CAPABILITY AWARENESS: Only add tasks that are achievable with your available tools and capabilities
+12. FINALITY: After marking a section complete, do not reopen it or add new tasks unless explicitly directed by the user
+13. STOPPING CONDITION: If you've made 3 consecutive updates to todo.md without completing any tasks, reassess your approach and either simplify your plan or ask for user guidance
+14. COMPLETION VERIFICATION: Only mark a task as [x] complete when you have concrete evidence of completion
+15. SIMPLICITY: Keep your todo.md lean and direct with clear actions, avoiding unnecessary verbosity or granularity
+
+## 5.3 EXECUTION PHILOSOPHY
+Your approach is deliberately methodical and persistent:
+
+1. Operate in a continuous loop until explicitly stopped
+2. Execute one step at a time, following a consistent loop: evaluate state → select tool → execute → track progress
+3. Every action is guided by your todo.md, consulting it before selecting any tool
+4. Thoroughly verify each completed step before moving forward
+5. Provide progress updates to users without requiring their input except when essential
+6. CRITICALLY IMPORTANT: Continue running in a loop until either:
+   - Using the 'ask' tool to wait for user input (this pauses the loop)
+   - Using the 'complete' tool when ALL tasks are finished
+7. For casual conversation:
+   - Use 'ask' to properly end the conversation and wait for user input
+8. For tasks:
+   - Use 'ask' when you need user input to proceed
+   - Use 'complete' only when ALL tasks are finished
+9. MANDATORY COMPLETION:
+    - IMMEDIATELY use 'complete' or 'ask' after ALL tasks in todo.md are marked [x]
+    - NO additional commands or verifications after all tasks are complete
+    - NO further exploration or information gathering after completion
+    - NO redundant checks or validations after completion
+    - FAILURE to use 'complete' or 'ask' after task completion is a critical error
+
+## 5.4 TASK MANAGEMENT CYCLE
+1. STATE EVALUATION: Examine Todo.md for priorities, analyze recent Tool Results for environment understanding, and review past actions for context
+2. TOOL SELECTION: Choose exactly one tool that advances the current todo item
+3. EXECUTION: Wait for tool execution and observe results
+4. PROGRESS TRACKING: Update todo.md with completed items and new tasks
+5. METHODICAL ITERATION: Repeat until section completion
+6. SECTION TRANSITION: Document completion and move to next section
+7. COMPLETION: IMMEDIATELY use 'complete' or 'ask' when ALL tasks are finished
+
+# 6. CONTENT CREATION
+
+## 6.1 WRITING GUIDELINES
 - Write content in continuous paragraphs using varied sentence lengths for engaging prose; avoid list formatting
 - Use prose and paragraphs by default; only employ lists when explicitly requested by users
 - All writing must be highly detailed with a minimum length of several thousand words, unless user explicitly specifies length or format requirements
 - When writing based on references, actively cite original text with sources and provide a reference list with URLs at the end
 - For lengthy documents, first save each section as separate draft files, then append them sequentially to create the final document
 - During final compilation, no content should be reduced or summarized; the final length must exceed the sum of all individual draft files
+- Use flowing paragraphs rather than lists; provide detailed content with proper citations
+- Strictly follow requirements in writing rules, and avoid using list formats in any files except todo.md

- SHELL: Use efficient command chaining and avoid interactive prompts
- CODING: Save code to files before execution; implement error handling
- WRITING: Use flowing paragraphs rather than lists; provide detailed content
+# 7. COMMUNICATION & USER INTERACTION

-# TASK MANAGEMENT CYCLE
-1. STATE EVALUATION: For each decision, thoroughly examine your Todo.md to identify current priorities, analyze recent Tool Results to understand your environment, and review your past actions to maintain context. These three elements - Todo.md, Tool Results, and action history - form the critical foundation for determining your next action. Never proceed without first considering all three.
-2. TOOL SELECTION: Choose exactly one tool that advances the current todo item
-3. EXECUTION: Wait for tool execution and observe results
-4. PROGRESS TRACKING: Update todo.md with completed items and new tasks
-5. METHODICAL ITERATION: Repeat until section completion
-6. SECTION TRANSITION: Document completion and move to next section
-7. COMPLETION: IMMEDIATELY use 'complete' or 'ask' when ALL tasks are finished. NO further actions allowed after completion.
+## 7.1 CONVERSATIONAL INTERACTIONS
+For casual conversation and social interactions:
+- ALWAYS use 'ask' tool to end the conversation and wait for user input
+- NEVER use 'complete' for casual conversation
+- Keep responses friendly and natural
+- Adapt to user's communication style
+- Ask follow-up questions when appropriate
+- Show interest in user's responses

-You persist autonomously throughout this cycle until the task is fully complete. IMPORTANT: You MUST ONLY terminate execution by either:
-1. Entering 'complete' state upon task completion, or
-2. Using the 'ask' tool when user input is required
+## 7.2 COMMUNICATION PROTOCOLS
+- Message Tools Usage:
+  * Use message tools instead of direct text responses
+  * Reply immediately to new user messages before other operations
+  * First reply must be brief, confirming receipt without solutions
+  * No reply needed for system-generated events (Planner, Knowledge, Datasource)

-No other response pattern will stop the execution loop. The system will continue running you in a loop if you don't explicitly use one of these tools to signal completion or need for user input.
+- Message Types:
+  * Use 'ask' only for essential needs requiring user input
+  * Minimize blocking operations to maintain progress
+  * Provide brief explanations for method/strategy changes

-# COMPLETION RULES
-1. IMMEDIATE COMPLETION:
-   - As soon as ALL tasks in todo.md are marked [x], you MUST use 'complete' or 'ask'
-   - No additional commands or verifications are allowed after completion
-   - No further exploration or information gathering is permitted
-   - No redundant checks or validations are needed
+- Deliverables:
+  * Attach all relevant files with the 'ask' tool with 'attachments' parameter
+  * Share results and deliverables before entering complete state
+  * Ensure users have access to all necessary resources

-2. COMPLETION VERIFICATION:
-   - Verify task completion only once
-   - If all tasks are complete, immediately use 'complete' or 'ask'
-   - Do not perform additional checks after verification
-   - Do not gather more information after completion
+- Communication Tools:
+  * Use 'ask' for essential questions and clarifications
+  * Include the 'attachments' parameter with file paths or URLs when sharing resources
+  * Use 'complete' only when all tasks are finished and verified
+  * DO NOT use 'complete' unless all todo.md items are marked [x]

-3. COMPLETION TIMING:
-   - Use 'complete' or 'ask' immediately after the last task is marked [x]
-   - No delay between task completion and tool call
-   - No intermediate steps between completion and tool call
-   - No additional verifications between completion and tool call
+- Tool Results: Carefully analyze all tool execution results to inform your next actions

-4. COMPLETION CONSEQUENCES:
-   - Failure to use 'complete' or 'ask' after task completion is a critical error
-   - The system will continue running in a loop if completion is not signaled
-   - Additional commands after completion are considered errors
-   - Redundant verifications after completion are prohibited
-   
+# 8. COMPLETION PROTOCOLS
+
+## 8.1 TERMINATION RULES
+- IMMEDIATE COMPLETION:
+  * As soon as ALL tasks in todo.md are marked [x], you MUST use 'complete' or 'ask'
+  * No additional commands or verifications are allowed after completion
+  * No further exploration or information gathering is permitted
+  * No redundant checks or validations are needed
+
+- COMPLETION VERIFICATION:
+  * Verify task completion only once
+  * If all tasks are complete, immediately use 'complete' or 'ask'
+  * Do not perform additional checks after verification
+  * Do not gather more information after completion
+
+- COMPLETION TIMING:
+  * Use 'complete' or 'ask' immediately after the last task is marked [x]
+  * No delay between task completion and tool call
+  * No intermediate steps between completion and tool call
+  * No additional verifications between completion and tool call
+
+- COMPLETION CONSEQUENCES:
+  * Failure to use 'complete' or 'ask' after task completion is a critical error
+  * The system will continue running in a loop if completion is not signaled
+  * Additional commands after completion are considered errors
+  * Redundant verifications after completion are prohibited
 """

 def get_system_prompt():
--- a/backend/agent/run.py
+++ b/backend/agent/run.py
@ -21,11 +21,10 @@ async def run_agent(thread_id: str, project_id: str, stream: bool = True, thread
    
    if not thread_manager:
        thread_manager = ThreadManager()
-    
    client = await thread_manager.db.client
    ## probably want to move to api.py
    project = await client.table('projects').select('*').eq('project_id', project_id).execute()
-    if project.data[0]['sandbox_id']:
+    if project.data and project.data[0]['sandbox_id'] is not None:
        sandbox_id = project.data[0]['sandbox_id']
        sandbox_pass = project.data[0]['sandbox_pass']
        sandbox = await get_or_start_sandbox(sandbox_id)
@ -37,7 +36,6 @@ async def run_agent(thread_id: str, project_id: str, stream: bool = True, thread
            'sandbox_id': sandbox_id,
            'sandbox_pass': sandbox_pass
        }).eq('project_id', project_id).execute()
-    
    # thread_manager.add_tool(SandboxBrowseTool, sandbox_id=sandbox_id, password=sandbox_pass)
    thread_manager.add_tool(SandboxShellTool, sandbox_id=sandbox_id, password=sandbox_pass)
    thread_manager.add_tool(SandboxFilesTool, sandbox_id=sandbox_id, password=sandbox_pass)
--- a/backend/agent/tools/computer_use_tool.py
+++ b/backend/agent/tools/computer_use_tool.py
@ -0,0 +1,960 @@
+import os
+from vncdotool import api
+import time
+from typing import Optional, List, Union
+from agentpress.tool import Tool, ToolResult, openapi_schema, xml_schema
+import base64
+from PIL import Image
+import shutil
+import asyncio
+import logging
+
+KEYBOARD_KEYS = [
+    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
+    'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
+    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
+    'enter', 'esc', 'backspace', 'tab', 'space', 'delete',
+    'ctrl', 'alt', 'shift', 'win',
+    'up', 'down', 'left', 'right',
+    'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12',
+    'ctrl+c', 'ctrl+v', 'ctrl+x', 'ctrl+z', 'ctrl+a', 'ctrl+s',
+    'alt+tab', 'alt+f4', 'ctrl+alt+delete'
+]
+
+class ComputerUseTool(Tool):
+    """VNC control tool for remote desktop automation."""
+    
+    def __init__(self, host: str = 'sandbox-ip-go-here', port: int = 5900, 
+                 password: str = 'admin'):
+        """Initialize VNC tool basic attributes."""
+        super().__init__()
+        self._loop = None  # Store reference to event loop
+        self.host = host
+        self.port = port
+        self.password = password
+        self.client = None
+        self.mouse_x = 0  # Track current mouse position
+        self.mouse_y = 0
+        
+    @classmethod
+    async def create(cls, host: str = 'sandbox-ip-go-here', port: int = 5900, 
+                    password: str = 'admin'):
+        """Create and initialize a VNC tool instance."""
+        instance = cls(host, port, password)
+        await instance._connect()
+        return instance
+
+    def _get_event_loop(self) -> asyncio.AbstractEventLoop:
+        """Get or create event loop safely."""
+        try:
+            loop = asyncio.get_running_loop()
+        except RuntimeError:
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+        self._loop = loop
+        return loop
+
+    async def _cleanup(self):
+        """Clean up resources properly."""
+        # First cleanup VNC client
+        if self.client:
+            try:
+                self.client.disconnect()
+                print("Disconnected from VNC server")
+            except:
+                pass
+            self.client = None
+        
+        # Add a small delay to ensure pending operations complete
+        await asyncio.sleep(0.1)
+
+    def __del__(self):
+        """Cleanup by disconnecting from VNC server."""
+        if self._loop and not self._loop.is_closed():
+            try:
+                # Create a new event loop if needed
+                if asyncio.get_event_loop().is_closed():
+                    loop = asyncio.new_event_loop()
+                    asyncio.set_event_loop(loop)
+                # Run cleanup
+                asyncio.get_event_loop().run_until_complete(self._cleanup())
+            except:
+                pass  # Suppress errors during cleanup
+
+    async def _connect(self) -> None:
+        """Establish VNC connection with retries."""
+        max_retries = 3
+        retry_delay = 1  # Reduced from 2 to 1 second
+        
+        for attempt in range(max_retries):
+            try:
+                connection_string = f'{self.host}::{self.port}'
+                print(f"Connecting to VNC server at {connection_string} (attempt {attempt + 1}/{max_retries})...")
+                
+                self.client = api.connect(connection_string, password=self.password)
+                await asyncio.sleep(1)  # Reduced from 2 to 1 second
+                
+                screen_width = 1024
+                screen_height = 768
+                self.mouse_x = screen_width // 2
+                self.mouse_y = screen_height // 2
+                
+                # Take initial screenshot to verify connection
+                await self.get_screenshot_base64()
+                await asyncio.sleep(0.5)  # Reduced from 1 to 0.5 seconds
+                
+                print(f"Successfully connected to VNC server at {self.host}")
+                return
+                
+            except Exception as e:
+                print(f"Connection attempt {attempt + 1} failed: {str(e)}")
+                if self.client:
+                    try:
+                        self.client.disconnect()
+                    except:
+                        pass
+                    self.client = None
+                    
+                if attempt < max_retries - 1:
+                    print(f"Retrying in {retry_delay} seconds...")
+                    await asyncio.sleep(retry_delay)
+                    retry_delay *= 2
+                else:
+                    print("Max retries reached. Could not establish connection.")
+                    raise Exception(f"Failed to connect to VNC server: {str(e)}")
+
+    async def _ensure_connection(self) -> bool:
+        """Ensure VNC connection is active, reconnect if needed."""
+        if self.client is None:
+            await self._connect()
+            return self.client is not None
+
+        try:
+            return True
+        except:
+            print("Connection test failed, attempting to reconnect...")
+            self.client = None
+            return await self._ensure_connection()
+
+    def _get_current_position(self) -> tuple[int, int]:
+        """Get current mouse position from VNC client."""
+        try:
+            # Get position from client's internal state
+            return (self.client.x, self.client.y)
+        except:
+            # Fallback to tracked position if client doesn't expose position
+            return (self.mouse_x, self.mouse_y)
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "move_to",
+            "description": "Move cursor to specified position",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "x": {
+                        "type": "number",
+                        "description": "X coordinate"
+                    },
+                    "y": {
+                        "type": "number",
+                        "description": "Y coordinate"
+                    }
+                },
+                "required": ["x", "y"]
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="move-to",
+        mappings=[
+            {"param_name": "x", "node_type": "attribute", "path": "."},
+            {"param_name": "y", "node_type": "attribute", "path": "."}
+        ],
+        example='''
+        <move-to x="100" y="200">
+        </move-to>
+        '''
+    )
+    async def move_to(self, x: float, y: float) -> ToolResult:
+        """Move cursor to specified position."""
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+            
+            x_int = int(round(float(x)))
+            y_int = int(round(float(y)))
+            
+            self.client.mouseMove(x_int, y_int)
+            await asyncio.sleep(0.1)  # Reduced from 0.2 to 0.1 seconds
+            
+            self.mouse_x = x_int
+            self.mouse_y = y_int
+            
+            return ToolResult(success=True, output=f"Moved to ({x_int}, {y_int})")
+                
+        except Exception as e:
+            return ToolResult(success=False, output=f"Failed to move: {str(e)}")
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "click",
+            "description": "Click at current or specified position",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "button": {
+                        "type": "string",
+                        "description": "Mouse button to click",
+                        "enum": ["left", "right", "middle"],
+                        "default": "left"
+                    },
+                    "x": {
+                        "type": "number",
+                        "description": "Optional X coordinate"
+                    },
+                    "y": {
+                        "type": "number",
+                        "description": "Optional Y coordinate"
+                    },
+                    "num_clicks": {
+                        "type": "integer",
+                        "description": "Number of clicks",
+                        "enum": [1, 2, 3],
+                        "default": 1
+                    }
+                }
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="click",
+        mappings=[
+            {"param_name": "x", "node_type": "attribute", "path": "x"},
+            {"param_name": "y", "node_type": "attribute", "path": "y"},
+            {"param_name": "button", "node_type": "attribute", "path": "button"},
+            {"param_name": "num_clicks", "node_type": "attribute", "path": "num_clicks"}
+        ],
+        example='''
+        <click x="100" y="200" button="left" num_clicks="1">
+        </click>
+        '''
+    )
+    async def click(self, x: Optional[float] = None, y: Optional[float] = None, 
+                   button: str = "left", num_clicks: int = 1) -> ToolResult:
+        """Click at current or specified position."""
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+
+            if x is not None or y is not None:
+                x_val = x if x is not None else self.mouse_x
+                y_val = y if y is not None else self.mouse_y
+                
+                x_int = int(round(float(x_val)))
+                y_int = int(round(float(y_val)))
+                
+                move_result = await self.move_to(x_int, y_int)
+                if not move_result.success:
+                    return move_result
+
+            button_map = {"left": 1, "right": 3, "middle": 2}
+            button_num = button_map.get(button.lower(), 1)
+            num_clicks = int(num_clicks)
+
+            for click_num in range(num_clicks):
+                self.client.mouseMove(self.mouse_x, self.mouse_y)
+                await asyncio.sleep(0.05)  # Reduced from 0.1 to 0.05 seconds
+                
+                self.client.mouseDown(button_num)
+                await asyncio.sleep(0.05)  # Reduced from 0.1 to 0.05 seconds
+                self.client.mouseUp(button_num)
+                
+                if click_num < num_clicks - 1:
+                    await asyncio.sleep(0.1)  # Reduced from 0.2 to 0.1 seconds
+
+            return ToolResult(success=True, 
+                            output=f"{num_clicks} {button} click(s) performed at ({self.mouse_x}, {self.mouse_y})")
+        except Exception as e:
+            return ToolResult(success=False, output=f"Failed to click: {str(e)}")
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "scroll",
+            "description": "Scroll the mouse wheel at current position",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "amount": {
+                        "type": "integer",
+                        "description": "Scroll amount (positive for up, negative for down)",
+                        "minimum": -10,
+                        "maximum": 10
+                    }
+                },
+                "required": ["amount"]
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="scroll",
+        mappings=[
+            {"param_name": "amount", "node_type": "attribute", "path": "amount"}
+        ],
+        example='''
+        <scroll amount="-3">
+        </scroll>
+        '''
+    )
+    async def scroll(self, amount: int) -> ToolResult:
+        """
+        Scroll the mouse wheel at current position.
+        Positive values scroll up, negative values scroll down.
+        """
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+
+            # Convert and validate amount
+            try:
+                amount = int(float(amount))  # Handle both string and float inputs
+                amount = max(-10, min(10, amount))  # Clamp between -10 and 10
+                logging.info(f"Scrolling with amount: {amount}")
+            except (ValueError, TypeError) as e:
+                logging.error(f"Invalid scroll amount: {amount}")
+                return ToolResult(success=False, output=f"Invalid scroll amount: {str(e)}")
+
+            # Use tracked mouse position
+            x, y = self.mouse_x, self.mouse_y
+            
+            # Ensure we're at the right position
+            self.client.mouseMove(x, y)
+            await asyncio.sleep(0.2)  # Wait for move to complete
+            
+            # Determine scroll direction and steps
+            steps = abs(amount)
+            button = 4 if amount > 0 else 5  # 4 = up, 5 = down
+            
+            # Perform scroll actions with longer delays
+            for _ in range(steps):
+                # Verify position before each scroll
+                self.client.mouseMove(x, y)
+                await asyncio.sleep(0.1)
+                
+                # Send wheel event with longer press duration
+                self.client.mouseDown(button)
+                await asyncio.sleep(0.1)  # Hold button longer
+                self.client.mouseUp(button)
+                await asyncio.sleep(0.2)  # Wait between scrolls
+
+            direction = "up" if amount > 0 else "down"
+            return ToolResult(success=True, 
+                            output=f"Scrolled {direction} {steps} step(s) at position ({x}, {y})")
+        except Exception as e:
+            logging.error(f"Scroll failed: {str(e)}")
+            return ToolResult(success=False, output=f"Failed to scroll: {str(e)}")
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "typing",
+            "description": "Type specified text",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "text": {
+                        "type": "string",
+                        "description": "Text to type"
+                    }
+                },
+                "required": ["text"]
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="typing",
+        mappings=[
+            {"param_name": "text", "node_type": "content", "path": "text"}
+        ],
+        example='''
+        <typing>Hello World!</typing>
+        '''
+    )
+    async def typing(self, text: str) -> ToolResult:
+        """Type specified text."""
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+            
+            text = str(text)
+            
+            char_mapping = {
+                '!': ['shift', '1'],
+                '@': ['shift', '2'],
+                '#': ['shift', '3'],
+                '$': ['shift', '4'],
+                '%': ['shift', '5'],
+                '^': ['shift', '6'],
+                '&': ['shift', '7'],
+                '*': ['shift', '8'],
+                '(': ['shift', '9'],
+                ')': ['shift', '0'],
+                '_': ['shift', '-'],
+                '+': ['shift', '='],
+                '?': ['shift', '/'],
+                '"': ['shift', "'"],
+                '<': ['shift', ','],
+                '>': ['shift', '.'],
+                '{': ['shift', '['],
+                '}': ['shift', ']'],
+                '|': ['shift', '\\'],
+                '~': ['shift', '`'],
+                ':': ['shift', ';'],
+            }
+            
+            for char in text:
+                if char in char_mapping:
+                    self.client.keyDown('shift')
+                    await asyncio.sleep(0.02)  # Reduced from 0.05 to 0.02 seconds
+                    self.client.keyPress(char_mapping[char][1])
+                    await asyncio.sleep(0.02)  # Reduced from 0.05 to 0.02 seconds
+                    self.client.keyUp('shift')
+                elif char.isupper():
+                    self.client.keyDown('shift')
+                    await asyncio.sleep(0.02)  # Reduced from 0.05 to 0.02 seconds
+                    self.client.keyPress(char.lower())
+                    await asyncio.sleep(0.02)  # Reduced from 0.05 to 0.02 seconds
+                    self.client.keyUp('shift')
+                else:
+                    self.client.keyPress(char)
+                    await asyncio.sleep(0.02)  # Reduced from 0.05 to 0.02 seconds
+                
+            return ToolResult(success=True, output=f"Typed: {text}")
+        except Exception as e:
+            return ToolResult(success=False, output=f"Failed to type: {str(e)}")
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "press",
+            "description": "Press and release a key",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "key": {
+                        "type": "string",
+                        "description": "Key to press",
+                        "enum": KEYBOARD_KEYS
+                    }
+                },
+                "required": ["key"]
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="press",
+        mappings=[
+            {"param_name": "key", "node_type": "attribute", "path": "key"}
+        ],
+        example='''
+        <press key="enter">
+        </press>
+        '''
+    )
+    async def press(self, key: str) -> ToolResult:
+        """Press and release a key."""
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+            
+            key = str(key).lower()  # Normalize key name
+            if key not in KEYBOARD_KEYS:
+                logging.error(f"Invalid key: {key}")
+                return ToolResult(success=False, output=f"Invalid key: {key}")
+                
+            logging.info(f"Pressing key: {key}")
+            self.client.keyPress(key)
+            return ToolResult(success=True, output=f"Pressed key: {key}")
+        except Exception as e:
+            logging.error(f"Key press failed: {str(e)}")
+            return ToolResult(success=False, output=f"Failed to press key: {str(e)}")
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "wait",
+            "description": "Wait for specified duration",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "duration": {
+                        "type": "number",
+                        "description": "Duration in seconds",
+                        "default": 0.5
+                    }
+                }
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="wait",
+        mappings=[
+            {"param_name": "duration", "node_type": "attribute", "path": "duration"}
+        ],
+        example='''
+        <wait duration="1.5">
+        </wait>
+        '''
+    )
+    async def wait(self, duration: float = 0.5) -> ToolResult:
+        """Wait for specified duration."""
+        try:
+            # Convert and validate duration
+            try:
+                duration = float(duration)
+                duration = max(0, min(10, duration))  # Clamp between 0 and 10 seconds
+                logging.info(f"Waiting for {duration} seconds")
+            except (ValueError, TypeError) as e:
+                logging.error(f"Invalid duration: {duration}")
+                return ToolResult(success=False, output=f"Invalid duration: {str(e)}")
+            
+            await asyncio.sleep(duration)
+            return ToolResult(success=True, output=f"Waited {duration} seconds")
+        except Exception as e:
+            logging.error(f"Wait failed: {str(e)}")
+            return ToolResult(success=False, output=f"Failed to wait: {str(e)}")
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "mouse_down",
+            "description": "Press a mouse button",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "button": {
+                        "type": "string",
+                        "description": "Mouse button to press",
+                        "enum": ["left", "right", "middle"],
+                        "default": "left"
+                    }
+                }
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="mouse-down",
+        mappings=[
+            {"param_name": "button", "node_type": "attribute", "path": "button"}
+        ],
+        example='''
+        <mouse-down button="left">
+        </mouse-down>
+        '''
+    )
+    async def mouse_down(self, button: str = "left", x: Optional[float] = None, y: Optional[float] = None) -> ToolResult:
+        """Press a mouse button at current or specified position."""
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+            
+            # If coordinates are provided, move there first
+            if x is not None and y is not None:
+                try:
+                    x_int = int(round(float(x)))  # Convert to float first, then round and convert to int
+                    y_int = int(round(float(y)))  # Convert to float first, then round and convert to int
+                    logging.debug(f"Moving to press position: ({x_int}, {y_int})")
+                    print(f"[Debug] Moving to press position: ({x_int}, {y_int})")
+                    move_result = await self.move_to(x_int, y_int)
+                    if not move_result.success:
+                        return move_result
+                except (ValueError, TypeError) as e:
+                    logging.error(f"Invalid coordinates: x={x}, y={y}")
+                    return ToolResult(success=False, output=f"Invalid coordinates: {str(e)}")
+            
+            button = str(button).lower()  # Normalize button name
+            button_map = {"left": 1, "right": 3, "middle": 2}
+            
+            if button not in button_map:
+                return ToolResult(success=False, output=f"Invalid button: {button}")
+            
+            self.client.mouseDown(button_map[button])
+            return ToolResult(success=True, output=f"{button} button pressed at ({self.mouse_x}, {self.mouse_y})")
+            
+        except Exception as e:
+            logging.error(f"Mouse down failed: {str(e)}")
+            return ToolResult(success=False, output=f"Failed to press button: {str(e)}")
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "mouse_up",
+            "description": "Release a mouse button",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "button": {
+                        "type": "string",
+                        "description": "Mouse button to release",
+                        "enum": ["left", "right", "middle"],
+                        "default": "left"
+                    }
+                }
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="mouse-up",
+        mappings=[
+            {"param_name": "button", "node_type": "attribute", "path": "button"}
+        ],
+        example='''
+        <mouse-up button="left">
+        </mouse-up>
+        '''
+    )
+    async def mouse_up(self, button: str = "left", x: Optional[float] = None, y: Optional[float] = None) -> ToolResult:
+        """Release a mouse button at current or specified position."""
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+            
+            # If coordinates are provided, move there first
+            if x is not None and y is not None:
+                try:
+                    x_int = int(round(float(x)))  # Convert to float first, then round and convert to int
+                    y_int = int(round(float(y)))  # Convert to float first, then round and convert to int
+                    logging.debug(f"Moving to release position: ({x_int}, {y_int})")
+                    move_result = await self.move_to(x_int, y_int)
+                    if not move_result.success:
+                        return move_result
+                except (ValueError, TypeError) as e:
+                    logging.error(f"Invalid coordinates: x={x}, y={y}")
+                    return ToolResult(success=False, output=f"Invalid coordinates: {str(e)}")
+            
+            button = str(button).lower()  # Normalize button name
+            button_map = {"left": 1, "right": 3, "middle": 2}
+            
+            if button not in button_map:
+                return ToolResult(success=False, output=f"Invalid button: {button}")
+            
+            self.client.mouseUp(button_map[button])
+            return ToolResult(success=True, output=f"{button} button released at ({self.mouse_x}, {self.mouse_y})")
+            
+        except Exception as e:
+            logging.error(f"Mouse up failed: {str(e)}")
+            return ToolResult(success=False, output=f"Failed to release button: {str(e)}")
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "drag_to",
+            "description": "Drag cursor to specified position",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "x": {
+                        "type": "number",
+                        "description": "Target X coordinate"
+                    },
+                    "y": {
+                        "type": "number",
+                        "description": "Target Y coordinate"
+                    }
+                },
+                "required": ["x", "y"]
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="drag-to",
+        mappings=[
+            {"param_name": "x", "node_type": "attribute", "path": "x"},
+            {"param_name": "y", "node_type": "attribute", "path": "y"}
+        ],
+        example='''
+        <drag-to x="500" y="50">
+        </drag-to>
+        '''
+    )
+    async def drag_to(self, x: float, y: float) -> ToolResult:
+        """Click and drag from current position to target position."""
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+
+            target_x = int(round(float(x)))
+            target_y = int(round(float(y)))
+            start_x = int(round(float(self.mouse_x)))
+            start_y = int(round(float(self.mouse_y)))
+            
+            steps = 20  # Reduced from 40 to 20 steps for faster movement
+            for i in range(1, steps + 1):
+                current_x = int(round(start_x + ((target_x - start_x) * i / steps)))
+                current_y = int(round(start_y + ((target_y - start_y) * i / steps)))
+                
+                self.client.mouseMove(current_x, current_y)
+                self.mouse_x = current_x
+                self.mouse_y = current_y
+                await asyncio.sleep(0.02)  # Reduced from 0.05 to 0.02 seconds
+            
+            self.client.mouseMove(target_x, target_y)
+            self.mouse_x = target_x
+            self.mouse_y = target_y
+            await asyncio.sleep(0.2)  # Reduced from 0.5 to 0.2 seconds
+
+            return ToolResult(success=True, 
+                            output=f"Dragged from ({start_x}, {start_y}) to ({target_x}, {target_y})")
+                
+        except Exception as e:
+            return ToolResult(success=False, output=f"Failed to drag: {str(e)}")
+
+    async def get_screen_size(self) -> tuple[int, int]:
+        """Get the VNC screen dimensions."""
+        try:
+            if not await self._ensure_connection():
+                return (0, 0)
+            
+            # Capture temporary screenshot to get dimensions
+            temp_filename = "temp_screenshot.png"
+            try:
+                self.client.captureScreen(temp_filename)
+                with Image.open(temp_filename) as img:
+                    width, height = img.size
+                    return (width, height)
+            finally:
+                if os.path.exists(temp_filename):
+                    os.remove(temp_filename)
+                    
+        except Exception as e:
+            print(f"Failed to get screen size: {str(e)}")
+            return (0, 0)
+
+    async def get_screenshot_base64(self) -> Optional[dict]:
+        """Capture screen and return as base64 encoded image."""
+        try:
+            if not await self._ensure_connection():
+                return None
+            
+            screenshots_dir = "screenshots"
+            if not os.path.exists(screenshots_dir):
+                os.makedirs(screenshots_dir)
+            
+            timestamp = time.strftime("%Y%m%d_%H%M%S")
+            temp_filename = os.path.join(screenshots_dir, f"temp_{timestamp}.png")
+            latest_filename = "latest_screenshot.png"
+            timestamped_filename = os.path.join(screenshots_dir, f"screenshot_{timestamp}.png")
+            
+            try:
+                await asyncio.sleep(1)  # Reduced from 3 to 1 second
+                
+                self.client.captureScreen(temp_filename)
+                
+                timeout = 3  # Reduced from 5 to 3 seconds
+                start_wait_time = time.time()
+                while not os.path.exists(temp_filename) or os.path.getsize(temp_filename) == 0:
+                    if time.time() - start_wait_time > timeout:
+                        raise Exception("Screenshot capture timeout")
+                    await asyncio.sleep(0.05)  # Reduced from 0.1 to 0.05 seconds
+                
+                shutil.copy2(temp_filename, latest_filename)
+                shutil.copy2(temp_filename, timestamped_filename)
+                
+                with open(temp_filename, 'rb') as img_file:
+                    img_data = img_file.read()
+                    if len(img_data) == 0:
+                        raise Exception("Empty screenshot file")
+                        
+                    base64_str = base64.b64encode(img_data).decode('utf-8')
+                    return {
+                        "content_type": "image/png",
+                        "base64": base64_str,
+                        "timestamp": timestamp,
+                        "filename": timestamped_filename
+                    }
+                    
+            finally:
+                if os.path.exists(temp_filename):
+                    try:
+                        os.remove(temp_filename)
+                    except:
+                        pass
+                    
+        except Exception as e:
+            print(f"[Screenshot] Error during screenshot process: {str(e)}")
+            return None
+
+    @openapi_schema({
+        "type": "function",
+        "function": {
+            "name": "hotkey",
+            "description": "Press a key combination",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "keys": {
+                        "type": "string",
+                        "description": "Key combination to press",
+                        "enum": KEYBOARD_KEYS
+                    }
+                },
+                "required": ["keys"]
+            }
+        }
+    })
+    @xml_schema(
+        tag_name="hotkey",
+        mappings=[
+            {"param_name": "keys", "node_type": "attribute", "path": "keys"}
+        ],
+        example='''
+        <hotkey keys="ctrl+a">
+        </hotkey>
+        '''
+        )    
+    async def hotkey(self, keys: str) -> ToolResult:
+        """Press a key combination."""
+        try:
+            if not await self._ensure_connection():
+                return ToolResult(success=False, output="Failed to establish VNC connection")
+
+            keys = str(keys).lower().strip()
+            key_sequence = keys.split('+')
+            
+            for key in key_sequence[:-1]:
+                self.client.keyDown(key)
+                await asyncio.sleep(0.02)  # Reduced from 0.1 to 0.02 seconds
+
+            self.client.keyPress(key_sequence[-1])
+            await asyncio.sleep(0.02)  # Reduced from 0.1 to 0.02 seconds
+
+            for key in reversed(key_sequence[:-1]):
+                self.client.keyUp(key)
+                await asyncio.sleep(0.02)  # Reduced from 0.1 to 0.02 seconds
+
+            return ToolResult(success=True, output=f"Pressed key combination: {keys}")
+
+        except Exception as e:
+            logging.error(f"Hotkey failed: {str(e)}")
+            return ToolResult(success=False, output=f"Failed to press keys: {str(e)}")
+
+if __name__ == "__main__":
+    import asyncio
+    
+    async def test_vnc_tool():
+        vnc = None
+        try:
+            # Initialize the VNC tool with connection details
+            print("Initializing VNC Tool...")
+            # vnc = await ComputerUseTool.create(host='172.202.112.205', password='admin')
+            vnc = await ComputerUseTool.create(host='192.168.1.5', password='admin', port=3859)
+
+            # Test Ctrl+Alt+Delete
+            # print("\nTesting Ctrl+Alt+Delete...")
+            # await vnc.hotkey("ctrl+alt+delete")
+            # await vnc.wait(2)  # Give some time to observe the effect
+            # print("\nCtrl+Alt+Delete test completed!")
+
+            # print("\nTesting Ctrl+A...")
+            # await vnc.hotkey("ctrl+a")
+            # await vnc.wait(2)  # Give some time to observe the effect
+            # await vnc.hotkey("left")
+            # print("\nCtrl+A test completed!")
+
+            # screenshot = await vnc.get_screenshot_base64()
+            #
+            # # Test clicking and dragging the Rumble logo to URL bar
+            # print("\nTesting click and drag of Rumble logo...")
+            # 
+            # # Move to Rumble logo position
+            # await vnc.move_to(160, 100)
+            # await vnc.wait(0.5)
+            # 
+            # # Click and hold the logo
+            # await vnc.mouse_down(button="left") 
+            # await vnc.wait(0.5)
+            # 
+            # # Drag to URL bar position
+            # await vnc.drag_to(200, 50)
+            # await vnc.wait(0.5)
+            # 
+            # # Release the mouse button
+            # await vnc.mouse_up(button="left")
+            # 
+            # print("Completed drag and drop test")
+            # 
+            # Test mouse movement and clicking
+            # print("\nTesting mouse movement and clicking...")
+            # await vnc.move_to(568, 497)
+            # await vnc.wait(0.5)
+            # result = await vnc.click(button="left")
+            # print(f"Click result: {result.output}")
+            # screenshot = await vnc.get_screenshot_base64()
+            
+            # # Test basic mouse movement
+            # print("\nTesting mouse movement...")
+            # result = await vnc.move_to(475, 100)
+            # print(f"Move result: {result.output}")
+            # screenshot = await vnc.get_screenshot_base64()
+                                
+            # # Test clicking
+            # print("\nTesting mouse clicks...")
+            # result = await vnc.click(button="left")
+            # print(f"Click result: {result.output}")
+            # screenshot = await vnc.get_screenshot_base64()
+            
+            # # Test typing
+            print("\nTesting keyboard typing...")
+            result = await vnc.typing("Hello World!")
+            print(f"Typing result: {result.output}")
+            # screenshot = await vnc.get_screenshot_base64()
+            
+            # # Test key press
+            # print("\nTesting key press...")
+            # result = await vnc.press("enter")
+            # print(f"Key press result: {result.output}")
+            # screenshot = await vnc.get_screenshot_base64()
+            
+            # # Test scrolling
+            # print("\nTesting scrolling...")
+            
+            # # Move to a specific position first (e.g., middle of screen)
+            # await vnc.move_to(500, 400)
+            # await vnc.wait(0.5)
+            
+            # # Scroll down
+            # result = await vnc.scroll(amount=-3)
+            # print(f"Scroll down result: {result.output}")
+            # await vnc.wait(1)
+            
+            # # Scroll up
+            # result = await vnc.scroll(amount=3)
+            # print(f"Scroll up result: {result.output}")
+            
+            # # First move to start position
+            # await vnc.move_to(475, 200)
+            # await vnc.wait(0.2)
+            
+            # # Perform drag to target
+            # result = await vnc.drag_to(500, 50)
+            # print(f"Drag result: {result.output}")
+                  
+            # screenshot = await vnc.get_screenshot_base64()
+            # print("\nAll tests completed successfully!")
+            
+        except Exception as e:
+            print(f"Test error: {e}")
+        finally:
+            if vnc:
+                print("\nCleaning up...")
+                # Add a small delay before cleanup
+                await asyncio.sleep(0.1)
+                await vnc._cleanup()
+                # Ensure we close the event loop properly
+                await asyncio.get_event_loop().shutdown_asyncgens()
+
+    # Run the test
+    asyncio.run(test_vnc_tool()) 
--- a/backend/agent/tools/deep_search.py
+++ b/backend/agent/tools/deep_search.py
--- a/backend/agent/tools/sb_shell_tool.py
+++ b/backend/agent/tools/sb_shell_tool.py
@ -151,9 +151,7 @@ class SandboxShellTool(SandboxToolsBase):
                return self.success_response({
                    "output": logs,
                    "exit_code": response.exit_code,
-                    "cwd": cwd,
-                    "session_id": session_id,
-                    "command_id": response.cmd_id
+                    "cwd": cwd
                })
            else:
                error_msg = f"Command failed with exit code {response.exit_code}"
--- a/backend/agent/tools/web_search_tool.py
+++ b/backend/agent/tools/web_search_tool.py
@ -57,15 +57,10 @@ class ExaWebSearchTool(Tool):
                        },
                        "description": "A list of terms that must be excluded from the results"
                    },
-                    # "livecrawl": {
-                    #     "type": "string",
-                    #     "description": "Whether to perform a live crawl - 'always', 'fallback', or 'never'",
-                    #     "default": "always"
-                    # },
                    "num_results": {
                        "type": "integer",
                        "description": "The number of results to return",
-                        "default": 10
+                        "default": 20
                    },
                    "type": {
                        "type": "string",
@ -88,7 +83,6 @@ class ExaWebSearchTool(Tool):
            {"param_name": "end_crawl_date", "node_type": "attribute", "path": "."},
            {"param_name": "include_text", "node_type": "attribute", "path": "."},
            {"param_name": "exclude_text", "node_type": "attribute", "path": "."},
-            # {"param_name": "livecrawl", "node_type": "attribute", "path": "."},
            {"param_name": "num_results", "node_type": "attribute", "path": "."},
            {"param_name": "type", "node_type": "attribute", "path": "."}
        ],
@ -98,7 +92,7 @@ class ExaWebSearchTool(Tool):
            summary="true" 
            include_text="important term"
            exclude_text="unwanted term"
-            num_results="10" 
+            num_results="20" 
            type="auto">
        </web-search>
        '''
@ -113,8 +107,7 @@ class ExaWebSearchTool(Tool):
        end_crawl_date: Optional[str] = None,
        include_text: Optional[List[str]] = None,
        exclude_text: Optional[List[str]] = None,
-        # livecrawl: str = "always",
-        num_results: int = 10,
+        num_results: int = 20,
        type: str = "auto"
    ) -> ToolResult:
        """
@ -129,7 +122,7 @@ class ExaWebSearchTool(Tool):
        - end_crawl_date: Optional end date for crawled results (ISO format)
        - include_text: List of terms that must be included in the results
        - exclude_text: List of terms that must be excluded from the results
-        - num_results: The number of results to return (default: 10)
+        - num_results: The number of results to return (default: 20)
        - type: The type of search to perform - 'auto', 'keyword', or 'neural' (default: 'auto')
        """
        try:
@ -148,8 +141,6 @@ class ExaWebSearchTool(Tool):
                params["include_text"] = include_text
            if exclude_text:
                params["exclude_text"] = exclude_text
-            # if livecrawl:
-            #     params["livecrawl"] = livecrawl
            if type:
                params["type"] = type
            
@ -174,7 +165,7 @@ if __name__ == "__main__":
        result = await search_tool.web_search(
            query="rubber gym mats best prices comparison",
            summary=False,
-            num_results=10
+            num_results=20
        )
        print(result)
        
--- a/backend/agentpress/response_processor.py
+++ b/backend/agentpress/response_processor.py
@ -118,6 +118,9 @@ class ResponseProcessor:
        # For tracking pending tool executions
        pending_tool_executions = []
        
+        # Set to track already yielded tool results by their index
+        yielded_tool_indices = set()
+        
        # Tool index counter for tracking all tool executions
        tool_index = 0
        
@ -130,6 +133,7 @@ class ResponseProcessor:
        # logger.debug(f"Starting to process streaming response for thread {thread_id}")
        logger.info(f"Config: XML={config.xml_tool_calling}, Native={config.native_tool_calling}, " 
                   f"Execute on stream={config.execute_on_stream}, Execution strategy={config.tool_execution_strategy}")
+        logger.info(f"Avoiding duplicate tool results using tracking mechanism")
        
        # if config.max_xml_tool_calls > 0:
        #     logger.info(f"XML tool call limit enabled: {config.max_xml_tool_calls}")
@ -313,59 +317,6 @@ class ResponseProcessor:
                    logger.info("Stopping stream due to XML tool call limit")
                    break
            
-                # Check for completed tool executions
-                completed_executions = []
-                for i, execution in enumerate(pending_tool_executions):
-                    if execution["task"].done():
-                        try:
-                            # Get the result
-                            result = execution["task"].result()
-                            tool_call = execution["tool_call"]
-                            tool_index = execution.get("tool_index", -1)
-                            
-                            # Store result for later database updates
-                            tool_results_buffer.append((tool_call, result))
-                            
-                            # Get or create the context
-                            if "context" in execution:
-                                context = execution["context"]
-                                context.result = result
-                            else:
-                                context = self._create_tool_context(tool_call, tool_index)
-                                context.result = result
-                            
-                            # Yield tool status message first
-                            yield self._yield_tool_completed(context)
-                            
-                            # Yield tool execution result
-                            yield self._yield_tool_result(context)
-                            
-                            # Mark for removal
-                            completed_executions.append(i)
-                            
-                        except Exception as e:
-                            logger.error(f"Error getting tool execution result: {str(e)}")
-                            tool_call = execution["tool_call"]
-                            tool_index = execution.get("tool_index", -1)
-                            
-                            # Get or create the context
-                            if "context" in execution:
-                                context = execution["context"]
-                                context.error = e
-                            else:
-                                context = self._create_tool_context(tool_call, tool_index)
-                                context.error = e
-                            
-                            # Yield error status for the tool
-                            yield self._yield_tool_error(context)
-                            
-                            # Mark for removal
-                            completed_executions.append(i)
-                
-                # Remove completed executions from pending list (in reverse to maintain indices)
-                for i in sorted(completed_executions, reverse=True):
-                    pending_tool_executions.pop(i)
-            
            # After streaming completes or is stopped due to limit, wait for any remaining tool executions
            if pending_tool_executions:
                logger.info(f"Waiting for {len(pending_tool_executions)} pending tool executions to complete")
@ -383,7 +334,7 @@ class ResponseProcessor:
                            tool_index = execution.get("tool_index", -1)
                            
                            # Store result for later
-                            tool_results_buffer.append((tool_call, result))
+                            tool_results_buffer.append((tool_call, result, tool_index))
                            
                            # Get or create the context
                            if "context" in execution:
@ -393,17 +344,31 @@ class ResponseProcessor:
                                context = self._create_tool_context(tool_call, tool_index)
                                context.result = result
                            
+                            # Skip yielding if already yielded during streaming
+                            if tool_index in yielded_tool_indices:
+                                logger.info(f"Skipping duplicate yield for tool index {tool_index}")
+                                continue
+                                
                            # Yield tool status message first
                            yield self._yield_tool_completed(context)
                            
                            # Yield tool execution result
                            yield self._yield_tool_result(context)
+                            
+                            # Track that we've yielded this tool result
+                            yielded_tool_indices.add(tool_index)
                    except Exception as e:
                        logger.error(f"Error processing remaining tool execution: {str(e)}")
                        # Yield error status for the tool
                        if "tool_call" in execution:
                            tool_call = execution["tool_call"]
                            tool_index = execution.get("tool_index", -1)
+                            
+                            # Skip yielding if already yielded during streaming
+                            if tool_index in yielded_tool_indices:
+                                logger.info(f"Skipping duplicate yield for remaining tool error index {tool_index}")
+                                continue
+                                
                            # Get or create the context
                            if "context" in execution:
                                context = execution["context"]
@ -411,14 +376,12 @@ class ResponseProcessor:
                            else:
                                context = self._create_tool_context(tool_call, tool_index)
                                context.error = e
-                            formatted_result = self._format_xml_tool_result(tool_call, result)
-                            yield {
-                                "type": "tool_result",
-                                "function_name": context.function_name,
-                                "xml_tag_name": context.xml_tag_name,
-                                "result": formatted_result,
-                                "tool_index": tool_index
-                            }
+                                
+                            # Yield error status for the tool
+                            yield self._yield_tool_error(context)
+                            
+                            # Track that we've yielded this tool error
+                            yielded_tool_indices.add(tool_index)
            
            # If stream was stopped due to XML limit, report custom finish reason
            if finish_reason == "xml_tool_limit_reached":
@ -464,9 +427,9 @@ class ResponseProcessor:
                    is_llm_message=True
                )
                
-                # Now add all buffered tool results AFTER the assistant message
-                for tool_call, result in tool_results_buffer:
-                    # Add result based on tool type
+                # Now add all buffered tool results AFTER the assistant message, but don't yield if already yielded
+                for tool_call, result, result_tool_index in tool_results_buffer:
+                    # Add result based on tool type to the conversation history
                    await self._add_tool_result(
                        thread_id, 
                        tool_call, 
@ -474,8 +437,13 @@ class ResponseProcessor:
                        config.xml_adding_strategy
                    )
                    
+                    # We don't need to yield again for tools that were already yielded during streaming
+                    if result_tool_index in yielded_tool_indices:
+                        logger.info(f"Skipping duplicate yield for tool index {result_tool_index}")
+                        continue
+                        
                    # Create context for tool result
-                    context = self._create_tool_context(tool_call, tool_index)
+                    context = self._create_tool_context(tool_call, result_tool_index)
                    context.result = result
                    
                    # Yield tool execution result
@ -575,6 +543,8 @@ class ResponseProcessor:
            tool_index = 0
            # XML tool call counter
            xml_tool_call_count = 0
+            # Set to track yielded tool results
+            yielded_tool_indices = set()
            
            # Extract finish_reason if available
            finish_reason = None
@ -667,6 +637,9 @@ class ResponseProcessor:
                    # Yield tool execution result
                    yield self._yield_tool_result(context)
                    
+                    # Track that we've yielded this tool result
+                    yielded_tool_indices.add(tool_index)
+                    
                    # Increment tool index for next tool
                    tool_index += 1
            
--- a/backend/poetry.lock
+++ b/backend/poetry.lock
@ -104,7 +104,6 @@ files = [
 [package.dependencies]
 aiohappyeyeballs = ">=2.3.0"
 aiosignal = ">=1.1.2"
-async-timeout = {version = ">=4.0,<6.0", markers = "python_version < \"3.11\""}
 attrs = ">=17.3.0"
 frozenlist = ">=1.1.1"
 multidict = ">=4.5,<7.0"
@ -173,10 +172,8 @@ files = [
 ]

 [package.dependencies]
-exceptiongroup = {version = ">=1.0.2", markers = "python_version < \"3.11\""}
 idna = ">=2.8"
 sniffio = ">=1.1"
-typing-extensions = {version = ">=4.1", markers = "python_version < \"3.11\""}

 [package.extras]
 doc = ["Sphinx (>=7.4,<8.0)", "packaging", "sphinx-autodoc-typehints (>=1.2.0)", "sphinx-rtd-theme"]
@ -226,6 +223,20 @@ docs = ["cogapp", "furo", "myst-parser", "sphinx", "sphinx-notfound-page", "sphi
 tests = ["cloudpickle", "hypothesis", "mypy (>=1.11.1)", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
 tests-mypy = ["mypy (>=1.11.1)", "pytest-mypy-plugins"]

+[[package]]
+name = "automat"
+version = "24.8.1"
+description = "Self-service finite-state machines for the programmer on the go."
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "Automat-24.8.1-py3-none-any.whl", hash = "sha256:bf029a7bc3da1e2c24da2343e7598affaa9f10bf0ab63ff808566ce90551e02a"},
+    {file = "automat-24.8.1.tar.gz", hash = "sha256:b34227cf63f6325b8ad2399ede780675083e439b20c323d376373d8ee6306d88"},
+]
+
+[package.extras]
+visualize = ["Twisted (>=16.1.1)", "graphviz (>0.5.1)"]
+
 [[package]]
 name = "blinker"
 version = "1.8.2"
@ -436,6 +447,17 @@ files = [
    {file = "colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44"},
 ]

+[[package]]
+name = "constantly"
+version = "23.10.4"
+description = "Symbolic constants in Python"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "constantly-23.10.4-py3-none-any.whl", hash = "sha256:3fd9b4d1c3dc1ec9757f3c52aef7e53ad9323dbe39f51dfd4c43853b68dfa3f9"},
+    {file = "constantly-23.10.4.tar.gz", hash = "sha256:aa92b70a33e2ac0bb33cd745eb61776594dc48764b06c35e0efd050b7f1c7cbd"},
+]
+
 [[package]]
 name = "daytona-api-client"
 version = "0.15.0"
@ -605,20 +627,6 @@ pytest-mock = ">=3.14.0,<4.0.0"
 requests = ">=2.32.3,<3.0.0"
 typing-extensions = ">=4.12.2,<5.0.0"

-[[package]]
-name = "exceptiongroup"
-version = "1.2.2"
-description = "Backport of PEP 654 (exception groups)"
-optional = false
-python-versions = ">=3.7"
-files = [
-    {file = "exceptiongroup-1.2.2-py3-none-any.whl", hash = "sha256:3111b9d131c238bec2f8f516e123e14ba243563fb135d3fe885990585aa7795b"},
-    {file = "exceptiongroup-1.2.2.tar.gz", hash = "sha256:47c2edf7c6738fafb49fd34290706d1a1a2f4d1c6df275526b62cbb4aa5393cc"},
-]
-
-[package.extras]
-test = ["pytest (>=6)"]
-
 [[package]]
 name = "fastapi"
 version = "0.110.0"
@ -971,6 +979,20 @@ files = [
    {file = "hyperframe-6.1.0.tar.gz", hash = "sha256:f630908a00854a7adeabd6382b43923a4c4cd4b821fcb527e6ab9e15382a3b08"},
 ]

+[[package]]
+name = "hyperlink"
+version = "21.0.0"
+description = "A featureful, immutable, and correct URL for Python."
+optional = false
+python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
+files = [
+    {file = "hyperlink-21.0.0-py2.py3-none-any.whl", hash = "sha256:e6b14c37ecb73e89c77d78cdb4c2cc8f3fb59a885c5b3f819ff4ed80f25af1b4"},
+    {file = "hyperlink-21.0.0.tar.gz", hash = "sha256:427af957daa58bc909471c6c40f74c5450fa123dd093fc53efd2e91d2705a56b"},
+]
+
+[package.dependencies]
+idna = ">=2.5"
+
 [[package]]
 name = "idna"
 version = "3.10"
@ -1008,6 +1030,23 @@ perf = ["ipython"]
 test = ["flufl.flake8", "importlib-resources (>=1.3)", "jaraco.test (>=5.4)", "packaging", "pyfakefs", "pytest (>=6,!=8.1.*)", "pytest-perf (>=0.9.2)"]
 type = ["pytest-mypy"]

+[[package]]
+name = "incremental"
+version = "24.7.2"
+description = "A small library that versions your Python projects."
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "incremental-24.7.2-py3-none-any.whl", hash = "sha256:8cb2c3431530bec48ad70513931a760f446ad6c25e8333ca5d95e24b0ed7b8fe"},
+    {file = "incremental-24.7.2.tar.gz", hash = "sha256:fb4f1d47ee60efe87d4f6f0ebb5f70b9760db2b2574c59c8e8912be4ebd464c9"},
+]
+
+[package.dependencies]
+setuptools = ">=61.0"
+
+[package.extras]
+scripts = ["click (>=6.0)"]
+
 [[package]]
 name = "iniconfig"
 version = "2.0.0"
@ -1382,9 +1421,6 @@ files = [
    {file = "multidict-6.1.0.tar.gz", hash = "sha256:22ae2ebf9b0c69d206c003e2f6a914ea33f0a932d4aa16f236afc049d9958f4a"},
 ]

-[package.dependencies]
-typing-extensions = {version = ">=4.1.0", markers = "python_version < \"3.11\""}
-
 [[package]]
 name = "nest-asyncio"
 version = "1.6.0"
@ -1551,7 +1587,6 @@ files = [

 [package.dependencies]
 numpy = [
-    {version = ">=1.22.4", markers = "python_version < \"3.11\""},
    {version = ">=1.23.2", markers = "python_version == \"3.11\""},
    {version = ">=1.26.0", markers = "python_version >= \"3.12\""},
 ]
@ -1706,7 +1741,6 @@ files = [
 deprecation = ">=2.1.0,<3.0.0"
 httpx = {version = ">=0.26,<0.29", extras = ["http2"]}
 pydantic = ">=1.9,<3.0"
-strenum = {version = ">=0.4.9,<0.5.0", markers = "python_version < \"3.11\""}

 [[package]]
 name = "prisma"
@ -1726,7 +1760,6 @@ jinja2 = ">=2.11.2"
 nodeenv = "*"
 pydantic = ">=1.10.0,<3"
 python-dotenv = ">=0.12.0"
-StrEnum = {version = "*", markers = "python_version < \"3.11\""}
 tomlkit = "*"
 typing-extensions = ">=4.5.0"

@ -1929,6 +1962,44 @@ files = [
 [package.extras]
 test = ["cffi", "hypothesis", "pandas", "pytest", "pytz"]

+[[package]]
+name = "pycryptodomex"
+version = "3.22.0"
+description = "Cryptographic library for Python"
+optional = false
+python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,>=2.7"
+files = [
+    {file = "pycryptodomex-3.22.0-cp27-cp27m-macosx_10_9_x86_64.whl", hash = "sha256:41673e5cc39a8524557a0472077635d981172182c9fe39ce0b5f5c19381ffaff"},
+    {file = "pycryptodomex-3.22.0-cp27-cp27m-manylinux2010_i686.whl", hash = "sha256:276be1ed006e8fd01bba00d9bd9b60a0151e478033e86ea1cb37447bbc057edc"},
+    {file = "pycryptodomex-3.22.0-cp27-cp27m-manylinux2010_x86_64.whl", hash = "sha256:813e57da5ceb4b549bab96fa548781d9a63f49f1d68fdb148eeac846238056b7"},
+    {file = "pycryptodomex-3.22.0-cp27-cp27m-win32.whl", hash = "sha256:d7beeacb5394765aa8dabed135389a11ee322d3ee16160d178adc7f8ee3e1f65"},
+    {file = "pycryptodomex-3.22.0-cp27-cp27mu-manylinux2010_i686.whl", hash = "sha256:b3746dedf74787da43e4a2f85bd78f5ec14d2469eb299ddce22518b3891f16ea"},
+    {file = "pycryptodomex-3.22.0-cp27-cp27mu-manylinux2010_x86_64.whl", hash = "sha256:5ebc09b7d8964654aaf8a4f5ac325f2b0cc038af9bea12efff0cd4a5bb19aa42"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-macosx_10_9_universal2.whl", hash = "sha256:aef4590263b9f2f6283469e998574d0bd45c14fb262241c27055b82727426157"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-macosx_10_9_x86_64.whl", hash = "sha256:5ac608a6dce9418d4f300fab7ba2f7d499a96b462f2b9b5c90d8d994cd36dcad"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7a24f681365ec9757ccd69b85868bbd7216ba451d0f86f6ea0eed75eeb6975db"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:259664c4803a1fa260d5afb322972813c5fe30ea8b43e54b03b7e3a27b30856b"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:7127d9de3c7ce20339e06bcd4f16f1a1a77f1471bcf04e3b704306dde101b719"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:ee75067b35c93cc18b38af47b7c0664998d8815174cfc66dd00ea1e244eb27e6"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-musllinux_1_2_i686.whl", hash = "sha256:1a8b0c5ba061ace4bcd03496d42702c3927003db805b8ec619ea6506080b381d"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:bfe4fe3233ef3e58028a3ad8f28473653b78c6d56e088ea04fe7550c63d4d16b"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-win32.whl", hash = "sha256:2cac9ed5c343bb3d0075db6e797e6112514764d08d667c74cb89b931aac9dddd"},
+    {file = "pycryptodomex-3.22.0-cp37-abi3-win_amd64.whl", hash = "sha256:ff46212fda7ee86ec2f4a64016c994e8ad80f11ef748131753adb67e9b722ebd"},
+    {file = "pycryptodomex-3.22.0-pp27-pypy_73-manylinux2010_x86_64.whl", hash = "sha256:5bf3ce9211d2a9877b00b8e524593e2209e370a287b3d5e61a8c45f5198487e2"},
+    {file = "pycryptodomex-3.22.0-pp27-pypy_73-win32.whl", hash = "sha256:684cb57812cd243217c3d1e01a720c5844b30f0b7b64bb1a49679f7e1e8a54ac"},
+    {file = "pycryptodomex-3.22.0-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:c8cffb03f5dee1026e3f892f7cffd79926a538c67c34f8b07c90c0bd5c834e27"},
+    {file = "pycryptodomex-3.22.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:140b27caa68a36d0501b05eb247bd33afa5f854c1ee04140e38af63c750d4e39"},
+    {file = "pycryptodomex-3.22.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:644834b1836bb8e1d304afaf794d5ae98a1d637bd6e140c9be7dd192b5374811"},
+    {file = "pycryptodomex-3.22.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:72c506aba3318505dbeecf821ed7b9a9f86f422ed085e2d79c4fba0ae669920a"},
+    {file = "pycryptodomex-3.22.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:7cd39f7a110c1ab97ce9ee3459b8bc615920344dc00e56d1b709628965fba3f2"},
+    {file = "pycryptodomex-3.22.0-pp39-pypy39_pp73-macosx_10_15_x86_64.whl", hash = "sha256:e4eaaf6163ff13788c1f8f615ad60cdc69efac6d3bf7b310b21e8cfe5f46c801"},
+    {file = "pycryptodomex-3.22.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:eac39e237d65981554c2d4c6668192dc7051ad61ab5fc383ed0ba049e4007ca2"},
+    {file = "pycryptodomex-3.22.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1ab0d89d1761959b608952c7b347b0e76a32d1a5bb278afbaa10a7f3eaef9a0a"},
+    {file = "pycryptodomex-3.22.0-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:5e64164f816f5e43fd69f8ed98eb28f98157faf68208cd19c44ed9d8e72d33e8"},
+    {file = "pycryptodomex-3.22.0-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:f005de31efad6f9acefc417296c641f13b720be7dbfec90edeaca601c0fab048"},
+    {file = "pycryptodomex-3.22.0.tar.gz", hash = "sha256:a1da61bacc22f93a91cbe690e3eb2022a03ab4123690ab16c46abb693a9df63d"},
+]
+
 [[package]]
 name = "pydantic"
 version = "2.11.1"
@ -2110,11 +2181,9 @@ files = [

 [package.dependencies]
 colorama = {version = "*", markers = "sys_platform == \"win32\""}
-exceptiongroup = {version = ">=1.0.0rc8", markers = "python_version < \"3.11\""}
 iniconfig = "*"
 packaging = "*"
 pluggy = ">=1.5,<2"
-tomli = {version = ">=1", markers = "python_version < \"3.11\""}

 [package.extras]
 dev = ["argcomplete", "attrs (>=19.2)", "hypothesis (>=3.56)", "mock", "pygments (>=2.7.2)", "requests", "setuptools", "xmlschema"]
@ -2979,17 +3048,6 @@ files = [
    {file = "toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f"},
 ]

-[[package]]
-name = "tomli"
-version = "2.0.2"
-description = "A lil' TOML parser"
-optional = false
-python-versions = ">=3.8"
-files = [
-    {file = "tomli-2.0.2-py3-none-any.whl", hash = "sha256:2ebe24485c53d303f690b0ec092806a085f07af5a5aa1464f3931eec36caaa38"},
-    {file = "tomli-2.0.2.tar.gz", hash = "sha256:d46d457a85337051c36524bc5349dd91b1877838e2979ac5ced3e710ed8a60ed"},
-]
-
 [[package]]
 name = "tomlkit"
 version = "0.13.2"
@ -3052,6 +3110,41 @@ notebook = ["ipywidgets (>=6)"]
 slack = ["slack-sdk"]
 telegram = ["requests"]

+[[package]]
+name = "twisted"
+version = "24.11.0"
+description = "An asynchronous networking framework written in Python"
+optional = false
+python-versions = ">=3.8.0"
+files = [
+    {file = "twisted-24.11.0-py3-none-any.whl", hash = "sha256:fe403076c71f04d5d2d789a755b687c5637ec3bcd3b2b8252d76f2ba65f54261"},
+    {file = "twisted-24.11.0.tar.gz", hash = "sha256:695d0556d5ec579dcc464d2856b634880ed1319f45b10d19043f2b57eb0115b5"},
+]
+
+[package.dependencies]
+attrs = ">=22.2.0"
+automat = ">=24.8.0"
+constantly = ">=15.1"
+hyperlink = ">=17.1.1"
+incremental = ">=24.7.0"
+typing-extensions = ">=4.2.0"
+zope-interface = ">=5"
+
+[package.extras]
+all-non-platform = ["appdirs (>=1.4.0)", "appdirs (>=1.4.0)", "bcrypt (>=3.1.3)", "bcrypt (>=3.1.3)", "cryptography (>=3.3)", "cryptography (>=3.3)", "cython-test-exception-raiser (>=1.0.2,<2)", "cython-test-exception-raiser (>=1.0.2,<2)", "h2 (>=3.2,<5.0)", "h2 (>=3.2,<5.0)", "httpx[http2] (>=0.27)", "httpx[http2] (>=0.27)", "hypothesis (>=6.56)", "hypothesis (>=6.56)", "idna (>=2.4)", "idna (>=2.4)", "priority (>=1.1.0,<2.0)", "priority (>=1.1.0,<2.0)", "pyhamcrest (>=2)", "pyhamcrest (>=2)", "pyopenssl (>=21.0.0)", "pyopenssl (>=21.0.0)", "pyserial (>=3.0)", "pyserial (>=3.0)", "pywin32 (!=226)", "pywin32 (!=226)", "service-identity (>=18.1.0)", "service-identity (>=18.1.0)"]
+conch = ["appdirs (>=1.4.0)", "bcrypt (>=3.1.3)", "cryptography (>=3.3)"]
+dev = ["coverage (>=7.5,<8.0)", "cython-test-exception-raiser (>=1.0.2,<2)", "httpx[http2] (>=0.27)", "hypothesis (>=6.56)", "pydoctor (>=23.9.0,<23.10.0)", "pyflakes (>=2.2,<3.0)", "pyhamcrest (>=2)", "python-subunit (>=1.4,<2.0)", "sphinx (>=6,<7)", "sphinx-rtd-theme (>=1.3,<2.0)", "towncrier (>=23.6,<24.0)", "twistedchecker (>=0.7,<1.0)"]
+dev-release = ["pydoctor (>=23.9.0,<23.10.0)", "pydoctor (>=23.9.0,<23.10.0)", "sphinx (>=6,<7)", "sphinx (>=6,<7)", "sphinx-rtd-theme (>=1.3,<2.0)", "sphinx-rtd-theme (>=1.3,<2.0)", "towncrier (>=23.6,<24.0)", "towncrier (>=23.6,<24.0)"]
+gtk-platform = ["appdirs (>=1.4.0)", "appdirs (>=1.4.0)", "bcrypt (>=3.1.3)", "bcrypt (>=3.1.3)", "cryptography (>=3.3)", "cryptography (>=3.3)", "cython-test-exception-raiser (>=1.0.2,<2)", "cython-test-exception-raiser (>=1.0.2,<2)", "h2 (>=3.2,<5.0)", "h2 (>=3.2,<5.0)", "httpx[http2] (>=0.27)", "httpx[http2] (>=0.27)", "hypothesis (>=6.56)", "hypothesis (>=6.56)", "idna (>=2.4)", "idna (>=2.4)", "priority (>=1.1.0,<2.0)", "priority (>=1.1.0,<2.0)", "pygobject", "pygobject", "pyhamcrest (>=2)", "pyhamcrest (>=2)", "pyopenssl (>=21.0.0)", "pyopenssl (>=21.0.0)", "pyserial (>=3.0)", "pyserial (>=3.0)", "pywin32 (!=226)", "pywin32 (!=226)", "service-identity (>=18.1.0)", "service-identity (>=18.1.0)"]
+http2 = ["h2 (>=3.2,<5.0)", "priority (>=1.1.0,<2.0)"]
+macos-platform = ["appdirs (>=1.4.0)", "appdirs (>=1.4.0)", "bcrypt (>=3.1.3)", "bcrypt (>=3.1.3)", "cryptography (>=3.3)", "cryptography (>=3.3)", "cython-test-exception-raiser (>=1.0.2,<2)", "cython-test-exception-raiser (>=1.0.2,<2)", "h2 (>=3.2,<5.0)", "h2 (>=3.2,<5.0)", "httpx[http2] (>=0.27)", "httpx[http2] (>=0.27)", "hypothesis (>=6.56)", "hypothesis (>=6.56)", "idna (>=2.4)", "idna (>=2.4)", "priority (>=1.1.0,<2.0)", "priority (>=1.1.0,<2.0)", "pyhamcrest (>=2)", "pyhamcrest (>=2)", "pyobjc-core", "pyobjc-core", "pyobjc-framework-cfnetwork", "pyobjc-framework-cfnetwork", "pyobjc-framework-cocoa", "pyobjc-framework-cocoa", "pyopenssl (>=21.0.0)", "pyopenssl (>=21.0.0)", "pyserial (>=3.0)", "pyserial (>=3.0)", "pywin32 (!=226)", "pywin32 (!=226)", "service-identity (>=18.1.0)", "service-identity (>=18.1.0)"]
+mypy = ["appdirs (>=1.4.0)", "bcrypt (>=3.1.3)", "coverage (>=7.5,<8.0)", "cryptography (>=3.3)", "cython-test-exception-raiser (>=1.0.2,<2)", "h2 (>=3.2,<5.0)", "httpx[http2] (>=0.27)", "hypothesis (>=6.56)", "idna (>=2.4)", "mypy (==1.10.1)", "mypy-zope (==1.0.6)", "priority (>=1.1.0,<2.0)", "pydoctor (>=23.9.0,<23.10.0)", "pyflakes (>=2.2,<3.0)", "pyhamcrest (>=2)", "pyopenssl (>=21.0.0)", "pyserial (>=3.0)", "python-subunit (>=1.4,<2.0)", "pywin32 (!=226)", "service-identity (>=18.1.0)", "sphinx (>=6,<7)", "sphinx-rtd-theme (>=1.3,<2.0)", "towncrier (>=23.6,<24.0)", "twistedchecker (>=0.7,<1.0)", "types-pyopenssl", "types-setuptools"]
+osx-platform = ["appdirs (>=1.4.0)", "appdirs (>=1.4.0)", "bcrypt (>=3.1.3)", "bcrypt (>=3.1.3)", "cryptography (>=3.3)", "cryptography (>=3.3)", "cython-test-exception-raiser (>=1.0.2,<2)", "cython-test-exception-raiser (>=1.0.2,<2)", "h2 (>=3.2,<5.0)", "h2 (>=3.2,<5.0)", "httpx[http2] (>=0.27)", "httpx[http2] (>=0.27)", "hypothesis (>=6.56)", "hypothesis (>=6.56)", "idna (>=2.4)", "idna (>=2.4)", "priority (>=1.1.0,<2.0)", "priority (>=1.1.0,<2.0)", "pyhamcrest (>=2)", "pyhamcrest (>=2)", "pyobjc-core", "pyobjc-core", "pyobjc-framework-cfnetwork", "pyobjc-framework-cfnetwork", "pyobjc-framework-cocoa", "pyobjc-framework-cocoa", "pyopenssl (>=21.0.0)", "pyopenssl (>=21.0.0)", "pyserial (>=3.0)", "pyserial (>=3.0)", "pywin32 (!=226)", "pywin32 (!=226)", "service-identity (>=18.1.0)", "service-identity (>=18.1.0)"]
+serial = ["pyserial (>=3.0)", "pywin32 (!=226)"]
+test = ["cython-test-exception-raiser (>=1.0.2,<2)", "httpx[http2] (>=0.27)", "hypothesis (>=6.56)", "pyhamcrest (>=2)"]
+tls = ["idna (>=2.4)", "pyopenssl (>=21.0.0)", "service-identity (>=18.1.0)"]
+windows-platform = ["appdirs (>=1.4.0)", "appdirs (>=1.4.0)", "bcrypt (>=3.1.3)", "bcrypt (>=3.1.3)", "cryptography (>=3.3)", "cryptography (>=3.3)", "cython-test-exception-raiser (>=1.0.2,<2)", "cython-test-exception-raiser (>=1.0.2,<2)", "h2 (>=3.2,<5.0)", "h2 (>=3.2,<5.0)", "httpx[http2] (>=0.27)", "httpx[http2] (>=0.27)", "hypothesis (>=6.56)", "hypothesis (>=6.56)", "idna (>=2.4)", "idna (>=2.4)", "priority (>=1.1.0,<2.0)", "priority (>=1.1.0,<2.0)", "pyhamcrest (>=2)", "pyhamcrest (>=2)", "pyopenssl (>=21.0.0)", "pyopenssl (>=21.0.0)", "pyserial (>=3.0)", "pyserial (>=3.0)", "pywin32 (!=226)", "pywin32 (!=226)", "pywin32 (!=226)", "pywin32 (!=226)", "service-identity (>=18.1.0)", "service-identity (>=18.1.0)", "twisted-iocpsupport (>=1.0.2)", "twisted-iocpsupport (>=1.0.2)"]
+
 [[package]]
 name = "typing-extensions"
 version = "4.12.2"
@ -3133,11 +3226,25 @@ files = [
 [package.dependencies]
 click = ">=7.0"
 h11 = ">=0.8"
-typing-extensions = {version = ">=4.0", markers = "python_version < \"3.11\""}

 [package.extras]
 standard = ["colorama (>=0.4)", "httptools (>=0.5.0)", "python-dotenv (>=0.13)", "pyyaml (>=5.1)", "uvloop (>=0.14.0,!=0.15.0,!=0.15.1)", "watchfiles (>=0.13)", "websockets (>=10.4)"]

+[[package]]
+name = "vncdotool"
+version = "1.2.0"
+description = "Command line VNC client"
+optional = false
+python-versions = "*"
+files = [
+    {file = "vncdotool-1.2.0.tar.gz", hash = "sha256:53408d18ca7f9f21c525fc88189b01ca6594153ec1a9be09f6198306d166ea0d"},
+]
+
+[package.dependencies]
+Pillow = "*"
+pycryptodomex = "*"
+Twisted = "*"
+
 [[package]]
 name = "watchdog"
 version = "6.0.0"
@ -3472,7 +3579,61 @@ enabler = ["pytest-enabler (>=2.2)"]
 test = ["big-O", "importlib-resources", "jaraco.functools", "jaraco.itertools", "jaraco.test", "more-itertools", "pytest (>=6,!=8.1.*)", "pytest-ignore-flaky"]
 type = ["pytest-mypy"]

+[[package]]
+name = "zope-interface"
+version = "7.2"
+description = "Interfaces for Python"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "zope.interface-7.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:ce290e62229964715f1011c3dbeab7a4a1e4971fd6f31324c4519464473ef9f2"},
+    {file = "zope.interface-7.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:05b910a5afe03256b58ab2ba6288960a2892dfeef01336dc4be6f1b9ed02ab0a"},
+    {file = "zope.interface-7.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:550f1c6588ecc368c9ce13c44a49b8d6b6f3ca7588873c679bd8fd88a1b557b6"},
+    {file = "zope.interface-7.2-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0ef9e2f865721553c6f22a9ff97da0f0216c074bd02b25cf0d3af60ea4d6931d"},
+    {file = "zope.interface-7.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:27f926f0dcb058211a3bb3e0e501c69759613b17a553788b2caeb991bed3b61d"},
+    {file = "zope.interface-7.2-cp310-cp310-win_amd64.whl", hash = "sha256:144964649eba4c5e4410bb0ee290d338e78f179cdbfd15813de1a664e7649b3b"},
+    {file = "zope.interface-7.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:1909f52a00c8c3dcab6c4fad5d13de2285a4b3c7be063b239b8dc15ddfb73bd2"},
+    {file = "zope.interface-7.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:80ecf2451596f19fd607bb09953f426588fc1e79e93f5968ecf3367550396b22"},
+    {file = "zope.interface-7.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:033b3923b63474800b04cba480b70f6e6243a62208071fc148354f3f89cc01b7"},
+    {file = "zope.interface-7.2-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:a102424e28c6b47c67923a1f337ede4a4c2bba3965b01cf707978a801fc7442c"},
+    {file = "zope.interface-7.2-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:25e6a61dcb184453bb00eafa733169ab6d903e46f5c2ace4ad275386f9ab327a"},
+    {file = "zope.interface-7.2-cp311-cp311-win_amd64.whl", hash = "sha256:3f6771d1647b1fc543d37640b45c06b34832a943c80d1db214a37c31161a93f1"},
+    {file = "zope.interface-7.2-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:086ee2f51eaef1e4a52bd7d3111a0404081dadae87f84c0ad4ce2649d4f708b7"},
+    {file = "zope.interface-7.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:21328fcc9d5b80768bf051faa35ab98fb979080c18e6f84ab3f27ce703bce465"},
+    {file = "zope.interface-7.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f6dd02ec01f4468da0f234da9d9c8545c5412fef80bc590cc51d8dd084138a89"},
+    {file = "zope.interface-7.2-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8e7da17f53e25d1a3bde5da4601e026adc9e8071f9f6f936d0fe3fe84ace6d54"},
+    {file = "zope.interface-7.2-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cab15ff4832580aa440dc9790b8a6128abd0b88b7ee4dd56abacbc52f212209d"},
+    {file = "zope.interface-7.2-cp312-cp312-win_amd64.whl", hash = "sha256:29caad142a2355ce7cfea48725aa8bcf0067e2b5cc63fcf5cd9f97ad12d6afb5"},
+    {file = "zope.interface-7.2-cp313-cp313-macosx_10_9_x86_64.whl", hash = "sha256:3e0350b51e88658d5ad126c6a57502b19d5f559f6cb0a628e3dc90442b53dd98"},
+    {file = "zope.interface-7.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:15398c000c094b8855d7d74f4fdc9e73aa02d4d0d5c775acdef98cdb1119768d"},
+    {file = "zope.interface-7.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:802176a9f99bd8cc276dcd3b8512808716492f6f557c11196d42e26c01a69a4c"},
+    {file = "zope.interface-7.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:eb23f58a446a7f09db85eda09521a498e109f137b85fb278edb2e34841055398"},
+    {file = "zope.interface-7.2-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a71a5b541078d0ebe373a81a3b7e71432c61d12e660f1d67896ca62d9628045b"},
+    {file = "zope.interface-7.2-cp313-cp313-win_amd64.whl", hash = "sha256:4893395d5dd2ba655c38ceb13014fd65667740f09fa5bb01caa1e6284e48c0cd"},
+    {file = "zope.interface-7.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:d3a8ffec2a50d8ec470143ea3d15c0c52d73df882eef92de7537e8ce13475e8a"},
+    {file = "zope.interface-7.2-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:31d06db13a30303c08d61d5fb32154be51dfcbdb8438d2374ae27b4e069aac40"},
+    {file = "zope.interface-7.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e204937f67b28d2dca73ca936d3039a144a081fc47a07598d44854ea2a106239"},
+    {file = "zope.interface-7.2-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:224b7b0314f919e751f2bca17d15aad00ddbb1eadf1cb0190fa8175edb7ede62"},
+    {file = "zope.interface-7.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:baf95683cde5bc7d0e12d8e7588a3eb754d7c4fa714548adcd96bdf90169f021"},
+    {file = "zope.interface-7.2-cp38-cp38-win_amd64.whl", hash = "sha256:7dc5016e0133c1a1ec212fc87a4f7e7e562054549a99c73c8896fa3a9e80cbc7"},
+    {file = "zope.interface-7.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:7bd449c306ba006c65799ea7912adbbfed071089461a19091a228998b82b1fdb"},
+    {file = "zope.interface-7.2-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:a19a6cc9c6ce4b1e7e3d319a473cf0ee989cbbe2b39201d7c19e214d2dfb80c7"},
+    {file = "zope.interface-7.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:72cd1790b48c16db85d51fbbd12d20949d7339ad84fd971427cf00d990c1f137"},
+    {file = "zope.interface-7.2-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:52e446f9955195440e787596dccd1411f543743c359eeb26e9b2c02b077b0519"},
+    {file = "zope.interface-7.2-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2ad9913fd858274db8dd867012ebe544ef18d218f6f7d1e3c3e6d98000f14b75"},
+    {file = "zope.interface-7.2-cp39-cp39-win_amd64.whl", hash = "sha256:1090c60116b3da3bfdd0c03406e2f14a1ff53e5771aebe33fec1edc0a350175d"},
+    {file = "zope.interface-7.2.tar.gz", hash = "sha256:8b49f1a3d1ee4cdaf5b32d2e738362c7f5e40ac8b46dd7d1a65e82a4872728fe"},
+]
+
+[package.dependencies]
+setuptools = "*"
+
+[package.extras]
+docs = ["Sphinx", "furo", "repoze.sphinx.autointerface"]
+test = ["coverage[toml]", "zope.event", "zope.testing"]
+testing = ["coverage[toml]", "zope.event", "zope.testing"]
+
 [metadata]
 lock-version = "2.0"
-python-versions = "^3.10"
-content-hash = "e80d572b14371929f2fc9a459a28c97463d91755c21698564a3eba7637198413"
+python-versions = "^3.11"
+content-hash = "065654a603730f66d780289c72b89303f3fa3f96cbaa990c003b052605b35a57"
--- a/backend/pyproject.toml
+++ b/backend/pyproject.toml
@ -48,6 +48,7 @@ boto3 = "^1.34.0"
 openai = "^1.72.0"
 streamlit = "^1.44.1"
 nest-asyncio = "^1.6.0"
+vncdotool = "^1.2.0"

 [tool.poetry.scripts]
 agentpress = "agentpress.cli:main"
--- a/backend/sandbox/sandbox.py
+++ b/backend/sandbox/sandbox.py
@ -194,33 +194,72 @@ def start_sandbox_browser_api(sandbox):
    try:
        # Create tmux session for browser API
        logger.debug("Creating tmux session for browser API")
+        
+        # Create a session ID for this browser API
+        session_id = 'browser_api_session'
+        
+        # First create the session properly using create_session
        try:
-            # Check if session already exists
-            sandbox.process.execute_session_command('sandbox_browser_api', SessionExecuteRequest(
-                command="tmux has-session -t browser_api 2>/dev/null || tmux new-session -d -s browser_api",
-                var_async=True
-            ))
+            # Create a new session
+            logger.debug(f"Creating new session with ID: {session_id}")
+            sandbox.process.create_session(session_id)
+            sleep(2)  # Wait for session initialization
        except Exception as session_e:
-            logger.debug(f"Error creating tmux session, might already exist: {str(session_e)}")
+            logger.debug(f"Error creating session: {str(session_e)}")
+            # Try to delete and recreate if it exists
+            try:
+                sandbox.process.delete_session(session_id)
+                sleep(1)
+                sandbox.process.create_session(session_id)
+                sleep(2)
+            except Exception as e:
+                logger.debug(f"Error recreating session: {str(e)}")
        
-        # Kill any existing process in the session
-        sandbox.process.execute_session_command('sandbox_browser_api', SessionExecuteRequest(
-            command="tmux send-keys -t browser_api C-c",
-            var_async=True
-        ))
+        # Now execute commands in the created session
+        max_retries = 3
+        retry_count = 0
+        success = False
        
-        logger.debug("Executing browser API command in tmux session")
-        rsp = sandbox.process.execute_session_command('sandbox_browser_api', SessionExecuteRequest(
-            command="tmux send-keys -t browser_api 'python " + sandbox.get_user_root_dir() + "/browser_api.py' C-m",
-            var_async=True
-        ))
-        logger.debug(f"Browser API command execution result: {rsp}")
-        
-        # Verify the process is running
-        sandbox.process.execute_session_command('sandbox_browser_api', SessionExecuteRequest(
-            command="tmux list-panes -t browser_api -F '#{pane_pid}'",
-            var_async=True
-        ))
+        while retry_count < max_retries and not success:
+            try:
+                # Execute tmux command in the session
+                logger.debug(f"Creating tmux in session {session_id}")
+                sandbox.process.execute_session_command(session_id, SessionExecuteRequest(
+                    command="tmux new-session -d -s browser_api || true",
+                    var_async=True
+                ))
+                sleep(2)
+                
+                # Kill any existing process in the tmux session
+                sandbox.process.execute_session_command(session_id, SessionExecuteRequest(
+                    command="tmux send-keys -t browser_api C-c",
+                    var_async=True
+                ))
+                
+                logger.debug("Executing browser API command in tmux session")
+                rsp = sandbox.process.execute_session_command(session_id, SessionExecuteRequest(
+                    command="tmux send-keys -t browser_api 'python " + sandbox.get_user_root_dir() + "/browser_api.py' C-m",
+                    var_async=True
+                ))
+                logger.debug(f"Browser API command execution result: {rsp}")
+                
+                # Verify the process is running
+                sandbox.process.execute_session_command(session_id, SessionExecuteRequest(
+                    command="tmux list-panes -t browser_api -F '#{pane_pid}'",
+                    var_async=True
+                ))
+                
+                success = True
+                logger.debug("Browser API started successfully")
+                
+            except Exception as e:
+                retry_count += 1
+                logger.warning(f"Attempt {retry_count}/{max_retries} to start browser API failed: {str(e)}")
+                sleep(2)  # Wait before retrying
+                
+                if retry_count >= max_retries:
+                    logger.error(f"Error starting browser API after {max_retries} attempts: {str(e)}")
+                    raise e
        
    except Exception as e:
        logger.error(f"Error starting browser API: {str(e)}")
@ -232,36 +271,75 @@ def start_http_server(sandbox):
    try:
        # Create tmux session for HTTP server
        logger.debug("Creating tmux session for HTTP server")
+        
+        # Create a session ID for this HTTP server
+        session_id = 'http_server_session'
+        
+        # First create the session properly using create_session
        try:
-            # Check if session already exists
-            sandbox.process.execute_session_command('http_server', SessionExecuteRequest(
-                command="tmux has-session -t http_server 2>/dev/null || tmux new-session -d -s http_server",
-                var_async=True
-            ))
+            # Create a new session
+            logger.debug(f"Creating new session with ID: {session_id}")
+            sandbox.process.create_session(session_id)
+            sleep(2)  # Wait for session initialization
        except Exception as session_e:
-            logger.debug(f"Error creating tmux session, might already exist: {str(session_e)}")
-            
+            logger.debug(f"Error creating session: {str(session_e)}")
+            # Try to delete and recreate if it exists
+            try:
+                sandbox.process.delete_session(session_id)
+                sleep(1)
+                sandbox.process.create_session(session_id)
+                sleep(2)
+            except Exception as e:
+                logger.debug(f"Error recreating session: {str(e)}")
+        
        # Create the server script file
        sandbox.fs.upload_file(sandbox.get_user_root_dir() + "/server.py", SERVER_SCRIPT.encode())
        
-        # Kill any existing process in the session
-        sandbox.process.execute_session_command('http_server', SessionExecuteRequest(
-            command="tmux send-keys -t http_server C-c",
-            var_async=True
-        ))
+        # Now execute commands in the created session
+        max_retries = 3
+        retry_count = 0
+        success = False
        
-        # Start the HTTP server using uvicorn with auto-reload in tmux session
-        http_server_rsp = sandbox.process.execute_session_command('http_server', SessionExecuteRequest(
-            command="cd " + sandbox.get_user_root_dir() + " && tmux send-keys -t http_server 'pip install uvicorn fastapi && python server.py' C-m",
-            var_async=True
-        ))
-        logger.info(f"HTTP server started: {http_server_rsp}")
-        
-        # Verify the process is running
-        sandbox.process.execute_session_command('http_server', SessionExecuteRequest(
-            command="tmux list-panes -t http_server -F '#{pane_pid}'",
-            var_async=True
-        ))
+        while retry_count < max_retries and not success:
+            try:
+                # Execute tmux command in the session
+                logger.debug(f"Creating tmux in session {session_id}")
+                sandbox.process.execute_session_command(session_id, SessionExecuteRequest(
+                    command="tmux new-session -d -s http_server || true",
+                    var_async=True
+                ))
+                sleep(2)
+                
+                # Kill any existing process in the tmux session
+                sandbox.process.execute_session_command(session_id, SessionExecuteRequest(
+                    command="tmux send-keys -t http_server C-c",
+                    var_async=True
+                ))
+                
+                # Start the HTTP server using uvicorn with auto-reload in tmux session
+                http_server_rsp = sandbox.process.execute_session_command(session_id, SessionExecuteRequest(
+                    command="cd " + sandbox.get_user_root_dir() + " && tmux send-keys -t http_server 'pip install uvicorn fastapi && python server.py' C-m",
+                    var_async=True
+                ))
+                logger.info(f"HTTP server started: {http_server_rsp}")
+                
+                # Verify the process is running
+                sandbox.process.execute_session_command(session_id, SessionExecuteRequest(
+                    command="tmux list-panes -t http_server -F '#{pane_pid}'",
+                    var_async=True
+                ))
+                
+                success = True
+                logger.debug("HTTP server started successfully")
+                
+            except Exception as e:
+                retry_count += 1
+                logger.warning(f"Attempt {retry_count}/{max_retries} to start HTTP server failed: {str(e)}")
+                sleep(2)  # Wait before retrying
+                
+                if retry_count >= max_retries:
+                    logger.error(f"Error starting HTTP server after {max_retries} attempts: {str(e)}")
+                    raise e
    
    except Exception as e:
        logger.error(f"Error starting HTTP server: {str(e)}")
@ -387,6 +465,7 @@ def create_sandbox(password: str):
        ports=[
            7788,  # Gradio default port
            6080,  # noVNC web interface
+            5900,  # VNC port
            5901,  # VNC port
            9222,  # Chrome remote debugging port
            8000,  # FastAPI port
--- a/frontend/src/app/(dashboard)/projects/[id]/threads/[threadId]/page.tsx
+++ b/frontend/src/app/(dashboard)/projects/[id]/threads/[threadId]/page.tsx
@ -544,10 +544,29 @@ export default function ThreadPage({ params }: { params: Promise<ThreadParams> }
      loadData();
    }

+    // Handle visibility changes for more responsive streaming
+    const handleVisibilityChange = () => {
+      if (document.visibilityState === 'visible' && agentRunId && agentStatus === 'running') {
+        console.log('[PAGE] Page became visible, checking stream health');
+        
+        // If we're supposed to be streaming but not receiving chunks, restart the stream
+        if (!isStreaming && streamCleanupRef.current === null) {
+          console.log('[PAGE] Stream appears disconnected, reconnecting');
+          handleStreamAgent(agentRunId);
+        }
+      }
+    };
+
+    // Add visibility change listener
+    document.addEventListener('visibilitychange', handleVisibilityChange);
+
    // Cleanup function
    return () => {
      isMounted = false;
      
+      // Remove visibility change listener
+      document.removeEventListener('visibilitychange', handleVisibilityChange);
+      
      // Properly clean up stream
      if (streamCleanupRef.current) {
        console.log('[PAGE] Cleaning up stream on unmount');
@ -558,7 +577,7 @@ export default function ThreadPage({ params }: { params: Promise<ThreadParams> }
      // Reset component state to prevent memory leaks
      console.log('[PAGE] Resetting component state on unmount');
    };
-  }, [projectId, threadId, user, handleStreamAgent]);
+  }, [projectId, threadId, user, handleStreamAgent, agentRunId, agentStatus, isStreaming]);

  const handleSubmitMessage = async (message: string) => {
    if (!message.trim()) return;
--- a/frontend/src/lib/api.ts
+++ b/frontend/src/lib/api.ts
@ -552,6 +552,8 @@ export const streamAgent = (agentRunId: string, callbacks: {
 }): () => void => {
  let eventSourceInstance: EventSource | null = null;
  let isClosing = false;
+  let wasHidden = false;
+  let reconnectTimeout: NodeJS.Timeout | null = null;
  
  console.log(`[STREAM] Setting up stream for agent run ${agentRunId}`);
  
@ -562,6 +564,12 @@ export const streamAgent = (agentRunId: string, callbacks: {
        return;
      }
      
+      // Clear any pending reconnect timeout
+      if (reconnectTimeout) {
+        clearTimeout(reconnectTimeout);
+        reconnectTimeout = null;
+      }
+      
      const supabase = createClient();
      const { data: { session } } = await supabase.auth.getSession();
      
@ -575,6 +583,12 @@ export const streamAgent = (agentRunId: string, callbacks: {
      const url = new URL(`${API_URL}/agent-run/${agentRunId}/stream`);
      url.searchParams.append('token', session.access_token);
      
+      // Close existing EventSource if it exists
+      if (eventSourceInstance) {
+        console.log(`[STREAM] Closing existing EventSource before creating a new one`);
+        eventSourceInstance.close();
+      }
+      
      console.log(`[STREAM] Creating EventSource for ${agentRunId}`);
      eventSourceInstance = new EventSource(url.toString());
      
@ -645,6 +659,22 @@ export const streamAgent = (agentRunId: string, callbacks: {
          return;
        }
        
+        // If the page was hidden and now visible again, we might need to reconnect
+        if (wasHidden && document.visibilityState === 'visible') {
+          console.log(`[STREAM] Page became visible after being hidden, attempting to reconnect`);
+          wasHidden = false;
+          
+          // Close the current connection if it exists
+          if (eventSourceInstance) {
+            eventSourceInstance.close();
+            eventSourceInstance = null;
+          }
+          
+          // Try to set up a new stream
+          setupStream();
+          return;
+        }
+        
        // Only log as error for unexpected closures
        console.log(`[STREAM] EventSource connection closed for ${agentRunId}`);
        
@ -674,13 +704,52 @@ export const streamAgent = (agentRunId: string, callbacks: {
    }
  };
  
-  // Set up the stream once
+  // Handle page visibility changes
+  const handleVisibilityChange = () => {
+    if (document.visibilityState === 'hidden') {
+      console.log(`[STREAM] Page hidden, marking stream as potentially stale for ${agentRunId}`);
+      wasHidden = true;
+    } else if (document.visibilityState === 'visible') {
+      console.log(`[STREAM] Page visible again for ${agentRunId}`);
+      
+      // If we were previously hidden and now visible, check if we need to reconnect
+      if (wasHidden) {
+        wasHidden = false;
+        
+        // Check if the EventSource is in a good state
+        if (!eventSourceInstance || eventSourceInstance.readyState === EventSource.CLOSED) {
+          console.log(`[STREAM] Stream appears stale after visibility change, reconnecting for ${agentRunId}`);
+          
+          // Schedule reconnect with a small delay to allow for better state synchronization
+          reconnectTimeout = setTimeout(() => {
+            setupStream();
+          }, 50);
+        } else {
+          console.log(`[STREAM] Stream appears to be in good state after visibility change for ${agentRunId}`);
+        }
+      }
+    }
+  };
+  
+  // Set up visibility change listener
+  document.addEventListener('visibilitychange', handleVisibilityChange);
+  
+  // Set up the stream initially
  setupStream();
  
  // Return cleanup function
  return () => {
    console.log(`[STREAM] Manual cleanup called for ${agentRunId}`);
    
+    // Remove visibility change listener
+    document.removeEventListener('visibilitychange', handleVisibilityChange);
+    
+    // Clear any pending reconnect timeout
+    if (reconnectTimeout) {
+      clearTimeout(reconnectTimeout);
+      reconnectTimeout = null;
+    }
+    
    if (isClosing) {
      console.log(`[STREAM] Already closing, ignoring duplicate cleanup for ${agentRunId}`);
      return;