suna/backend/docs/browser_state_image_display.md

2.5 KiB

Browser State Image Display Enhancement

Overview

The BrowserToolView component has been enhanced to display browser state images from the image_url field in addition to the existing screenshot_base64 field. This allows for displaying previously captured browser screenshots that have been uploaded to cloud storage.

Implementation Details

Backend Behavior

The backend browser tool (sb_browser_tool.py) already handles image uploads:

  1. When a browser action is executed, a screenshot is captured as base64
  2. The screenshot is uploaded to Supabase storage using upload_base64_image()
  3. The public URL is stored in the image_url field
  4. The screenshot_base64 field is removed to reduce message size
  5. The browser state message is saved with the image_url field

Frontend Changes

The BrowserToolView component now:

  1. Extracts both screenshot_base64 and image_url from browser state messages
  2. Prioritizes image_url over screenshot_base64 when both are available
  3. Displays images from either source seamlessly

Code Changes

// Extract both fields from browser state
const browserStateContent = safeJsonParse<{ 
  screenshot_base64?: string;
  image_url?: string;
}>(browserStateMessage.content, {});

screenshotBase64 = browserStateContent?.screenshot_base64 || null;
screenshotUrl = browserStateContent?.image_url || null;

Display Logic

The component uses a helper function to render screenshots:

const renderScreenshot = () => {
  if (screenshotUrl) {
    // Prefer uploaded URL
    return <img src={screenshotUrl} alt="Browser Screenshot" />;
  } else if (screenshotBase64) {
    // Fallback to base64
    return <img src={`data:image/jpeg;base64,${screenshotBase64}`} alt="Browser Screenshot" />;
  }
  return null;
};

Benefits

  1. Reduced Memory Usage: Base64 images are removed after upload, reducing message size
  2. Better Performance: Loading images from URLs is more efficient than inline base64
  3. Persistence: Images remain accessible even after the base64 data is removed
  4. Caching: Browser can cache images loaded from URLs

Usage

No changes are required in how the browser tools are used. The enhancement is transparent to the user and agent. Browser screenshots will automatically be uploaded and displayed from their cloud URLs.

Future Enhancements

  1. Add image preloading for better performance
  2. Implement image caching strategy
  3. Add fallback handling if image URL becomes unavailable
  4. Support for multiple screenshots per browser state