mirror of https://github.com/kortix-ai/suna.git
73 lines
2.5 KiB
Markdown
73 lines
2.5 KiB
Markdown
|
# Browser State Image Display Enhancement
|
||
|
|
||
|
## Overview
|
||
|
|
||
|
The BrowserToolView component has been enhanced to display browser state images from the `image_url` field in addition to the existing `screenshot_base64` field. This allows for displaying previously captured browser screenshots that have been uploaded to cloud storage.
|
||
|
|
||
|
## Implementation Details
|
||
|
|
||
|
### Backend Behavior
|
||
|
|
||
|
The backend browser tool (`sb_browser_tool.py`) already handles image uploads:
|
||
|
|
||
|
1. When a browser action is executed, a screenshot is captured as base64
|
||
|
2. The screenshot is uploaded to Supabase storage using `upload_base64_image()`
|
||
|
3. The public URL is stored in the `image_url` field
|
||
|
4. The `screenshot_base64` field is removed to reduce message size
|
||
|
5. The browser state message is saved with the `image_url` field
|
||
|
|
||
|
### Frontend Changes
|
||
|
|
||
|
The `BrowserToolView` component now:
|
||
|
|
||
|
1. Extracts both `screenshot_base64` and `image_url` from browser state messages
|
||
|
2. Prioritizes `image_url` over `screenshot_base64` when both are available
|
||
|
3. Displays images from either source seamlessly
|
||
|
|
||
|
### Code Changes
|
||
|
|
||
|
```typescript
|
||
|
// Extract both fields from browser state
|
||
|
const browserStateContent = safeJsonParse<{
|
||
|
screenshot_base64?: string;
|
||
|
image_url?: string;
|
||
|
}>(browserStateMessage.content, {});
|
||
|
|
||
|
screenshotBase64 = browserStateContent?.screenshot_base64 || null;
|
||
|
screenshotUrl = browserStateContent?.image_url || null;
|
||
|
```
|
||
|
|
||
|
### Display Logic
|
||
|
|
||
|
The component uses a helper function to render screenshots:
|
||
|
|
||
|
```typescript
|
||
|
const renderScreenshot = () => {
|
||
|
if (screenshotUrl) {
|
||
|
// Prefer uploaded URL
|
||
|
return <img src={screenshotUrl} alt="Browser Screenshot" />;
|
||
|
} else if (screenshotBase64) {
|
||
|
// Fallback to base64
|
||
|
return <img src={`data:image/jpeg;base64,${screenshotBase64}`} alt="Browser Screenshot" />;
|
||
|
}
|
||
|
return null;
|
||
|
};
|
||
|
```
|
||
|
|
||
|
## Benefits
|
||
|
|
||
|
1. **Reduced Memory Usage**: Base64 images are removed after upload, reducing message size
|
||
|
2. **Better Performance**: Loading images from URLs is more efficient than inline base64
|
||
|
3. **Persistence**: Images remain accessible even after the base64 data is removed
|
||
|
4. **Caching**: Browser can cache images loaded from URLs
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
No changes are required in how the browser tools are used. The enhancement is transparent to the user and agent. Browser screenshots will automatically be uploaded and displayed from their cloud URLs.
|
||
|
|
||
|
## Future Enhancements
|
||
|
|
||
|
1. Add image preloading for better performance
|
||
|
2. Implement image caching strategy
|
||
|
3. Add fallback handling if image URL becomes unavailable
|
||
|
4. Support for multiple screenshots per browser state
|