mirror of https://github.com/buster-so/buster.git
5.5 KiB
5.5 KiB
Web Tools Package
A TypeScript package for web scraping and company research using Firecrawl's deep research API.
Features
- Company Research: Research companies using their website URL to extract business information
- Deep Research: Uses Firecrawl's AI-powered deep research to gather comprehensive insights
- Polling System: Automatically polls job status with exponential backoff
- Type Safety: Full TypeScript support with proper interfaces
- Error Handling: Comprehensive error handling with custom error types
- Testing: Complete test suite with unit and integration tests
Installation
bun install
Environment Variables
Set your Firecrawl API key:
export FIRECRAWL_API_KEY="fc-your-api-key-here"
Or create a .env
file:
FIRECRAWL_API_KEY=fc-your-api-key-here
Usage
Basic Company Research
import { researchCompany } from 'web-tools';
const result = await researchCompany('https://buster.so');
console.log(result.company); // "Buster"
console.log(result.industry); // "Technology"
console.log(result.businessModel); // "SaaS platform"
console.log(result.services); // ["Analytics", "Data Platform"]
console.log(result.description); // Full markdown description
console.log(result.keyInsights); // Key insights array
Research with Options
import { researchCompany } from 'web-tools';
const result = await researchCompany('https://example.com', {
includeFinancials: true,
includeNews: true,
focusAreas: ['technology', 'business-model'],
maxWaitTime: 300000, // 5 minutes
pollingInterval: 5000, // 5 seconds
});
Using Individual Services
import { FirecrawlService } from 'web-tools';
const firecrawl = new FirecrawlService();
// Start a deep research job
const jobId = await firecrawl.startDeepResearch('Research about AI startups', {
maxDepth: 3,
timeLimit: 180,
maxUrls: 10,
});
// Check job status
const status = await firecrawl.getJobStatus(jobId);
// Simple URL scraping
const content = await firecrawl.scrapeUrl('https://example.com', {
formats: ['markdown', 'html'],
onlyMainContent: true,
});
API Reference
researchCompany(url, options?)
Research a company using their website URL.
Parameters:
url
(string): The company's website URLoptions
(CompanyResearchOptions): Optional configuration
Returns: Promise
CompanyResearchOptions
interface CompanyResearchOptions {
maxWaitTime?: number; // Maximum polling time (default: 300000ms)
pollingInterval?: number; // Polling interval (default: 5000ms)
includeFinancials?: boolean; // Include financial info (default: false)
includeNews?: boolean; // Include recent news (default: false)
focusAreas?: string[]; // Focus on specific areas
}
CompanyResearch
interface CompanyResearch {
company: string; // Company name
industry: string; // Primary industry
businessModel: string; // How they make money
services: string[]; // Products/services offered
description: string; // 2-4 paragraph description in markdown
keyInsights: string[]; // Key insights for new employees
url: string; // Original URL researched
researchedAt: Date; // When research was conducted
rawData?: unknown; // Raw research data from Firecrawl
}
FirecrawlService
class FirecrawlService {
constructor(config?: FirecrawlConfig);
// Start deep research job
startDeepResearch(query: string, options?: DeepResearchOptions): Promise<string>;
// Check job status
getJobStatus(jobId: string): Promise<JobStatusResponse>;
// Scrape single URL
scrapeUrl(url: string, options?: ScrapeOptions): Promise<unknown>;
// Validate URL accessibility
validateUrl(url: string): Promise<boolean>;
}
Error Handling
The package uses a custom CompanyResearchError
class:
try {
const result = await researchCompany('https://example.com');
} catch (error) {
if (error instanceof CompanyResearchError) {
console.log('Error code:', error.code); // 'TIMEOUT' | 'API_ERROR' | 'PARSE_ERROR' | 'INVALID_URL'
console.log('Error message:', error.message);
console.log('Error details:', error.details);
}
}
Testing
Run all tests:
bun run test
Run tests with UI:
bun run test:ui
Run tests with coverage:
bun run test:coverage
Unit Tests
Located in tests/unit/
- test individual components with mocked dependencies.
Integration Tests
Located in tests/integration/
- test the full flow with real API calls (requires valid FIRECRAWL_API_KEY).
Development
Build the package:
bun run build
Run in development mode:
bun run dev
Architecture
src/
├── index.ts # Main exports
├── services/
│ └── firecrawl.ts # Firecrawl service wrapper
├── deep-research/
│ ├── types.ts # TypeScript interfaces
│ └── company-research.ts # Main research logic
└── utils/
└── polling.ts # Polling utilities
Contributing
- Follow TypeScript best practices
- No
any
types - use proper interfaces orunknown
with type guards - No
console.log
statements in production code - Write tests for new features
- Use meaningful commit messages
License
This package is part of the Buster project.