Spider Engine (spider_rs)
Installation
Already installed: pip show spider_rs shows version 0.0.57
Location: /home/nexus/.local/lib/python3.10/site-packages
⚠️ CRITICAL: Correct Usage
WRONG (sync, broken):
from spider_rs import Website
site = Website('https://example.com')
site.crawl() # Does NOT work properly
CORRECT (async, works):
from spider_rs import crawl
import asyncio
async def fetch():
result = await crawl('https://example.com')
return result
site = asyncio.run(fetch())
Key Functions
| Function | Description |
|---|---|
crawl(url) |
Async function - crawl and return NWebsite |
crawl_smart(url) |
Smart mode crawling (async) |
NWebsite Object (Return Value)
The crawl() function returns an NWebsite object with:
- .pages - List of crawled pages
- .links - List of discovered links
Async Patterns
Single URL
from spider_rs import crawl
import asyncio
async def scrape_site(url):
result = await crawl(url)
return result.pages, result.links
pages, links = asyncio.run(scrape_site('https://example.com'))
Parallel Crawling (Recommended)
from spider_rs import crawl
import asyncio
async def crawl_multiple(urls):
results = await asyncio.gather(*[crawl(u) for u in urls])
return results
urls = [
'https://example.com',
'https://httpbin.org',
'https://example.org'
]
results = asyncio.run(crawl_multiple(urls))
Performance
| Pattern | 5 Sites | Notes |
|---|---|---|
| Sequential | 5.2s | One at a time |
| Parallel (asyncio.gather) | 2.3s | 2.3x faster |
Integration with FastMCP
FastMCP tools are already async, so integration is natural:
@mcp.tool()
async def web_crawl(url: str) -> str:
from spider_rs import crawl
result = await crawl(url)
return json.dumps({
'pages': len(result.pages),
'links': len(result.links)
})
Rust Crate Info
- Source:
/home/nexus/.cache/puccinialin/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/spider-2.27.8 - High performance, async Rust implementation
- Python bindings via PyO3