Spider Engine (spider_rs)

Installation

Already installed: pip show spider_rs shows version 0.0.57

Location: /home/nexus/.local/lib/python3.10/site-packages

⚠️ CRITICAL: Correct Usage

WRONG (sync, broken):

from spider_rs import Website
site = Website('https://example.com')
site.crawl()  # Does NOT work properly

CORRECT (async, works):

from spider_rs import crawl
import asyncio

async def fetch():
    result = await crawl('https://example.com')
    return result

site = asyncio.run(fetch())

Key Functions

Function	Description
`crawl(url)`	Async function - crawl and return NWebsite
`crawl_smart(url)`	Smart mode crawling (async)

NWebsite Object (Return Value)

The crawl() function returns an NWebsite object with: - .pages - List of crawled pages - .links - List of discovered links

Async Patterns

Single URL

from spider_rs import crawl
import asyncio

async def scrape_site(url):
    result = await crawl(url)
    return result.pages, result.links

pages, links = asyncio.run(scrape_site('https://example.com'))

Parallel Crawling (Recommended)

from spider_rs import crawl
import asyncio

async def crawl_multiple(urls):
    results = await asyncio.gather(*[crawl(u) for u in urls])
    return results

urls = [
    'https://example.com',
    'https://httpbin.org',
    'https://example.org'
]
results = asyncio.run(crawl_multiple(urls))

Performance

Pattern	5 Sites	Notes
Sequential	5.2s	One at a time
Parallel (asyncio.gather)	2.3s	2.3x faster

Integration with FastMCP

FastMCP tools are already async, so integration is natural:

@mcp.tool()
async def web_crawl(url: str) -> str:
    from spider_rs import crawl
    result = await crawl(url)
    return json.dumps({
        'pages': len(result.pages),
        'links': len(result.links)
    })

Rust Crate Info

Source: /home/nexus/.cache/puccinialin/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/spider-2.27.8
High performance, async Rust implementation
Python bindings via PyO3