page

Spider Engine Reference

spider crawler reference

Spider Engine (spider_rs)

Installation

Already installed: pip show spider_rs shows version 0.0.57

Location: /home/nexus/.local/lib/python3.10/site-packages

⚠️ CRITICAL: Correct Usage

WRONG (sync, broken):

from spider_rs import Website
site = Website('https://example.com')
site.crawl()  # Does NOT work properly

CORRECT (async, works):

from spider_rs import crawl
import asyncio

async def fetch():
    result = await crawl('https://example.com')
    return result

site = asyncio.run(fetch())

Key Functions

Function Description
crawl(url) Async function - crawl and return NWebsite
crawl_smart(url) Smart mode crawling (async)

NWebsite Object (Return Value)

The crawl() function returns an NWebsite object with: - .pages - List of crawled pages - .links - List of discovered links

Async Patterns

Single URL

from spider_rs import crawl
import asyncio

async def scrape_site(url):
    result = await crawl(url)
    return result.pages, result.links

pages, links = asyncio.run(scrape_site('https://example.com'))
from spider_rs import crawl
import asyncio

async def crawl_multiple(urls):
    results = await asyncio.gather(*[crawl(u) for u in urls])
    return results

urls = [
    'https://example.com',
    'https://httpbin.org',
    'https://example.org'
]
results = asyncio.run(crawl_multiple(urls))

Performance

Pattern 5 Sites Notes
Sequential 5.2s One at a time
Parallel (asyncio.gather) 2.3s 2.3x faster

Integration with FastMCP

FastMCP tools are already async, so integration is natural:

@mcp.tool()
async def web_crawl(url: str) -> str:
    from spider_rs import crawl
    result = await crawl(url)
    return json.dumps({
        'pages': len(result.pages),
        'links': len(result.links)
    })

Rust Crate Info

  • Source: /home/nexus/.cache/puccinialin/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/spider-2.27.8
  • High performance, async Rust implementation
  • Python bindings via PyO3
ID: 206d2549
Path: Web Intelligence > Architecture > Spider Engine Reference
Updated: 2026-01-08T12:46:37