Corpus Environment - Nexus Knowledge Base

Environment: Corpus

Ports: 6650 (vault) / 6651 (operational) Location: /opt/mcp-servers/corpus/mcp_corpus_server.py Version: 2.1.0 Locker: l_252a Prefix: corp: Status: ✅ WORKING

Purpose

Document storage, PDF ingestion, and content extraction with Quadfecta indexing. Takes documents, extracts text (via PyMuPDF/Docling), structures by pages/chapters, and stores for AI search/recall. Supports hierarchical parent-child relationships (max depth 3).

Port Configuration

Vault (6650): Password required - stores documents with full indexing
Operational (6651): No password required - read replica for search

Stable ID Format

c_XXXX (4 alphanumeric chars) Key Format: corp:{user}:{timestamp_id}

Tools (11 total)

Tool	Parameters	Description
create	title (req), content (req), category, parent_id, cdn_urls, track_refs, tags	Create document with hierarchy support
get	id (required)	Get document by ID (supports c_XXXX stable IDs)
update	id (req), title, content, category, cdn_urls, track_refs, tags	Update document fields
delete	id (required), force (bool)	Delete document (fails if has children unless force=true)
list	limit (default 20), category, parent_id, query	List documents with filters and Quadfecta scoring
search	query (req), limit (default 10), category	Quadfecta search (keyword + vector + graph + temporal)
tree	root_id, max_depth (default 3)	Show hierarchical tree structure
convert	source (req), format, save, title, category	Convert PDF to markdown via Docling
extract	filepath (req), pages, extract_images	Fast text/image extraction via PyMuPDF
categories	action (list/add), name, description	Manage document categories
ingest	source (req), category (req), title, chunk_by, cdn_source, extract_images, tags	Full ingestion workflow: extract → chunk → store → index

Ingestion Workflow

PDF file → ingest(source, category)
    ↓
PyMuPDF extraction (text + images)
    ↓
Chunking (by page, chapter, section, or none)
    ↓
Parent document + child chunks created
    ↓
Quadfecta indexing (keyword, vector, graph, temporal)
    ↓
Ready for corpus.search/get

Architecture

Stable ID System: c_XXXX prefix (4 alphanumeric chars)
Hierarchy: Max 3 levels deep (parent → child → grandchild)
Extractors: PyMuPDF (fast), Docling (high-quality markdown)
Graph Integration: FalkorDB via graph_helper
Quadfecta Scoring: keyword + vector + graph + temporal layers

Categories (16 default)

book, manual, research, report, blog, conversation, contract, invoice, proposal, documentation, analysis, notes, presentation, template, archive, other

Bug Fixes Applied

✅ Locker Password Mismatch (Fixed 2026-01-06 by Maverick) - Locker l_5137 had incorrect password stored (28dTIp) - Fixed: Updated to correct password (crfWls) via locker.update - Impact: Corpus MCP tools failed when credentials_helper was available

✅ Operational Auth Bug (Fixed 2026-01-06 by Maverick) - Operational client was passing password to port 6651 (no auth required) - Fixed: Removed password from get_operational_client() - uses password=None - File modified: /opt/mcp-servers/corpus/mcp_corpus_server.py

Security Assessment

✅ Locker password now correct ✅ No command injection (no shell execution) ✅ Stable ID system prevents key enumeration ✅ Hierarchy depth limit prevents infinite recursion

Audited by Maverick (a_7yma) | Documented by Rocky (o_cq0c) | 2026-01-06