Document Workflow Architecture
Overview
The Nexus document workflow uses three environments working together:
- TEMP - Staging for fast content preparation
- Corpus - Searchable text storage (Quadfecta indexed)
- Nexus Docs - File storage and PDF generation
Environment Details
| Environment |
Port |
Prefix |
Stable ID |
Purpose |
| TEMP |
6680/6681 |
temp: |
tmp_XXXX |
Staging, 24h TTL |
| Corpus |
6650/6651 |
corp: |
c_XXXX |
Searchable text, Quadfecta |
| Nexus Docs |
6653 (CopyParty) |
(paths) |
(files) |
Files, PDFs, sharing |
Pipeline Cluster (6650-6653)
The Pipeline cluster handles all document ingestion and processing:
| Port |
Service |
Type |
Description |
| 6650 |
Corpus Redis |
Vault |
Primary data store for ingested documents |
| 6651 |
Corpus Redis |
Operational |
Read replica for corpus queries |
| 6652 |
PDF-Converter MCP |
Stateless |
PDF conversion service (no Redis) |
| 6653 |
Nexus Docs (CopyParty) |
Web |
File hosting CDN |
Workflow: Research → Document → PDF
┌──────────────────────────────────────────────────────────────┐
│ 1. RESEARCH PHASE │
│ AI gathers info from: web, corpus, kb, contacts │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ 2. STAGE PHASE │
│ temp.stage(content, source_type="generated") │
│ → Returns tmp_XXXX │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ 3. COMMIT PHASE │
│ A) docs.create_pdf(temp_id=tmp_XXXX) → PDF + public URL │
│ B) corpus.create(content, cdn_urls=[url]) → c_XXXX │
│ C) TEMP auto-cleaned after PDF creation │
└──────────────────────────────────────────────────────────────┘
TEMP Environment
temp.stage - Stage content with metadata and TTL
temp.get - Retrieve staged content
temp.list - List staged items
temp.cleanup - Manual cleanup
Corpus Environment
corpus.create - Store searchable text with Quadfecta indexing
corpus.get - Retrieve by c_XXXX ID
corpus.search - Quadfecta search (keyword+vector+graph+temporal)
corpus.ingest - Ingest existing PDFs (extract → store)
corpus.convert - PDF to markdown (Docling)
corpus.extract - Quick text extraction (PyMuPDF)
Nexus Docs Environment
docs.create_pdf - Generate PDF from markdown or TEMP staging
docs.upload - Upload any file
docs.list - List files in directory
docs.get_url - Get public URL
docs.create_pdf Parameters
| Parameter |
Type |
Description |
| user |
string |
Required - User folder |
| content |
string |
Markdown content (or use temp_id/file_path) |
| temp_id |
string |
TEMP staging ID (tmp_XXXX) |
| file_path |
string |
Local markdown file path |
| title |
string |
PDF title (auto-detected if not set) |
| destination |
string |
Path in user folder (default: 'documents') |
| template |
string |
'professional', 'minimal', 'report', 'default' |
| logo_path |
string |
Path to header logo image |
| header_text |
string |
Text next to logo |
| footer_text |
string |
Footer text (default: 'Confidential') |
| show_page_numbers |
boolean |
Show page numbers (default: true) |
| cleanup_temp |
boolean |
Auto-cleanup TEMP after PDF (default: true) |
Example Workflow
# 1. Stage content
temp.stage(
content="# Sales Proposal\n\n...",
source_type="generated",
metadata={"title": "Sales Proposal", "client": "Acme Corp"}
)
# Returns: tmp_a1b2
# 2. Create PDF from staging
docs.create_pdf(
user="chris",
temp_id="tmp_a1b2",
destination="sales/clients/acme/",
template="professional",
logo_path="/data/cdn/users/chris/images/logo.png"
)
# Returns: URL https://docs.corlera.com/home/chris/sales/clients/acme/sales-proposal.pdf
# 3. Store in Corpus for searchability (optional)
corpus.create(
title="Sales Proposal - Acme Corp",
content="# Sales Proposal\n\n...",
cdn_urls=["https://docs.corlera.com/home/chris/sales/clients/acme/sales-proposal.pdf"],
tags=["sales", "proposal", "acme"]
)
# Returns: c_x1y2
Architecture Benefits
- Speed - TEMP staging avoids streaming large content through chat
- Separation - Corpus = AI knowledge, Docs = Human files
- Searchability - Quadfecta indexing on Corpus entries
- Traceability - Corpus entries link to PDF URLs
- Multi-tenant - User parameter required, no defaults
MCP Server Files
- TEMP:
/opt/mcp-servers/temp/mcp_temp_server.py
- Corpus:
/opt/mcp-servers/corpus/mcp_corpus_server.py
- Nexus Docs:
/opt/mcp-servers/docs/mcp_docs_server.py
External Access
- Public URL: https://docs.corlera.com
- Reverse Proxy: DigitalOcean (143.198.133.215) → Tailscale → 100.73.67.78:6653
- Memory (future) - Reserved for true RAG/temporal/graph cognitive system
- Document-v2 - LLMSherpa hierarchical extraction (ports 6750-6752)