Document Workflow Architecture

Overview

The Nexus document workflow uses three environments working together:

TEMP - Staging for fast content preparation
Corpus - Searchable text storage (Quadfecta indexed)
Nexus Docs - File storage and PDF generation

Environment Details

Environment	Port	Prefix	Stable ID	Purpose
TEMP	6680/6681	temp:	tmp_XXXX	Staging, 24h TTL
Corpus	6650/6651	corp:	c_XXXX	Searchable text, Quadfecta
Nexus Docs	6653 (CopyParty)	(paths)	(files)	Files, PDFs, sharing

Pipeline Cluster (6650-6653)

The Pipeline cluster handles all document ingestion and processing:

Port	Service	Type	Description
6650	Corpus Redis	Vault	Primary data store for ingested documents
6651	Corpus Redis	Operational	Read replica for corpus queries
6652	PDF-Converter MCP	Stateless	PDF conversion service (no Redis)
6653	Nexus Docs (CopyParty)	Web	File hosting CDN

Workflow: Research → Document → PDF

┌──────────────────────────────────────────────────────────────┐
│  1. RESEARCH PHASE                                            │
│     AI gathers info from: web, corpus, kb, contacts           │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│  2. STAGE PHASE                                               │
│     temp.stage(content, source_type="generated")             │
│     → Returns tmp_XXXX                                        │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│  3. COMMIT PHASE                                              │
│     A) docs.create_pdf(temp_id=tmp_XXXX) → PDF + public URL  │
│     B) corpus.create(content, cdn_urls=[url]) → c_XXXX       │
│     C) TEMP auto-cleaned after PDF creation                   │
└──────────────────────────────────────────────────────────────┘

Key Tools

TEMP Environment

temp.stage - Stage content with metadata and TTL
temp.get - Retrieve staged content
temp.list - List staged items
temp.cleanup - Manual cleanup

Corpus Environment

corpus.create - Store searchable text with Quadfecta indexing
corpus.get - Retrieve by c_XXXX ID
corpus.search - Quadfecta search (keyword+vector+graph+temporal)
corpus.ingest - Ingest existing PDFs (extract → store)
corpus.convert - PDF to markdown (Docling)
corpus.extract - Quick text extraction (PyMuPDF)

Nexus Docs Environment

docs.create_pdf - Generate PDF from markdown or TEMP staging
docs.upload - Upload any file
docs.list - List files in directory
docs.get_url - Get public URL

docs.create_pdf Parameters

Parameter	Type	Description
user	string	Required - User folder
content	string	Markdown content (or use temp_id/file_path)
temp_id	string	TEMP staging ID (tmp_XXXX)
file_path	string	Local markdown file path
title	string	PDF title (auto-detected if not set)
destination	string	Path in user folder (default: 'documents')
template	string	'professional', 'minimal', 'report', 'default'
logo_path	string	Path to header logo image
header_text	string	Text next to logo
footer_text	string	Footer text (default: 'Confidential')
show_page_numbers	boolean	Show page numbers (default: true)
cleanup_temp	boolean	Auto-cleanup TEMP after PDF (default: true)

Example Workflow

# 1. Stage content
temp.stage(
    content="# Sales Proposal\n\n...",
    source_type="generated",
    metadata={"title": "Sales Proposal", "client": "Acme Corp"}
)
# Returns: tmp_a1b2

# 2. Create PDF from staging
docs.create_pdf(
    user="chris",
    temp_id="tmp_a1b2",
    destination="sales/clients/acme/",
    template="professional",
    logo_path="/data/cdn/users/chris/images/logo.png"
)
# Returns: URL https://docs.corlera.com/home/chris/sales/clients/acme/sales-proposal.pdf

# 3. Store in Corpus for searchability (optional)
corpus.create(
    title="Sales Proposal - Acme Corp",
    content="# Sales Proposal\n\n...",
    cdn_urls=["https://docs.corlera.com/home/chris/sales/clients/acme/sales-proposal.pdf"],
    tags=["sales", "proposal", "acme"]
)
# Returns: c_x1y2

Architecture Benefits

Speed - TEMP staging avoids streaming large content through chat
Separation - Corpus = AI knowledge, Docs = Human files
Searchability - Quadfecta indexing on Corpus entries
Traceability - Corpus entries link to PDF URLs
Multi-tenant - User parameter required, no defaults

MCP Server Files

TEMP: /opt/mcp-servers/temp/mcp_temp_server.py
Corpus: /opt/mcp-servers/corpus/mcp_corpus_server.py
Nexus Docs: /opt/mcp-servers/docs/mcp_docs_server.py

External Access

Public URL: https://docs.corlera.com
Reverse Proxy: DigitalOcean (143.198.133.215) → Tailscale → 100.73.67.78:6653

Memory (future) - Reserved for true RAG/temporal/graph cognitive system
Document-v2 - LLMSherpa hierarchical extraction (ports 6750-6752)

Document Workflow Architecture

Overview

Environment Details

Pipeline Cluster (6650-6653)

Workflow: Research → Document → PDF

Key Tools

TEMP Environment

Corpus Environment

Nexus Docs Environment

docs.create_pdf Parameters

Example Workflow

Architecture Benefits

MCP Server Files

External Access

Related