root

Document Workflow Architecture

Document Workflow Architecture

Overview

The Nexus document workflow uses three environments working together:

  1. TEMP - Staging for fast content preparation
  2. Corpus - Searchable text storage (Quadfecta indexed)
  3. Nexus Docs - File storage and PDF generation

Environment Details

Environment Port Prefix Stable ID Purpose
TEMP 6680/6681 temp: tmp_XXXX Staging, 24h TTL
Corpus 6650/6651 corp: c_XXXX Searchable text, Quadfecta
Nexus Docs 6653 (CopyParty) (paths) (files) Files, PDFs, sharing

Pipeline Cluster (6650-6653)

The Pipeline cluster handles all document ingestion and processing:

Port Service Type Description
6650 Corpus Redis Vault Primary data store for ingested documents
6651 Corpus Redis Operational Read replica for corpus queries
6652 PDF-Converter MCP Stateless PDF conversion service (no Redis)
6653 Nexus Docs (CopyParty) Web File hosting CDN

Workflow: Research → Document → PDF

┌──────────────────────────────────────────────────────────────┐
│  1. RESEARCH PHASE                                            │
│     AI gathers info from: web, corpus, kb, contacts           │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│  2. STAGE PHASE                                               │
│     temp.stage(content, source_type="generated")             │
│     → Returns tmp_XXXX                                        │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│  3. COMMIT PHASE                                              │
│     A) docs.create_pdf(temp_id=tmp_XXXX) → PDF + public URL  │
│     B) corpus.create(content, cdn_urls=[url]) → c_XXXX       │
│     C) TEMP auto-cleaned after PDF creation                   │
└──────────────────────────────────────────────────────────────┘

Key Tools

TEMP Environment

  • temp.stage - Stage content with metadata and TTL
  • temp.get - Retrieve staged content
  • temp.list - List staged items
  • temp.cleanup - Manual cleanup

Corpus Environment

  • corpus.create - Store searchable text with Quadfecta indexing
  • corpus.get - Retrieve by c_XXXX ID
  • corpus.search - Quadfecta search (keyword+vector+graph+temporal)
  • corpus.ingest - Ingest existing PDFs (extract → store)
  • corpus.convert - PDF to markdown (Docling)
  • corpus.extract - Quick text extraction (PyMuPDF)

Nexus Docs Environment

  • docs.create_pdf - Generate PDF from markdown or TEMP staging
  • docs.upload - Upload any file
  • docs.list - List files in directory
  • docs.get_url - Get public URL

docs.create_pdf Parameters

Parameter Type Description
user string Required - User folder
content string Markdown content (or use temp_id/file_path)
temp_id string TEMP staging ID (tmp_XXXX)
file_path string Local markdown file path
title string PDF title (auto-detected if not set)
destination string Path in user folder (default: 'documents')
template string 'professional', 'minimal', 'report', 'default'
logo_path string Path to header logo image
header_text string Text next to logo
footer_text string Footer text (default: 'Confidential')
show_page_numbers boolean Show page numbers (default: true)
cleanup_temp boolean Auto-cleanup TEMP after PDF (default: true)

Example Workflow

# 1. Stage content
temp.stage(
    content="# Sales Proposal\n\n...",
    source_type="generated",
    metadata={"title": "Sales Proposal", "client": "Acme Corp"}
)
# Returns: tmp_a1b2

# 2. Create PDF from staging
docs.create_pdf(
    user="chris",
    temp_id="tmp_a1b2",
    destination="sales/clients/acme/",
    template="professional",
    logo_path="/data/cdn/users/chris/images/logo.png"
)
# Returns: URL https://docs.corlera.com/home/chris/sales/clients/acme/sales-proposal.pdf

# 3. Store in Corpus for searchability (optional)
corpus.create(
    title="Sales Proposal - Acme Corp",
    content="# Sales Proposal\n\n...",
    cdn_urls=["https://docs.corlera.com/home/chris/sales/clients/acme/sales-proposal.pdf"],
    tags=["sales", "proposal", "acme"]
)
# Returns: c_x1y2

Architecture Benefits

  1. Speed - TEMP staging avoids streaming large content through chat
  2. Separation - Corpus = AI knowledge, Docs = Human files
  3. Searchability - Quadfecta indexing on Corpus entries
  4. Traceability - Corpus entries link to PDF URLs
  5. Multi-tenant - User parameter required, no defaults

MCP Server Files

  • TEMP: /opt/mcp-servers/temp/mcp_temp_server.py
  • Corpus: /opt/mcp-servers/corpus/mcp_corpus_server.py
  • Nexus Docs: /opt/mcp-servers/docs/mcp_docs_server.py

External Access

  • Public URL: https://docs.corlera.com
  • Reverse Proxy: DigitalOcean (143.198.133.215) → Tailscale → 100.73.67.78:6653
  • Memory (future) - Reserved for true RAG/temporal/graph cognitive system
  • Document-v2 - LLMSherpa hierarchical extraction (ports 6750-6752)
ID: fe0a50bc
Path: Document Workflow Architecture
Updated: 2026-01-13T12:51:50