NEXUS SERVER REDUNDANCY RESEARCH REPORT

Agent: Ray | Date: January 13, 2026

EXECUTIVE SUMMARY

Recommend a layered approach for Nexus failover: 1. Redis Sentinel for automatic database failover (built into Redis) 2. Lsyncd for MCP server code replication (simple, proven) 3. Manual failover initially (switch Tailscale DNS/IP)

CURRENT ARCHITECTURE (from KB)

Primary: cortex-nexus-master (2TB) - 100.73.67.78
Backup: cortex-storage (1TB) - 100.112.54.111
Redis: Vault/Operational dual-pod pattern already in place
Docker containers with volumes at /data/nexus3/

TOOL COMPARISON MATRIX

Tool	Sync Type	Level	Real-time	Bi-Directional	Complexity	Best For
rsync	File	Periodic	No	No	Low	Simple backups
Lsyncd	File	Near real-time	Yes (inotify)	No	Low-Medium	MCP code sync
DRBD	Block	Real-time	Yes	Yes (dual-primary)	High	Databases, requires dedicated partition
GlusterFS	File	Real-time	Yes	Yes	High	Large-scale distributed storage (needs 3+ nodes)
Syncthing	File	Real-time	Yes	Yes	Low	Peer-to-peer sync, easy setup
Redis Sentinel	Redis	Real-time	Yes	N/A (master-slave)	Medium	Redis automatic failover

LAYER 1: REDIS REPLICATION (Built-in Solution)

Current State

Already using vault/operational within same server. Can extend cross-server.

Cross-Server Redis Sentinel Setup

Deploy 3 Sentinel instances (minimum) across both servers
Configure vault containers on backup as replicas of primary vaults
Sentinel monitors masters and auto-promotes replica on failure

Configuration:

sentinel monitor nexus-kb-master 100.73.67.78 6625 2
sentinel down-after-milliseconds nexus-kb-master 5000
sentinel failover-timeout nexus-kb-master 60000

Pros: - Native Redis feature, no new software - Automatic failover (30-60 seconds) - Already familiar with vault/operational pattern

Cons: - Asynchronous replication (some data loss possible) - Need minimum 3 Sentinel instances for quorum - MCP servers need Sentinel-aware connection logic

LAYER 2: MCP SERVER CODE REPLICATION

Option A: Lsyncd (RECOMMENDED)

Uses inotify + rsync for near real-time sync.

Setup:

-- /etc/lsyncd.conf on primary
sync {
    default.rsync,
    source = "/opt/mcp-servers/",
    target = "100.112.54.111:/opt/mcp-servers/",
    rsync = {
        archive = true,
        compress = true,
        rsh = "/usr/bin/ssh -l nexus"
    }
}

Pros: - 15-second batching (configurable) - Simple setup, low overhead - Uses existing SSH/rsync - One-way (prevents split-brain)

Cons: - Not truly real-time (15s delay default) - One-way only

Option B: Syncthing

Peer-to-peer real-time sync with web UI.

Pros: - Bidirectional by default - Easy Docker deployment - Encrypted, no manual SSH setup - Web UI for monitoring

Cons: - More overhead than Lsyncd - Conflict handling adds complexity

LAYER 3: DOCKER CONTAINER STRATEGY

NOT Recommended: Live Migration

Docker doesn't natively support live container migration. DRBD/shared storage would be needed.

Recommended: Standby Containers

Keep identical docker-compose files on backup server
Sync /data/nexus3/ volumes via Lsyncd
On failover: start containers on backup

Commands for failover:

# On backup server
cd /opt/mcp-servers
docker-compose up -d

LAYER 4: FAILOVER STRATEGY

Manual Failover (Phase 1)

Detect failure (monitoring/alert)
SSH to backup: start Docker containers
Update Tailscale DNS or use floating IP
MCP clients reconnect

Estimated failover time: 2-5 minutes with manual intervention

Semi-Automatic (Phase 2)

Keepalived for floating IP between servers
Health check scripts monitor primary
Auto-start containers on failure detection

Full Automatic (Phase 3)

Redis Sentinel handles Redis failover
Consul/Nomad for service discovery
Automatic DNS updates

RECOMMENDED APPROACH FOR NEXUS

Phase 1: Foundation (Simplest Path)

Install Lsyncd on primary
Sync /opt/mcp-servers/ → backup
Sync /data/nexus3/ volumes → backup (excluding container runtime data)
Setup Redis replicas on backup
Each vault on backup = replica of primary vault
Same port scheme, just replica mode
Document manual failover procedure

Phase 2: Automation

Deploy Redis Sentinel (3 instances across both servers)
Add Keepalived for floating Tailscale IP
Create failover scripts

GOTCHAS & LIMITATIONS

Tailscale IPs are fixed - Can't easily move IPs between machines. Options:
Use Tailscale Magic DNS names
Run HAProxy/Nginx on a third node
Use Tailscale Funnel for external access
Redis async replication - Some writes may be lost (milliseconds worth)
Storage size mismatch - Backup has 1TB vs Primary 2TB. Prioritize critical data.
Split-brain risk - If both servers think they're master. Sentinel quorum helps.
MCP port conflicts - Same ports on both servers means can't run both simultaneously (not an issue for standby model)

ESTIMATED COMPLEXITY

Component	Effort	Risk
Lsyncd setup	Low	Low
Redis cross-server replication	Medium	Low
Redis Sentinel	Medium	Medium
Manual failover docs	Low	Low
Automatic failover	High	Medium

SOURCES

Redis Sentinel: https://redis.io/docs/latest/operate/oss_and_stack/management/sentinel/
Lsyncd: https://lsyncd.github.io/lsyncd/
DRBD vs rsync: https://iamvhl.medium.com/drbd-vs-rsync-92e8c2c53f9d
Syncthing: https://syncthing.net/
Docker HA: https://www.evidian.com/products/high-availability-software-for-application-clustering/