NEXUS SERVER REDUNDANCY RESEARCH REPORT
Agent: Ray | Date: January 13, 2026
EXECUTIVE SUMMARY
Recommend a layered approach for Nexus failover: 1. Redis Sentinel for automatic database failover (built into Redis) 2. Lsyncd for MCP server code replication (simple, proven) 3. Manual failover initially (switch Tailscale DNS/IP)
CURRENT ARCHITECTURE (from KB)
- Primary: cortex-nexus-master (2TB) - 100.73.67.78
- Backup: cortex-storage (1TB) - 100.112.54.111
- Redis: Vault/Operational dual-pod pattern already in place
- Docker containers with volumes at /data/nexus3/
TOOL COMPARISON MATRIX
| Tool | Sync Type | Level | Real-time | Bi-Directional | Complexity | Best For |
|---|---|---|---|---|---|---|
| rsync | File | Periodic | No | No | Low | Simple backups |
| Lsyncd | File | Near real-time | Yes (inotify) | No | Low-Medium | MCP code sync |
| DRBD | Block | Real-time | Yes | Yes (dual-primary) | High | Databases, requires dedicated partition |
| GlusterFS | File | Real-time | Yes | Yes | High | Large-scale distributed storage (needs 3+ nodes) |
| Syncthing | File | Real-time | Yes | Yes | Low | Peer-to-peer sync, easy setup |
| Redis Sentinel | Redis | Real-time | Yes | N/A (master-slave) | Medium | Redis automatic failover |
LAYER 1: REDIS REPLICATION (Built-in Solution)
Current State
Already using vault/operational within same server. Can extend cross-server.
Cross-Server Redis Sentinel Setup
- Deploy 3 Sentinel instances (minimum) across both servers
- Configure vault containers on backup as replicas of primary vaults
- Sentinel monitors masters and auto-promotes replica on failure
Configuration:
sentinel monitor nexus-kb-master 100.73.67.78 6625 2
sentinel down-after-milliseconds nexus-kb-master 5000
sentinel failover-timeout nexus-kb-master 60000
Pros: - Native Redis feature, no new software - Automatic failover (30-60 seconds) - Already familiar with vault/operational pattern
Cons: - Asynchronous replication (some data loss possible) - Need minimum 3 Sentinel instances for quorum - MCP servers need Sentinel-aware connection logic
LAYER 2: MCP SERVER CODE REPLICATION
Option A: Lsyncd (RECOMMENDED)
Uses inotify + rsync for near real-time sync.
Setup:
-- /etc/lsyncd.conf on primary
sync {
default.rsync,
source = "/opt/mcp-servers/",
target = "100.112.54.111:/opt/mcp-servers/",
rsync = {
archive = true,
compress = true,
rsh = "/usr/bin/ssh -l nexus"
}
}
Pros: - 15-second batching (configurable) - Simple setup, low overhead - Uses existing SSH/rsync - One-way (prevents split-brain)
Cons: - Not truly real-time (15s delay default) - One-way only
Option B: Syncthing
Peer-to-peer real-time sync with web UI.
Pros: - Bidirectional by default - Easy Docker deployment - Encrypted, no manual SSH setup - Web UI for monitoring
Cons: - More overhead than Lsyncd - Conflict handling adds complexity
LAYER 3: DOCKER CONTAINER STRATEGY
NOT Recommended: Live Migration
Docker doesn't natively support live container migration. DRBD/shared storage would be needed.
Recommended: Standby Containers
- Keep identical docker-compose files on backup server
- Sync /data/nexus3/ volumes via Lsyncd
- On failover: start containers on backup
Commands for failover:
# On backup server
cd /opt/mcp-servers
docker-compose up -d
LAYER 4: FAILOVER STRATEGY
Manual Failover (Phase 1)
- Detect failure (monitoring/alert)
- SSH to backup: start Docker containers
- Update Tailscale DNS or use floating IP
- MCP clients reconnect
Estimated failover time: 2-5 minutes with manual intervention
Semi-Automatic (Phase 2)
- Keepalived for floating IP between servers
- Health check scripts monitor primary
- Auto-start containers on failure detection
Full Automatic (Phase 3)
- Redis Sentinel handles Redis failover
- Consul/Nomad for service discovery
- Automatic DNS updates
RECOMMENDED APPROACH FOR NEXUS
Phase 1: Foundation (Simplest Path)
- Install Lsyncd on primary
- Sync /opt/mcp-servers/ → backup
-
Sync /data/nexus3/ volumes → backup (excluding container runtime data)
-
Setup Redis replicas on backup
- Each vault on backup = replica of primary vault
-
Same port scheme, just replica mode
-
Document manual failover procedure
Phase 2: Automation
- Deploy Redis Sentinel (3 instances across both servers)
- Add Keepalived for floating Tailscale IP
- Create failover scripts
GOTCHAS & LIMITATIONS
- Tailscale IPs are fixed - Can't easily move IPs between machines. Options:
- Use Tailscale Magic DNS names
- Run HAProxy/Nginx on a third node
-
Use Tailscale Funnel for external access
-
Redis async replication - Some writes may be lost (milliseconds worth)
-
Storage size mismatch - Backup has 1TB vs Primary 2TB. Prioritize critical data.
-
Split-brain risk - If both servers think they're master. Sentinel quorum helps.
-
MCP port conflicts - Same ports on both servers means can't run both simultaneously (not an issue for standby model)
ESTIMATED COMPLEXITY
| Component | Effort | Risk |
|---|---|---|
| Lsyncd setup | Low | Low |
| Redis cross-server replication | Medium | Low |
| Redis Sentinel | Medium | Medium |
| Manual failover docs | Low | Low |
| Automatic failover | High | Medium |
SOURCES
- Redis Sentinel: https://redis.io/docs/latest/operate/oss_and_stack/management/sentinel/
- Lsyncd: https://lsyncd.github.io/lsyncd/
- DRBD vs rsync: https://iamvhl.medium.com/drbd-vs-rsync-92e8c2c53f9d
- Syncthing: https://syncthing.net/
- Docker HA: https://www.evidian.com/products/high-availability-software-for-application-clustering/