Nexus AI Engine

The local AI subsystem for Nexus 3.0. A multi-core architecture where each component operates independently, designed to create a self-improving AI that becomes the definitive Nexus expert.

Company Connection: "Core" terminology aligns with Corlera - at the core of AI development.

Philosophy: Knowledge + Validation = Reliability

The AI has two layers of capability: 1. Fine-tuned Knowledge - Everything about Nexus baked into the model 2. Validation Tools - Ability to verify knowledge against actual system state

Even when LARS "knows" the answer from training, it should validate by checking actual files. This prevents hallucinations and ensures reliability.

Core Components

Runtime Core

Ollama - Model inference engine (unmodified, auto-updates)
Nexus Wrapper - Adds MCP integration, auth, custom routing
Purpose: Run the local AI model and handle requests
Status: Independent - can run even if other cores are down

Training Core

Unsloth - LoRA/QLoRA fine-tuning (runs on 3090 GPUs)
Training Queue - Batched instruction-response pairs (JSON-L)
Scheduler - Runs training jobs (e.g., 2 AM nightly)
Hot-swap - New LoRA adapters loaded without restart
Purpose: Continuously improve the model from Nexus activity

Tools Core

Local tools that run on the AI server - no Nexus dependency:

Calculator - Python eval with safety (scientific calculations, no hallucinated math)
Code Executor - Sandboxed Python/bash execution
File System - Full read/write access to Nexus codebase
Terminal - Sudo access, system commands
Web Search - Independent web searching capability
MCP Bridge - Call any Nexus MCP server when needed

Critical Requirement: LARS needs sudo access, terminal commands, file read/write. Without this, validation is impossible.

Monitor Core

Change Watcher - Monitors KB, Track, Docs, code files for updates
Training Generator - Creates instruction-response pairs from changes
Chrono Integration - Time-aware (knows current date/time, timestamps)
Self-Training Loop - AI learns from Nexus changes automatically
Purpose: The AI trains itself continuously as you work

What Gets Fine-Tuned Into LARS

Must Be Baked In (Hard Knowledge)

All Nexus Code - Every MCP server, every Python file (~60,000 files)
Architecture Patterns - How MCP servers are structured, naming conventions
Port Mappings - Which service runs on which port
Environment Structure - Context, Track, KB, Docs, User, etc.
Python Expertise - Deep Python knowledge for code generation
Thinking Patterns - Step-by-step reasoning before action
Tool Usage Patterns - When to use which tool
Validation Habits - Always verify, never assume

Accessed Via Tools (Dynamic Knowledge)

Current file contents - Read actual files to validate
Recent changes - What was modified today/this week
Web search results - Current information from internet
Calculations - Guaranteed accurate math
System state - Running services, disk space, etc.

Validation Pattern

When LARS responds, it should:

1. THINK: "I believe the answer is X based on my training"
2. VALIDATE: "Let me check the actual file/system to confirm"
3. COMPARE: "My knowledge matches/differs from current state"
4. RESPOND: "Here is the verified answer with source"

This pattern prevents the hallucination problem seen in the math example where LARS rewrote the question instead of answering it.

Target Hardware

Current: 48GB VRAM (2x RTX 3090)
Planned: 72GB VRAM (3x RTX 3090)
Base Model: Qwen 2.5 Coder or Qwen 3 (when available)
Training: Unsloth with LoRA adapters

Architecture Diagram

+------------------------------------------------------------------+
|                      NEXUS AI ENGINE                             |
|                   (AI Server - Cortex)                           |
+-----------------+-----------------+-----------------+------------+
|  RUNTIME CORE   |  TRAINING CORE  |   TOOLS CORE    | MONITOR    |
|                 |                 |                 |  CORE      |
|  +-----------+  |  +-----------+  |  +-----------+  |            |
|  |  Ollama   |  |  |  Unsloth  |  |  |Calculator |  | Change     |
|  |(unmodified|  |  |  LoRA/    |  |  |  Web      |  | Watcher    |
|  | runtime)  |  |  |  QLoRA    |  |  |  Search   |  |            |
|  +-----+-----+  |  +-----------+  |  +-----------+  | Training   |
|        |        |                 |                 | Generator  |
|  +-----+-----+  |  +-----------+  |  +-----------+  |            |
|  |  Nexus    |  |  | Training  |  |  | Terminal  |  | Chrono     |
|  |  Wrapper  |  |  |  Queue    |  |  |  (sudo)   |  | Aware      |
|  | (MCP/Auth)|  |  |  (JSON-L) |  |  | File R/W  |  |            |
|  +-----------+  |  +-----------+  |  +-----------+  |            |
|                 |                 |                 |            |
|                 |  +-----------+  |  +-----------+  |            |
|                 |  | Scheduler |  |  |MCP Bridge |  |            |
|                 |  | (2 AM job)|  |  |(Nexus)    |  |            |
|                 |  +-----------+  |  +-----------+  |            |
+-----------------+-----------------+-----------------+------------+

Relationship to Main Nexus

The Nexus AI Engine runs on the AI server (Cortex) and communicates with the main Nexus system via MCP. LARS (the VS Code extension) is a client that talks to the AI Engine.

The Vision

LARS becomes the Nexus expert that knows more about Nexus than any external AI (Claude, GPT, etc.) because: 1. It has all Nexus code fine-tuned into its weights 2. It continuously learns from changes as you work 3. It validates its knowledge against the actual system 4. It has direct file system and terminal access

When Claude needs to know about Nexus, it asks LARS. LARS is the expert.

Track Project: LARS Training System (0dd041be)
Ollama Repo: /opt/repos/ollama/
Training Tool: Unsloth