Nexus Training Loop - Nexus Knowledge Base

The Nexus Loop Concept

Chris's proposed iterative training approach - train, test, correct, repeat until near-perfect accuracy.

How Standard Fine-Tuning Works

Dataset → Train (one pass) → Done → Hope it worked

Load batches of examples (4-8 at a time)
Forward pass: model generates prediction
Calculate loss: how wrong was it?
Backward pass: adjust weights (tiny nudges to billions of parameters)
Repeat for all batches, multiple epochs
No testing during training - just math

The Nexus Loop (Proposed)

Dataset → Train → Test → Score → Correct → Retrain → Loop until 98%+

Step-by-Step

Train: Standard fine-tuning on 3D dataset
Test: Run inference on 100+ evaluation questions
Identity questions (Who is LARS?)
Nexus knowledge (What is workflow.init?)
Reasoning tasks (multi-step problems)
Score: Claude judges each answer
Correct
Incorrect
Hallucination (made up facts)
Correct: Generate training examples that fix failures
If LARS said 'Chris has an MBA' → create explicit correction example
If LARS forgot a Nexus concept → reinforce with examples
Retrain: Original data + correction examples
Loop: Repeat until accuracy threshold met (98-100%)

What This Is Called in Research

RLHF (Reinforcement Learning from Human Feedback) - but we use AI feedback
Iterative Refinement Training
Online Learning with Evaluation
Constitutional AI (Anthropic's approach uses similar loops)

Key Differences from Hobbyist Training

Hobbyist Approach	Nexus Loop
Train once, hope it works	Train, verify, correct, repeat
No evaluation during process	Continuous evaluation
Accept whatever accuracy	Target specific threshold
Manual testing after	Automated testing in loop

Implementation Requirements

Test Suite: 100+ evaluation questions with ground truth
AI Judge: Claude scoring LARS responses
Correction Generator: Create training examples from failures
Loop Orchestrator: Manage train/test/correct cycle
Threshold Logic: Stop when accuracy hits target

Open Questions

How long per training cycle? (affects total loop time)
Does corrective training destabilize earlier learning?
How many corrections per loop before diminishing returns?
Should we use LoRA for corrections or full fine-tune?

Honest Assessment

This is how frontier labs iterate on models - the difference is they have massive compute and human labelers. We have 72GB VRAM and Claude as judge. It's scrappy, but conceptually sound. Worth experimenting.