The Nexus Loop Concept
Chris's proposed iterative training approach - train, test, correct, repeat until near-perfect accuracy.
How Standard Fine-Tuning Works
Dataset → Train (one pass) → Done → Hope it worked
- Load batches of examples (4-8 at a time)
- Forward pass: model generates prediction
- Calculate loss: how wrong was it?
- Backward pass: adjust weights (tiny nudges to billions of parameters)
- Repeat for all batches, multiple epochs
- No testing during training - just math
The Nexus Loop (Proposed)
Dataset → Train → Test → Score → Correct → Retrain → Loop until 98%+
Step-by-Step
- Train: Standard fine-tuning on 3D dataset
- Test: Run inference on 100+ evaluation questions
- Identity questions (Who is LARS?)
- Nexus knowledge (What is workflow.init?)
- Reasoning tasks (multi-step problems)
- Score: Claude judges each answer
- Correct
- Incorrect
- Hallucination (made up facts)
- Correct: Generate training examples that fix failures
- If LARS said 'Chris has an MBA' → create explicit correction example
- If LARS forgot a Nexus concept → reinforce with examples
- Retrain: Original data + correction examples
- Loop: Repeat until accuracy threshold met (98-100%)
What This Is Called in Research
- RLHF (Reinforcement Learning from Human Feedback) - but we use AI feedback
- Iterative Refinement Training
- Online Learning with Evaluation
- Constitutional AI (Anthropic's approach uses similar loops)
Key Differences from Hobbyist Training
| Hobbyist Approach | Nexus Loop |
|---|---|
| Train once, hope it works | Train, verify, correct, repeat |
| No evaluation during process | Continuous evaluation |
| Accept whatever accuracy | Target specific threshold |
| Manual testing after | Automated testing in loop |
Implementation Requirements
- Test Suite: 100+ evaluation questions with ground truth
- AI Judge: Claude scoring LARS responses
- Correction Generator: Create training examples from failures
- Loop Orchestrator: Manage train/test/correct cycle
- Threshold Logic: Stop when accuracy hits target
Open Questions
- How long per training cycle? (affects total loop time)
- Does corrective training destabilize earlier learning?
- How many corrections per loop before diminishing returns?
- Should we use LoRA for corrections or full fine-tune?
Honest Assessment
This is how frontier labs iterate on models - the difference is they have massive compute and human labelers. We have 72GB VRAM and Claude as judge. It's scrappy, but conceptually sound. Worth experimenting.