page

Nexus Training Loop

The Nexus Loop Concept

Chris's proposed iterative training approach - train, test, correct, repeat until near-perfect accuracy.

How Standard Fine-Tuning Works

Dataset → Train (one pass) → Done → Hope it worked
  • Load batches of examples (4-8 at a time)
  • Forward pass: model generates prediction
  • Calculate loss: how wrong was it?
  • Backward pass: adjust weights (tiny nudges to billions of parameters)
  • Repeat for all batches, multiple epochs
  • No testing during training - just math

The Nexus Loop (Proposed)

Dataset → Train → Test → Score → Correct → Retrain → Loop until 98%+

Step-by-Step

  1. Train: Standard fine-tuning on 3D dataset
  2. Test: Run inference on 100+ evaluation questions
  3. Identity questions (Who is LARS?)
  4. Nexus knowledge (What is workflow.init?)
  5. Reasoning tasks (multi-step problems)
  6. Score: Claude judges each answer
  7. Correct
  8. Incorrect
  9. Hallucination (made up facts)
  10. Correct: Generate training examples that fix failures
  11. If LARS said 'Chris has an MBA' → create explicit correction example
  12. If LARS forgot a Nexus concept → reinforce with examples
  13. Retrain: Original data + correction examples
  14. Loop: Repeat until accuracy threshold met (98-100%)

What This Is Called in Research

  • RLHF (Reinforcement Learning from Human Feedback) - but we use AI feedback
  • Iterative Refinement Training
  • Online Learning with Evaluation
  • Constitutional AI (Anthropic's approach uses similar loops)

Key Differences from Hobbyist Training

Hobbyist Approach Nexus Loop
Train once, hope it works Train, verify, correct, repeat
No evaluation during process Continuous evaluation
Accept whatever accuracy Target specific threshold
Manual testing after Automated testing in loop

Implementation Requirements

  1. Test Suite: 100+ evaluation questions with ground truth
  2. AI Judge: Claude scoring LARS responses
  3. Correction Generator: Create training examples from failures
  4. Loop Orchestrator: Manage train/test/correct cycle
  5. Threshold Logic: Stop when accuracy hits target

Open Questions

  • How long per training cycle? (affects total loop time)
  • Does corrective training destabilize earlier learning?
  • How many corrections per loop before diminishing returns?
  • Should we use LoRA for corrections or full fine-tune?

Honest Assessment

This is how frontier labs iterate on models - the difference is they have massive compute and human labelers. We have 72GB VRAM and Claude as judge. It's scrappy, but conceptually sound. Worth experimenting.

ID: 96046629
Path: Accelerated AI Training > Proposed Architecture > Nexus Training Loop
Updated: 2026-01-01T19:34:47