section

Background Training Operations

Background Training Operations

The Problem

Training runs can take hours. We can't sit and watch. We need to: - Start training and work on other things - Check progress periodically - Run overnight training - Review results next morning

Solution: Background Training

Start training in background:

cd ~/corlera-training
nohup python3 scripts/train_3d.py \
    --model /data/models/huggingface/MODEL_NAME \
    --dataset datasets/DATASET.json \
    --output outputs/OUTPUT_NAME \
    --epochs 10 \
    --lr 3e-4 \
    > logs/training_$(date +%Y%m%d_%H%M%S).log 2>&1 &

Check progress anytime:

# Watch live
tail -f logs/training_*.log

# Check latest loss
grep "loss" logs/training_*.log | tail -20

# Check if still running
ps aux | grep train_3d

Review when done:

# Final results
cat logs/training_*.log | grep -E "loss|train_runtime|Completed"

# Check output
ls -la outputs/OUTPUT_NAME/

Training Time Estimates

Examples Epochs 7B Model 14B Model 30B Model
20 10 ~5 min ~15 min ~45 min
50 10 ~12 min ~35 min ~2 hr
100 10 ~25 min ~1 hr ~4 hr
500 10 ~2 hr ~5 hr Overnight

Estimates based on current hardware (2x 3090)

Overnight Training Workflow

  1. Prepare dataset during day
  2. Start training before leaving
  3. Check logs next morning
  4. Review results, plan next iteration

This allows continuous progress without waiting.

ID: bf4aeac2
Path: Corlera AI Training Lab > Vision & Architecture > Background Training Operations
Updated: 2025-12-29T15:54:02