Background Training Operations
The Problem
Training runs can take hours. We can't sit and watch. We need to: - Start training and work on other things - Check progress periodically - Run overnight training - Review results next morning
Solution: Background Training
Start training in background:
cd ~/corlera-training
nohup python3 scripts/train_3d.py \
--model /data/models/huggingface/MODEL_NAME \
--dataset datasets/DATASET.json \
--output outputs/OUTPUT_NAME \
--epochs 10 \
--lr 3e-4 \
> logs/training_$(date +%Y%m%d_%H%M%S).log 2>&1 &
Check progress anytime:
# Watch live
tail -f logs/training_*.log
# Check latest loss
grep "loss" logs/training_*.log | tail -20
# Check if still running
ps aux | grep train_3d
Review when done:
# Final results
cat logs/training_*.log | grep -E "loss|train_runtime|Completed"
# Check output
ls -la outputs/OUTPUT_NAME/
Training Time Estimates
| Examples | Epochs | 7B Model | 14B Model | 30B Model |
|---|---|---|---|---|
| 20 | 10 | ~5 min | ~15 min | ~45 min |
| 50 | 10 | ~12 min | ~35 min | ~2 hr |
| 100 | 10 | ~25 min | ~1 hr | ~4 hr |
| 500 | 10 | ~2 hr | ~5 hr | Overnight |
Estimates based on current hardware (2x 3090)
Overnight Training Workflow
- Prepare dataset during day
- Start training before leaving
- Check logs next morning
- Review results, plan next iteration
This allows continuous progress without waiting.