Background Training Operations

The Problem

Training runs can take hours. We can't sit and watch. We need to: - Start training and work on other things - Check progress periodically - Run overnight training - Review results next morning

Solution: Background Training

Start training in background:

cd ~/corlera-training
nohup python3 scripts/train_3d.py \
    --model /data/models/huggingface/MODEL_NAME \
    --dataset datasets/DATASET.json \
    --output outputs/OUTPUT_NAME \
    --epochs 10 \
    --lr 3e-4 \
    > logs/training_$(date +%Y%m%d_%H%M%S).log 2>&1 &

Check progress anytime:

# Watch live
tail -f logs/training_*.log

# Check latest loss
grep "loss" logs/training_*.log | tail -20

# Check if still running
ps aux | grep train_3d

Review when done:

# Final results
cat logs/training_*.log | grep -E "loss|train_runtime|Completed"

# Check output
ls -la outputs/OUTPUT_NAME/

Training Time Estimates

Examples	Epochs	7B Model	14B Model	30B Model
20	10	~5 min	~15 min	~45 min
50	10	~12 min	~35 min	~2 hr
100	10	~25 min	~1 hr	~4 hr
500	10	~2 hr	~5 hr	Overnight

Estimates based on current hardware (2x 3090)

Overnight Training Workflow

Prepare dataset during day
Start training before leaving
Check logs next morning
Review results, plan next iteration

This allows continuous progress without waiting.