Local LLM Server Setup - Tomorrow's Plan
Session: s_801t Date: December 9, 2025 Track Project: 0187a93c
Prerequisites (Christopher)
- Flash Ubuntu Server 24.04 to USB drive
- Download: https://ubuntu.com/download/server
- Use Rufus or balenaEtcher
- Move GPUs from desktop to new machine
- GTX 1070 (8GB) - Primary
- GTX 1060 6GB - Secondary
- Connect desktop monitors to motherboard (Intel UHD 630)
- Boot new machine from USB, install Ubuntu
During Ubuntu Install
- Hostname:
llm-serverorlocal-ai - Username:
nexus(or your preference) - Enable OpenSSH Server
- Use entire disk (or configure NVMe RAID)
Once SSH is Available (Claude takes over)
Step 1: System Updates
sudo apt update && sudo apt upgrade -y
Step 2: NVIDIA Drivers
sudo apt install nvidia-driver-550 -y
sudo reboot
Step 3: Verify GPUs
nvidia-smi
# Should show both GTX 1070 and GTX 1060
Step 4: CUDA Toolkit
sudo apt install nvidia-cuda-toolkit -y
Step 5: Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
Step 6: Ollama
curl -fsSL https://ollama.com/install.sh | sh
Step 7: Pull Test Model
ollama pull qwen2.5:1.8b
Step 8: Test Locally
ollama run qwen2.5:1.8b
# Chat, check responsiveness
Step 9: Configure Network Access
sudo systemctl edit ollama
# Add: Environment="OLLAMA_HOST=0.0.0.0"
sudo systemctl restart ollama
Step 10: Test from Cortex
curl http://<tailscale-ip>:11434/api/generate \
-d '{"model":"qwen2.5:1.8b","prompt":"Hello"}'
Optional: Open WebUI
sudo apt install docker.io -y
sudo docker run -d -p 3000:8080 \
--gpus all \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
LM Studio Setup (Windows Desktop)
- Download LM Studio from lmstudio.ai
- Settings → Remote Server
- Enter Tailscale IP: http://
:11434 - Browse and test models
Success Metrics
- [ ] Both GPUs detected in nvidia-smi
- [ ] Ollama responding on port 11434
- [ ] Qwen 1.8B running at 40+ tokens/sec
- [ ] LM Studio connected from Windows
- [ ] Accessible via Tailscale from anywhere