Local LLM Server Setup - Tomorrow's Plan

Session: s_801t Date: December 9, 2025 Track Project: 0187a93c

Prerequisites (Christopher)

Flash Ubuntu Server 24.04 to USB drive
Download: https://ubuntu.com/download/server
Use Rufus or balenaEtcher
Move GPUs from desktop to new machine
GTX 1070 (8GB) - Primary
GTX 1060 6GB - Secondary
Connect desktop monitors to motherboard (Intel UHD 630)
Boot new machine from USB, install Ubuntu

During Ubuntu Install

Hostname: llm-server or local-ai
Username: nexus (or your preference)
Enable OpenSSH Server
Use entire disk (or configure NVMe RAID)

Once SSH is Available (Claude takes over)

Step 1: System Updates

sudo apt update && sudo apt upgrade -y

Step 2: NVIDIA Drivers

sudo apt install nvidia-driver-550 -y
sudo reboot

Step 3: Verify GPUs

nvidia-smi
# Should show both GTX 1070 and GTX 1060

Step 4: CUDA Toolkit

sudo apt install nvidia-cuda-toolkit -y

Step 5: Tailscale

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

Step 6: Ollama

curl -fsSL https://ollama.com/install.sh | sh

Step 7: Pull Test Model

ollama pull qwen2.5:1.8b

Step 8: Test Locally

ollama run qwen2.5:1.8b
# Chat, check responsiveness

Step 9: Configure Network Access

sudo systemctl edit ollama
# Add: Environment="OLLAMA_HOST=0.0.0.0"
sudo systemctl restart ollama

Step 10: Test from Cortex

curl http://<tailscale-ip>:11434/api/generate \
  -d '{"model":"qwen2.5:1.8b","prompt":"Hello"}'

Optional: Open WebUI

sudo apt install docker.io -y
sudo docker run -d -p 3000:8080 \
  --gpus all \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

LM Studio Setup (Windows Desktop)

Download LM Studio from lmstudio.ai
Settings → Remote Server
Enter Tailscale IP: http://:11434
Browse and test models

Success Metrics

[ ] Both GPUs detected in nvidia-smi
[ ] Ollama responding on port 11434
[ ] Qwen 1.8B running at 40+ tokens/sec
[ ] LM Studio connected from Windows
[ ] Accessible via Tailscale from anywhere