LARS - Local AI Server
Overview
LARS (Local AI Resource Server) is a dedicated local AI inference server built on repurposed hardware with dual NVIDIA RTX 3090 GPUs. It serves as a local AI assistant that can work alongside Nexus AI (Claude) for parallel processing, code assistance, and task delegation.
Hardware Specifications
System
- CPU: Intel Core i7-8700K @ 3.70GHz (6 cores, 12 threads)
- RAM: 32GB DDR4
- Hostname: local-ai
- OS: Ubuntu 22.04 LTS Server
GPUs
| Slot | GPU | VRAM | Notes |
|---|---|---|---|
| GPU 0 (PCIe 01:00.0) | EVGA RTX 3090 FTW3 Ultra | 24GB | RGB working |
| GPU 1 (PCIe 02:00.0) | ZOTAC Gaming RTX 3090 | 24GB | Compute verified |
Total VRAM: 48GB (47GB usable for inference)
Storage
| Device | Size | Mount | Purpose |
|---|---|---|---|
| nvme0n1 | 232GB | / (LVM 100GB) | Ubuntu OS |
| nvme1n1 | 232GB | /data/models | AI model storage |
Network Configuration
- LAN IP: 10.0.0.250
- Tailscale IP: 100.89.34.86
- Ollama API: http://100.89.34.86:11434
Software Stack
- NVIDIA Driver: 590.48.01
- CUDA Version: 13.1
- Ollama: Latest (models at /data/models/ollama)
- Model Loaded: qwen2.5-coder:32b (19GB)
Ollama Configuration
Ollama runs as systemd service - starts automatically on boot.
# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/data/models/ollama"
VS Code Extension
- Name: lars-assistant
- Version: 0.3.0
- Location: /home/nexus/.config/systemd/user/.cache/lars-extension/
- Features: Chat interface, Ollama streaming, voice output (pending)
Voice Integration
- Voice ID: UgBBYS2sOqTuMpoF3BR0 (male voice)
- Voice Server: v4.2.0 with multi-voice support
- Usage: voice:'lars' parameter
Credentials
- SSH: lars / LARS25 (Locker: l_f38d)
Track Projects
- c91dd504: Tuesday Demo - Tool Use & Voice
- f65ed2f4: VS Code Extension
- 048d1528: LoRA Training Pipeline
- dbc1600a: Main LARS Project