LARS - Local AI Server

Overview

LARS (Local AI Resource Server) is a dedicated local AI inference server built on repurposed hardware with dual NVIDIA RTX 3090 GPUs. It serves as a local AI assistant that can work alongside Nexus AI (Claude) for parallel processing, code assistance, and task delegation.

Hardware Specifications

System

CPU: Intel Core i7-8700K @ 3.70GHz (6 cores, 12 threads)
RAM: 32GB DDR4
Hostname: local-ai
OS: Ubuntu 22.04 LTS Server

GPUs

Slot	GPU	VRAM	Notes
GPU 0 (PCIe 01:00.0)	EVGA RTX 3090 FTW3 Ultra	24GB	RGB working
GPU 1 (PCIe 02:00.0)	ZOTAC Gaming RTX 3090	24GB	Compute verified

Total VRAM: 48GB (47GB usable for inference)

Storage

Device	Size	Mount	Purpose
nvme0n1	232GB	/ (LVM 100GB)	Ubuntu OS
nvme1n1	232GB	/data/models	AI model storage

Network Configuration

LAN IP: 10.0.0.250
Tailscale IP: 100.89.34.86
Ollama API: http://100.89.34.86:11434

Software Stack

NVIDIA Driver: 590.48.01
CUDA Version: 13.1
Ollama: Latest (models at /data/models/ollama)
Model Loaded: qwen2.5-coder:32b (19GB)

Ollama Configuration

Ollama runs as systemd service - starts automatically on boot.

# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/data/models/ollama"

VS Code Extension

Name: lars-assistant
Version: 0.3.0
Location: /home/nexus/.config/systemd/user/.cache/lars-extension/
Features: Chat interface, Ollama streaming, voice output (pending)

Voice Integration

Voice ID: UgBBYS2sOqTuMpoF3BR0 (male voice)
Voice Server: v4.2.0 with multi-voice support
Usage: voice:'lars' parameter

Credentials

SSH: lars / LARS25 (Locker: l_f38d)

Track Projects

c91dd504: Tuesday Demo - Tool Use & Voice
f65ed2f4: VS Code Extension
048d1528: LoRA Training Pipeline
dbc1600a: Main LARS Project