root

Local LLM Integration Guide

Local LLM Integration Guide

Overview

This guide covers integrating local LLMs into the Nexus 3.0 ecosystem using Ollama and the Local MCP Server.

Architecture

Hardware Requirements (Practice Rig)

  • GPU: 8GB+ VRAM minimum (14GB recommended for 7B models)
  • CPU: Any modern multi-core
  • RAM: 16GB+ system RAM
  • Storage: 50GB+ for models

Hardware Requirements (Production/Client)

  • GPU: NVIDIA RTX Pro 6000 (96GB VRAM)
  • CPU: AMD Threadripper (64 cores)
  • RAM: 128GB+
  • Storage: 1TB+ NVMe

Software Stack

Server (Headless Linux)

  • Ubuntu Server 24.04 LTS
  • NVIDIA Driver 550+
  • CUDA Toolkit
  • Ollama
  • Docker (for Open WebUI)

Client (Windows Desktop)

  • LM Studio - GUI for model testing
  • Connects to remote Ollama API

Network Setup

  • Tailscale for secure mesh networking
  • Ollama API on port 11434
  • Open WebUI on port 3000 (optional)

Model Recommendations by VRAM

VRAM Model Tokens/sec (est)
8GB Qwen2.5-1.8B 60+
8GB Qwen2.5-7B-Q4 20-30
14GB Qwen2.5-7B-Q8 25-35
14GB Qwen2.5-14B-Q4 15-20
24GB Llama2-13B-Q8 20-30
48GB Llama2-70B-Q4 10-15
96GB Llama2-70B-Q8 15-25

Local MCP Server

The local MCP server exposes these tools: - local.chat - Conversational completion - local.complete - Text completion - local.models - List available models - local.status - GPU memory, tokens/sec - local.embed - Generate embeddings

What Small Models (1.8B-7B) Can Do

  • Tool calling (add contact, create track, search Nexus)
  • Short voice responses
  • Structured data extraction
  • Company knowledge Q&A (with training)

What Requires Larger Models

  • Long document summarization
  • Complex multi-step reasoning
  • Use Claude API for these tasks

Training / Fine-Tuning

Use LoRA (Low-Rank Adaptation): - Base model stays frozen - Train small adapter (~50-100MB) - Works completely offline - Tools: Unsloth, Axolotl, LLaMA-Factory

Training Data Ideas

  • Corlera company information
  • User preferences and style
  • Nexus tool usage patterns
  • Contact and project context
ID: 2b3ea2a3
Path: Local LLM Integration Guide
Updated: 2026-01-13T12:51:28