Enterprise Local LLM: RTX 6000 Capabilities & Custom Training

Dual RTX Pro 6000 Specs

VRAM: 96GB each (192GB total)
Memory Bandwidth: 960 GB/s
CUDA Cores: 18,176
Target: Enterprise AI workloads

Model Capabilities with 192GB VRAM

Model	VRAM	Fits?	Tokens/sec
Llama 70B Q4	~40GB	Single GPU	30-50
Llama 70B Q8	~75GB	Single GPU	25-40
Llama 70B FP16	~140GB	Dual GPU	15-25
Llama 405B Q4	~200GB	Tight fit	5-10
Multiple 70B simultaneously	-	Yes	-

Training Capabilities

Task	96GB	192GB
Fine-tune 7B full	Easy	Easy
Fine-tune 13B full	Yes	Easy
Fine-tune 70B LoRA	Yes	Easy
Fine-tune 70B full	Tight	Yes

Custom Training: Making a Coder BETTER

What Training Adds (LoRA Adapter)

Base model stays unchanged. You add a small adapter (~100-500MB) that teaches: - YOUR codebase patterns - YOUR naming conventions
- YOUR preferred libraries - YOUR error handling style - Company coding standards - Project-specific knowledge

Training Data for Corlera/Nexus

All 18 MCP server source files
Example tool implementations
Redis patterns (Track Pattern, Quadfecta)
Documentation and docstrings
Bug fix examples (before/after)
Good conversation examples

Result: Expert at YOUR Code

Before training: - Generic MCP structure - Guesses at patterns - May conflict with your style

After training: - Uses YOUR exact patterns - Knows Nexus architecture - Production-ready output - Matches your style perfectly

Is It BETTER Than Base Model?

For generic coding: Same For YOUR codebase: Significantly better

The trained model becomes an expert at Nexus-specific work.

KV Cache Clarification

What's Stored

Input tokens (your question)
Output tokens (model response)
ALL conversation history
Cumulative across turns

Multi-GPU KV Cache

NOT "overflow when full"
Layer-based split
Each GPU handles its layers + their KV cache
Plan model size accordingly

Production Setup Vision

Threadripper 64-core
128GB System RAM
Dual RTX Pro 6000 (192GB VRAM)
Qwen Coder 32B + Nexus LoRA

= Coding assistant that knows your entire 
  codebase and matches your style exactly

Last updated: 2025-12-08 Session: s_801t