Local AI Training Infrastructure Research

Date: 2025-12-27 Purpose: Research for setting up self-hosted AI training pipeline for LARS and custom Nexus AI models

Executive Summary

This document covers the research for implementing a fully local, self-hosted AI training infrastructure. The goal is to: 1. Fine-tune local AI models (like LARS) with custom identity and tool knowledge 2. Potentially fork Llama for a custom "Nexus AI" base model 3. Run training continuously on local servers without cloud dependencies

Part 1: Unsloth - Primary Training Framework

What is Unsloth?

Unsloth is an open-source (Apache 2.0 license) framework for fine-tuning and reinforcement learning for LLMs. It's specifically optimized for efficient, low-memory training on NVIDIA GPUs.

Key Features

2x faster training with 70% less VRAM compared to standard methods
No accuracy degradation (uses no approximation methods)
Supports LoRA, QLoRA, and full fine-tuning
Direct export to Ollama, GGUF, llama.cpp, vLLM
Supports 100+ models including Qwen, Llama, DeepSeek, Gemma

Can It Run Locally? YES!

Installation:

# Linux/WSL
pip install unsloth

# Docker (recommended for servers)
docker pull unsloth/unsloth

GPU Requirements: - Minimum: CUDA Capability 7.0 - Supported: V100, T4, RTX 20/30/40 series, A100, H100, L40 - Also supports AMD and Intel GPUs - Our dual RTX 3090s are perfect for this

Export to Ollama Workflow

# After training in Unsloth
model.save_pretrained_gguf("output_dir", tokenizer, quantization_method="q8_0")
# Or for smaller size:
model.save_pretrained_gguf("output_dir", tokenizer, quantization_method="q4_k_m")

The exported GGUF file can then be loaded directly into Ollama.

Important: Chat Template Consistency

When exporting to Ollama, you MUST use the SAME chat template that was used during training. Mismatched templates cause gibberish output or infinite generation loops.

Part 2: Alternatives to Unsloth

1. Axolotl

Best for: Beginners, multi-GPU setups (Unsloth is single-GPU only)
License: Open source
Strengths: Community-driven, rapid new model support
Use case: If we need to use BOTH 3090s simultaneously
GitHub: https://github.com/OpenAccess-AI-Collective/axolotl

2. LLaMA Factory

Best for: Zero-code training via Web UI
License: Apache 2.0
Strengths: Supports 100+ LLMs, Gradio Web UI, no coding required
Use case: Quick experiments, non-technical training
GitHub: https://github.com/hiyouga/LLaMA-Factory

3. SWIFT (by ModelScope)

Best for: Multi-GPU training at scale
License: Open source
Strengths: 300+ LLM support, full workflow (pretrain → fine-tune → RLHF → deploy)
Use case: Enterprise-scale training

4. Torchtune

Best for: PyTorch developers wanting deep customization
License: BSD (PyTorch ecosystem)
Strengths: Pure PyTorch, AMD/NVIDIA support, memory efficient

Comparison Table

Framework	Multi-GPU	Web UI	License	Best For
Unsloth	No	No	Apache 2.0	Speed/efficiency
Axolotl	Yes	No	Open	Multi-GPU
LLaMA Factory	Yes	Yes	Apache 2.0	No-code
SWIFT	Yes	Partial	Open	Enterprise
Torchtune	Yes	No	BSD	Customization

Part 3: Forking Llama for Custom Nexus AI

Llama 3 License Terms

Llama uses the Meta Llama 3 Community License - NOT true open source, but permits: - Commercial use ✓ - Fine-tuning ✓ - Creating derivative works ✓ - Distribution ✓

Requirements: 1. Must include "Llama" in model name (e.g., "Nexus-Llama") 2. Must display "Built with Llama" on related materials 3. Must include copy of license with distribution 4. Must comply with Acceptable Use Policy

How to Fork Llama

# Clone the repo
git clone https://github.com/meta-llama/llama3.git

# Request access to model weights at llama.com
# Run download script after approval
bash download.sh

Fully Permissive Alternatives

If we want true open source without attribution requirements: - Grok (xAI) - Apache 2.0, fully permissive - Mixtral 8x22B - Apache 2.0, fully permissive - Qwen - Various models with permissive licenses

Part 4: Recommended Architecture for Nexus

Option A: Dedicated Training Server

Run Unsloth on a dedicated server (could be Nexus server or backup) that: 1. Continuously monitors training data/feedback 2. Periodically retrains/fine-tunes LARS 3. Exports new GGUF to local-ai server 4. Ollama hot-reloads the updated model

Option B: On-Demand Training on local-ai

Run Unsloth directly on local-ai server (dual 3090s) when needed: 1. Collect training data via normal LARS usage 2. Schedule training during off-hours 3. Export directly to Ollama on same machine

Recommended: Option A

Dedicated training server prevents GPU contention with inference.

Training Data Sources for LARS Identity

Curated Q&A pairs establishing LARS identity
Tool usage examples ("When asked to read a file, call read_file tool")
Nexus system knowledge
Coding style preferences
Voice/personality traits

Part 5: Implementation Plan

Phase 1: Setup

Install Unsloth on local-ai server (or dedicated training server)
Create training dataset for LARS identity
Run initial fine-tune on Qwen 2.5 Coder 32B
Export to GGUF and test in Ollama

Phase 2: Tool Knowledge

Create tool-calling training examples
Fine-tune LARS to understand Gateway tools
Test tool invocation accuracy

Phase 3: Continuous Learning

Set up feedback collection from LARS usage
Implement periodic retraining pipeline
A/B test new models vs current

Resources

Official Documentation

Unsloth: https://unsloth.ai/docs
Unsloth GitHub: https://github.com/unslothai/unsloth
Llama 3 GitHub: https://github.com/meta-llama/llama3
LLaMA Factory: https://github.com/hiyouga/LLaMA-Factory

Tutorials

Unsloth + Ollama Tutorial: https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/tutorial-how-to-finetune-llama-3-and-use-in-ollama
Saving to GGUF: https://docs.unsloth.ai/basics/running-and-saving-models/saving-to-gguf
Saving to Ollama: https://docs.unsloth.ai/basics/running-and-saving-models/saving-to-ollama

Key Takeaways

Unsloth CAN run locally - Apache 2.0 license, pip install, Docker available
Our hardware is perfect - Dual RTX 3090s exceed requirements
Llama can be forked - With attribution requirements ("Built with Llama")
Qwen may be better - More permissive license, already running on LARS
Training pipeline is achievable - Unsloth → GGUF → Ollama is well-documented

Local AI Training Infrastructure - Unsloth, Llama Fork, and Self-Hosted Fine-Tuning