root

Local AI Training Infrastructure - Unsloth, Llama Fork, and Self-Hosted Fine-Tuning

ai-training unsloth llama fine-tuning lora local-ai lars infrastructure

Local AI Training Infrastructure Research

Date: 2025-12-27 Purpose: Research for setting up self-hosted AI training pipeline for LARS and custom Nexus AI models


Executive Summary

This document covers the research for implementing a fully local, self-hosted AI training infrastructure. The goal is to: 1. Fine-tune local AI models (like LARS) with custom identity and tool knowledge 2. Potentially fork Llama for a custom "Nexus AI" base model 3. Run training continuously on local servers without cloud dependencies


Part 1: Unsloth - Primary Training Framework

What is Unsloth?

Unsloth is an open-source (Apache 2.0 license) framework for fine-tuning and reinforcement learning for LLMs. It's specifically optimized for efficient, low-memory training on NVIDIA GPUs.

Key Features

  • 2x faster training with 70% less VRAM compared to standard methods
  • No accuracy degradation (uses no approximation methods)
  • Supports LoRA, QLoRA, and full fine-tuning
  • Direct export to Ollama, GGUF, llama.cpp, vLLM
  • Supports 100+ models including Qwen, Llama, DeepSeek, Gemma

Can It Run Locally? YES!

Installation:

# Linux/WSL
pip install unsloth

# Docker (recommended for servers)
docker pull unsloth/unsloth

GPU Requirements: - Minimum: CUDA Capability 7.0 - Supported: V100, T4, RTX 20/30/40 series, A100, H100, L40 - Also supports AMD and Intel GPUs - Our dual RTX 3090s are perfect for this

Export to Ollama Workflow

# After training in Unsloth
model.save_pretrained_gguf("output_dir", tokenizer, quantization_method="q8_0")
# Or for smaller size:
model.save_pretrained_gguf("output_dir", tokenizer, quantization_method="q4_k_m")

The exported GGUF file can then be loaded directly into Ollama.

Important: Chat Template Consistency

When exporting to Ollama, you MUST use the SAME chat template that was used during training. Mismatched templates cause gibberish output or infinite generation loops.


Part 2: Alternatives to Unsloth

1. Axolotl

  • Best for: Beginners, multi-GPU setups (Unsloth is single-GPU only)
  • License: Open source
  • Strengths: Community-driven, rapid new model support
  • Use case: If we need to use BOTH 3090s simultaneously
  • GitHub: https://github.com/OpenAccess-AI-Collective/axolotl

2. LLaMA Factory

  • Best for: Zero-code training via Web UI
  • License: Apache 2.0
  • Strengths: Supports 100+ LLMs, Gradio Web UI, no coding required
  • Use case: Quick experiments, non-technical training
  • GitHub: https://github.com/hiyouga/LLaMA-Factory

3. SWIFT (by ModelScope)

  • Best for: Multi-GPU training at scale
  • License: Open source
  • Strengths: 300+ LLM support, full workflow (pretrain → fine-tune → RLHF → deploy)
  • Use case: Enterprise-scale training

4. Torchtune

  • Best for: PyTorch developers wanting deep customization
  • License: BSD (PyTorch ecosystem)
  • Strengths: Pure PyTorch, AMD/NVIDIA support, memory efficient

Comparison Table

Framework Multi-GPU Web UI License Best For
Unsloth No No Apache 2.0 Speed/efficiency
Axolotl Yes No Open Multi-GPU
LLaMA Factory Yes Yes Apache 2.0 No-code
SWIFT Yes Partial Open Enterprise
Torchtune Yes No BSD Customization

Part 3: Forking Llama for Custom Nexus AI

Llama 3 License Terms

Llama uses the Meta Llama 3 Community License - NOT true open source, but permits: - Commercial use ✓ - Fine-tuning ✓ - Creating derivative works ✓ - Distribution ✓

Requirements: 1. Must include "Llama" in model name (e.g., "Nexus-Llama") 2. Must display "Built with Llama" on related materials 3. Must include copy of license with distribution 4. Must comply with Acceptable Use Policy

How to Fork Llama

# Clone the repo
git clone https://github.com/meta-llama/llama3.git

# Request access to model weights at llama.com
# Run download script after approval
bash download.sh

Fully Permissive Alternatives

If we want true open source without attribution requirements: - Grok (xAI) - Apache 2.0, fully permissive - Mixtral 8x22B - Apache 2.0, fully permissive - Qwen - Various models with permissive licenses


Option A: Dedicated Training Server

Run Unsloth on a dedicated server (could be Nexus server or backup) that: 1. Continuously monitors training data/feedback 2. Periodically retrains/fine-tunes LARS 3. Exports new GGUF to local-ai server 4. Ollama hot-reloads the updated model

Option B: On-Demand Training on local-ai

Run Unsloth directly on local-ai server (dual 3090s) when needed: 1. Collect training data via normal LARS usage 2. Schedule training during off-hours 3. Export directly to Ollama on same machine

Dedicated training server prevents GPU contention with inference.

Training Data Sources for LARS Identity

  1. Curated Q&A pairs establishing LARS identity
  2. Tool usage examples ("When asked to read a file, call read_file tool")
  3. Nexus system knowledge
  4. Coding style preferences
  5. Voice/personality traits

Part 5: Implementation Plan

Phase 1: Setup

  1. Install Unsloth on local-ai server (or dedicated training server)
  2. Create training dataset for LARS identity
  3. Run initial fine-tune on Qwen 2.5 Coder 32B
  4. Export to GGUF and test in Ollama

Phase 2: Tool Knowledge

  1. Create tool-calling training examples
  2. Fine-tune LARS to understand Gateway tools
  3. Test tool invocation accuracy

Phase 3: Continuous Learning

  1. Set up feedback collection from LARS usage
  2. Implement periodic retraining pipeline
  3. A/B test new models vs current

Resources

Official Documentation

  • Unsloth: https://unsloth.ai/docs
  • Unsloth GitHub: https://github.com/unslothai/unsloth
  • Llama 3 GitHub: https://github.com/meta-llama/llama3
  • LLaMA Factory: https://github.com/hiyouga/LLaMA-Factory

Tutorials

  • Unsloth + Ollama Tutorial: https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/tutorial-how-to-finetune-llama-3-and-use-in-ollama
  • Saving to GGUF: https://docs.unsloth.ai/basics/running-and-saving-models/saving-to-gguf
  • Saving to Ollama: https://docs.unsloth.ai/basics/running-and-saving-models/saving-to-ollama

Key Takeaways

  1. Unsloth CAN run locally - Apache 2.0 license, pip install, Docker available
  2. Our hardware is perfect - Dual RTX 3090s exceed requirements
  3. Llama can be forked - With attribution requirements ("Built with Llama")
  4. Qwen may be better - More permissive license, already running on LARS
  5. Training pipeline is achievable - Unsloth → GGUF → Ollama is well-documented

Contents

ID: c6bbfa0d
Path: Local AI Training Infrastructure - Unsloth, Llama Fork, and Self-Hosted Fine-Tuning
Updated: 2026-01-13T12:50:55