Auto-Deploy Pipeline - Training to Ollama

Auto-Deploy Pipeline: Training to Ollama

The Gap We Need to Close

Current State:
[Dataset] → [Train Script] → [LoRA Adapter] → ??? (manual steps)

Goal State:
[Dataset] → [Train Script] → [LoRA Adapter] → [Auto-Deploy] → [Talk to LARS]

Key Discovery: Ollama Supports LoRA Adapters Directly

From Ollama's Modelfile Reference, there's an ADAPTER instruction that loads LoRA adapters on top of a base model. No merge required.

Modelfile Example:

FROM qwen2.5:7b-instruct
ADAPTER /path/to/lars-3d-v2-tasks/adapter_model.safetensors
SYSTEM "You are LARS, the Local AI Runtime System..."

Then: ollama create lars -f Modelfile

Supported Adapter Formats

Ollama officially supports Safetensor adapters for: - Llama (1, 2, 3, 3.1) - Mistral (1, 2, Mixtral) - Gemma (1, 2)

Note: Qwen not explicitly listed. Need to test if it works directly or requires GGUF conversion.

Two Deployment Paths

Path 1: Direct Adapter Loading (Preferred - Faster) - Use ADAPTER instruction in Modelfile - Points directly to safetensors file - Skip merge step entirely

Path 2: Convert LoRA to GGUF (Fallback) - Use convert_lora_to_gguf.py from llama.cpp - Convert adapter to GGUF format - Then load in Modelfile

Auto-Deploy Script Design

#!/bin/bash
# deploy_lars.sh - Run after training completes

ADAPTER_PATH="$1"
MODEL_NAME="lars"
BASE_MODEL="qwen2.5:7b-instruct"

# Create Modelfile
cat > /tmp/Modelfile << EOF
FROM ${BASE_MODEL}
ADAPTER ${ADAPTER_PATH}/adapter_model.safetensors
SYSTEM "You are LARS, the Local AI Runtime System. You are owned by Corlera and Christopher Foust. You run locally on dedicated hardware as part of the Nexus AI engine."
EOF

# Deploy to Ollama
ollama create ${MODEL_NAME} -f /tmp/Modelfile

echo "LARS deployed with adapter from ${ADAPTER_PATH}"

Full Pipeline Integration

# End of train_3d.py or wrapper script:

# 1. Training completes, saves to output dir
# 2. Auto-deploy kicks in:
./deploy_lars.sh ~/corlera-training/outputs/lars-3d-v2-tasks

# 3. Extension now talks to updated LARS

Benefits

No manual steps after training
Model name stays same (lars) - extension doesn't need reconfiguring
Fast iteration - train, deploy, test, repeat
No Ollama core modifications - just Modelfile + create command

Sources

Ollama Modelfile Reference: https://docs.ollama.com/modelfile
Deploy Fine-Tuned LoRA with Ollama: https://kaitchup.substack.com/p/deploy-your-fine-tuned-langue-models
Unsloth LoRA with Ollama: https://sarinsuriyakoon.medium.com/unsloth-lora-with-ollama-lightweight-solution-to-full-cycle-llm-development-edadb6d9e0f0
Ollama LoRA GitHub Issue: https://github.com/ollama/ollama/issues/4432