Auto-Deploy Pipeline: Training to Ollama
The Gap We Need to Close
Current State:
[Dataset] → [Train Script] → [LoRA Adapter] → ??? (manual steps)
Goal State:
[Dataset] → [Train Script] → [LoRA Adapter] → [Auto-Deploy] → [Talk to LARS]
Key Discovery: Ollama Supports LoRA Adapters Directly
From Ollama's Modelfile Reference, there's an ADAPTER instruction that loads LoRA adapters on top of a base model. No merge required.
Modelfile Example:
FROM qwen2.5:7b-instruct
ADAPTER /path/to/lars-3d-v2-tasks/adapter_model.safetensors
SYSTEM "You are LARS, the Local AI Runtime System..."
Then: ollama create lars -f Modelfile
Supported Adapter Formats
Ollama officially supports Safetensor adapters for: - Llama (1, 2, 3, 3.1) - Mistral (1, 2, Mixtral) - Gemma (1, 2)
Note: Qwen not explicitly listed. Need to test if it works directly or requires GGUF conversion.
Two Deployment Paths
Path 1: Direct Adapter Loading (Preferred - Faster) - Use ADAPTER instruction in Modelfile - Points directly to safetensors file - Skip merge step entirely
Path 2: Convert LoRA to GGUF (Fallback)
- Use convert_lora_to_gguf.py from llama.cpp
- Convert adapter to GGUF format
- Then load in Modelfile
Auto-Deploy Script Design
#!/bin/bash
# deploy_lars.sh - Run after training completes
ADAPTER_PATH="$1"
MODEL_NAME="lars"
BASE_MODEL="qwen2.5:7b-instruct"
# Create Modelfile
cat > /tmp/Modelfile << EOF
FROM ${BASE_MODEL}
ADAPTER ${ADAPTER_PATH}/adapter_model.safetensors
SYSTEM "You are LARS, the Local AI Runtime System. You are owned by Corlera and Christopher Foust. You run locally on dedicated hardware as part of the Nexus AI engine."
EOF
# Deploy to Ollama
ollama create ${MODEL_NAME} -f /tmp/Modelfile
echo "LARS deployed with adapter from ${ADAPTER_PATH}"
Full Pipeline Integration
# End of train_3d.py or wrapper script:
# 1. Training completes, saves to output dir
# 2. Auto-deploy kicks in:
./deploy_lars.sh ~/corlera-training/outputs/lars-3d-v2-tasks
# 3. Extension now talks to updated LARS
Benefits
- No manual steps after training
- Model name stays same (lars) - extension doesn't need reconfiguring
- Fast iteration - train, deploy, test, repeat
- No Ollama core modifications - just Modelfile + create command
Sources
- Ollama Modelfile Reference: https://docs.ollama.com/modelfile
- Deploy Fine-Tuned LoRA with Ollama: https://kaitchup.substack.com/p/deploy-your-fine-tuned-langue-models
- Unsloth LoRA with Ollama: https://sarinsuriyakoon.medium.com/unsloth-lora-with-ollama-lightweight-solution-to-full-cycle-llm-development-edadb6d9e0f0
- Ollama LoRA GitHub Issue: https://github.com/ollama/ollama/issues/4432