Qwen3-Coder Research for LARS
Research completed: 2025-12-28
Selected Model: Huihui-Qwen3-Coder-30B-A3B-Instruct-abliterated
Why Abliterated: - LARS needs to be uncensored to perform system tasks without refusals - Standard models have safety training that blocks certain operations - Abliteration removes refusal patterns without damaging model capabilities - LARS operates under Claude's oversight, so doesn't need its own guardrails
Source: https://huggingface.co/huihui-ai/Huihui-Qwen3-Coder-30B-A3B-Instruct-abliterated
Ollama Pull: ollama pull huihui_ai/qwen3-coder-abliterated
Architecture
- Total Parameters: 30.5 billion
- Active Parameters: 3.3 billion per token (MoE)
- Experts: 128 total, 8 active per token
- VRAM Required: ~18GB (fits on 2x RTX 3090)
Abliteration Process
- Weight modification to suppress refusal patterns
- Uses P-E-W method for optimal de-censoring
- No damage to model capabilities
- Available as GGUF for Ollama/llama.cpp
GGUF Quantizations Available
- Q4_K_M: 18.6 GB (recommended for 48GB VRAM)
- Q5_K_M: 21.7 GB
- Q6_K: 25.1 GB
- Q8_0: 32.5 GB
Alternative Models Considered
- Standard qwen3:30b-a3b (censored - rejected)
- qwen2.5-coder:32b (older, censored)
- DavidAU 42B variant (larger, may not fit)
Related
- Track: LARS Implementation Sprint (fc0dd483)
- Track: LARS Training System (0dd041be)
- KB: Nexus AI Engine (e213b1c0)
- KB: Un-sloth Setup Guide (1c99b911)