Qwen3-Coder Research for LARS

Research completed: 2025-12-28

Selected Model: Huihui-Qwen3-Coder-30B-A3B-Instruct-abliterated

Why Abliterated: - LARS needs to be uncensored to perform system tasks without refusals - Standard models have safety training that blocks certain operations - Abliteration removes refusal patterns without damaging model capabilities - LARS operates under Claude's oversight, so doesn't need its own guardrails

Source: https://huggingface.co/huihui-ai/Huihui-Qwen3-Coder-30B-A3B-Instruct-abliterated

Ollama Pull: ollama pull huihui_ai/qwen3-coder-abliterated

Architecture

Total Parameters: 30.5 billion
Active Parameters: 3.3 billion per token (MoE)
Experts: 128 total, 8 active per token
VRAM Required: ~18GB (fits on 2x RTX 3090)

Abliteration Process

Weight modification to suppress refusal patterns
Uses P-E-W method for optimal de-censoring
No damage to model capabilities
Available as GGUF for Ollama/llama.cpp

GGUF Quantizations Available

Q4_K_M: 18.6 GB (recommended for 48GB VRAM)
Q5_K_M: 21.7 GB
Q6_K: 25.1 GB
Q8_0: 32.5 GB

Alternative Models Considered

Standard qwen3:30b-a3b (censored - rejected)
qwen2.5-coder:32b (older, censored)
DavidAU 42B variant (larger, may not fit)