page

Qwen3 Coder Research

research qwen lars models vram

Qwen3-Coder Research for LARS

Research completed: 2025-12-28

Selected Model: Huihui-Qwen3-Coder-30B-A3B-Instruct-abliterated

Why Abliterated: - LARS needs to be uncensored to perform system tasks without refusals - Standard models have safety training that blocks certain operations - Abliteration removes refusal patterns without damaging model capabilities - LARS operates under Claude's oversight, so doesn't need its own guardrails

Source: https://huggingface.co/huihui-ai/Huihui-Qwen3-Coder-30B-A3B-Instruct-abliterated

Ollama Pull: ollama pull huihui_ai/qwen3-coder-abliterated

Architecture

  • Total Parameters: 30.5 billion
  • Active Parameters: 3.3 billion per token (MoE)
  • Experts: 128 total, 8 active per token
  • VRAM Required: ~18GB (fits on 2x RTX 3090)

Abliteration Process

  • Weight modification to suppress refusal patterns
  • Uses P-E-W method for optimal de-censoring
  • No damage to model capabilities
  • Available as GGUF for Ollama/llama.cpp

GGUF Quantizations Available

  • Q4_K_M: 18.6 GB (recommended for 48GB VRAM)
  • Q5_K_M: 21.7 GB
  • Q6_K: 25.1 GB
  • Q8_0: 32.5 GB

Alternative Models Considered

  • Standard qwen3:30b-a3b (censored - rejected)
  • qwen2.5-coder:32b (older, censored)
  • DavidAU 42B variant (larger, may not fit)
  • Track: LARS Implementation Sprint (fc0dd483)
  • Track: LARS Training System (0dd041be)
  • KB: Nexus AI Engine (e213b1c0)
  • KB: Un-sloth Setup Guide (1c99b911)
ID: b21cea46
Path: Qwen3 Coder Research
Updated: 2026-01-13T12:51:00