section

AM-DeepSeek-R1-Distilled Dataset Analysis

AM-DeepSeek-R1-Distilled-1.4M Analysis

Date: 2025-12-29

Dataset Structure

{
  "messages": [
    {
      "role": "user",
      "content": "<prompt>",
      "info": {"source": "...", "reference_answer": "..."}
    },
    {
      "role": "assistant",
      "content": "<think>reasoning</think><answer>solution</answer>",
      "info": {"think_content": "...", "answer_content": "..."}
    }
  ]
}

Thinking Patterns Observed

Natural language: 'Okay, let me see', 'Hmm', 'I remember that...'
Self-questioning: 'So maybe I can...', 'Then I can...'
Step enumeration in answers

Metrics

Average thinking length: 3000-3500 chars
Average answer length: 300-1300 chars
Ratio: ~3:1 thinking to answer

Applied To

Created DS-002 (lars_3d_identity.json) based on this format
Successfully trained LARS with 3D reasoning (EXP-003)

🌳 View Tree