AM-DeepSeek-R1-Distilled-1.4M Analysis
Date: 2025-12-29
Dataset Structure
{
"messages": [
{
"role": "user",
"content": "<prompt>",
"info": {"source": "...", "reference_answer": "..."}
},
{
"role": "assistant",
"content": "<think>reasoning</think><answer>solution</answer>",
"info": {"think_content": "...", "answer_content": "..."}
}
]
}
Thinking Patterns Observed
- Natural language: 'Okay, let me see', 'Hmm', 'I remember that...'
- Self-questioning: 'So maybe I can...', 'Then I can...'
- Step enumeration in answers
Metrics
- Average thinking length: 3000-3500 chars
- Average answer length: 300-1300 chars
- Ratio: ~3:1 thinking to answer
Applied To
- Created DS-002 (lars_3d_identity.json) based on this format
- Successfully trained LARS with 3D reasoning (EXP-003)