LARS 3D Reasoning Dataset Format
Overview
The 3D Reasoning format teaches AI models HOW to think, not just WHAT to answer. This format is critical for training LARS to have genuine reasoning capabilities.
Based on: AM-DeepSeek-R1-Distilled-1.4M dataset (1.4 million reasoning examples)
The 3D Format Structure
JSON Schema
{
"messages": [
{
"role": "user",
"content": "The question or prompt"
},
{
"role": "assistant",
"content": "<think>\n[Step-by-step reasoning process]\n[Analysis of the problem]\n[Consideration of options]\n[Logic chain to reach conclusion]\n</think>\n<answer>\n[Final polished response]\n[Clear, actionable answer]\n</answer>"
}
]
}
Key Elements
| Element | Purpose |
|---|---|
<think> |
Opens the reasoning block - shows HOW to arrive at answer |
</think> |
Closes the reasoning block |
<answer> |
Opens the final response block |
</answer> |
Closes the final response block |
Why 3D vs 2D?
2D Format (Simple Q&A)
- Prompt → Response
- Model memorizes answers
- No reasoning transfer
- Limited generalization
3D Format (Reasoning Chain)
- Prompt → Thinking → Response
- Model learns reasoning patterns
- Transfers to new problems
- Better generalization
Example: Salary Negotiation
{
"messages": [
{
"role": "user",
"content": "What are the best salary negotiation tactics?"
},
{
"role": "assistant",
"content": "<think>\nLet me analyze effective salary negotiation strategies...\n\nFirst, research is crucial - knowing market rates gives leverage.\nSecond, timing matters - negotiate after receiving an offer, not before.\nThird, focus on value provided, not personal needs.\nFourth, practice the conversation to build confidence.\nFifth, be prepared to walk away if the offer doesn't meet minimums.\n</think>\n<answer>\nThe most effective salary negotiation tactics are:\n\n1. **Research thoroughly** - Know your market value\n2. **Time it right** - Negotiate after the offer\n3. **Lead with value** - Emphasize what you bring\n4. **Use silence** - After stating your number, wait\n5. **Have a walk-away number** - Know your minimum\n</answer>"
}
]
}
Generating 3D Datasets
Using Synth Server
gateway.run([{
server: 'synth',
tool: 'dataset',
args: {
content: 'source content here',
content_type: 'transcript',
format: 'reasoning' # DEFAULT - uses 3D format
}
}])
Prompt Template for AI Generation
Given this content, generate training examples with:
1. A user question
2. An assistant response containing:
- <think> section with step-by-step reasoning
- </think> to close reasoning
- <answer> section with final response
- </answer> to close
The thinking shows HOW to arrive at the answer.
The answer is the polished final response.
Sources
- Dataset: https://huggingface.co/datasets/a-m-team/AM-DeepSeek-R1-Distilled-1.4M
- Paper: https://arxiv.org/abs/2503.19633
- Track Project: 53134a0f (3D Reasoning Training Research)
Integration Points
- Synth Server: Generates 3D datasets from any content
- Transcript Server: Auto-generates datasets from YouTube transcripts
- KB Server: Can generate datasets from knowledge base content
- Corpus Server: Can generate datasets from documents
Training Notes
- Use SFT (Supervised Fine-Tuning) with this format
- AM-Distill-Qwen-32B trained on this format outperformed DeepSeek-R1-Distill-Qwen-32B
- Quality of thinking chain is more important than quantity