page

3D Reasoning Dataset Format

training lars dataset 3d-reasoning critical deepseek

LARS 3D Reasoning Dataset Format

Overview

The 3D Reasoning format teaches AI models HOW to think, not just WHAT to answer. This format is critical for training LARS to have genuine reasoning capabilities.

Based on: AM-DeepSeek-R1-Distilled-1.4M dataset (1.4 million reasoning examples)

The 3D Format Structure

JSON Schema

{
  "messages": [
    {
      "role": "user",
      "content": "The question or prompt"
    },
    {
      "role": "assistant",
      "content": "<think>\n[Step-by-step reasoning process]\n[Analysis of the problem]\n[Consideration of options]\n[Logic chain to reach conclusion]\n</think>\n<answer>\n[Final polished response]\n[Clear, actionable answer]\n</answer>"
    }
  ]
}

Key Elements

Element Purpose
<think> Opens the reasoning block - shows HOW to arrive at answer
</think> Closes the reasoning block
<answer> Opens the final response block
</answer> Closes the final response block

Why 3D vs 2D?

2D Format (Simple Q&A)

  • Prompt → Response
  • Model memorizes answers
  • No reasoning transfer
  • Limited generalization

3D Format (Reasoning Chain)

  • Prompt → Thinking → Response
  • Model learns reasoning patterns
  • Transfers to new problems
  • Better generalization

Example: Salary Negotiation

{
  "messages": [
    {
      "role": "user",
      "content": "What are the best salary negotiation tactics?"
    },
    {
      "role": "assistant",
      "content": "<think>\nLet me analyze effective salary negotiation strategies...\n\nFirst, research is crucial - knowing market rates gives leverage.\nSecond, timing matters - negotiate after receiving an offer, not before.\nThird, focus on value provided, not personal needs.\nFourth, practice the conversation to build confidence.\nFifth, be prepared to walk away if the offer doesn't meet minimums.\n</think>\n<answer>\nThe most effective salary negotiation tactics are:\n\n1. **Research thoroughly** - Know your market value\n2. **Time it right** - Negotiate after the offer\n3. **Lead with value** - Emphasize what you bring\n4. **Use silence** - After stating your number, wait\n5. **Have a walk-away number** - Know your minimum\n</answer>"
    }
  ]
}

Generating 3D Datasets

Using Synth Server

gateway.run([{
    server: 'synth',
    tool: 'dataset',
    args: {
        content: 'source content here',
        content_type: 'transcript',
        format: 'reasoning'  # DEFAULT - uses 3D format
    }
}])

Prompt Template for AI Generation

Given this content, generate training examples with:
1. A user question
2. An assistant response containing:
   - <think> section with step-by-step reasoning
   - </think> to close reasoning  
   - <answer> section with final response
   - </answer> to close

The thinking shows HOW to arrive at the answer.
The answer is the polished final response.

Sources

  • Dataset: https://huggingface.co/datasets/a-m-team/AM-DeepSeek-R1-Distilled-1.4M
  • Paper: https://arxiv.org/abs/2503.19633
  • Track Project: 53134a0f (3D Reasoning Training Research)

Integration Points

  • Synth Server: Generates 3D datasets from any content
  • Transcript Server: Auto-generates datasets from YouTube transcripts
  • KB Server: Can generate datasets from knowledge base content
  • Corpus Server: Can generate datasets from documents

Training Notes

  • Use SFT (Supervised Fine-Tuning) with this format
  • AM-Distill-Qwen-32B trained on this format outperformed DeepSeek-R1-Distill-Qwen-32B
  • Quality of thinking chain is more important than quantity
ID: be6097dc
Path: LARS - Local AI Resource Server > Training > 3D Reasoning Dataset Format
Updated: 2026-01-06T16:35:10