Back

Open Reproduction of DeepSeek-R1

81 points3 hoursgithub.com
Tiberium2 hours ago

Last update over a year ago, so I hope (2025) gets added to the title:

> [2025/05/26] (Step 1 completed!) We release Mixture-of-Thoughts--a curated reasoning dataset of 350k verified traces distilled from R1. The dataset spans tasks in mathematics, coding, and science, and is designed to teach language models to reason step-by-step. We also provide a recipe to train OpenR1-Distill-7B, which replicates the reasoning capabilities of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B and marks the completion of step 1 in the Open R1 project.

Doesn't look like they managed to actually reproduce R1, and only stopped on Step 1 out of their 3-step plan.

spmurrayzzz2 hours ago

One of my favorite code comments of all time is still in the src:

"# TODO: implement a proper validator to compare against ground truth. For now we just check for exact string match on each line of stdout." [1]

This was one of my chief complaints about the entire R1 news cycle, it felt like no one actually read the technical report. They were being heralded for their openness, but they left out the most meaningful details that you'd need to reproduce their work.

[1] https://github.com/huggingface/open-r1/blob/1416fa0cf21595d2...

neutronicus1 hour ago

Reminds me of my days in a computational physics PhD program.

aesthesia1 hour ago

If you really want to see fully open training pipelines for modern LLMs, Olmo and to a lesser extent Nemotron are what you should look at.

https://github.com/allenai/OLMo

https://github.com/NVIDIA-NeMo/Nemotron

spijdar39 minutes ago

I'm not really familiar with either, but I'm more familiar with Olmo. My impression is Nemotron is newer -- why is it less applicable? Is it not totally open like Olmo?

madiator2 hours ago

Check out OpenThoughts. It has a widely used dataset, a model that beats the deepseek's smaller reasoning models, and a paper that talks in detail about the data curation methodology.

https://www.open-thoughts.ai/

yogthos1 hour ago

neat

poppafuze28 minutes ago

"This will likely involve curating new, large-scale datasets for math, reasoning, and code.". ... everybody likes to hand-wave on this .

yieldcrv2 hours ago

Too old now

christkv2 hours ago

What is the estimated cost these days to train something like this to conclusion?

RedMagicBox51 minutes ago

[dead]