Guide: Train a Reasoning Model
Built a reasoning model, like o3 or R1, for your use case
Last updated
Built a reasoning model, like o3 or R1, for your use case
Last updated
Kiln is a platform that makes building task-specific AI models easy and fast. By creating a fine-tuned model targeted to your use case, you can produce a model that's higher quality, faster and cheaper that standard foundation models.
In this guide, we'll walk through how to build a reasoning model, like OpenAI o3 or Deepseek R1, for your specific use case. The whole process can be completed in as little as 30 minutes, and does not require coding.
Also see our docs on .
We already have a . This article covers the settings to use throughout that process to ensure the final model you produce is a reasoning model, and not just a standard fine-tune.
When creating your training dataset, be sure to filter it to samples with reasoning/thinking as shown here:
To train your own reasoning mode, you must select the Final Responses and Intermediate Reasoning
training strategy. This will include the reasoning data in the fine-tune data.
If you select Final only
the fine-tune will only learn from the final result, not the reasoning. This is still a valid approach and could produce a viable model for your task. However, it won't produce a model with learned reasoning skills.
When you call any fine-tune, we always recommend calling it with the same prompt used in training.
The fine-tuning approach described in this article is a general approach that trains for an intermediate "thinking" output. This can be used for both reasoning models and chain of thought.
Both approaches can build great task specific models. Which to choose depends on your use case. It can be worth training and comparing several to find the best option for you.
Distill a Reasoning Model: Reasoning models have learned reasoning skills across a range of domains. If large reasoning models like Deepseek R1 perform well on your task, but are too expensive or slow, it can be a good choice to fine-tune a smaller model from R1 outputs (this is called distilling a model). The smaller model will learn task-specific reasoning patterns from R1 samples, and be faster and cheaper to run.
Chain of thought with default prompt: Sometimes a simple "think step by step" prompt is all you need for chain of thought to greatly improve your quality of output. If large models work great with a simple prompt but smaller models fail to produce the same quality, you can build a fine-tune with task-specific examples so the smaller model can distill the thinking patterns from the larger model.
Chain of thought with a custom thinking prompt: When building a model for a specific task, it's very possible you or your team understand the nuance of the task better than a generalized model like Deekseek R1. If you can create a "thinking instructions" prompt that works will with large models like Sonnet or GPT-4o, you can use that to build a synthetic training set, create a fine-tune, and reproduce that quality on a much smaller and faster model.
In each case, you're building a model that will be focused on the use-case samples it is trained on. This can produce a model that's faster, cheaper and higher quality than the original model, within the domain of your task.
Human curation feedback can add the nuance that makes a truly great model/product. Kiln offers a number of tools to make this easy:
When developing your training data with our tool, be sure to use either a reasoning model or chain-of-thought prompting. Using either of these will ensure your dataset has reasoning data to learn from. See our for which models have native reasoning support.
See below for .
If you're using multi-shot prompting, also ensure your prompt examples have appropriate reasoning data. Consider a with examples demonstrating ideal reasoning for your task.
If calling your model from custom code, follow the chat call flow described in our , and use the prompts used to fine-tune the model which can be found by clicking the model in the "Fine Tune" tab of Kiln's UI.
Read about .
Have a subject matter expert , and filter your training data to only use high quality samples.
Have subject matter experts , giving the model important examples of places it likely would have failed without fine-tuning.
Use human-led chain of thought prompts as described , to generate for fine-tuning.
When you find a pattern of bugs, use to create samples of correct input/output pairs. Add these to your training set to fix the behaviour the next time you train.
Use Kiln's to anyone on your team to contribute to model quality with feedback, data generation and quality. Our UI is designed for anyone, and does not require command line or coding skills.