Reasoning & Chain of Thought

Improve your model's quality with inference time scaling

PreviousGuide: Train a Reasoning Model NextPrompts

Last updated 9 days ago

Reasoning & Chain of Thought

Improve your model's quality with inference time scaling

Want to dive in and build a reasoning model? See our

Kiln has powerful support for reasoning models and chain of thought. These techniques can generate higher quality results, while also reducing costs and improving performance.

What are reasoning models and chain of thought?

Reasoning models and chain of thought (COT) are methods that give models time to "think" before giving a final answer. Their "thinking" takes the form of discussing the request and possible answers in a stream of generated tokens. These additional tokens allow for more complex reasoning, step-by-step thinking, and have been shown to improve the quality of results.

These approaches are also known as "inference time scaling," where models improve from spending more compute power at inference time — as opposed to improving by spending more compute at training time.

While similar in some ways, the methods have some differences:

Chain of thought is a method that's been around for a few years, and simply involves asking the model to think before giving an answer. This can be as simple as appending "Think step by step" to your prompt or adding detailed instructions for what the model should "think" about before giving its final answer.
Reasoning/thinking models like Deepseek R1 or OpenAI's O3 are a newer form of inference time compute, where the model itself was trained to develop powerful reasoning skills. These models are trained with reinforcement learning, where the model is rewarded for being correct and penalized when incorrect. This training system uses deep learning to help models develop reasoning skills across a range of domains.

While reasoning models are generally more powerful than chain of thought, it's often worth testing both approaches for your use case. Thinking models strive to reason about everything effectively, but a well-crafted chain of thought prompt from a human expert can often outperform them when developing use-case-specific models/APIs.

How Kiln handles reasoning models and chain of thought

Kiln has native support for both these methods. This includes:

Reasoning Parsers: Kiln includes parsers that separate out "thinking" from answers for common thinking models.

Using Reasoning or COT in Kiln for inference

Using reasoning models or COT in Kiln is easy! Simply do one of the following:

Run a model with any "Chain of Thought" prompt selected, including a custom prompt with thinking instructions included.
Run a reasoning model (e.g., Deepseek R1) with any prompt. If a Chain of Thought prompt is used, the thinking instructions will be passed along to the model in the system prompt. If a non-COT prompt is used, the reasoning model will still "think," but using its own reasoning guidance.

Once the run is complete, you'll see both a final answer and reasoning in the model output.

Building your own reasoning model (distillation)

Kiln can fine-tune a thinking model using your dataset. Often called distillation, these models can learn the reasoning strategies for your use case from examples in your Kiln dataset. By fine-tuning a model, you can produce a model that's smaller, faster, cheaper, and better than the original model (for your use case).

See our guide on fine-tuning reasoning models

Kiln uses supervised fine tuning to distill reasoning models.

Performance & Cost

Reasoning and COT doesn't necessarily mean slower or more costly requests. Sometimes a smaller model with these methods can be faster, better, and cheaper than a larger model performing the same task. Fine-tuning can help further reduce costs and improve quality.

Supported Reasoning Models

While you can call OpenAI's reasoning models (o1, o3) from Kiln, they behave like normal models. OpenAI hides the reasoning tokens from users, only returning the final answer.

Custom Message Chat Flow

Here's the message call flow Kiln uses for each configuration:

Normal Call-Flow (non-reasoning model)

[System-Message]: System prompt
[User-Message]: User inputs
[Assistant-Message]: Final Answer, optionally structured data

Chain of Thought Call-Flow (non-reasoning model):

[System-Message]: System prompt
[User-Message]: User inputs
[User-Message]: Thinking instructions. User-provided if available, defaults to "Think step by step, explaining your reasoning."
[Assistant-Message]: COT reasoning tokens
[User-Message]: Kiln managed message: "Considering the above, return a final result."
[Assistant-Message]: Final Answer, optionally structured data

This flow is also used on fine-tunes you create with Kiln if the fine-tune was created with the "Final answer and intermediate reasoning" training strategy.

Reasoning Model Call-Flow:

[System-Message]: System prompt, optionally appending thinking instructions if the selected prompt includes them.
[User-Message]: User inputs
[Assistant-Message]: Final Answer and reasoning in one message, but will be parsed into separate reasoning and answer fields. Will parse structured data if the task has structured output.