Reasoning & Chain of Thought
Improve your model's quality with inference time scaling
Last updated
Improve your model's quality with inference time scaling
Last updated
Want to dive in and build a reasoning model? See our Guide for Training A Reasoning Model
Kiln has powerful support for reasoning models and chain of thought. These techniques can generate higher quality results, while also reducing costs and improving performance.
Kiln has native support for both these methods. This includes:
Creating Reasoning Models: You can fine-tune/distill reasoning models using Kiln. Models train on your Kiln dataset, using samples generated from reasoning models. This approach allows you to build small, fast, and high-quality thinking models, tuned to your use case.
Data Model: Our data model stores thinking separately from final answers, allowing you to evaluate or train on them independently.
Custom Message Flow: When using chain-of-thought with models that don't support reasoning we make a chain of calls to the model to formally separate the thinking from the answer.
Structured Data: Our chat call flow allows for the final answer messages to use structured data tools (json_schema, json_object, tool-calls, etc.) without adding "thinking" fields to your data structures.
Prompts: Prompts are divided into the primary system message and a separate "thinking instruction."
Reasoning Parsers: Kiln includes parsers that separate out "thinking" from answers for common thinking models.
Using reasoning models or COT in Kiln is easy! Simply do one of the following:
Run a model with any "Chain of Thought" prompt selected, including a custom prompt with thinking instructions included.
Run a reasoning model (e.g., Deepseek R1) with any prompt. If a Chain of Thought prompt is used, the thinking instructions will be passed along to the model in the system prompt. If a non-COT prompt is used, the reasoning model will still "think," but using its own reasoning guidance.
Once the run is complete, you'll see both a final answer and reasoning in the model output.
Kiln can fine-tune a thinking model using your dataset. Often called distillation, these models can learn the reasoning strategies for your use case from examples in your Kiln dataset. By fine-tuning a model, you can produce a model that's smaller, faster, cheaper, and better than the original model (for your use case).
See our guide on fine-tuning reasoning models
See our guide for general fine-tuning (non-reasoning models)
Kiln uses supervised fine tuning to distill reasoning models.
Reasoning and COT doesn't necessarily mean slower or more costly requests. Sometimes a smaller model with these methods can be faster, better, and cheaper than a larger model performing the same task. Fine-tuning can help further reduce costs and improve quality.
Currently we support Deepseek R1 and it's official distillations. We expect to see many more open reasoning models emerge over the next few months. See the model capability list in our docs for the latest .
While you can call OpenAI's reasoning models (o1, o3) from Kiln, they behave like normal models. OpenAI hides the reasoning tokens from users, only returning the final answer.
Here's the message call flow Kiln uses for each configuration:
[System-Message]: System prompt
[User-Message]: User inputs
[Assistant-Message]: Final Answer, optionally structured data
[System-Message]: System prompt
[User-Message]: User inputs
[User-Message]: Thinking instructions. User-provided if available, defaults to "Think step by step, explaining your reasoning."
[Assistant-Message]: COT reasoning tokens
[User-Message]: Kiln managed message: "Considering the above, return a final result."
[Assistant-Message]: Final Answer, optionally structured data
This flow is also used on fine-tunes you create with Kiln if the fine-tune was created with the "Final answer and intermediate reasoning" training strategy.
[System-Message]: System prompt, optionally appending thinking instructions if the selected prompt includes them.
[User-Message]: User inputs
[Assistant-Message]: Final Answer and reasoning in one message, but will be parsed into separate reasoning and answer fields. Will parse structured data if the task has structured output.