Organizing Datasets
Tag, Filter, Sort, Import, Freeze and Split

Using Tags to Organize Your Dataset
Kiln uses tags to organize your dataset. You can add tags to any run/sample, and then filter by tag. This is a great way to organize your dataset and find specific runs.
Some examples of how you might use tags within a team:
Working with eval teams:
Add the "needs_review" tag when data is ready for review by a human eval team
Ask a human eval team to review the new batch of synthetic data. New synth data is automatically tagged with "synthetic" and "synthetic_session_id".
Defining a "golden" dataset: Have QA tag a "golden" data reserved for evals
Bug resolution: QA can tag examples of a common issue with a tag (e.g. "issue_unprofessional_tone"). Data scientists can run evals of different methods of fixing the issue.
Regression Testing: tag important customer use cases with a tag ("customer_use_case") and run evals to ensure the model doesn't regress on them prior to a new release.
Fine-tuning: exclude tags from fine-tuning datasets (golden, customer_use_case, etc), to prevent contamination.
Sort and Filter
The dataset view offers a number of tools that make working with large datasets easier:
Filter: Tap the filter button (
) to filter to specific tags
Sort: You can sort by any column by clicking its header
Multi-select: you can enter "selection" mode by clicking the select button
Select any row by clicking it
Select a range of rows by clicking the first, then holding shift while clicking the last
Batch Editing
Once you have selected rows you can perform a number of batch actions:
Add tags
Remove tags
Delete dataset items
Importing Data into you Dataset
If you already have a dataset, it's easy to import it into Kiln. Open the dataset tab, then click "Upload File" to add your data.
The format must be a CSV file with a header row. The following columns are supported:
input
[Required] - The input to the task. If the task has an input schema, this must be a JSON string confirming to that schema.output
[Required] - The output of the task. If the task has an output schema, this must be a JSON string confirming to that schema.reasoning
[Optional] - If you model is a reasoning model that output reasoning/thinking text before the output (for example, R1, QwQ, etc), you can provide that text here. This will be visibile in the UI, and availalbe for fine-tuning a reasoning model.chain_of_thought
[Optional] - If you model output chain-of-thought text before the output, you can provide that text here. This will be visibile in the UI, and availalbe for fine-tuning a thinking model.tags
[Optional] - comma separated string listing the tags you want to add to this row. For example:tag1, tag2
.

Dataset Splits
When creating a fine-tune, you can define a "dataset split". This is a frozen subset of your data.
Dataset splits may be broken into sub-sets like "train", "validation" and "test" which are useful for systematically training and evaluating models.
Dataset splits will randomly assign items between sub-sets (train/test/val), but the assignment is static. Items do not shift between subsets once the dataset split is created.
Dataset splits to not grow/change when you add new data. They are frozen at the point in time when they are created. This makes it easier to run multiple experiments (fine-tunes, evals, etc) on exactly the same training/eval datasets.
Last updated