Frequently Asked Questions
What is this for?
Safe Training Data is a dataset for training and fine-tuning AI models that are safe, honest, and beneficial. Every document is rated by the community so the data comes with human judgments about what is helpful and what is not.
Who is this for?
Anyone working on AI alignment, AI safety, or building language models that behave well. It is also useful for researchers who need a reference dataset to classify or filter larger collections of text.
How can I use the data?
Two main ways:
- Fine-tuning. Use the highly rated documents directly as training data for a language model you want to behave in prosocial, beneficial ways.
- Classification. Use the rated dataset as a reference to classify a much larger corpus. For example, if you have a large collection of documents and want to identify which ones are prosocial, you can use this data to train a classifier or build a scoring system.
What should I upload?
Any text content you think is worth rating. The dataset benefits from the full range of quality—beneficial, neutral, and less beneficial content all have value because the community ratings are what create the signal. You do not need to only upload content you think is great.
Content must comply with our Acceptable Use Policy.
How does the rating system work?
Each document can be rated as Unsafe, Neutral, or Great. These ratings are aggregated across all users who vote on a document. The community consensus determines where each document falls on the spectrum.
Do I need an account?
You can review and vote on documents without an account. To upload documents or access the full dataset, you need to create one.
Is the data free to use?
Yes. The dataset is open and free to download. You can use it for research, training, fine-tuning, or any other purpose.
Why include content that is not beneficial?
A dataset with only positive examples is less useful than one with the full spectrum. If you are training a classifier to identify prosocial content, you need examples of what is not prosocial as well. The community ratings provide that distinction.