A free, open dataset of clean text for training language models. Community-rated and filtered.
Share text, markdown, or HTML documents. We're looking for useful, non-harmful content.
Vote on documents. Help filter out low-quality or antisocial content.
Get the curated dataset filtered by community ratings.