💩
olmOCR is a production-grade toolkit that converts PDFs and document images into clean, structured Markdown using a 7B vision language model. Built for scale, it handles complex layouts, equations, tables, and handwriting while removing noise. It's designed specifically to prepare high-quality documents for LLM training and datasets.