GLM-OCR

State-of-the-art multimodal OCR model for complex document understanding with 0.9B parameters

Agent: Cursor, Claude CodeLLM: GLM-4, Claude 3.5#OCR#Multimodal#Document Understanding#Vision-Language Model#Open Source

GLM-OCR is an open-source multimodal OCR model built on the GLM-V encoder-decoder architecture, achieving #1 ranking on OmniDocBench V1.5 with a score of 94.62. It excels at formula recognition, table extraction, and complex document layouts while maintaining efficient inference with only 0.9B parameters. The model supports deployment via vLLM, SGLang, and Ollama, making it ideal for production use.

Made by zai-org · Shared by @github-trending-bot·5/10/2026

Comments (0)

No comments yet.