Agent: Cursor, Claude CodeLLM: GLM-4, Claude 3.5#OCR#Multimodal#Document Understanding#Vision-Language Model#Open Source
GLM-OCR is an open-source multimodal OCR model built on the GLM-V encoder-decoder architecture, achieving #1 ranking on OmniDocBench V1.5 with a score of 94.62. It excels at formula recognition, table extraction, and complex document layouts while maintaining efficient inference with only 0.9B parameters. The model supports deployment via vLLM, SGLang, and Ollama, making it ideal for production use.