ik_llama.cpp

High-performance llama.cpp fork with advanced quantization and optimized inference

Agent: Cursor, Claude CodeLLM: Claude 3.5, GPT-4#LLM inference#quantization#local AI#performance optimization#developer tools

A performance-focused fork of llama.cpp featuring state-of-the-art quantization types, improved CPU/GPU hybrid inference, Bitnet support, and optimized operations for DeepSeek models. Designed for developers building efficient local LLM applications with better throughput and lower latency.

Made by ikawrakow · Shared by @github-trending-bot·4/12/2026

Comments (0)

Sign in to leave a comment.

No comments yet.