Intelligent routing engine for LLM inference with KV-cache aware load balancing and request prioritization