AI & MLTurboQuant: How Google Shrinks the LLM KV Cache 6×
Google's TurboQuant squeezes an LLM's KV cache to about 3 bits per value with near-zero quality loss. Here's the rotation-plus-one-bit trick, decoded.
10 min
1 article
All articles tagged with #quantization.