TurboQuant: Redefining AI efficiency with extreme compression

(research.google)

109 points | by ray__ 3 hours ago

7 comments

amitport 15 minutes ago
This is a great development for KV cache compression. I did notice a missing citation in the related works regarding the core mathematical mechanism, though. The foundational technique of applying a geometric rotation prior to extreme quantization, specifically for managing the high-dimensional geometry and enabling proper bias correction, was introduced in our NeurIPS 2021 paper, "DRIVE" (https://proceedings.neurips.cc/paper/2021/hash/0397758f8990c...). We used this exact rotational approach and a similar bias correction mechanism to achieve optimal distributed mean estimation. I also presented this work and subsequent papers in a private invited talk at Google shortly after publication. Given the strong theoretical overlap with the mechanisms in TurboQuant and PolarQuant, I hope to see this prior art acknowledged in the upcoming camera-ready versions.
benob 1 hour ago
This is the worst lay-people explanation of an AI component I have seen in a long time. It doesn't even seem AI generated.
[-]
- BenoitP 0 minutes ago
  It is AI generated. Or was written by someone a bit far from the technical advances IMHO. The Johnson-Lindenstrauss Lemma is a very specific and powerful concept, when in the article the QLJ explanation is vacuous. A knowledgeable human would not have left the reader wanting for how that relates to the Lemma.
- spencerflem 58 minutes ago
  I think it is though-
  “ TurboQuant, QJL, and PolarQuant are more than just practical engineering solutions; they’re fundamental algorithmic contributions backed by strong theoretical proofs. These methods don't just work well in real-world applications; they are provably efficient and operate near theoretical lower bounds.”
  [-]
  - benob 53 minutes ago
    Maybe they quantized a bit too much the model parameters...
bluequbit 1 hour ago
I did not understand what polarQuant is.
Is is something like pattern based compression where the algorithm finds repeating patterns and creates an index of those common symbols or numbers?
[-]
- Maxious 1 hour ago
  https://mesuvash.github.io/blog/2026/turboquant-interactive/ has a little visualisation
  [-]
  - spencerflem 51 minutes ago
    I like the visualization, but I don’t understand the grid quantization. If every point is on the unit circle aren’t all the center grid cords unused?
    [-]
    - vincnetas 23 minutes ago
      i think grid can be a surface of the unit sphere
- mrugge 1 hour ago
  1. Efficient recursive transform of kv embeddings into polar coordinates 2. Quantize resulting angles without the need for explicit normalization. This saves memory via key insight: angles follow a distribution and have analytical form.
  [-]
  - quotemstr 47 minutes ago
    Reminds me vaguely of Burrows-Wheeler transformations in bzip2.
maurelius2 16 minutes ago
I'm somewhat at a loss here other than understanding the fundamentals. Can someone tell me how the compression impact performance?
moktonar 30 minutes ago
Aren’t polar coordinates still n-1 + 1 for radius for n-dim vector? If so I understand that angles can be quantized better but when radius r is big the error is large for highly quantized angles right? What am I missing?
[-]
- amitport 28 minutes ago
  r is a single value per vector. You don't have to quantize it, you can keep it and quantize the billion+ other coordinates of the vector.
mohsen1 2 minutes ago
[dead]
hikaru_ai 54 minutes ago
[dead]