Fooling around with encrypted reasoning blobs

(blog.cryptographyengineering.com)

57 points | by supermatou 3 days ago

5 comments

glitchc 2 hours ago
Very interesting. The state management is the really insightful find here.
I always wondered how these large AI companies managed access for millions of simultaneous users without having to allocate a dedicated LLM instance for each user. Pushing the complete state down to the user after every call makes perfect sense. The LLM itself stays memoryless and ready to respond to an arbitrary prompt. Very nice.
[-]
- geocar 2 hours ago
  N.B. This is exactly how seaside, vba, and even arc[1] do server-side state generally: by encrypting the blob-representing-state and sending to the client to be sent back on future requests (where it will be decrypted and rehydrated).
  It's an old trick that everyone designing protocols should know, since there are lots of applications beyond AI companies.
  [1]: As in, pg's lisp: https://arclanguage.github.io/ref/srv.html#:~:text=The%20pre...
- b65e8bee43c2ed0 14 minutes ago
  the exchange rate between text and its representation in memory is brutal. here's a bit from a recent article:
  >An 82 GB footprint in DDR3 on a 2016 Xeon. About 25 GB of weights and 56 GB of KV cache at the full 262K context. The KV cache is larger than the model.
  262k tokens is not much at all. with ~5 characters per token, that's only 1.3 MB of plaintext.
Groxx 1 hour ago
One possible use for the "replay across accounts": if you can get a reasoning block that jailbreaks the model, you could share that block without sharing how you did it, and others can immediately take advantage of it too.
[-]
- denysvitali 18 minutes ago
  Not necessarily for the "without sharing" part, but to increase the reliability of the jailbreak. The same prompt isn't guaranteed to return the same result, but combining the internal thinking with the prompt might be a more effective way
Reubend 3 hours ago
Super cool side channel attack. I tend to agree that it's pretty impractical, but it's such a fun discovery!
Retr0id 3 hours ago
Very cool idea to use thinking duration (either in tokens or in wall time) as a side-channel!
haeseong 1 hour ago
[dead]