Querying 3B Vectors

(vickiboykis.com)

37 points | by surprisetalk 3 days ago

2 comments

  • antirez 20 minutes ago
    As it always happens, people realize there is something new is Redis in 2 years or more. With Streams it tragically took like 4 years and then everybody started to use it for this use case, with a sharp acceleration in the latest few years. I believe this is what is happening for vector sets as well. For a problem like that you just git clone Redis, add the vectors into a key with VADD, and query with VSIM. It's a 10 lines Python script that will deliver 20k/50k queries per second more or less, out of the box with zero optimizations. The point of Redis vector sets is that it does not want to be yet another vector store, it just gives you HNSWs as a trivial to use data structure.

    https://gist.github.com/antirez/b3cc9af4db69b04756606ad91cab...

  • sdenton4 2 hours ago
    Depending on how 'one-off' the query is, sequential read is the right answer. The alternative is indexing the data for ANN, which will generally require doing the equivalent of many queries across the dataset.

    On the bright side, smart folks have already thought pretty hard about this. In my work, I ended up picking usearch for large-scale vector storage and ANN search. It's plenty fast and is happy working with vectors on disk - solutions which are /purely/ concerned with latency often don't include support for vectors on disk, which forces you into using a hell of a lot of RAM.

    https://github.com/unum-cloud/USearch