Hi all,
I am currently using Spanner as a vector database. I get similarity search results using the ORDER BY COSINE_DISTANCE() function to find the most similar vector embeddings to the search query.
As I scale up my vector database to ~2 million embeddings, these queries take multiple minutes to execute. I have looked into the new vector indexing feature to get APPROX_COSINE_DISTANCE() results, but I cannot use the vector index feature because it is in preview and I have less than 1 node (ie. granular instance).
I would love to hear thoughts on either: (1) Do you know how long will it take for the VECTOR INDEX feature to become available to all Spanner users (General Availability status instead of Preview)? (2) Do you know any other ways to speed up this search, other than reducing vector dimensions?
Any ideas welcome. Thanks!
ANN is not available on the granular instances at the moment and minimum of 1 node needed. This is likely to change in near future so please stay tuned and thanks for your patience.
Thanks for your response @kshenoy ! I appreciate the Google Cloud team's work on expanding ANN availability. I was hoping this was the case, since these features are pretty new. In the meantime I'll work on my own solution to help reduce load times.
Just a quick update on this thread. A couple of things to note:
Relevant docs: https://cloud.google.com/spanner/docs/reference/standard-sql/data-definition-language (search for "vector_length")