Re: Vectors not indexing : Dense vector count appe...

KJ_24 · 04-07-2025 01:21 PM

Hi ,
I’m using Vertex AI Vector Search (streaming update method) and having trouble indexing datapoints. My data is cleaned and is vertex ai ready in accordance with online guidance. The initial CSV was uploaded to big query and then exported as a json to build the vector.

Despite successful 200 OK responses from the API, the index remains empty:

No vectors are showing in vectorCount
Queries return empty []
Even when using the exact same 384-dim vector used during indexing

The core issue

It seems that no vectors are actually being ingested into the index, even though the API call completes successfully. The index is created and deployed fine, but querying returns nothing — and when checking the index info its because the vectors just aren’t there.

Setup and what we've tried:

Index created with dimensions: 384, DOT_PRODUCT_DISTANCE, STREAM_UPDATE
Data successfully transformed from BigQuery CSV → valid JSON
Sample rows of 10 contain properly formatted vectors (embedding_0 to embedding_383)
All restricts fields (e.g. restricts_keywords, restricts_devices) were formatted and validated and below the character length permittes.
JSON payload looks correct — matches API spec
Used curl POST
The deployedIndexId is valid and matches what we see from

What we suspect:

The vectors may be silently rejected, maybe either due to formatting (e.g. invalid datapoint ID, malformed restricts) or because the endpoint isn’t fully enabled for streaming upserts. But no error message is returned, and no logs in log explorer making this very hard to debug.

We’d really appreciate guidance on:

How to verify whether vectors were actually stored
Any known issues that cause silent failure of upsertDatapoints
How to confirm if our deployed index is truly ready for streaming ingestion

Thanks so much in advance, we’ve exhausted documentation and have reached out last week but not had a response back from GCP support and would love any help getting this unblocked.

ruthseki

Hi @KJ_24,

Welcome to Google Cloud Community!

To address your questions about Vertex AI Vector Search and streaming upserts:

1. How to Verify Whether Vectors Were Actually Stored:

Query the Index: This is the most direct method. After upserting, immediately query the index using the exact same vector (or a very similar vector) that you just indexed. Be sure your query is set up to return all results, not just the top N. If you get an empty result, that confirms the vector was not successfully stored. Remember to account for potential latency; wait a minute or two after upserting before querying.
Get Index Statistics: The primary indicator is monitoring the vectorCount metric of the index resource itself. Use the Google Cloud Console or the gcloud command-line tool to fetch the index's details:
gcloud ai indexes describe INDEX_ID --region=REGION

Look for the vectorCount field in the output. This value should increment after successful upserts. Be aware that there can be a delay (minutes to hours) before this metric is fully updated, so this is not a real time measurement of what has just been upserted.
Metric Monitoring: Use Cloud Monitoring to track the aiplatform.googleapis.com/index/vectors metric for your index. This shows the number of vectors in the index over time. Again, expect a delay in seeing updates.
Sampling with Larger Datasets: If you're indexing a large dataset, don't try to check every vector. Instead, create a set of "known good" vectors and their corresponding datapoint IDs. Upsert these, then query specifically for them to verify ingestion is working at all.

2. Any Known Issues That Cause Silent Failure of upsertDatapoints:

Here are the most commonly reported causes of silent failures with upsertDatapoints in Vertex AI Vector Search, especially using streaming updates:

Non-Unique datapoint_id Values: Vertex AI Vector Search requires all datapoint_id values within an index to be globally unique across all upsert operations. If you re-use IDs, the upsert may succeed (200 OK), but the vector will not be updated, or worse, it could silently fail without throwing a visible error.
Restrict Field Issues (Formatting and Limits): Even if you think you're within the limits, double-check the encoding and character counts of your restrict fields. Ensure there are no unexpected special characters or whitespace. Try removing all restrict fields temporarily to see if that resolves the issue.
Large Request Sizes/DataPoint Size Limits: There's a limit to the total size of a single upsertDatapoints request and the size of individual DataPoint objects. If you're sending large batches of vectors or individual datapoints are large due to extensive metadata, reduce the batch size.
Data Validation Issues: While the API might return a 200 OK, there could be underlying data validation issues (e.g., invalid data types in metadata fields) that prevent the vector from being indexed. Log everything.
Service Account Permissions (Insufficient): Double-check that the service account your application or code is using has the necessary permissions to write to the Vector Search index. Ensure that either roles/aiplatform.dataOwner OR BOTH roles/aiplatform.user AND roles/storage.objectViewer (if applicable) roles are assigned to the Service Account.
Vector Format Issues: While less common, extremely small, or extremely large vector values could cause issues. Normalize the vectors (ensure they are unit length) before indexing.
Undeployed Index Configuration Changes: If you made any changes to the Index configuration after the Deployed Index was provisioned, those changes will not be reflected in the Deployed Index until you deploy a new Deployed Index based on the updated Index. You can check your index configuration by following the steps in the “View index configuration” section below.
Resource contention: The service may be experiencing a temporary period of resource contention. Backoff logic is highly recommended in your code to retry the call to upsertDatapoints.
Incomplete Index Endpoint Initialization: The deployed index endpoint might not be fully initialized immediately after deployment, and can take up to 15 minutes. Retry upserts after a delay.
Region Inconsistency: Make sure that all resources (client code, index, and deployed index) are in the same Google Cloud region.

3. How to Confirm if Our Deployed Index Is Truly Ready for Streaming Ingestion:

Check the Deployed Index Endpoint Status in the Cloud Console: Go to the Vertex AI section in the Google Cloud Console, navigate to your index, and then to the deployed index. The status should be "Active" or "Ready."
Initial Test Upsert/Query Cycle: After deploying the index, wait a few minutes (5-10). Then, perform a single upsertDatapoints call with a small number of vectors (1-2). Immediately query the index for those vectors. If the query returns results, the index is likely ready for streaming ingestion. This verifies the end-to-end pipeline.
Monitoring Latency Metrics: Monitor the request latency metrics for the deployed index endpoint. High latency or a large number of timeout errors can indicate that the endpoint is not fully ready or is experiencing issues. The absence of latency spikes or timeout errors gives some confidence that it is ready.
Retry Logic with Backoff: Implement retry logic with exponential backoff in your code that calls upsertDatapoints. This helps handle transient errors that can occur during the initial startup phase of the deployed index endpoint.
Wait Before Upserting (Best Practice): As a general best practice, always wait a reasonable amount of time (e.g., 5-10 minutes) after deploying the index endpoint before starting the main streaming upsert process. This gives the system time to fully initialize.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

KJ_24

Thanks we managed to find the solution ourselves. Basically the person that we reached out to a GCP support via our account manager prior to this issue happening had given us incorrect advice. We eventually managed to figure it ourselves. Thank you anyway

saroyogi

I am facing the same issue. please let me know how you have solved

igodfried

Yes I'm having this issue too. The silent failures are really ridiculous you need better error messages

KJ_24

The silent errors are awful and time consuming.
Not sure if this will be helpful but our issue is below... but maybe it was a rookie error!

Our CSV looked fine (IDs, 384D embeddings, restricts as quoted strings), but Vertex Vector Search wouldn’t create the index. No errors, no clues.

Reached out to GCP support shared on a google meet that they had no problems converting with our csv sample data into an index and just said it's an easy fix : “Just convert to JSON in BigQuery and stream it in.” They showed us how they created an index from our sample. And yes it created an index! So we did exactly the same steps using exactly the same sample we shared with them.

But as we later found out- it was an empty index! We couldn't get any more GCP support. We could only assume that everything up to the point they gave the advice was correct.

Turns out our CSV sample wasn’t clean enough to convert properly to a json. The engineers advice to just convert it lost us so much time. Our restricts and embeddings weren’t parsed correctly. The root formatting issue was never flagged, we had to figure it out ourselves.

Once we rebuilt a proper JSON and uploaded via Python SDK, it worked immediately. Basically formatting is key.

Lesson:
Just because your CSV looks fine doesn’t mean it will upload properly.
If you’re using embeddings or restricts, skip the guesswork and use JSON from the start, and double-check GCP’s formatting requirements. Don't assume the support engineers know everything!

Vectors not indexing : Dense vector count appears empty in Vertex AI

The core issue

Setup and what we've tried:

What we suspect: