Solved: Re: Datastore indexing issue with vertex ai

babuharsh · 01-27-2025 09:54 AM

I have submitted my website links for advanced indexing by vertex AI, and then it asked for verification of the domain.

I verified the domain and after a few hours, the status changed to indexed. But, I have checked the logs of my website and the Vertex Ai bot has not hit my website even once.

Also, when I verify the url indexing, then it says "not in index" and I don't get any answers from the data.

Is there any specific procedure to follow for successful indexing which I might be missing or will it take some time(I have already waited for 4-5 days)?

Url fomats i submitted(example): www.mydomain.com/faq/specific-page and www.mydomain.com/faq/*

babuharsh

I was able to make it crawl and index the URLs using this method:

1. Check the URL by writing down url in "check url" option.

2. Select all the urls showing "not in index" status.

3. Click on "Recrawl" option and a message will show up about the task of recrawl operation and crawling and indexing will happen within few minutes (1-15 minutes).

View solution in original post

marckevin

Hi @babuharsh,

Welcome to Google Cloud Community!

I understand you're having an issue with data store indexing with Vertex AI, having a status as “Indexed” but logs show no bot activity, and URL inspection confirms it's "not indexed" can stem from different factors.

Here are several suggestions and possible causes that may help resolve the issue:

Crawling Delay: There may be a possible delay in the actual crawling process.
Verify website domains: Verify and double check the domain of URL, you can check this documentation on how to properly verify a website domain.
Check Data Store Configuration: For best practices, you can check this documentation before indexing your data.
Website Server Issues: Verify if your web server has any issues when a bot is attempting to connect. Also ensure your web pages do not include any robots.txt that block indexing.

If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

babuharsh

I was able to make it crawl and index the URLs using this method:

1. Check the URL by writing down url in "check url" option.

2. Select all the urls showing "not in index" status.

3. Click on "Recrawl" option and a message will show up about the task of recrawl operation and crawling and indexing will happen within few minutes (1-15 minutes).