Hi, I have created data pipeline (batching) from GCP Datastore to BigQuery using Dataflow. Our team have concern about the data metrics such as total count, null attributes, and etc. We have retrieved those data from __stats__ entity of Datastore which is updated every 24-48 hours. I have found that the __stats__ is always updated for 48 hours (2 days behind latest data ingestion). Is there any workaround to get latest statistics of the Datastore entities (for yesterday maybe), also any step to ensure its quality like completeness, etc? (We want to use that __stats__ entity for GCP Datastore's data quality assessment)
Hi @yogocik,
The best way to get a consistent view of the statistics is to query for the statistic entity with the most recent timestamp, then use that timestamp value as a filter when fetching other statistic entities.
Thank you.