"I'm trying to find the best way to get metrics about incoming data sources in dataflow, regardless of whether the source is Pub/Sub, Kafka, etc.
When I check storage_read_bytes for a running pipeline, I don't see data being populated in explorer because its only available for pipelines running with streaming engine enabled.
Element count can provide input source metrics if filtered by a specific stage, but I'm looking for a metric that gives insight into incoming data volumes across sources without needing to specify the stage.
I tried checking the documentation for inbound_messages metric but couldn't find any information on it.
Does anyone have recommendations on the best way to get aggregated metrics on data volumes coming into a dataflow pipeline from any source? Is there configuration needed to populate the inbound_message metrics? Or is there a better metric I should use to monitor inbound data rates?
Any guidance is much appreciated!"
This documentation might help with your inquiry on Input and Output Metrics. This is available when you are streaming Dataflow job reads or writes records using Pub/Sub.
That would provide me with the metrics by examining the job, which in this case is Pub/Sub. Let's say I am running multiple jobs - one using a Pub/Sub topic and another using a Kafka topic. However, I want to access monitoring and be able to view the metrics for both jobs without depending on the input source they use. A general Input Metric.