Data pipeline health is a cornerstone of an effective Detection & Response capability. A robust data pipeline ensures that security-relevant events are flowing continuously from monitored systems to the tools used by your team to detect and analyze suspicious or notable behavior. Any disruption in this flow or data quality – whether it’s a latency issue, log parsing error, or a misconfigured log source – can create blind spots in your defenses or leave you with a security investigation that’s hamstrung by missing logs.
In this blog series, we’ll explore some practical techniques for monitoring the health of your data pipeline with Google Security Operations (SecOps). In part one, I’ll explain the importance of monitoring your data pipeline and some of the monitoring & alerting features available in Google SecOps and Google Cloud. Part two will demonstrate how to implement proactive health checks to validate that logging, search, and detection is working end-to-end with Google SecOps.
Over the years, while working as a defensive practitioner at various companies, I’ve witnessed firsthand the problems that data pipeline issues can have on security investigations, detection, and incident response activities. These experiences have solidified my belief that data pipeline monitoring is not just a best practice, but an absolute necessity. Do any of these scenarios sound familiar to you?
Before diving into specific monitoring techniques, let’s establish a common understanding of the core components that make up a typical security data pipeline. These components work together to collect, process, store, and analyze security-relevant events, forming the foundation of your security monitoring capabilities. Familiarizing yourself with these building blocks will help you identify potential points of failure and vulnerabilities within your pipeline.
Please note, this is a simplified overview of the core components in a security data pipeline. In reality, these pipelines can be much more complex, involving additional stages and specialized tools depending on the specific security monitoring needs of an organization.Example security data pipeline
This is where security-relevant events originate, which includes a wide range of systems and applications. Example data sources include endpoint detection & response (EDR) agents, firewalls, cloud service providers, identity & access management (IAM) services, and Software as a Service (SaaS) applications.
Agents such as BindPlane are responsible for gathering logs and events from the data sources and forwarding them to the SIEM for ingestion. Agents may also perform some preprocessing or filtering of events to do things such as reduce data volume or redact PII.
Log forwarders are software components that run on your network and forward logs to a SIEM for ingestion. Some SIEMs also have the capability to manage feeds that pull logs directly from data sources and ingest them.
In many cases, log data needs to be processed before it is indexed in a SIEM so that it is easy for the security team to search and correlate data. Processing includes parsing/normalizing logs into a common schema, extracting timestamps, usernames, source IP addresses, and other relevant fields.
Events can be enriched before or after they’re ingested into a SIEM by the SIEM itself and/or another system/application in your data pipeline. Examples of enriching events include adding geolocation data to IP addresses, WHOIS information to domains, or incorporating threat intelligence feeds to identify known indicators of compromise.
The SIEM stores the processed & enriched security data, often in a specialized database or data lake optimized for search and analysis.
After events are ingested, normalized, indexed, and enriched, they’re available to search and for rules to detect and alert on suspicious or notable activity. These components must also be monitored in addition to the above.
Now that we've covered the importance of monitoring a data pipeline and some typical components, let's explore some monitoring techniques to help identify and address potential issues before they impact your security monitoring capabilities. This section will describe features available in Google SecOps and Google Cloud to help you maintain a robust and resilient data pipeline.
Google SecOps’ Data Ingestion and Health dashboard provides information about the type, volume, and health of data being ingested. Think of this as your “at a glance” view to monitor for data pipeline health issues in your environment. For example, visualizations on the dashboard show how many event parsing errors have occurred per log type and the ingestion throughput for the last hour, week, and so on.
Viewing the Data Ingestion and Health dashboard in Google SecOps
Monitoring Feed Status
The Feeds settings page in Google SecOps provides information about the status of each configured log feed and a timestamp for the last time the feed’s data transfer process completed successfully. In the example below, my feed that ingests Okta system logs into Google SecOps is in a failed state.
Reviewing the status of feeds in Google SecOps
Opening the details for the feed shows an error message on why the feed is in a failed state. Authentication between Google SecOps and my Okta organization failed. This issue needs to be fixed before Google SecOps can resume pulling and ingesting logs for this feed.
Reviewing an error message for a feed with a status of failed
It’s also possible to retrieve a list of feeds programmatically via the feeds.list API method and determine the status of each feed. As an alternative to monitoring the state of feeds in the feeds settings page, this code can be executed somewhere on a schedule (perhaps in a Cloud Run function) so that your team can be alerted to any feed issues.
The output from my GitHub Actions workflow below shows that my “GitHub Enterprise Audit Log Stream” feed is ok, but my “Okta System Logs” feed is in a failed state and requires attention.
Monitoring the status of feeds via Google SecOps’ REST API
Google Cloud Monitoring offers a suite of tools to monitor the performance and availability of your cloud resources, including data ingestion for Google SecOps. Below is a list of Cloud Monitoring policies that you can configure to help identify issues with data ingestion and quickly address them before your security monitoring capabilities are impacted.
For a deeper understanding on using Cloud Monitoring for ingestion notifications, I highly recommend Chris Martin’s blog post on this subject and Google SecOps’ documentation.
Please note, a prerequisite to configuring ingestion notifications in Cloud Monitoring is to have a Google Cloud project that’s bound to your Google SecOps instance.
That’s it for part one of this series where I covered the following:
Please join me in part two where I’m excited to take things to the next level and demonstrate how to implement proactive health checks to validate that your logging, search, detection, and alerting capabilities are working end to end.
Special thanks to the following people for sharing their valuable feedback and expertise: Dan Dye, Serhat Gülbetekin, Ermyas Haile, Dave Herrald, Utsav Lathia, Christopher Martin, Othmane Moustaid, and John Stoner.