The Pain of Broken Pipelines

David-French · 09-25-2024 07:02 AM

Data pipeline health is a cornerstone of an effective Detection & Response capability. A robust data pipeline ensures that security-relevant events are flowing continuously from monitored systems to the tools used by your team to detect and analyze suspicious or notable behavior. Any disruption in this flow or data quality – whether it’s a latency issue, log parsing error, or a misconfigured log source – can create blind spots in your defenses or leave you with a security investigation that’s hamstrung by missing logs.

In this blog series, we’ll explore some practical techniques for monitoring the health of your data pipeline with Google Security Operations (SecOps). In part one, I’ll explain the importance of monitoring your data pipeline and some of the monitoring & alerting features available in Google SecOps and Google Cloud. Part two will demonstrate how to implement proactive health checks to validate that logging, search, and detection is working end-to-end with Google SecOps.

The Pain of Broken Pipelines

Over the years, while working as a defensive practitioner at various companies, I’ve witnessed firsthand the problems that data pipeline issues can have on security investigations, detection, and incident response activities. These experiences have solidified my belief that data pipeline monitoring is not just a best practice, but an absolute necessity. Do any of these scenarios sound familiar to you?

The Case of Missing Logs – The security team was asked to conduct an investigation on a user’s activity for the last year and finds out that a system stopped shipping its user authentication logs to the SIEM three months ago. The team carries out its investigation with missing data.
The Schema Shift Surprise – The vendor of a SaaS platform that the team monitors for suspicious activity changed their logging schema in a minor update deployed to all customers. Various field names in the log have changed and the team’s rules that match on specific field names and values continue to run, but will never fire an alert. The security team ends up failing to detect malicious activity as a result of this.
The Latency Nightmare – During an active intrusion, the security team was monitoring for evidence of a threat group’s known attack techniques in the logs from its cloud-based IAM service. A delay in log ingestion into their SIEM prevented the team from taking timely action to remove the attacker’s footholds from various systems.
The “Everything is Fine” Deception – A cloud-based platform stopped sending its logs to a storage bucket where they’re ingested from by the organization’s SIEM. The SIEM didn’t alert on any logging issues as it was still able to authenticate to and list the contents of the storage bucket. Since the security team's rules to monitor the cloud-based platform weren't being fed any logs, they never alerted on suspicious activity.

Core Components of a Security Data Pipeline

Before diving into specific monitoring techniques, let’s establish a common understanding of the core components that make up a typical security data pipeline. These components work together to collect, process, store, and analyze security-relevant events, forming the foundation of your security monitoring capabilities. Familiarizing yourself with these building blocks will help you identify potential points of failure and vulnerabilities within your pipeline.

Please note, this is a simplified overview of the core components in a security data pipeline. In reality, these pipelines can be much more complex, involving additional stages and specialized tools depending on the specific security monitoring needs of an organization.Example security data pipeline

Data Sources

This is where security-relevant events originate, which includes a wide range of systems and applications. Example data sources include endpoint detection & response (EDR) agents, firewalls, cloud service providers, identity & access management (IAM) services, and Software as a Service (SaaS) applications.

Collection Agents & Forwarders

Agents such as BindPlane are responsible for gathering logs and events from the data sources and forwarding them to the SIEM for ingestion. Agents may also perform some preprocessing or filtering of events to do things such as reduce data volume or redact PII.

Log forwarders are software components that run on your network and forward logs to a SIEM for ingestion. Some SIEMs also have the capability to manage feeds that pull logs directly from data sources and ingest them.

Processing & Enrichment

In many cases, log data needs to be processed before it is indexed in a SIEM so that it is easy for the security team to search and correlate data. Processing includes parsing/normalizing logs into a common schema, extracting timestamps, usernames, source IP addresses, and other relevant fields.

Events can be enriched before or after they’re ingested into a SIEM by the SIEM itself and/or another system/application in your data pipeline. Examples of enriching events include adding geolocation data to IP addresses, WHOIS information to domains, or incorporating threat intelligence feeds to identify known indicators of compromise.

Storage

The SIEM stores the processed & enriched security data, often in a specialized database or data lake optimized for search and analysis.

After events are ingested, normalized, indexed, and enriched, they’re available to search and for rules to detect and alert on suspicious or notable activity. These components must also be monitored in addition to the above.

Monitoring Your Data Pipeline Health with Google SecOps and Google Cloud

Now that we've covered the importance of monitoring a data pipeline and some typical components, let's explore some monitoring techniques to help identify and address potential issues before they impact your security monitoring capabilities. This section will describe features available in Google SecOps and Google Cloud to help you maintain a robust and resilient data pipeline.

Monitoring Data Ingestion & Health

Google SecOps’ Data Ingestion and Health dashboard provides information about the type, volume, and health of data being ingested. Think of this as your “at a glance” view to monitor for data pipeline health issues in your environment. For example, visualizations on the dashboard show how many event parsing errors have occurred per log type and the ingestion throughput for the last hour, week, and so on.

Viewing the Data Ingestion and Health dashboard in Google SecOps

Monitoring Feed Status

The Feeds settings page in Google SecOps provides information about the status of each configured log feed and a timestamp for the last time the feed’s data transfer process completed successfully. In the example below, my feed that ingests Okta system logs into Google SecOps is in a failed state.

Reviewing the status of feeds in Google SecOps

Opening the details for the feed shows an error message on why the feed is in a failed state. Authentication between Google SecOps and my Okta organization failed. This issue needs to be fixed before Google SecOps can resume pulling and ingesting logs for this feed.

Reviewing an error message for a feed with a status of failed

It’s also possible to retrieve a list of feeds programmatically via the feeds.list API method and determine the status of each feed. As an alternative to monitoring the state of feeds in the feeds settings page, this code can be executed somewhere on a schedule (perhaps in a Cloud Run function) so that your team can be alerted to any feed issues.

The output from my GitHub Actions workflow below shows that my “GitHub Enterprise Audit Log Stream” feed is ok, but my “Okta System Logs” feed is in a failed state and requires attention.

Monitoring the status of feeds via Google SecOps’ REST API

Using Cloud Monitoring for Ingestion Notifications

Google Cloud Monitoring offers a suite of tools to monitor the performance and availability of your cloud resources, including data ingestion for Google SecOps. Below is a list of Cloud Monitoring policies that you can configure to help identify issues with data ingestion and quickly address them before your security monitoring capabilities are impacted.

Monitoring for silent Google SecOps forwarders or collection agents
Monitoring for spikes in the total ingested log size
Monitoring for log parsing errors

For a deeper understanding on using Cloud Monitoring for ingestion notifications, I highly recommend Chris Martin’s blog post on this subject and Google SecOps’ documentation.

Please note, a prerequisite to configuring ingestion notifications in Cloud Monitoring is to have a Google Cloud project that’s bound to your Google SecOps instance.

Wrap Up

That’s it for part one of this series where I covered the following:

The importance of monitoring a data pipeline for issues before a security team is blindsided by missing logs or malicious activity going unnoticed
Core components that make up a basic security data pipeline
Techniques for monitoring your data pipeline for issues using Google SecOps and Google Cloud

Please join me in part two where I’m excited to take things to the next level and demonstrate how to implement proactive health checks to validate that your logging, search, detection, and alerting capabilities are working end to end.

Additional Reading

Detection Engineering Demystified: Building Custom Detections for GitHub Enterprise (Recording, Slides) – David French
Chronicle Forwarder Telemetry via Google Cloud Monitoring – Chris Martin
Building Better Hunt Data – Josh Liburdi
Chronicle Ingestion Stats & Metrics – Chris Martin
Using Cloud Monitoring for ingestion notifications
Enrichment in Google SecOps – Chris Martin

Acknowledgements

Special thanks to the following people for sharing their valuable feedback and expertise: Dan Dye, Serhat Gülbetekin, Ermyas Haile, Dave Herrald, Utsav Lathia, Christopher Martin, Othmane Moustaid, and John Stoner.

New SecOps Webinar May 14th! Learn about Gemini's generative AI within Google SecOps

Practical Techniques for Monitoring Your Security Data Pipeline (1 of 2)

The Pain of Broken Pipelines

Core Components of a Security Data Pipeline

Data Sources

Collection Agents & Forwarders

Processing & Enrichment

Storage

Monitoring Your Data Pipeline Health with Google SecOps and Google Cloud

Monitoring Data Ingestion & Health

Using Cloud Monitoring for Ingestion Notifications

Wrap Up

Additional Reading

Acknowledgements