Evolving in a Multi-Tenant World

atticuslin1

Atticus Lin is a Cloud Security Manager at Arctiq and has been building detection rules, parsers, and automations in Google SecOps for the last three years. He serves as the technical lead of SecOps Onboarding and SecOps Detection Engineering.

Reid Hurlburt is a member of the automation team at Arctiq, specializing in Security Orchestration Automation & Response (SOAR) with Google SecOps. The automation team regularly develops automated playbooks which aids and/or augments the standard operating procedures of SOC analysts.

Arctiq’s Managed Extended Detection and Response (MXDR) service delivers 24/7, year-round vigilance, using Google Security Operations (SecOps) to detect, investigate and respond to threats. Our fully managed service automates security detection and response to safeguard IT infrastructure, systems, data, and more. As a Google Premier Services Partner, Arctiq has enhanced levels of enablement and partnership with Google Cloud Security to bring our customers a best-in-breed level of platform expertise and service.

At Arctiq, we were faced with an interesting problem as we experienced hypergrowth in the security operations space: “How do we manage content (e.g. detection rules, data tables, and rule exclusions) across multiple customer environments in a scalable, consistent, version-controlled, and automated way?” Enter Detection-as-Code (DAC) and our efforts in GitHub to standardize Detection Engineering at Arctiq.

Evolving in a Multi-Tenant World

Early on in our Google SecOps journey, our Detection Engineering processes involved manually logging in to customer tenants to create and maintain rules. As we onboarded more customers, a list of pain points quickly came to light:

The deployment, testing, and tuning of our rules library across our fleet of Google SecOps tenants was tedious and time-consuming. For example, our workflow for creating a new rule in a customer’s Google SecOps tenant would be to navigate to the tenant, authenticate, create the rule, initiate a test against the last two weeks of data, analyze the results of the test once it completed, tune the rule accordingly, then set the rule to alert.
Our processes were slowed by manually checking (and double-checking) to ensure that rules contained no sensitive customer data like production subnet ranges, admin/user group names, comments, or reference list entries left a tenant.
Tracking modifications to rules, both on a per-customer basis and in our master library of rules was difficult.
Answering the questions of “Who changed this rule and why?” became difficult to identify as our team grew and as leadership at various levels of technical knowledge got involved with our processes.
Review and collaboration around rule development and changes required setting up meetings and screen-sharing sessions for our remote-first team spread across North America.

The Path Forward: Embracing Detection-as-Code

After coming across a blog post written by David French about implementing Detection-as-Code with Google SecOps, we immediately recognized our north star for scalability. Detection-as-Code leverages software engineering principles to manage and deploy detection rules as code, offering significant benefits including:

Scalability across multiple environments: Our engineers can create/modify a rule once in GitHub and have the changes deployed to multiple Google SecOps tenants.
Consistency in rule deployment: Using GitHub as the single source of truth for our detection content makes it easy to test and deploy rules to protect our customers.
Version control for tracking changes to content: The process to revert to an earlier version of a rule or understand the reasons for changes to a rule is straightforward.
Automation of deployment and tuning processes: Once an engineer’s proposed changes are tested, reviewed, and approved, the changes are automatically deployed to customers.
Enhanced collaboration among security teams: The unique experience and skill sets of individuals on the team ensures that we build the best rules possible.
Increased efficiency by reducing manual tasks: Proposed changes to detection content are tested automatically prior to deploying changes to Google SecOps.
Improved accountability through clear audit trails: Changes to our detection content are tracked in GitHub and the associated artifacts (e.g. GitHub issues and pull requests) record who changed what, when, and why.

This approach modernizes Detection Engineering practices, and has allowed us to build more robust, adaptable, and consistently applied security measures. We knew that in order to grow with our customer base that we would need to invest time and resources into automation and the latest methodologies.

Custom Tool Development for MSSP Scale

Google Cloud Security’s Content Manager tool was awesome to use on a tenant-by-tenant basis, and served us well in the beginning. We saw an opportunity to customize the tooling based on our own requirements. We needed to manage content in our customer’s Google SecOps tenants at scale and concurrently. We decided to fork the GitHub repository and wrap the CLI with additional functionality.

To develop our own Detection-as-Code pipeline for Google SecOps, we wrote a custom Python-based CLI that uses a subset of the methods from Content Manager provided by Google. Changes to the code that interacts with the Google SecOps API were minimal, which meant that we could focus on other crucial tasks such as secure storage, the management of credentials & secrets, and our detection logic. By designating each tenant to its own directory, we’re able to independently manage secrets, rules, and reference lists for each customer.

We were able to distribute our custom CLI to our team’s Detection Engineers quickly, allowing them to update multiple Google SecOps tenants with new/edited rules with a single command – greatly increasing efficiency.

Reviewing the help message for our custom CLI for managing content across multiple Google SecOps tenants

The following command shows how we can push rule updates to multiple Google SecOps tenants by specifying the customer IDs.

(venv) ➜  automation git:(main) python3 rules-cli.py push --rules c2_beaconing_dns_tcp_udp.yaral --tenants clientX clientY clientZ

Automating Our Detection Pipeline

Once our custom CLI was developed and working consistently, our next step was to introduce further automation via GitHub Actions. We wanted our Detection Engineering process to follow an expedited yet intuitive workflow:

A Detection Engineer creates a pull request in GitHub containing a new rule, and specifies one or more specific customer environments for deployment.
A test of the rule logic is automatically kicked off against the last two weeks of data in the customer’s Google SecOps tenant(s), which returns an overview of the results.
A SOC analyst and a fellow engineer review, request modifications if needed, then approve the new rule.

The goal at this stage is to reduce the risk of pushing changes to customers that result in adverse effects such as false positives or worse, false negatives.
The team collaborates around proposed changes with the objective of building the most efficient and precise detections leveraging the unique experience that we have on the team.

The rule is automatically deployed to the target customer tenants once the pull request has been merged, and its state is set to “Alerting.”

Let’s take a look at the above workflow in action!

In the screenshot below, user “atticus-arctiq” has created a detection rule for CrowdStrike and has included the detection logic in a file for review in a new pull request on GitHub.

Reviewing a pull request to create a new rule

A GitHub Actions workflow is executed immediately after the pull request is created. In the example below, we can see that the “github-actions” bot executed our CLI tool to test the rule over the events ingested in Google SecOps.

Testing the rule via GitHub Actions and Google SecOps’ API

The bot leaves a comment on the pull request once the workflow has completed. Atticus can review any detections that were generated by the rule during the testing and tweak the detection logic appropriately.

Reviewing results after testing a rule against events ingested in Google SecOps

After a couple iterations via commits, a peer review of the pull request is submitted. Once the pull request is approved, the proposed changes are merged into the “main” branch of the GitHub repository. This kicks off another GitHub Actions workflow to push the changes to the specific Google SecOps tenants.

Deploying the rule via GitHub Actions and Google SecOps’ API to target tenants

The Payoff

Arctiq’s Detection Engineering capabilities are now significantly more efficient since implementing our Detection-as-Code pipeline and updated processes. We estimate that our engineers are 15-25% more efficient across the entire lifecycle of a rule. This increased efficiency comes from the time they save during automated testing, parallelized collaboration, and streamlined deployment. We have also found that our own team’s ad hoc audits of rules have become far less frequent, and when they do happen, the version history native to GitHub has resulted in an expedited process.

These efficiency gains allow us to provide greater detection coverage across our customers. Our team has more time for research and the development of new rules to detect emerging threats and more advanced attacker tradecraft. Our automated testing practices ensure greater accuracy and fidelity of our rules, which improves our customer security posture and confidence. As we look to adopt additional AI advancements into our processes, this accuracy and process control will be crucial to operating with confidence at the speed of business.

What’s Next?

With our initial Detection-as-Code pipeline now active and an integral part of our workflow, we are turning our attention to automating content creation and refining our detection rules. We plan to leverage the newest capabilities of MCP servers and LLMs for this purpose. Our existing GitHub integration provides robust version control and facilitates collaboration, enabling a streamlined "trust-but-verify" approach. This allows Gemini to effectively function as a collaborative team member whom we can trust to do security research and content creation, but in a way we can validate and have confidence in.

What we demonstrated in this post is just one of the many ways you can leverage the Google SecOps APIs to develop custom tooling that fits the workflow of your operations team. The team at Google Cloud Security is continuously developing new open source tools, such as the new Google SecOps SDK that can be leveraged in this same way – enabling many possibilities for automation and process improvements within your organization.

Acknowledgement

A special thanks to David French at Google Cloud Security for his valuable insight and feedback on this blog, as well as for his thought leadership on top of which we were able to build this pipeline.

Thank you also to Eugene Dimarsky, Google Cloud Security Partner Engineer, for his partnership and advice as we embarked on this journey.

Scaling Detection-as-Code with Google SecOps: An MSSP’s Perspective

Evolving in a Multi-Tenant World

The Path Forward: Embracing Detection-as-Code

Custom Tool Development for MSSP Scale

Automating Our Detection Pipeline

The Payoff

What’s Next?

Acknowledgement