In a leaky basement with no windows an alarm sounds, dDefenders are given a set of observed adversary actions and must make a decision that will affect the direction of a security event investigation. The stakes are high as the wrong decision can either misdirect analysts or prove a costly mistake giving the adversary room to maneuver further. This is not a SOC at a real company but a proof-of-concept game called Attack Poker. I presented the idea during a lightning talk at ATT&CKCon 2.0 . Ever since working on that project I believed that only by analyzing chains of ATT&CK tTechniques could we begin to improve our abilityies to stop adversaries.
A small group of like minded individuals discussed the future of using ATT&CK to make it easier to prioritize alert data at ATT&CKCon 4.0. These individuals included my colleague Andy Shepard and Ingrid Skoog who was the head of R&D for the Center for Threat Informed Defense (or Center) at the time. The discussion turned into a project proposal at Center. Many Ccenter members sponsored the project and their input and data contributions further refined the idea. With sponsors, Center researchers worked endlessly on the project. This combined effort resulted in the publication of the Techniques Inference Engine (TIE).
The concepts that led to TIE were inspired by two previous Center research projects. The first was ATT&CK Flow that provides a method to describe entire chains of techniques and their associated STIX objects. Flow was very inspiring and provided an emphasis away from just atomic techniques and has moved us closer to analyzing sequential adversary behavior. The second project was known at the Top ATT&CK Techniques. This project was the first attempt to identify choke points within the matrix that would limit adversary movements depending on the techniques chosen. The existence of choke points along the matrix is a key point to understand. It is said that adversaries can move freely amongst the techniques, but this is not necessarily true. Depending on the goal of the adversary, they will choose certain techniques at a point in time that makes it possible to find which techniques are most likely to be part of the path they must take to achieve their goal. These two previous projects were extremely important as steps towards formulating the approach that would be taken with TIE.
The largest obstacle to achieving the goals of the project was collecting the dataset that would be used to train the ML model. There were two main difficulties to overcome. The first was the needed size of training data. One major problem when developing a model is possible overfitting. This happens when the data set is either too small in size, or does not have a large enough variation in the data to accurately represent the correlation between data points. When overfitting becomes too high, the data no longer represents a true relationship with reality, andbut instead just reinforces the data present and thus no longer has useful information. The findings at this point will not help the analyst as they are mostly just repeating the small amount of data available. The second problem faced were the biases that might be brought into the model depending on the data chosen and more importantly the variation within the data. For this reason no emulated data was used, as this data might possibly represent what security teams typically test TTPs versus what is actually observed in adversary behavior in the wild. Although such data would have increased both volume and variation, the bias could have severely skewed results. The dataset also had to be chosen to allow the largest volume of data from CTI reports to be included. An early decision made was that concurrent techniques without regard for order would be used. This type of concurrent analysis versus sequential meant that more available data could be used in the model. Currently there is not enough sequential data to provide a good dataset for the models considered. The dataset collected is one of the largest collections of concurrent TTPs available and is a very significant outcome of the project.
The first available method is web application. The steps to use it include choosing the techniques that have been observed, then results are the most likely concurrent techniques in descending order. Also included are a number of convenient filters and ordering choices. The filters include Platform, Group and Campaign. One of the most convenient features is the ability to group the techniques according to tactics to give better context regarding the goal of the adversary. For advanced users a python dev notebook is available.
I'm most excited about the future direction of this research. With TIE complete, the project will lead to further research and innovative ways to analyze technique chains. My goals include exploring the use of ATT&CK Flow and LLMs to capture these sequential chains. ATT&CK is often called a common language, and ATT&CK Flow provides the perfect structure to describe sequential technique analysis within that language. The challenge lies in building a comprehensive database, but LLMs could help by automatically generating ATT&CK Flows from observed adversary behavior. I believe that as the community uses TIE, we'll see both novel applications and a clearer picture of its future potential.