Solved: Chronicle Search API to Pull UDM data

russell_pfeifer · 12-12-2024 07:00 AM

Good morning--

Looking for some feedback on the feasibility of pulling large UDM datasets out of Chronicle using a curl command. We previously did this successfully using Splunk but I'd prefer to not waste my time going down any rabbit-holes if it is simply not possible in Chronicle.

Basically I'd like to query our Tenable vulnerability data (currently parsed out into UDM fields in Chronicle) and export 15 or so of the fields to a csv which would then be uploaded to JIRA for remediation.

I see within the API documentation there is a way to search UDM but I don't see any documentation around pulling all events from a specific log type.

I've been using the following documentation as a guide so far:

https://cloud.google.com/chronicle/docs/reference/search-api

Thanks in advance for any help you can provide

mikewilusz

The UDM search endpoint is documented here: https://cloud.google.com/chronicle/docs/reference/search-api#udmsearch

You can see one of the parameters is a query, so for your example if you want to pull your Tenable data, the query would look like:

metadata.log_type = "TENABLE_IO"

That should retrieve all the Tenable events from the time span you specify in the API request.

-mike

View solution in original post

mikewilusz

The UDM search endpoint is documented here: https://cloud.google.com/chronicle/docs/reference/search-api#udmsearch

You can see one of the parameters is a query, so for your example if you want to pull your Tenable data, the query would look like:

metadata.log_type = "TENABLE_IO"

That should retrieve all the Tenable events from the time span you specify in the API request.

-mike

russell_pfeifer

Understood -- thank you for confirming -- glad I'm not wasting my time.

mwisener

FYI udmSearch is limited to 10k results, no paging, so you would need to break up the queries into time chunks without really knowing how much data is there all while keeping under the rate limit of 2 queries per minute.

mwisener

Forgot to mention the only real solution to mass export normalized data is the bigquery export option. Request google export datalake.events to your own gcp project so you can query it at your leisure. It's not real-time and you pay storage and compute costs (assuming your instance is associated with a gcp project)

https://cloud.google.com/chronicle/docs/preview/cloud-integration/export-to-customer-managed-project

russell_pfeifer

Ahh OK so this may explain why my shell script is not actually able to pull back any data. Are you saying that if I don't use the bigquery / datalake.events option in a gcp project I'm not able to simply pull UDM events from my Chronicle instance URI? This the way we were able to pull this data using Splunk.

For example:

curl -v --ca-native https://backstory.chronicle.security/events:udmSearch?query=metadata.log_type="TENABLE_IO" --user email\*TOKEN*

As of now (not using bigquery) this does not return any events from the Tenable index.

Thanks

mwisener

No you can do that, it will just be limited to 10k rows.

See: https://cloud.google.com/chronicle/docs/reference/search-api#udmsearch

russell_pfeifer

Understood - the documentation certainly makes it look like its possible although I'm still unable to pull any UDM data with a curl command.

Have you personally been able to pull UDM data using this approach? If so, did you have to integrate Google sign in before you could access googleapis.com? The following documentation makes it seem like that might be pre-requisite:

https://developers.google.com/identity/sign-in/web/sign-in

mwisener

You need a chronicle backstory key which support can provide you if you do not have it.

I do query udmSearch (via python) - I do not use curl. There is authentication information on the reference docs provided. I use python so can use the examples, and google already provides libraries like google.oauth2 service_account that handles all the authentication stuff.

https://cloud.google.com/chronicle/docs/reference/search-api#getting_api_authentication_credentials

there is also:

https://github.com/chronicle/api-samples-python/blob/master/search/udm_search.py

russell_pfeifer

There seems to be a lot more documentation / support for python so I am going to switch over to that for my script. Thanks for all the information!

russell_pfeifer

So I've gotten the Python query up to the point where I can generate a JSON file pulling UDM events from the Tenable log type. I have also incorporated a function that writes the JSON response to a CSV.

Currently though - the query parameters are quite simple:

query = 'metadata.log_type = "TENABLE_IO"'

Where the query seems to be failing is when I go to incorporate statistical functions into my query, specifically:

# Simplified query parameters
query ='''
events:
  metadata.log_type = "TENABLE_IO" 
  $pluginID = security_result.rule_id
  $assetID = principal.asset.product_object_id

match:
  $pluginID, $assetID  

outcome:
  $Description = array_distinct(extensions.vulns.vulnerabilities.description)
  $FQDN = array_distinct(principal.hostname)
'''

Once the string is run with that query it returns this error:

Error: 400 - {
  "error": {
    "code": 400,
    "message": "generic::invalid_argument: compilation error query uses a feature that is not yet allowed: invalid argument",
    "status": "INVALID_ARGUMENT"
  }
}

Is this simply a limitation of the current query parameters?

Full python string here:

# Set up API credentials and scopes
SCOPES = ['https://www.googleapis.com/auth/chronicle-backstory']
SERVICE_ACCOUNT_FILE = r'C:\API_keys\OAuth_2_Keys.json'

# Authenticate using the service account key
credentials = service_account.Credentials.from_service_account_file(
    SERVICE_ACCOUNT_FILE, scopes=SCOPES
)
http_session = AuthorizedSession(credentials)

# API endpoint
base_url = "https://backstory.googleapis.com/v1/events:udmSearch"

# Simplified query parameters
query ='''
events:
  metadata.log_type = "TENABLE_IO" 
  $pluginID = security_result.rule_id
  $assetID = principal.asset.product_object_id

match:
  $pluginID, $assetID  

outcome:
  $Description = array_distinct(extensions.vulns.vulnerabilities.description)
  $FQDN = array_distinct(principal.hostname)
'''

start_time = "2024-11-01T00:00:00Z"  # Replace with desired start time
end_time = "2024-12-20T01:00:00Z"  # Replace with desired end time
limit = 100  # Replace with desired limit

# Build the query string
params = {
    "query": query,
    "time_range.start_time": start_time,
    "time_range.end_time": end_time,
    "limit": limit
}

# Encode the query string for the URL (properly handle spaces and special characters)
query_string = urlencode(params)

# Full URL with query parameters
url = f"{base_url}?{query_string}"

# Make the GET request
response = http_session.request("GET", url)

# Function to write JSON response to CSV file
def write_to_csv(json_data, file_path):
    if not json_data:
        print("No data to write to CSV.")
        return

    # Extract keys (column names) from the first dictionary in the list
    keys = json_data[0].keys()

    with open(file_path, 'w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=keys)
        writer.writeheader()
        writer.writerows(json_data)

# Check the response status and write to CSV
if response.status_code == 200:
    response_json = response.json()
    # Assuming the response JSON contains a list of events
    events = response_json.get('events', [])
    csv_file_path = r'C:\temp\Tenable\output1.csv'  # Replace with your desired file location
    write_to_csv(events, csv_file_path)
    print(f"Data written to {csv_file_path}")
else:
    print(f"Error: {response.status_code} - {response.text}")
    
Error: 400 - {
  "error": {
    "code": 400,
    "message": "generic::invalid_argument: compilation error query uses a feature that is not yet allowed: invalid argument",
    "status": "INVALID_ARGUMENT"
  }
}

AymanC

Hi @russell_pfeifer,

That's correct - currently Statistics and Aggregates isn't supported in the '/v1/events:udmSearch endpoint'.

Kind Regards,

Ayman

russell_pfeifer

Thanks Ayman -- good to know. I wonder if there are future plans to integrate stats / aggregates.

AymanC

Hi @russell_pfeifer,

I believe there has been mentioned of plans to / is likely.

Kind Regards,

Ayman

zoooz

@russell_pfeifer Hi, i encountered this problem too, did you manged to solve it? thanks

russell_pfeifer

hi @zoooz

See Ayman's response above. I have a formal inquiry out to Google as well but I believe that stats and aggregate commands simply aren't supported at this time given the specific error message generated by the string that attempts to incorporate them.

Regards

raybrian

Here is an example from my scratchpad which might be helpful for you:

https://gist.github.com/emeryray2002/1f3a4e19726b3fe1dffdceb930f33557

russell_pfeifer

Thanks for sharing @raybrian -- seeing if I can integrate your code into our instance now

New SecOps Webinar May 14th! Learn about Gemini's generative AI within Google SecOps

Chronicle Search API to Pull UDM data