Hello,
I recently integrated an S3 bucket for one of our application logs. The logs are in JSON format, and the files in the S3 bucket are named in the format FileName.json.gz.
SecOps is successfully pulling the logs, but I'm encountering an issue with the log format. The logs are breaking improperlyโeither after each line, comma, or curly brace { } โresulting in an incorrect format when ingested.
Iโve double-checked the basics but havenโt been able to pinpoint the root cause of this issue.
Has anyone faced a similar problem? If so, how did you resolve it? Any guidance or troubleshooting steps would be greatly appreciated.
Thanks in advance for your help!
Chronicle SIEM requires newline delimited JSON at present, e.g., each JSON log is on a single line and ends with a newline character.
Are your source logs pretty printed (multiple lines) rather than newline delimited? If so, that could the cause. and as far as I know there is no way to collect these with the native Feed Management, and rather it requires either 1) a custom ingestion solution, or 2) you update the source logging to output in single line JSON.
Hello @cmmartin_google
Yes, logs are in formatted JSON and unfortunately there is no way to change source logging to output in single line JSON.
To create a custom ingestion, I can write python script. But can you please help me what need to import to support BOTO3 library in IDE?
Example Python scripts for Chronicle SIEM are available at the Google SecOps Github repo: https://github.com/chronicle/api-samples-python Specifically you'd want to send the flattened JSON via the https://github.com/chronicle/api-samples-python/blob/master/ingestion/create_unstructured_log_entrie... endpoint (ideally as a batch, up to 1MB per request).
An alternative, that may work at low volume, is mount the S3 storage into a VM, and use our Chronicle SIEM Collector (BindPlane) with the FileLog receiver, which supports multi-line logs https://observiq.com/docs/resources/sources/filelog
Hello @cmmartin_google
Thanks for sharing the repo. but unable to find related to S3 bucket.
For the reading from an AWS S3 bucket I would look at something like: https://docs.aws.amazon.com/code-library/latest/ug/python_3_s3_code_examples.html
You could adapt an example from https://github.com/chronicle/ingestion-scripts to work with S3.
Hello @cmmartin_google
Thanks for sharing, but I am not able to "import boto3" in IDE. Getting error ==> ModuleNotFoundError: No module named 'boto3'
also not able to find anything for AWS or S3 in https://github.com/chronicle/ingestion-scripts
I tried installing AWS S3 from the Marketplace to explore the code and see how Boto3 is used in an IDE. Surprisingly, same issue with this as well. When I try running the code, Iโm getting an error related to Boto3, indicating that itโs not being recognized.
Apologies, if I understand the question it's about setting up the AWS SDK (Boto3) in your IDE, which are areas beyond what I can offer support for.
That said, there is a Boto3 plugin for VSCode which may be a starting point - https://marketplace.visualstudio.com/items?itemName=Boto3typed.boto3-ide
Hello @cmmartin_google
Thank you.
I found the solution now for it. we need to import "boto3" library to integration and dependency library "jmespath"