Hi All,
Is there any way that we can find the duplicate events ingested into chronicle. If yes, could you please share more information.
With Regards,
Shaik Shaheer
@jstoner @mikewilusz @manthavish - Could you please help me identifying the duplicate logs into chronicle.
This should be possible by using a pivot table, on the basis that the log contains a unique identifier (Global Event ID, Event ID, Log Id etc). In the following case we are using Google Chronicle's demo instance, and utilizing the 'Crowdstrike Falcon' log source, with the UDM field that contains an event's unique identifier being "metadata.product_log_id".
[1] - First we search for the log type we want, in this case 'Crowdstrike Falcon' is the following: metadata.log_type = "CS_EDR"
[2] - Navigate to 'Pivot'
[3] - Apply Pivot settings like the screenshot below (grouping by the unique identifier}
[4] - Click on the :, export the data into a .csv, and remove all the ones that are equal to "1" (which if you order by Descending will be at the bottom) :).
This should show you the Event count based on the UDM field that is grouped (in this basis we are implying that metadata.product_log_id for the 'CS_EDR' logs is a unique identifier for each log). Depending on the need of this, it is likely that the creation of a dashboard may be better suited.
Hope this helps!
Hi Ayman C,
Greetings...!!!
Thank you for your suggestion, and we attempted to implement this method. However, it makes the analyst's job tedious as they have to manually export and individually check the logs. Is there an alternative automation process available?
With Regards,
Shaik Shaheer
Hi Shaik,
Google Chronicle SIEM customers can leverage several automation strategies to check for duplicate ingested data. Here's a breakdown:
1. Hash-Based Deduplication
Mechanism:
Pros:
Cons:
2. Similarity Detection with Chronicle Rules
Mechanism:
Pros:
Cons:
3. External Data Deduplication
Mechanism:
Pros:
Cons:
Jumping into this discussion because I just want to ask if Chronicle has any builtin/automated/default mechanism that would prevent ingesting the same log in a given time frame? For example, when pulling O365 audit logs, will Chronicle know when not to ingest a log already ingested in previous pulls?
Hi Shaik,
I think there are numerous ways you could solve this. You could create a dashboard that will schedule a delivery which contains the data you need. You can receive this data via a csv to then allow for review / further automation (depending on what your goal is). This is achievable if there's a vendor-specific event identifier uniquely identifier at source of the event, an example vendor that provides this is OFFICE 365 logs. In the below screenshot I'm using the public Google Chronicle SIEM instance to test this concept, this utilises 'WORKSPACE_ACTIVITY' as an example log source to identify duplicate logs. Feel free to utilise the dashboard, and remove the filters within the table to fit your needs!
https://demo.backstory.chronicle.security/dashboards?name=7432
To use this, save the dashboard and then import it into the SIEM Dashboard (legacy dashboard) on your Chronicle instance.
Once the dashboard is imported, and modified to suit your needs, set up scheduled reporting!
Hi, how/where can we test/implement this feature? I have one log source that is randomly duplicating logs (same timestamp, etc) Assuming we cannot solve this at source, I would like it solved prior to alert generation. Is this possible currently?
Hello,
Please open a support case with the feed ID - they will fix this.
Oh, its not a problem with the feed. I just have a log source that sometimes sends duplicates (via syslog). I think this is due to the way it reads from the file and/or some artifact of the HA setup it's running that causes the duplication. While I will try to solve it at source, i was wondering if there was a way to de-duplicate in the log ingestion pipeline.