Hi, Community!
I have quite a big number of proxies under my support.
Management would like to collect analytics data
(app names, user names, dates-times & so on)
from all of the proxies & export data to BigQuery.
Exported data is going to be consumed &
processed by some Machine Learning.
Given:
My question is, what are the best practices for the "Data Collectors" & "DataCapture" policy?
Specifically, from the documentation & my tiny Proof Of Concept, it seems to me that
at first look,
Apigee Hybrid captures data, e.g. emails from all the many proxies
into the single "dc_req_email" resulting that all the emails from all proxies being stored mixed,
"Many proxies - to One data collector (emails)".
However, in the DataCapture policy | Apigee X | Google Cloud there is a note:
I am hoping that maybe @dknezic @dchiesa1 @shirishv @markjkelly @kurtkanaskie has some advice.
Many dimensions and metrics are already captured in analytics by default, and this also includes the ones you're referring to. You can refer to the list here
The data capture policy can be used to supplement this list with your own additional analytics custom dimensions. As mentioned, you'll want to have only a single data capture policy in your API proxy if you end up needing to use it.
The built in analytics collects aggregate data based on the metrics and dimensions, but not message content (e.g. headers, request body content), so you would need to use the Data Capture policy for those fields. You can place multiple Capture elements in that policy.
For example:
<DataCapture name="DC-custom" continueOnError="false" enabled="true">
<IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
<Capture>
<Collect ref="flow.username" default="0"/>
<DataCollector>dc_req_username</DataCollector>
</Capture>
<Capture>
<Collect ref="flow.useremail" default="0"/>
<DataCollector>dc_req_email</DataCollector>
</Capture>
</DataCapture>
Since you've looked at Data Capture you may have seen Exporting data from Analytics which describes how you can export data to Big Query. This will include your custom data.