Segment is a customer data hub that allows customers to track data from all their tools with a single tracking code, and then Segment servers integrates the data to the right application, in the right way. This allows you to follow individual user event streams and compare cross-platform analytics. A sampling of the many types of analysis that come with this Block include:
Integrations include many tools, including analytics tools, email schedulers, customer success tools, and a myriad of others. You can also check Segment’s partner site for the remaining integration partners. A sampling of these tools is included below:
Segment offers a product called Segment Warehouses, in which data can be piped into Redshift, Postgres, Google BigQuery, or Snowflake. Looker is an Integration Partner for Segment, meaning that users can connect their database to Looker and explore the Segment event data.
Please note that Source Blocks are only available to current and prospective customers. Please reach out to your assigned Looker Analyst for questions, assistance implementing this block, and access to the LookML code.
Included below are some sample screenshots of a few out of the many dashboards that can be created with this Block:
Redshift or Postgres - The LookML for this block can be found in this Github repo.
BigQuery - The LookML for this block can be found in this Github repo
Snowflake - The LookML for this block can be found in this Github repo
You can either download the entire block into your Looker application by following the directions outlined here, or selectively migrate pieces of the block by simply copying + pasting the block LookML into your Looker instance.
If you don’t have a Github account, we encourage you to create one for easy access to this block. If you don’t have access to the repo, or cannot make a Github account, please contact a Looker Analyst or email blocks@looker.com and we’d be happy to help.
@Dillon_Morrison Why do you use the received_at
timestamp instead of the sent_at
timestamp, for example in track facts. Isn’t sent_at
the real time at which the event gets fired?
In my experience with Segment, using sent_at
is unreliable due to the sender’s clock, especially when that is a browser-side event.
In addition to that Segment also uses receveid_at as their sort key.
Sort Keys:
All tables use received_at for the sort key as we have found that timestamp to be the most reliable indicator of when an event was sent. We recommend using received_at for all time sorted queries as those queries will run considerably faster.
Hey @Dillon_Morrison, thanks for your work on this block! I ran into something just now implementing this on a segment data set with very few logged in users. In 2_mapped_tracks.view.lkml
and _B_mapped_events.view.lkml
you’re creating event_id strings based on track.user_id (or uuid). Where user_id is null (anonymous users) the event_id will also be null which can cause issues downstream. @Bryan_Weber did some work on this model earlier and was creating event_id’s using looker_visitor_id which always has a non null value. Something to keep an eye out for!
I think there is an issue with the INNER JOIN from mapped_tracks to [page_]alias_mapping.
In alias_mapping, alias is the distinct anonymous_id
BUT this is the inner join:
inner join ${aliases_mapping.SQL_TABLE_NAME} as a2v
on a2v.alias = coalesce(t.user_id, t.anonymous_id)
SHOULDN’T this be:
inner join ${aliases_mapping.SQL_TABLE_NAME} as a2v
on a2v.alias = t.anonymous_id
BECAUSE a2v.alias is the distinct anonymous_id?
It does look that way, doesn’t it? I’ll try and track down the creator of this block and see.
Hi Jeff - thanks for the feedback. I’m investigating on my end and will push changes accordingly. Thanks!
You can’t import this project remotely, you have to customize the schema names and so you need to directly copy the files into your project.