This content, written by Erin Franz, was initially posted in Looker Blog on Oct 12, 2015. The content is subject to limited support.
HP Vertica’s Flex Zone allows semi-structured data to be quickly loaded and analyzed using Vertica, without having to create schemas up front. Because Looker connects directly to Vertica to provide data exploration and visualization capabilities, Looker can take advantage of the the underlying power of Vertica as well as its analytic features such as Flex Zone.
In this tutorial we’ll build out an exploration framework for a JSON Flex Table in Vertica using Looker’s LookML modeling layer. Twitter data is a great example for this since it is commonly obtained in JSON format and analysis can provide valuable insights into social presence and sentiment. Utilizing Vertica’s JSON parser, Flex Table integration, and the LookML modeling layer, we’ll examine how to quickly and easily get valuable insights from Twitter data.
In order to make the data from the raw JSON file easily queryable we’ll first need to create a Flex Table in Vertica using Vertica’s JSON parser. In this case, I have a raw JSON tweets file stored locally on my computer and I’ll copy it to Vertica:
create flex table tweets_table();
copy tweets_table from local '/Users/erinfranz/Documents/tweets.json' parser fjsonparser();
In order to query the flex table we just created, we’ll generate keys using the following command. This will also generate a table of keys that we can use as a reference for querying:
select compute_flextable_keys_and_build_view('tweets_table');
Now we’re ready to start querying the table. We can determine the available keys by examining the tweets_table_keys
table that we created in the previous step by running the following query:
select key_name::varchar, data_type_guess::varchar from tweets_table_keys;
This produces a list of possible keys to query from the tweets_table. We can query these keys as if they were just columns in a regular Vertica table. For instance, if we wanted to look at the text of the tweets, we’d only need to run the following:
select text::varchar from tweets_table limit 10
Because of this functionality, it’s easy to translate these keys and any derived variation of their values into LookML dimensions and measures in Looker’s modeling layer. Dimensions and measures function as snippets of SQL - we just need to determine what and how we’d like to explore the values from the keys table and translate those into LookML.
We’ll start by creating a view file for tweets_table. For text, we’d simply like the text value as a varchar data type. We can easily add a text dimension by putting the key we’d like to query in the sql: parameter of the dimension definition. We’ll also add a count measure so we can easily count the number of tweets that meet certain conditions.
- explore: tweets_table
- view: tweets_table
sql_table_name: public.tweets_table
fields:
- dimension: text
sql: ${TABLE}."text"::varchar
- measure: count
type: count
This allows us to easily grab and filter on the text field in Looker’s explore interface. We could now see, for instance, the number of tweets that contain a certain word or phrase.
We could then expand on this filter and gather further metrics about those specific tweets. For instance, we might be interested in the number of followers each tweet is reaching. We can grab the followers count from the tweets table by adding the following dimension:
- dimension: followers_count
type: number
sql: ${TABLE}."user.followers_count"::int
We can then define further metrics on top of followers count to aggregate information about tweets we might be interested. In this case, we’ll define a minimum, maximum, and average amount of followers using LookML measures:
- measure: min_followers_count
type: min
sql: ${followers_count}
- measure: max_followers_count
type: max
sql: ${followers_count}
- measure: average_followers_count
type: average
sql: ${followers_count}
Now we can use those measures to learn more about the tweets we’re interested in:
With Looker on Vertica flex tables, semi-unstructured data can be explored with the same ease as perfectly processed data by taking advantage of HP Vertica’s parsing and querying capability and Looker’s LookML modeling layer. You don’t have to move or transform your JSON data when you can harness HP Vertica’s analytical and processing power directly and visualize it with Looker.