Writing a simple data dictionary to csv using the Looker API and the Python requests library

There are many reasons to come up with a document that explains what fields/explores/models are available or exposed in our Looker application. This script is used to parse through the model definition and print out a csv of fields. I use the Looker API to get the model Metadata.

I use python, and while you can use the Looker SDK, I prefer to use the python requests library.

With some other changes, you can easily figure out how fields are set up and audit the model for items such as:

  • Do we follow a consistent naming convention?
  • Do we have redundant or similar fields?
  • Did we do a good job annotating fields via descriptions?

We’ve already done this into an interactive webpage using the Ruby SDK. You can see that here:
https://discourse.looker.com/t/creating-a-data-dictionary-using-lookers-api/3589

endpoints

To get this started, I need to have endpoints for authentication, get_model and get_explore endpoints. I do that with an API class:

class LookerApi(object):

    def __init__(self, token, secret, host):

        self.token = token
        self.secret = secret
        self.host = host

        self.session = requests.Session()
        self.session.verify = False
        self.auth()

    def auth(self):
        url = '{}{}'.format(self.host,'login')
        params = {'client_id':self.token,
                  'client_secret':self.secret
                  }
        r = self.session.post(url,params=params)
        access_token = r.json().get('access_token')
        # print access_token
        self.session.headers.update({'Authorization': 'token {}'.format(access_token)})

# GET /lookml_models/{{NAME}}
    def get_model(self,model_name=None,fields={}):
        url = '{}{}/{}'.format(self.host,'lookml_models', model_name)
        # print url
        params = fields
        r = self.session.get(url,params=params)
        if r.status_code == requests.codes.ok:
            return r.json()

# GET /lookml_models/{{NAME}}/explores/{{NAME}}
    def get_explore(self,model_name=None,explore_name=None,fields={}):
        url = '{}{}/{}/{}/{}'.format(self.host,'lookml_models', model_name, 'explores', explore_name)
        print url
        params = fields
        r = self.session.get(url,params=params)
        if r.status_code == requests.codes.ok:
            return r.json()

csv writing

Once we can call those endpoints, The script should call for all models, and parse through each explore: calling for all the field information in a loop. We then will write each field and it’s metadata to a new row. For each row, I have created a function to call:

def write_fields(explore, fields):

	### First, compile the fields you need for your row

	explore_fields=explore['fields']
	try:
		connection_name = str(explore['connection_name'])
	except:
		connection_name = ''
	for dimension in explore_fields[fields]:
		# print dimension

		field_type = fields
		project = str(dimension['project_name'])
		explore = str(explore_def['name'])
		view=str(dimension['view'])
		view_label=str(dimension['view_label'])
		name=str(dimension['name'])
		hidden=str(dimension['hidden'])
		label=str(dimension['label'])
		label_short=str(dimension['label_short'])
		description=str(dimension['description'])
		sql=str(dimension['sql'])
		ftype=str(dimension['type'])
		value_format=str(dimension['value_format'])
		source = str(dimension['source_file'])

	### compile the line - this is possible to combine above, but here to keep things simple
		rowout = []
		rowout.append(connection_name)
		rowout.append(field_type)
		rowout.append(project)
		rowout.append(explore)
		rowout.append(view)
		rowout.append(view_label)
		rowout.append(name)
		rowout.append(hidden)
		rowout.append(label)
		rowout.append(label_short)
		rowout.append(description)
		rowout.append(sql)
		rowout.append(ftype)
		rowout.append(value_format)
		rowout.append(source)

		w.writerow(rowout)

csv formatting

Then all I need is to instantiate the API, open a CSV, write the header, and then iterate through my models. using the csv library we can start a csv with this code:

csvfile= open('dictionary.csv', 'wb')

w = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
header = ['connection_name',
			'field_type',
			'project',
			'explore',
			'view',
			'view_label',
			'name',
			'hidden',
			'label',
			'label_short',
			'description',
			'sql',
			'ftype',
			'value_format',
			'source']

w.writerow(header)

Parse the model.

The rest of the script looks like this:

  • Get looker API 3.0 Credentials

  • Call for the model

  • Parse through the model and write each field as a row into our csv

  • close the file

    f = open(‘config.yml’)
    params = yaml.load(f)
    f.close()

    hostname = ‘localhost’

    my_host = params[‘hosts’][hostname][‘host’]
    my_secret = params[‘hosts’][hostname][‘secret’]
    my_token = params[‘hosts’][hostname][‘token’]

    looker = LookerApi(host=my_host,
    token=my_token,
    secret = my_secret)

    --------- API Calls -------------

    – Get all models –

    models = looker.get_model("")

    pp(models)

    for model in models:
    model_name = model[‘name’]

      ## -- Get single model --
      model_def = looker.get_model(model_name)
      # pp(model_def)
    
      ## -- Get single explore --
      for explore_def in model_def['explores']:
      	explore=looker.get_explore(model_name, explore_def['name'])
      	# pp(explore)
      	## -- parse explore --
      	
      	try:
      		write_fields(explore,'measures')
      	except:
      		print 'Problem measure fields in ', explore_def['name']
      	try:
      		write_fields(explore,'dimensions')
      	except:
      		print 'Problem dimension fields in ', explore_def['name']
    

The end result of executing this file is a csv file called “dictionary.csv”

Check out the full script called get_data_dictionary.py here: https://github.com/llooker/python_api_samples

Note: the link will use a LookerAPI.py file to hold the class, and a configuration file for keys. Check the readme for setting this up.

5 8 3,549