Authored by: @maryiaborukhava and @nishantmaharish. Co-authored by: @marout
This article provides a comprehensive guide on how to split productional Looker instance content across multiple instances. This strategy is designed to improve performance and provide content isolation for Looker instances, particularly those with a large number of users and extensive content.
From a single Looker instance to multiple instances
Recently, our team decided to introduce an additional Looker instance and allocate a group of users from one tenant to this new instance. Our Looker setup has undergone multiple iterations. In addition to the production instance, we have established a development environment to offload development and QA activities from the production environment. This additional instance enhanced performance for multiple workflows. However, users of the production environment experienced challenges as a result of the increasing number of LookML models, Explores, dashboards, Looks, and schedules in the production environment. These issues included delayed responses from API endpoints, longer dashboard rendering times, slower search performance in Explores, and interface delays when interacting with buttons in Looker. As a result, we decided to introduce a second production environment for Looker and relocate a group of users there.
Advantages and disadvantages of supporting multiple Looker instances
The primary benefits of utilizing multiple environments include enhanced availability and improved performance, among others:
The main disadvantages of maintaining multiple Looker instances include increased administrative effort and cost as well as the need to have a strategy that lets users from different tenants work together effectively:
Is an additional instance necessary?
When deciding whether your organization needs additional Looker instances, it's essential to evaluate whether the advantages outweigh the disadvantages. In particular, consider adding an additional Looker instance if your current setup suffers from server performance issues such as extensive Looker queueing, frequent rush hours with many scheduled reports, reduced responsiveness, or difficulties with search and content validation functionality.
Problem statement
Our main objective was to migrate and separate the content for a specific group of users (belonging to a particular tenant) from the single production instance to a newly created production environment dedicated to them. The goal was to facilitate a smooth transition for users who stayed with the original instance while minimizing any disruption for those transitioning to the new production environment.
The process of splitting the Looker instance involves three key steps:
Before starting the instance split, you must secure the additional Looker instance with Google. Once you have access to the new instance, follow these steps to ensure a smooth transition of code, reports, and users:
The periphery jobs include micro-services, scripts, and UIs. For our Looker environment, a suite of peripheral jobs using the Looker API is supported to programmatically monitor and manage the environment. These tasks include checking server availability, testing the health of database connections, cleaning up unused content, assigning users to groups based on their accounts, and killing long-running queries. All of these jobs had to be extended to support the new Looker production instance. The changes to your environment will depend on how many jobs you have and their internal structure, but this step is fairly straightforward and includes extending the list of Looker instance URLs to include the new instance.
The final and most crucial step is migrating the Looker content for users. This task can be challenging if you aim to ensure a smooth transition for users while addressing the following requirements:
The Looker platform does not offer this functionality, and a straightforward solution has yet to be found on the market. There are open-source command-line tools available for migrating content between Looker instances, for example, Looker Deployer and Gazer. However, these tools do not cover the whole spectrum of the activities required for the successful instance split.
Instead of building a custom solution from scratch, we leveraged Gazer and enhanced it with additional functionality. We used Gazer to perform basic migration for dashboards, Looks, alerts, and scheduled plans. Subsequently, we enhanced it with additional capabilities to meet extended requirements, such as preserving content ownership, migrating boards and favorites, and providing visibility into the migrated content.
Gazer | Looker Deployer | Custom Solution | |
Migrates roles, groups, and permission and model sets | ✓ | ✓ | ✓ |
Migrates dashboards and Looks | ✓ | ✓ | ✓ |
Migrates folder structure | ✓ | ✓ | ✓ |
Migrates schedules | ✓ | ✓ | ✓ |
Migrates alerts | ✓ | ✓ | ✓ |
Migrates boards | ❌ | ✓ | ✓ |
Migrates favorites | ❌ | ❌ | ✓ |
Disables alerts and schedules in the original instance after migration | ❌ | ❌ | ✓ |
Preserves ownership (dashboards, Looks, schedules, and alerts) | ❌ | ❌ | ✓ |
Links to the migrated content in the original instance | ❌ | ❌ | ✓ |
To maintain content ownership during migration, we first instructed users to log in to the new instance to ensure that their accounts were created, since we use SAML for all instances. Gazer commands to migrate dashboards and Looks rely on the user credentials and assign this user as the content owner. In our custom script, we temporarily elevated the content owners to admin status and generated their API3 credentials, which were passed to the Gazer command to migrate dashboards and Looks. The client's credentials were used to migrate the content if an owner’s account didn’t exist in the new Looker instance. Let’s have a look at the code snippet.
# migrate_content.py import time import subprocess import tempfile import json from subprocess import CalledProcessError import looker_sdk from looker_sdk.sdk.api40 import models source_base_url = "https://source.looker.com:19999" # Replace with the source instance url target_base_url = "https://target.looker.com:19999" # Replace with the target instance url source_client_id = "CLIENT_ID" # Replace with the client id source_client_secret = "CLIENT_SECRET" # Replace with the client secret target_client_id = "CLIENT_ID" # Replace with the client id target_client_secret = "CLIENT_SECRET" # Replace with the client secret content_id = "1234" # Replace with the content id content_type = "dashboard" # Replace with the content type, it can be look or dashboard folder_id = "4321" # Replace with the folder id where the content should be deployed class LookerSettings(looker_sdk.api_settings.ApiSettings): """ A helper class needed to be able to initialise Looker SDK by passing client_id, client_secret instead of ini file, see https://pypi.org/project/looker-sdk/ """ def __init__(self, *args, **kw_args): self.client_id = kw_args.pop("client_id") self.client_secret = kw_args.pop("client_secret") self.base_url = kw_args.pop("base_url") self.timeout = kw_args.pop("timeout") super().__init__(*args, **kw_args) def read_config(self) -> looker_sdk.api_settings.SettingsConfig: config = super().read_config() config["client_id"] = self.client_id config["client_secret"] = self.client_secret config["base_url"] = self.base_url config["timeout"] = self.timeout return config # Initialize the Looker SDKs source_sdk = looker_sdk.init40(config_settings=LookerSettings(client_id=source_client_id, client_secret=source_client_secret, base_url=source_base_url, timeout=300)) target_sdk = looker_sdk.init40(config_settings=LookerSettings(client_id=target_client_id, client_secret=target_client_secret, base_url=target_base_url, timeout=300)) def find_owner_id_in_target(owner_id_in_source): """ Finds if the owner exists in the target instance """ try: # Fetch owner details from the source instance owner_details_in_source = source_sdk.user(owner_id_in_source) # Search for the owner in the target instance using their email owners_in_target = target_sdk.search_users(email=owner_details_in_source.email) # Filter out the embedded user non_embedded_owners_in_target = [owner for owner in owners_in_target if owner['display_name'] != 'Embed User'] if len(non_embedded_owners_in_target) == 0: print("Could not find owner in target instance.") return None return non_embedded_owners_in_target[0].get('id') except Exception as exp: print(f"Error while finding if owner exists in the target instance. Owner id in source is {owner_id_in_source}. Error: {exp}") return None def get_owner_details_in_target(owner_id_in_source): """ Fetches the owner details in the target instance """ try: start_time = time.time() # Find the owner ID in the target instance owner_id_in_target = find_owner_id_in_target(owner_id_in_source) if owner_id_in_target is None: return None # Add the owner to the admin group in the target instance target_sdk.add_group_user( group_id="3", # Replace with the admin group id where the user should be added body=models.GroupIdForGroupUserInclusion( user_id=owner_id_in_target )) # Create API credentials for the owner in the target instance api_creds = target_sdk.create_user_credentials_api3(user_id=owner_id_in_target) if not api_creds: print(f"Could not fetch the api credentials of the owner. Owner id in target is {owner_id_in_target}") return None print(f"Fetching owner details in target took: {time.time() - start_time}") return { "owner_id_in_target": owner_id_in_target, "api_creds": api_creds } except Exception as exp: print(f"Error while fetching owner details in target. Owner id in source is {owner_id_in_source}. Error: {exp}") def revoke_permission_and_delete_credentials(owner_id_in_target, api_3_credentials_id): """ Revokes the permissions and deletes the credentials of the owner in the target instance.""" try: start_time = time.time() target_sdk.delete_user_credentials_api3(user_id=owner_id_in_target, credentials_api3_id=api_3_credentials_id) target_sdk.delete_group_user( group_id="3", # Replace with the admin group id where the user should be added user_id=owner_id_in_target ) print(f"Revoking permissions and deleting credentials took: {time.time() - start_time}") except Exception as exp: print(f"Error while revoking permissions and deleting credentials. Owner id in target is {owner_id_in_target}. Error: {exp}") def __run_cli_command(gzr_command: 'list[str]', arguments_to_scrap: 'list[int]') -> str: """ Run the passed command, returning the output. If the command results in error, it will throw the corresponding error while also masking specified arguments, to avoid exposing secrets in the log. """ proc = subprocess.run( gzr_command, universal_newlines=True, capture_output=True, check=False ) if proc.returncode != 0: # Replace the credentials so they would not be exposed in the logs gzr_command_safe = gzr_command.copy() for arg in arguments_to_scrap: gzr_command_safe[arg] = "XXX" raise CalledProcessError( returncode=proc.returncode, cmd=gzr_command_safe, output=proc.stdout, stderr=proc.stderr, ) return str(proc.stdout) def fetch_content_from_looker(content_id, content_type, host, port, client_id, client_secret): """ Fetches the content from the source instance using Gazer """ gzr_command = ["gzr", content_type, "cat", content_id, "--host", host, "--port", port, "--client-id", client_id, "--client-secret", client_secret ] try: output = __run_cli_command(gzr_command, [-1, -3]) except CalledProcessError as err: data = { "statusCode": 500, "message": f"Failed to fetch content using Gazer: {err}. \nStdErr: {err.stderr}, StdOut: {err.output}", } print(data) raise err # Parsing JSON try: return json.loads(output) except json.JSONDecodeError as err: data = { "statusCode": 500, "message": f"Failed to parse the output from Gazer: {err}. \nOutput: {output}", } print(data) raise err def deploy_content_to_looker(content_type, content_file_path, folder_id, host, port, client_id, client_secret): """ Deploys the content to the target instance using Gazer """ gzr_command = ["gzr", content_type, "import", content_file_path, folder_id, "--host", host, "--port", port, "--client-id", client_id, "--client-secret", client_secret, "--force" ] try: output = __run_cli_command(gzr_command, [-2, -4]) except CalledProcessError as err: data = { "statusCode": 500, "message": f"Failed to deploy content using Gazer: {err}, StdOut: {err.output}", } print(data) raise err except Exception as err: data = { "statusCode": 500, "message": f"Failed to deploy content using Gazer: {err}", } print(data) raise err return output def write_content_to_file(content_type, content_id, content, tmpdirname): """ Writes the content to a file """ try: local_file_path = f"{tmpdirname}/{content_type.capitalize()}_{content_id}.json" # Writing a file to a temporary directory with open(local_file_path, "w") as file: file.write(json.dumps(content)) return local_file_path except Exception as exp: print(f"Error while writing the content to a file. Content type: {content_type}, Content id: {content_id}. Error: {exp}") return None with tempfile.TemporaryDirectory() as tmpdirname: source_host, source_port = source_base_url.replace("https://", "").split(":") content = fetch_content_from_looker(content_id, content_type, source_host, source_port, source_client_id, source_client_secret) local_file_path = write_content_to_file(content_type, content_id, content, tmpdirname) owner_id_in_source = content["user_id"] owner_details_in_target = get_owner_details_in_target(owner_id_in_source) if owner_details_in_target is not None: print('Using Owner Details') target_host, target_port = target_base_url.replace("https://", "").split(":") # Deploy the content to the target instance try: deploy_content_to_looker(content_type, local_file_path, folder_id, target_host, target_port, owner_details_in_target.get('api_creds').client_id, owner_details_in_target.get('api_creds').client_secret) except Exception as exp: print(f"Error while deploying the content to the target instance. Error: {exp}") finally: revoke_permission_and_delete_credentials(owner_details_in_target["owner_id_in_target"], owner_details_in_target["api_creds"].id) else: print('Using Client Credentials') target_host, target_port = target_base_url.replace("https://", "").split(":") # Use a Client Credentails to deploy the content try: deploy_content_to_looker(content_type, local_file_path, folder_id, target_host, target_port, target_client_id, target_client_secret) except Exception as exp: print(f"Error while deploying the content to the target instance. Error: {exp}")
Ensure looker_sdk and gazer are installed on your machine and you have replaced the values (like base URL, client ID and secret, etc.) used in the script. Then use the following command to run it.
python3 migrate_content.py
Once the content is migrated, the credentials will be deleted and the owner will be removed from the admin group.
This approach also proved effective in preserving the ownership of migrated alerts and scheduled plans. However, there is an important consideration: The owner of an alert or scheduled plan may differ from the owner of the associated content. In such cases, the ownership of alerts and plans will default to the owner of the migrated content, as Gazer utilizes the same credentials provided to create these alerts and plans.
We maintained a record of the content mappings and associated users during the migration process across both instances. Subsequently, we retrieved the details of the boards and favorites on the original instance using the Looker SDK and utilized this mapping to facilitate their migration to the new instance.
To provide visibility to the migrated content, we made the following changes to the original content present in the old instance:
These enhancements minimized disruption during the migration, allowing users to navigate to the migrated content in the new instance easily. The following method adds buttons to dashboards in the original instance with a message and a link to the migrated content.
def _get_content_id_from_gazer_output(gazer_output: str | None, slug_of_content_in_source: str) -> str: """ Extracts the content id in the target from the gazer output or using the slug """ try: content_id_on_target = None if gazer_output is None: # If gazer output is not available, search the content using the slug as the content might have been migrated dashboards = target_sdk.search_dashboards(slug=slug_of_content_in_source) if len(dashboards) > 0: content_id_on_target = dashboards[0].get('id') else: return None else: # Gazer output will have the content id in the target content_id_on_target = str(gazer_output.split()[-1]).strip() return content_id_on_target except Exception as exp: print(f'Unable to retrieve content id from gazer output or slug for the content. Exception: {exp}') return None def update_content_in_source_with_its_id_in_target(gazer_output: str | None, content_id_in_source: str, slug_of_content_in_source: str): start_time = time.time() try: content_id_in_target = _get_content_id_from_gazer_output(gazer_output, slug_of_content_in_source) if content_id_in_target is None: print(f'Could not find the content id in the target for the content with id {content_id_in_source}') return link_in_the_new_instance = f'{target_base_url}/dashboards/{content_id_in_target}' request_body = { "dashboard_id": content_id_in_source, "type": "button", "rich_content_json": f'{{"text":"This content has been migrated to the new Looker instance. Please, click here to access it in the new location.","description":"Clicking on the button will take you to the new instance","newTab":true,"alignment":"center","size":"medium","style":"FILLED","color":"#f9ab00","href":"{link_in_the_new_instance}"}}', } # Create a dashboard element in the source content (dashboard) dashboard_element_details = source_sdk.create_dashboard_element(request_body) # Get the dashboard layout of the source content dashboard_layout_of_source_content = source_sdk.dashboard_dashboard_layouts(dashboard_id=content_id_in_source, fields="dashboard_layout_components") # Get the dashboard layout components of the new element dashboard_layout_components_of_source_content = dashboard_layout_of_source_content[0].get('dashboard_layout_components') # Find the dashboard layout component of the new element (we need to update it so that the new element, which is button is at the top of the dashboard) dashboard_layout_component_of_new_element = '' for dashboard_layout in dashboard_layout_components_of_source_content: if dashboard_layout.get('dashboard_element_id') == dashboard_element_details.get('id'): dashboard_layout_component_of_new_element = dashboard_layout break if not dashboard_layout_component_of_new_element: print(f'No dashboard layout component found for the new element in the source content with id {content_id_in_source}') return # Update the dashboard layout component of the new element to move it to the top of the dashboard source_sdk.update_dashboard_layout_component(dashboard_layout_component_id=dashboard_layout_component_of_new_element.get('id'), body=models.WriteDashboardLayoutComponent( dashboard_element_id=dashboard_element_details.get('id'), dashboard_layout_id=dashboard_layout_component_of_new_element.get('dashboard_layout_id'), row=-10, column=0, width=22, height=0 )) except Exception as exp: print(f'Error occurred while updating the content in the source with its id in the target. Exception: {exp}') print(f'Time taken to update the content in the source with its id in the target: {time.time() - start_time} seconds')
To use the methods, include them in the same file mentioned above (migrate_content.py). The updated file should look like this:
# Rest of the code owner_id_in_source = content["user_id"] owner_details_in_target = get_owner_details_in_target(owner_id_in_source) gazer_output = None if owner_details_in_target is not None: # Code in the if block else: # Code in the else block update_content_in_source_with_its_id_in_target(gazer_output, content_id, content.get('slug'))
Then migrate the content again using the command:
python3 migrate_content.py
The method above includes the following steps:
This process generates a button that appears as shown below.
User communication
Effective communication with users is one of the most critical aspects of a successful Looker instance split. Since the process can disrupt users' workflows, keeping them informed about the migration details is essential. Here are a few crucial steps that helped us ensure a smooth transition:
Managing multiple production Looker instances necessitates increased maintenance and monitoring efforts from the team responsible for overseeing the Looker environment and incurs additional costs. However, this setup offers the benefit of improved performance for heavily utilized Looker instances. Notably, we have observed a more than threefold reduction in the average number of query errors following the migration, which can be attributed to content pruning during the migration process. Furthermore, the high density of scheduled jobs has significantly decreased in both instances, as the schedules were effectively distributed across multiple instances. The delays experienced when searching for dashboards and Explores have also visibly reduced.