BQ Loader - Multiple Service Account

Dear All,

I need a support to understand how I can use the following command:

  • bq show
  • bq load

in my shell script using 2 different service account dynamically.

At the moment I didn't find any solution and I notice that the command connect to the current active account, so the only way is to switch every time between the 2 account, but for me is not a suitable solution.

The goal is to load a file into dedicated table specifying the SA that can access on this table

Could you please support me?

Details:

  • Last gcloud CLI
  • 2 Service Account
  • 2 JSON file for authentication (one for each service accounts)

Best Regards

Simone

Solved Solved
2 3 176
1 ACCEPTED SOLUTION

When running shell scripts like Shell1.sh with environment variables such as GOOGLE_APPLICATION_CREDENTIALS being set within the script, concurrency issues you're concerned about typically do not arise due to how environment variables are handled in Linux environments. 

Each time a shell script is executed, it runs in a separate process. When you set an environment variable within a script (using export in bash, for instance), it affects only the current shell and its child processes. This means that:

  • If User 1 and User 2 run Shell1.sh simultaneously in your application, each invocation of the script runs in its own process.
  • When Shell1.sh sets GOOGLE_APPLICATION_CREDENTIALS using export, it sets the variable only for that script's process and any processes spawned by it, not for the entire system or other concurrent instances of the script.

Example Scenario 

  • User 1 Calls Shell1.sh: This script sets GOOGLE_APPLICATION_CREDENTIALS to SA1_KEY_FILE. This change is local to the process started for User 1.
  • User 2 Calls Shell1.sh Simultaneously: A new, separate process is started for User 2. When this script sets GOOGLE_APPLICATION_CREDENTIALS to SA2_KEY_FILE, it does so only within its process.

Implications

  • No Interference: The GOOGLE_APPLICATION_CREDENTIALS set by User 1's process does not interfere with the one set by User 2's process. They are entirely isolated.
  • Concurrency Safety: You can safely run multiple instances of your script concurrently without worrying about them overwriting each other's environment variables.

Best Practices 

While the environment variable approach is safe from concurrency issues in the scenario you described, consider the following to ensure overall system robustness:

  • Permissions: Ensure each service account has only the permissions necessary for the tasks it needs to perform to adhere to the principle of least privilege.
  • Security: Secure your service account key files and consider using secrets management solutions to handle them, especially in a multi-user environment.
  • Monitoring: Implement logging and monitoring to track the usage of service accounts and detect any unexpected access patterns or failures.

Your approach of dynamically setting GOOGLE_APPLICATION_CREDENTIALS within separate instances of a script is safe from concurrency issues due to the process isolation provided by the operating system.

View solution in original post

3 REPLIES 3

To dynamically utilize two different service accounts within a shell script for bq show and bq load commands in Google Cloud BigQuery, you can leverage the GOOGLE_APPLICATION_CREDENTIALS environment variable. This avoids globally switching accounts and ensures each command executes with the correct service account.

Steps:

  1. Key Components

    • Service Accounts: Application-specific accounts used for interacting with Google Cloud services.
    • GOOGLE_APPLICATION_CREDENTIALS:An environment variable pointing to a service account key file. Used for authentication by Google Cloud SDK and client libraries.
    • bq show: Displays information about BigQuery datasets, tables, or jobs.
    • bq load: Loads data from a source into a BigQuery table.
  2. Environment Setup

    • Install the latest gcloud CLI.
    • Obtain two JSON key files corresponding to your service accounts.
  3. Sample Script

     
    #!/bin/bash 
    
    # Define paths to service account key files
    SA1_KEY_FILE="/path/to/your/first_service_account_key.json"
    SA2_KEY_FILE="/path/to/your/second_service_account_key.json"
    
    # Project and dataset variables
    PROJECT_ID="your-project-id"  
    DATASET_ID="your_dataset_id" 
    TABLE_ID_1="your_table_1_id" 
    TABLE_ID_2="your_table_2_id" 
    DATA_FILE="path/to/your/data_file.csv" 
    
    # Function to load data with a specified service account
    load_data() {
        local sa_key_file=$1
        local table_id=$2
        local data_file=$3
    
        export GOOGLE_APPLICATION_CREDENTIALS="$sa_key_file" 
        bq load --project_id="$PROJECT_ID" "$DATASET_ID.$table_id" "$data_file" 
        unset GOOGLE_APPLICATION_CREDENTIALS 
    }
    
    # Load into Table 1 using Service Account 1
    load_data "$SA1_KEY_FILE" "$TABLE_ID_1" "$DATA_FILE"
    
    # Load into Table 2 using Service Account 2
    load_data "$SA2_KEY_FILE" "$TABLE_ID_2" "$DATA_FILE" 
    

Explanation

  • The load_data function temporarily sets GOOGLE_APPLICATION_CREDENTIALS for targeted authentication within each bq load operation.
  • It takes the service account key file path, table ID, and data file path for flexibility.
  • After each bq command, GOOGLE_APPLICATION_CREDENTIALS is unset to maintain a clean environment.

Important:

  • Permissions: Ensure your service accounts possess the necessary BigQuery permissions.
  • Adaptability: Extend this approach to bq show or other commands.
  • Security: Safeguard your service account key files and apply the principle of least privilege.

Hi @ms4446 ,

thanks for your feedback!

I would like to go into detail:

I understand that if I set the variable GOOGLE_APPLICATION_CREDENTIALS I can switch to more than one service account (my example was with 2 SA).

Your script export and unset the GOOGLE_APPLICATION_CREDENTIALS variable in sequence but if my scenario is:

  1. I create a shell script with as input parameter the GOOGLE_APPLICATION_CREDENTIALS and I call this shell script Shell1.sh
  2. User 1 go into my application and call Shell1.sh and set as inpunt paramenter (GOOGLE_APPLICATION_CREDENTIALS ) the SA1_KEY_FILE. 
  3. Simultaneously (in parallel) an User 2 go into my application and call Shell1.sh and set as inpunt paramenter (GOOGLE_APPLICATION_CREDENTIALS ) the SA2_KEY_FILE.

since GOOGLE_APPLICATION_CREDENTIALS is the same environment variable, since the two processes are running simultaneously, don't we have concurrency problems on the same variable? The variables do not intermingle with each other, causing problems?

Thanks and Best Regards

Simone

When running shell scripts like Shell1.sh with environment variables such as GOOGLE_APPLICATION_CREDENTIALS being set within the script, concurrency issues you're concerned about typically do not arise due to how environment variables are handled in Linux environments. 

Each time a shell script is executed, it runs in a separate process. When you set an environment variable within a script (using export in bash, for instance), it affects only the current shell and its child processes. This means that:

  • If User 1 and User 2 run Shell1.sh simultaneously in your application, each invocation of the script runs in its own process.
  • When Shell1.sh sets GOOGLE_APPLICATION_CREDENTIALS using export, it sets the variable only for that script's process and any processes spawned by it, not for the entire system or other concurrent instances of the script.

Example Scenario 

  • User 1 Calls Shell1.sh: This script sets GOOGLE_APPLICATION_CREDENTIALS to SA1_KEY_FILE. This change is local to the process started for User 1.
  • User 2 Calls Shell1.sh Simultaneously: A new, separate process is started for User 2. When this script sets GOOGLE_APPLICATION_CREDENTIALS to SA2_KEY_FILE, it does so only within its process.

Implications

  • No Interference: The GOOGLE_APPLICATION_CREDENTIALS set by User 1's process does not interfere with the one set by User 2's process. They are entirely isolated.
  • Concurrency Safety: You can safely run multiple instances of your script concurrently without worrying about them overwriting each other's environment variables.

Best Practices 

While the environment variable approach is safe from concurrency issues in the scenario you described, consider the following to ensure overall system robustness:

  • Permissions: Ensure each service account has only the permissions necessary for the tasks it needs to perform to adhere to the principle of least privilege.
  • Security: Secure your service account key files and consider using secrets management solutions to handle them, especially in a multi-user environment.
  • Monitoring: Implement logging and monitoring to track the usage of service accounts and detect any unexpected access patterns or failures.

Your approach of dynamically setting GOOGLE_APPLICATION_CREDENTIALS within separate instances of a script is safe from concurrency issues due to the process isolation provided by the operating system.