Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Automatically sync Google cloud bucket from google drive

I have a few hundred folders on google drive which I'd like to upload on Google cloud bucket. The google drive folders will be updated regularly, so I need a way to check if the folder has new files and then upload them to the cloud bucket. I know it is possible to do so using Apps scripts but it is not scalable as the script will not be able to scan all the folders in a single run. Is there a more cleaner way to do it using google cloud functionalities itself? Is there a way to do it using gsutil? Can someone please provide a step by step guide to do so. I am new to GCP. Thanks a lot!

3 8 3,092
8 REPLIES 8

Hello @Priyanka1311  ,Welcome on Google Cloud Community.

Unfortunately there is no out of the box solution which would address your needs. Your case could be potentially handled by combination of Cloud Function ( Python script for searching through gDrive folders and copying items to gcloud bucket ),  Cloud Scheduler for scheduling following Cloud Function and Pub/Sub for triggering function execution. This approach is serverless so you don't have to care about infra stuff.

--
cheers,
DamianS
LinkedIn medium.com Cloudskillsboost

Thanks for your response @DamianS , however I am not sure I follow everything. I got a reference for gsutil rsync. Could that be used?

Basically gsutil rsync is not recommended CLI tool for Cloud Storage. You should instead of that use gcloud storage

Apart from best practices, how would you connect via gsutil rsync/gcloud storage commands to Google Drive ? Because there is no straight forward way to interact with Google Drive via gcloud CLI commands. You must use Google Drive API to work around files stored at Google Drive which will be able to achieve mostly by using either Python or Go programming language ( Python is easier imho ). 

 

I've tested this approach ( but triggered function manually, not via scheduler or Pub/Sub) and works like a charm. So, from my point of view, this is the only way to Search through GoogleDrive folder and copy content to gcloud bucket. 

--
cheers,
DamianS
LinkedIn medium.com Cloudskillsboost

Could you please share what you've done?

I'm not that proficient with Python coding, so I've asked AI for little help. However

main.py

from googleapiclient.discovery import build
from google.oauth2 import service_account
from google.cloud import storage
from google.cloud import secretmanager
from googleapiclient.http import MediaIoBaseDownload
import io
import os
import json

FOLDER_ID = os.environ.get('FOLDER_ID')
BUCKET_NAME = os.environ.get('BUCKET_NAME')
SECRET_ID = os.environ.get('SECRET_ID')
PROJECT_ID = os.environ.get('PROJECT_ID')

def access_secret_version(project_id, secret_id, version_id='latest'):
    client = secretmanager.SecretManagerServiceClient()
    name = f"projects/{project_id}/secrets/{secret_id}/versions/{version_id}"
    try:
        response = client.access_secret_version(request={"name": name})
        payload = response.payload.data.decode("UTF-8")
        print(f"Successfully accessed secret: {secret_id}")
        return payload
    except Exception as e:
        print(f"Error accessing secret version: {e}")
        raise e

def get_service_account_credentials(secret_payload):
    try:
        info = json.loads(secret_payload)
        credentials = service_account.Credentials.from_service_account_info(info)
        print("Service account credentials successfully created.")
        return credentials
    except Exception as e:
        print(f"Error creating service account credentials: {e}")
        raise e

def list_files(service, folder_id):
    try:
        results = service.files().list(
            q=f"'{folder_id}' in parents and mimeType != 'application/vnd.google-apps.folder'",
            pageSize=1000, fields="nextPageToken, files(id, name)"
        ).execute()
        items = results.get('files', [])
        print(f"Successfully listed files in folder: {folder_id}. Found {len(items)} files.")
        return items
    except Exception as e:
        print(f"Error listing files: {e}")
        raise e

def download_file(service, file_id, file_name):
    request = service.files().get_media(fileId=file_id)
    fh = io.BytesIO()
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while not done:
        try:
            status, done = downloader.next_chunk()
            print(f"Download {int(status.progress() * 100)}% complete for file {file_name}.")
        except Exception as e:
            print(f"Error downloading file {file_name}: {e}")
            raise e
    fh.seek(0)
    print(f"File {file_name} downloaded successfully.")
    return fh

def upload_to_gcs(file_stream, bucket_name, object_name):
    try:
        storage_client = storage.Client()
        bucket = storage_client.bucket(bucket_name)
        blob = bucket.blob(object_name)
        blob.upload_from_file(file_stream, rewind=True)
        print(f"{object_name} uploaded to {bucket_name}.")
    except Exception as e:
        print(f"Error uploading to GCS: {e}")
        raise e

def sync_drive_to_gcs(request):
    try:
        secret_payload = access_secret_version(PROJECT_ID, SECRET_ID)
        credentials = get_service_account_credentials(secret_payload)
        service = build('drive', 'v3', credentials=credentials)
        items = list_files(service, FOLDER_ID)

        for item in items:
            file_id = item['id']
            file_name = item['name']
            print(f"Downloading {file_name} from Google Drive")
            file_stream = download_file(service, file_id, file_name)
            print(f"Uploading {file_name} to GCS")
            upload_to_gcs(file_stream, BUCKET_NAME, file_name)
        return 'Sync complete', 200
    except Exception as e:
        print(f"Error in sync_drive_to_gcs: {e}")
        return f"Internal Server Error: {e}", 500

requirements.txt

google-cloud-secret-manager
google-api-python-client
google-auth-httplib2
google-auth-oauthlib
google-cloud-storage

Function variables

DamianS_0-1718260860718.png

1. Create Service Account and create service account key.
2. Create entry at Secret Manager to store key content
3. Grant IAM role Secret Manager Secret Accessor to your newly created Service Account
4. Enable APIs: Google Drive API
5. Create bucket
6. Obtain Service Account email address and Share Google Drive folder with that Service Account ( Go to Google Drive - Folder -> Share -> Copy/paste email address -> Set to Editor -> Share)
7. Create Cloud Function. If you get question about API , click Enable APIs. 
8. Name your function -> Allow unauthenticated invocations ( only for testing !)
9. Choose your service account created at the beginning as "Runtime service account"
10. Add Runtime Variables with values
FOLDER_ID = FOlder ID taken from Google Drive
BUCKET_NAME = name of the bucket where files will be copy/pasted
SECRET_ID = name of secret, under where you keeping ServiceAccount Key content
PROJECT_ID = ID of the project where cloud function will be created

 

EXAMPLE:

DamianS_2-1718261341273.png

11. Choose Python 3.10 as runtime, sync_drive_to_gcs as Entry POint.

Once Function deployed, you will see Function URL, once clicked Function will be triggered. 

Output from logs:

 

DamianS_4-1718261538515.png

IMPORTANT: If you have big files, like movies or something, you must create cloud function with big amount of Memory. 
Important2: If you have any spreadsheet, docs, PPTs or something, you will get following error:

Error downloading file Untitled spreadsheet: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1TiaGARyv4IdFuh2jcDgQBJERBVNdcJDWBVfFfEGziVQ?alt=media returned "Only files with binary content can be downloaded. Use Export with Docs Editors files.". Details: "[{'message': 'Only files with binary content can be downloaded. Use Export with Docs Editors files.', 'domain': 'global', 'reason': 'fileNotDownloadable', 'location': 'alt', 'locationType': 'parameter'}]">

 

thank you for your help. For now, it seems complicated and so I am going to stick to Apps scripts

If you are open to considering a more general purpose tool, you can use Application Integration to do that work. 

App Integtration allows you to set up a workflow, that automates tasks like this. You start an integration with a "trigger", some event that kicks the work off.  The set of possible triggers includes Salesforce, ServiceNow, or Zendesk  - you start an integration on a new record.  Or Pubsub (obvious).  or Kafka, webhook, or MQ.  

Then you stitch together a workflow consisting of "tasks" . some examples are:  listing all the files in a Google Drive folder.  sending an email.  doing crypto (encrypt or decrypt) with Cloud KMS. invoking a cloud function, sending out an arbitrary http request, read or write to Cloud Storage, publish to a pubsub topic, etc. 

The tool doesn't automatically read folders and zip them, but, it's a framework to allow you to assemble an integration, partly from off the shelf tasks, and partly from your own logic that you embed in a Cloud Function or Cloud Run service. 

You could do all of this in one big python script, but the idea is you're going to probably do other automations in the future, and this is a standard framework for managing the integrations, logging, governing the security, managing performance, and so on.