Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

blob_list not listing current directory.

Hi,
I have a scenario, tried multiple options but it is not working.
Suppose I have a directory/files,
Source:
gs://bucket_name/table_nm/table_nm_YYYYMMDD/file1.parquet
gs://bucket_name/table_nm/table_nm_YYYYMMDD/file2.parquet
Move files to archive
gs://archive_bucket_name/table_nm/table_nm_YYYYMMDD/file1.parquet
gs://archive_bucket_name/table_nm/table_nm_YYYYMMDD/file2.parquet
 
I need to copy both the files into a archive path, Code I have works fine to copy and delete file1.parquet and file2.parquet.
After copying files to archive I want delete the delete the folder=table_nm_YYYYMMDD from source but it get retains in source, how do I delete this folder?
 
Below code just list the file1.parquet and file2.parquet but doesn't list the current folder.
 
Any suggestion please?
 
    for blob in client.list_blobs(lz_bucket, prefix=lz_folder):
        files_blobs.append(blob)
 
Note: I tried passing lz_folder ending with "/" and without "/", still it doesn't work.    
 
Thanks in advance for your help.
Solved Solved
0 2 1,480
1 ACCEPTED SOLUTION

GCS does not have the concept of folders in the traditional file system sense. The directory structure you see in GCS is actually a convention formed by the object name prefixes. Therefore, deleting a "folder" in GCS means deleting all the objects that share a common prefix.

In your current code, you're listing files within a specific prefix but not addressing the deletion of the folder itself. To effectively delete the folder, you need to delete all objects that have the prefix table_nm_YYYYMMDD/.

Here's an updated approach to achieve this:

  1. List and Delete All Objects with the Prefix: Use the list_blobs method to list all objects with the specified prefix and then delete them. Ensure that the prefix includes the trailing slash to target all objects within the specific directory.   

     
    from google.cloud import storage
    
    # Initialize the client
    client = storage.Client()
    
    # Specify bucket and folder details (replace with your values)
    lz_bucket = 'your-bucket-name'
    lz_folder = 'table_nm/table_nm_YYYYMMDD'
    
    # Define the prefix with a trailing slash
    prefix = f"{lz_folder}/"
    
    # List and delete all objects with the prefix
    bucket = client.bucket(lz_bucket)
    blobs = client.list_blobs(bucket, prefix=prefix)
    for blob in blobs:
        blob.delete()
    

    Replace 'your-bucket-name' and 'table_nm/table_nm_YYYYMMDD' with your actual bucket name and folder path. This script will delete all objects within the specified 'folder' in GCS.

  2. Caution: It's important to use such scripts with caution. Deletion operations are irreversible, and it's crucial to ensure that the script behaves as expected. Always test thoroughly in a safe environment before applying it to your production data.

By following this approach, you effectively delete the "folder" (i.e., all objects with the given prefix) from GCS. Remember, the inclusion of the trailing slash in the prefix is vital to ensure accurate targeting of the objects within the specific directory.

View solution in original post

2 REPLIES 2

GCS does not have the concept of folders in the traditional file system sense. The directory structure you see in GCS is actually a convention formed by the object name prefixes. Therefore, deleting a "folder" in GCS means deleting all the objects that share a common prefix.

In your current code, you're listing files within a specific prefix but not addressing the deletion of the folder itself. To effectively delete the folder, you need to delete all objects that have the prefix table_nm_YYYYMMDD/.

Here's an updated approach to achieve this:

  1. List and Delete All Objects with the Prefix: Use the list_blobs method to list all objects with the specified prefix and then delete them. Ensure that the prefix includes the trailing slash to target all objects within the specific directory.   

     
    from google.cloud import storage
    
    # Initialize the client
    client = storage.Client()
    
    # Specify bucket and folder details (replace with your values)
    lz_bucket = 'your-bucket-name'
    lz_folder = 'table_nm/table_nm_YYYYMMDD'
    
    # Define the prefix with a trailing slash
    prefix = f"{lz_folder}/"
    
    # List and delete all objects with the prefix
    bucket = client.bucket(lz_bucket)
    blobs = client.list_blobs(bucket, prefix=prefix)
    for blob in blobs:
        blob.delete()
    

    Replace 'your-bucket-name' and 'table_nm/table_nm_YYYYMMDD' with your actual bucket name and folder path. This script will delete all objects within the specified 'folder' in GCS.

  2. Caution: It's important to use such scripts with caution. Deletion operations are irreversible, and it's crucial to ensure that the script behaves as expected. Always test thoroughly in a safe environment before applying it to your production data.

By following this approach, you effectively delete the "folder" (i.e., all objects with the given prefix) from GCS. Remember, the inclusion of the trailing slash in the prefix is vital to ensure accurate targeting of the objects within the specific directory.

Thanks for the advice... looks like it is working.