I have an s3 bucket (amazon) and I want to write a composer pipeline to upload csv files into the S3 bucket on a daily schedule. Currently all many data is in BigQuery and I'll convert it into csvs and will put it into an S3 bucket. Does anyone have any examples of doing this? Which is the easiest way to do this? I shall be grateful if someone can help CC: @ms4446
Solved! Go to Solution.
For your scenario, where you're aiming to upload files to an Amazon S3 bucket from a Cloud Composer and you have a single user access requirement, you don't necessarily need to use a third-party tool like JumpCloud for identity access management. AWS IAM and Google Cloud's IAM, along with the concept of workload identity federation, can suffice for your needs.
Given your use case, here's a simplified approach without needing third-party identity providers:
https://accounts.google.com
.Steps to Implement Workload Identity Federation
Create an AWS IAM Role:
Set Up Google Cloud Service Account:
Obtain Google Service Account Credentials:
Exchange Tokens for AWS Credentials:
Example
import boto3
# Assuming you have obtained the Google ID token by authenticating with your service account
google_id_token = 'YOUR_GOOGLE_ID_TOKEN'
# Assume the AWS role
sts_client = boto3.client('sts')
assumed_role_object = sts_client.assume_role_with_web_identity(
RoleArn="arn:aws:iam::AWS_ACCOUNT_ID:role/YOUR_AWS_ROLE",
RoleSessionName="SessionName",
WebIdentityToken=google_id_token
)
credentials = assumed_role_object['Credentials']
# Now you can use these temporary credentials to access AWS services
s3_client = boto3.client(
's3',
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken']
)
# Example: List buckets
response = s3_client.list_buckets()
print(response)
If you're finding it challenging, consider the following resources: