I've been trying for days, now I'm too frustrated...
This is just 1 part of a larger workflow I have to create. The goal is to extract data from emailed PDFs, then do stuff with the text, automating as much as possible.
I tried using Apps Script to detect when a label was applied in Gmail, then uploading the attachments to the GCS bucket. I couldn't make it work, no matter what configurations I tried. I restarted five times, but still screwed up somewhere.
Since I couldn't figure it out using Gmail, I scrapped the idea of going directly from Gmail to GCS and instead used the Apps Script to copy the attachments to Google Drive. Maybe I could make it work approaching it from GDrive rather than Gmail. Nope.
The Apps Script I made to detect new Drive files starts to work, then crashes when connecting to GCS. Same thing happened with Gmail.
I'm starting over again with a new GC project and bucket. I've enabled pretty much every API we thought might be needed or helpful.
I created a new Service Account and gave it the Roles: Document AI Admin, Service Account Token Creator, Storage Admin, Owner, etc.
Getting super frustrated, I basically just gave the SA permission to do everything.
None of it worked. Now it's Friday, time to go home, and I've got nothing good to tell the boss on Monday morning when he asks how it's going. I'd love some insight from anyone who's actually done this before. Or anyone who has an idea about how to pull text from emailed PDFs, better than using GC's Document AI tools.
Thanks folks.
Hi @Prosperity_Tech,
Welcome to the Google Cloud Community!
Look into Uploading Files from Google Drive to Google Cloud Storage With Google Apps Script. The blog provides a detailed explanation on how to upload to GCS using Apps Script. Below are the following steps:
If you still can't upload to Google Cloud Storage, you should look into configuring your Google Cloud Projects. Every Apps Script project uses Google Cloud to manage authorization, advanced services, and other details.
You can also try looking into this Stack Overflow Post that talks about copying files from Google Drive to Google Cloud Storage as it can be a workaround for your case.
You can also get in touch with Google Cloud Support if the above options don't work. Let me know if it helped, thanks!
That sounds frustrating. You seem do be doing something innovative and important for your company. That's often hard.
To work this out, let's start by looking at Apps Script logs to see the errors from the crashes. Can you check there? Maybe share them directly if there is no confidential information there?
You should be able to get to them from the tool menu above the Apps Script editing window.
Good afternoon. Indeed, this project is an attempt at beginning some important innovations for our company.
The Apps Script error that I'm currently struggling with is this:
Exception: Request failed for https://storage.googleapis.com returned code 403. Truncated server response: <?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message></Error> (use muteHttpExceptions option to examine full response)
I can't see why I'm being denied access to my own project. Here's what I know and what I've done to figure this out:
I'm not sure what else to do or what could be causing the Access Denied error. I am the only person who has made any attempts to access anything at all with this project.
It sounds like you Cloud Storage client is not using the credentials for your account or the service account you have created. How are you using registering the service account with Apps Script?
I don't fully know how this works in Apps Script specifically so can provide just hints:
- there is an interesting post here that that outlines something similar to what you are trying to do.
- If you share a code snippet of how you initialize the GCS client, we might be able to make progress together.
- In the end, the question is what identity your Apps Script uses to call GCS.
I read and examined the post from your link, but I was unable to successfully resolve the problem using the examples and details given, perhaps because the post is from January 2020 and things may be different today.
I'll paste code below for review after removing sensitive info. Hopefully there's something that can be figured out.
// Global variables
var BUCKET_NAME = "processing";
var PROJECT_ID = "docai-123456";
var SERVICE_ACCOUNT_EMAIL = "docai-sa@docai-123456.iam.gserviceaccount.com";
var SERVICE_ACCOUNT_KEY = "abcdefghijklmnopqrstuvwxyz1234567890";
var CLIENT_ID = "1234567890";
var API_KEY = "zyxwvutsrqponmlkjihgfedcba0987654321";
var FOLDER_NAME = 'intake'; // Google Drive folder to monitor
var CREDENTIALS = {
"private_key": SERVICE_ACCOUNT_KEY,
"client_email": SERVICE_ACCOUNT_EMAIL,
"client_id": CLIENT_ID,
"user_email": "me@companydomain.com",
"api_key": API_KEY,
};
function checkForNewFiles() {
var service = getOAuthService(CREDENTIALS.user_email);
service.reset();
if (service.hasAccess()) {
var folder = DriveApp.getFoldersByName(FOLDER_NAME).next();
var files = folder.getFiles();
while (files.hasNext()) {
var file = files.next();
var fileName = file.getName();
var targetBucket = 'https://storage.googleapis.com/' + BUCKET_NAME + '/' + fileName;
copyFileToGCS(file, targetBucket, service);
file.setTrashed(true); // Move file to GDrive trash after copying to GCS
}
}
}
// Helper function to upload a file to Google Cloud Storage
function uploadToCloudStorage(blob, destination) {
var accessToken = getCloudStorageAccessToken();
var options = {
method: "PUT",
headers: {
"Authorization": "Bearer " + accessToken,
"Content-Type": blob.getContentType()
},
payload: blob.getBytes()
};
var url = "https://storage.googleapis.com/" + BUCKET_NAME + "/" + encodeURIComponent(destination);
var response = UrlFetchApp.fetch(url, options);
}
function copyFileToGCS(file, targetBucket, service) {
var blob = file.getBlob();
var options = {
headers: {
'Content-Type': blob.getContentType(),
'Authorization': 'Bearer ' + service.getAccessToken()
},
method: 'post',
payload: blob,
muteHttpExceptions: true
};
var response = UrlFetchApp.fetch(targetBucket, options);
if (response.getResponseCode() === 200) {
Logger.log('File copied successfully.');
} else {
Logger.log('Error copying file: ' + response.getContentText());
}
}
Twice I tried replying with some of the code, but my post disappeared almost immediately both times. I'm guessing there's something in place to prevent posts with potentially "risky" snippets? This is try #3, let's see if this one sticks.
I made variables at the beginning of the Apps Script for the project ID, service account email, service account key, client id, and API key. I then created another variable called CREDENTIALS and put the others into it. Nothing out of the ordinary there.
Here's the snippet:
function checkForNewFiles() {
var service = getOAuthService(CREDENTIALS.user_email);
service.reset();
if (service.hasAccess()) {
var folder = DriveApp.getFoldersByName(FOLDER_NAME).next();
var files = folder.getFiles();
while (files.hasNext()) {
var file = files.next();
var fileName = file.getName();
var targetBucket = 'https://storage.googleapis.com/' + BUCKET_NAME + '/' + fileName;
copyFileToGCS(file, targetBucket, service);
file.setTrashed(true); // Move file to GDrive trash after copying to GCS
}
}
}
This errored with "getOAuthService is not defined" so I added the OAuth2 library from here and attempted to import it like this:
import { getOAuthService } from 'oauth2';
That errored with "Cannot use import statement outside a module" and prevented me from saving the changes, so I tried creating a module like this:
module myScript {
import { getOAuthService } from 'oauth2';
}
Same exact error. I don't know how to fix it. Is there another way to do this?
Thanks for persevering. Sorry this is so hard. So is the issue that your script can't get to drive or that you can't get to GCS? Or both?
The script can't connect to GCS.
Ok. Let's take this one step at a time. It seems like the issue that we really need to figure out is how to authenticate with an GCP -- or any other -- API using OAuth2 from Apps Script. It might be more effective to post this in the Apps Script community, since this is not a GCS question. But let's see how far we can take this here:
1. You need to import the OAuth2 library. You are trying to do this explicitly in the code. The way to do it in Apps Script is by using the + next to the Libraries in the left hand menu of the app script editor. You can paste the library id from the github repo you pointed out (https://github.com/googleworkspace/apps-script-oauth2)
Once you do this, you can get to the client creation method using
I did already add the library from GitHub. I thought I'd mentioned that earlier. And I did create a service account, it's one of the variables at the top.
I changed line 18 to the following:
var service = OAuth2.createService(CREDENTIALS);
Executing this returned no errors. However, nothing happened, either. There was no file uploaded to the bucket, and the file in the Drive that was the target was not deleted.
I believe createService interprets the first argument as a name of the service. Just a string. Please take a look at the example I posted above for how to pass the credentials to the "service" client. Or you can look at more examples here: https://github.com/googleworkspace/apps-script-oauth2/tree/main/samples (none as relevant as the one I posted above, but very likely well-tested).
For what it's worth, I've adapted and successfully got this working with the following directions based on the original example I posted as well as this example in the OAuth2 library repo.
1. Download service account key in json format. Assign it to a variable named key in your code. I've created a new Apps Script File called "key" that contains
key = CONTENTS_OF_THE_KEY_FILE_JSON
now I can access all the fields of the key using key.__ notation. You can make the code a bit cleaner by storing the key itself in drive and downloading it only when you app runs.
2. In a separate script file:
function getGcsUrl(bucket, object){
const API = `https://www.googleapis.com/upload/storage/v1/b`
var gcsObjectName = encodeURIComponent(object)
return `${API}/${bucket}/o?uploadType=media&name=${gcsObjectName}`
}
function getGcsService(){
return OAuth2.createService('CloudStorage')
.setPrivateKey(key.private_key)
.setIssuer(key.client_email)
.setCache(CacheService.getUserCache())
.setTokenUrl('https://oauth2.googleapis.com/token')
.setScope('https://www.googleapis.com/auth/devstorage.read_write')
.setPropertyStore(PropertiesService.getUserProperties())
}
function run(){
var gcs = getGcsService()
const file = DriveApp.getFileById(REPLACE_WITH_YOUR_DRIVE_FILE_ID)
const blob = file.getBlob()
response = UrlFetchApp.fetch(getGcsUrl('REPLACE_WITH_YOUR_BUCKET', blob.getName()),
{
method: 'POST',
contentLength: blob.getBytes().length,
contentType: blob.getContentType(),
payload: blob.getBytes(),
headers: {Authorization: `Bearer ${ gcs.getAccessToken()}`},
});
const result = JSON.parse(response.getContentText());
Logger.log(JSON.stringify(result, null, 2));
}
3. Run the run function in Apps Script.
Using your method of pulling the credentials from a separate "Key" file containing a "key" variable seems to have the best potential so far. I haven't gotten any errors when doing it this way.
However, there still isn't any successful copying/uploading from the Drive to the bucket. The execution log displays a File List and the name of the most recent file I've uploaded to my Drive (regardless of the folder into which the file was uploaded).
I need to fix the Apps Script so it focuses on new uploads to a specific folder, but that's not the main problem now. Getting the file that has clearly been identified copied over to the GCS bucket is what I need help with.
Sounds like you've added some looping logic. Try running this in the Apps Script debug mode, maybe? The script I've shared above definitely copied a file data from Drive to GCS successfully.
I still haven't been able to figure out why this isn't working. So, as another test, I created a brand new project and used the script from your previous post. I inserted my GCS bucket name and Drive file ID where indicated, changed the sharing settings on that file to Anyone With the Link, and (likely unnecessarily) directly shared that file with the GC Service Account email. I also added the OAuth2 library to the project.
But, it didn't work. There were no errors in the Apps Script execution logs, but the Drive file was not copied to GCS.
I am very frustrated, especially after not being able to get even this simple test to work properly. What could be missing? Do I need to create OAuth credentials in the Google Cloud project? Looking at your script sample I used for this test, I don't see where I'd even put that. If this worked for you, it should work for me as well.
Can we get back to the basics here and see if we can find the difference between your GCS and Apps Script project settings and mine? It would be helpful if you could explain how you set up the relevant components for them, and I can compare with my own settings.
I am experiencing the exact same frustrations as you are right now. Google offers great products, but they are so difficult to access due to their unintuitive security systems, which is incredibly frustrating.
We're in sept-2024, and it's still the same situation as you experienced in 2023.
I am new to GCP and I am experiencing the same thing. The only difference is I am accessing a shared drive (from a third party).