Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How power up cloud shell ?

akk
Bronze 1
Bronze 1

Hello, 

Currently  I don't have a simple way to search a string which is contain by files with a specific pattern in the subdirectory. And i would like to do it thanks to the cloud shell.

I can scan all the files in my bucket , but i have thousand subdirectory and each subdirectory contain many files, so i make a filter (with a date) to focus only some subdirectory and scan a little amount of files.

i begin to write the path of  all the files i want to scan thanks to the pattern of the subdirectory, here  *2022-11-09*
gsutil ls gs://randomname1/*2022-11-09*/** > test.txt

And after i try these command below (here the string i want to catch in  my files : 3012227427 )


-------1--------OK 
parallel -j 8
while read -r line; do
gsutil cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------

-------2--------OK
while read -r line; do
gsutil -m cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------

-------3--------KO
while read -r line; do
gsutil -o "Cpu=parallel" -o "ParallelCompositeUploadThreshold=500o" cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------

-------4--------OK
while read -r line; do
gsutil cp "$line" - | awk -v l="'Command: gsutil cp $line - | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------

-------5--------OK
parallel -j 8
while read -r line; do
gsutil cp "$line" - | awk -v l="'Command: gsutil cp $line - | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------


All the OK solution i tried take the same amount of time and i'm out of solution. Just for 1 day (+1000 files), i'm at 20 minutes of running and it's not end, is it possible to power up the cloud shell ? If yes how ?


Thanks in advance , 





Solved Solved
0 3 790
1 ACCEPTED SOLUTION

akk
Bronze 1
Bronze 1

I surrend and i import all the files on my computer and doing the grep on my local :

gsutil cp gs://randomname1/*2022-11-09*/** C:\Users\myname\dir1

cd C:\Users\myname
grep -r -H "3012227427" dir1> output.txt

1-2 second for grep 2000 files. 
I will use cloud shell for the strict minimum now.




View solution in original post

3 REPLIES 3

I think you really should index the files before putting them into cloud storage.

Using gsutil cat on all the files in practive retrieves every file from the storage. This has some effect on billing, both retrieval and amount of commands.

First you need to start adding several urls to the gsutil cat command. The process itself is slow, not the cloud shell. So cat as many urls you can on one go, and after finding the right batch of urls then separate them to find the right file. GCE might be a good choice.

If you know where the search string is in the file you should use -r parameter (-r range).

If you put a sample output from gsutil ls I can reformat the command

edit: Something similar to

gsutil -h cat $(cat test.txt) | egrep '==> |3012227427'

but as said, this will retrieve all contents from cloud storage listed in test.txt

Hi,

there is no possibility to boost cloudshell itself. 

best
DamianS

akk
Bronze 1
Bronze 1

I surrend and i import all the files on my computer and doing the grep on my local :

gsutil cp gs://randomname1/*2022-11-09*/** C:\Users\myname\dir1

cd C:\Users\myname
grep -r -H "3012227427" dir1> output.txt

1-2 second for grep 2000 files. 
I will use cloud shell for the strict minimum now.