Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How power up cloud shell ?

akk
Bronze 1
Bronze 1

Hello, 

Currently  I don't have a simple way to search a string which is contain by files with a specific pattern in the subdirectory. And i would like to do it thanks to the cloud shell.

I can scan all the files in my bucket , but i have thousand subdirectory and each subdirectory contain many files, so i make a filter (with a date) to focus only some subdirectory and scan a little amount of files.

i begin to write the path of  all the files i want to scan thanks to the pattern of the subdirectory, here  *2022-11-09*
gsutil ls gs://randomname1/*2022-11-09*/** > test.txt

And after i try these command below (here the string i want to catch in  my files : 3012227427 )


-------1--------OK 
parallel -j 8
while read -r line; do
gsutil cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------

-------2--------OK
while read -r line; do
gsutil -m cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------

-------3--------KO
while read -r line; do
gsutil -o "Cpu=parallel" -o "ParallelCompositeUploadThreshold=500o" cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------

-------4--------OK
while read -r line; do
gsutil cp "$line" - | awk -v l="'Command: gsutil cp $line - | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------

-------5--------OK
parallel -j 8
while read -r line; do
gsutil cp "$line" - | awk -v l="'Command: gsutil cp $line - | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------


All the OK solution i tried take the same amount of time and i'm out of solution. Just for 1 day (+1000 files), i'm at 20 minutes of running and it's not end, is it possible to power up the cloud shell ? If yes how ?


Thanks in advance , 





Solved Solved
0 3 912
1 ACCEPTED SOLUTION

akk
Bronze 1
Bronze 1

I surrend and i import all the files on my computer and doing the grep on my local :

gsutil cp gs://randomname1/*2022-11-09*/** C:\Users\myname\dir1

cd C:\Users\myname
grep -r -H "3012227427" dir1> output.txt

1-2 second for grep 2000 files. 
I will use cloud shell for the strict minimum now.




View solution in original post