Hello,
Currently I don't have a simple way to search a string which is contain by files with a specific pattern in the subdirectory. And i would like to do it thanks to the cloud shell.
I can scan all the files in my bucket , but i have thousand subdirectory and each subdirectory contain many files, so i make a filter (with a date) to focus only some subdirectory and scan a little amount of files.
i begin to write the path of all the files i want to scan thanks to the pattern of the subdirectory, here *2022-11-09*
gsutil ls gs://randomname1/*2022-11-09*/** > test.txt
And after i try these command below (here the string i want to catch in my files : 3012227427 )
-------1--------OK
parallel -j 8
while read -r line; do
gsutil cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------
-------2--------OK
while read -r line; do
gsutil -m cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------
-------3--------KO
while read -r line; do
gsutil -o "Cpu=parallel" -o "ParallelCompositeUploadThreshold=500o" cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------
-------4--------OK
while read -r line; do
gsutil cp "$line" - | awk -v l="'Command: gsutil cp $line - | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------
-------5--------OK
parallel -j 8
while read -r line; do
gsutil cp "$line" - | awk -v l="'Command: gsutil cp $line - | awk '/3012227427/{print ARGV[ARGIND] ":" $0}':" '/3012227427/{print l $0}' > results.txt
done < test.txt
----------------
All the OK solution i tried take the same amount of time and i'm out of solution. Just for 1 day (+1000 files), i'm at 20 minutes of running and it's not end, is it possible to power up the cloud shell ? If yes how ?
Thanks in advance ,
Solved! Go to Solution.
I surrend and i import all the files on my computer and doing the grep on my local :
gsutil cp gs://randomname1/*2022-11-09*/** C:\Users\myname\dir1
cd C:\Users\myname
grep -r -H "3012227427" dir1> output.txt
1-2 second for grep 2000 files.
I will use cloud shell for the strict minimum now.