trying to execute in a loop cbt deleterow rowkey, which rowkey contains space characters

hi,

i have a huge amount of rowkeys to be deleted from a bigtable, which rowkeys contain space characters.

the problem i´m facing is that the string taken as rowkey parameter for the cbt deleterow is the beginning of the value of variable from iteration up to the first space character, leaving all the remaining as unexpected parameter for cbt - it pops error message like 'bad argument "OF" ' (rowkey is like ALL OF US#1234567890# , and in the cbt command, it is surrounded by $' and ' (i.e. $'ALL OF US#1234567890#' ).

what is going on: 

during iteration (i put count=3 as to speed up result) , each execution comes with that error message as i mentioned above but, when i execute the command that was set up by iteration, with the rowkey , it works fine, it brings the rowkey and the column i specified to be brought.

why of this behavior ? 

if you see the reason or some typo, mistake, some point of ignorance, please join me to point out what is needed to make it work.

thank you very much in advance 

 

Solved Solved
0 1 62
1 ACCEPTED SOLUTION

hi, it´s me again. 

i wanted to share what I could do to reach what I wanted as result .

I wanted to find a way to automatize generation of list of rowkeys or even deletion of them. I had a huge amount of bad rowkeys to delete, but the pain point actually was rowkey containing space character .

so I reached that : 

cbt -project aproject -instance aninstance read agiventable regex=^[0-9].*20240301.*$ cells-per-column="1" prefix=ROYAUME columns=CF:id_produit count=5 | grep ^ROYAUME | while read line; do echo cbt -project aproject -instance aninstance lookup agiventable \$\'$line\' cells-per-column="1" columns=CF:id_produit | bash - ; done | grep ^ROYAUME > result_test_ROYAUME.txt

cat resultado_teste_3M.txt
ROYAUME DU MONROI20240301#00000000000000000003#
ROYAUME DU MONROI20240301#00000000000000000004#
ROYAUME DU MONROI20240301#00000000000000000005#
ROYAUME DU MONROI20240301#00000000000000000006#
ROYAUME DU MONROI20240301#00000000000000000007#

First I´m using cbt read by a regex and prefix to set the list of rowkeys I want, and using pipe, i connected it to a while loop that i set the cbt command with the iteration variable surrounded by $' and ' characters to indicate that rowkey in variable contains raw bytes (i.e. space) .

My opinion, if it could serve as advice, I´d say 'avoid letting space into rowkey in bigtable', and if there is articles on best practices in distributing rowkeys across bigtable, read it (the example i put with sequential number in rowkey, is resembling towards a bad example; instead of 00000000000000000003, revert it , 3000000000000000000 , as you will have lower digits varying more frequently, it will be more distributed/spreaded).

 

 

View solution in original post

1 REPLY 1

hi, it´s me again. 

i wanted to share what I could do to reach what I wanted as result .

I wanted to find a way to automatize generation of list of rowkeys or even deletion of them. I had a huge amount of bad rowkeys to delete, but the pain point actually was rowkey containing space character .

so I reached that : 

cbt -project aproject -instance aninstance read agiventable regex=^[0-9].*20240301.*$ cells-per-column="1" prefix=ROYAUME columns=CF:id_produit count=5 | grep ^ROYAUME | while read line; do echo cbt -project aproject -instance aninstance lookup agiventable \$\'$line\' cells-per-column="1" columns=CF:id_produit | bash - ; done | grep ^ROYAUME > result_test_ROYAUME.txt

cat resultado_teste_3M.txt
ROYAUME DU MONROI20240301#00000000000000000003#
ROYAUME DU MONROI20240301#00000000000000000004#
ROYAUME DU MONROI20240301#00000000000000000005#
ROYAUME DU MONROI20240301#00000000000000000006#
ROYAUME DU MONROI20240301#00000000000000000007#

First I´m using cbt read by a regex and prefix to set the list of rowkeys I want, and using pipe, i connected it to a while loop that i set the cbt command with the iteration variable surrounded by $' and ' characters to indicate that rowkey in variable contains raw bytes (i.e. space) .

My opinion, if it could serve as advice, I´d say 'avoid letting space into rowkey in bigtable', and if there is articles on best practices in distributing rowkeys across bigtable, read it (the example i put with sequential number in rowkey, is resembling towards a bad example; instead of 00000000000000000003, revert it , 3000000000000000000 , as you will have lower digits varying more frequently, it will be more distributed/spreaded).