HBase Shell command questions

HBase Shell command questions - nosql

I have some questions about HBase Shell Command Tool:
1: How to list all column family names (just names!) in a table?
2: How to count the number of rows in a column family?

1: How to list all column family names (just names!) in a table?
Not possible OOTB. But you could do something like this :
echo "scan 'table'" | bin/hbase shell | awk -F'=' '{print $2}' | awk -F ':' '{print $1}'
2: How to count the number of rows in a column family?
What do you mean by this? Do you intend to ask How to count the number of column families in a rows? If this is what you need, try this :
echo "scan 'table'" | bin/hbase shell | grep cf | wc -l

Use describe, it will show the column families as NAME=> 'columnfamilyname'

I have a listColumns script based on Tariq's answer that limits the scan (because I'd like it to finish in my lifetime).
echo "scan '$1', LIMIT => 1" | hbase shell | awk '{print $2}' | grep column | sort | uniq | awk -F = '{print $2} '
Obviously you run the risk of rows having different columns.

Related

grep summarize lines by strings

I have a grep'ed string from curl, and I want to summarize lines of by its content.
e. g.
input
SomethingA v2.3
SomethingA v2.4
SomethingElse v1.1
SomethingElse v1.2
output
SomethingA 2
SomethingElse 1
The numbers in the output are not a must, but if easy to achieve, would be very nice. The "v" as the leading space is a fix prefix for the numerics which don't have to contain a dot.
I tried echo "$str" | grep -Po '(.*(?<=))v[0-9]' but it still contains the "v1" .. and I don't know how to reduce the leading strings by multiple matches.

You can use
$ awk '{print $1}' filename | sort | uniq -c | awk '{print $2,$1}'
Note : This will also give a count of the blank lines. If you want to get rid of those, use :
$ grep -v '^$' filename | awk '{print $1}' | sort | uniq -c | awk '{print $2,$1}'

Why is sed returning more characters than requested

In a part of my script I am trying to generate a list of the year and month that a file was submitted. Since the file contains the timestamp, I should be able to cut the filenames to the month position, and then do a sort+uniq filtering. However sed is generating an outlier for one of the files.
I am using this command sequence
ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
And this works for most of time except in some cases it outputs the whole timestamp:
$ ls
service-parent-20181119092630.json service-parent-20181123134132.json service-parent-20181202124532.json service-parent-20190121091830.json service-parent-20190125124209.json
service-parent-20181119101003.json service-parent-20181126104300.json service-parent-20181211095939.json service-parent-20190121092453.json service-parent-20190128163539.json
service-parent-20181120095850.json service-parent-20181127083441.json service-parent-20190107035508.json service-parent-20190122093608.json
service-parent-20181120104838.json service-parent-20181129155835.json service-parent-20190107042234.json service-parent-20190122115053.json
$ ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
service-parent-201811
service-parent-201811201048
service-parent-201812
service-parent-201901
I have also tried this variation but the second output line is still returned:
ls -1 service*json | sed -e "s|\(.*201.\{3\}\).*json$|\1|g" | sort |uniq
Can somebody explain why service-parent-201811201048 is returned past the requested 3 characters?
Thanks.

service-parent-201811201048 happens to have 201048 to match 201....
Might try ls -1 service*json | sed -e "s|\(.*-201...\).*json$|\1|g" | sort |uniq to ask for a dash - before 201....

It is not recommended to parse the output of ls. Please try instead:
for i in service*json; do
sed -e "s|^\(service-.*-201[0-9]\{3\}\).*json$|\1|g" <<< "$i"
done | sort | uniq

Your problem is explained at https://stackoverflow.com/a/54565973/1745001 (i.e. .* is greedy) but try this:
$ ls | sed -E 's/(-[0-9]{6}).*/\1/' | sort -u
service-parent-201811
service-parent-201812
service-parent-201901
The above requires a sed that supports EREs via -E, e.g. GNU sed and OSX/BSD sed.

Get a column using sed and modify it

I need to modify the 5 to 9 column directly in each line from a file.
Currently i'm doing this in a while loop, getting each column by line.
For example a line looks like:
echo "m.mustermann#muster.com;surnanme;givenname;displayname;1111;2222;3333;44(#44;(5555"
line_9=$(echo $line | awk -F "[;]" '{print $9}' | sed 's/[^0-9+*,]*//g')
Is there a possibility to do that with "sed -i" instead of awk
Thanks for any help

I'm not sure it can be done generally in sed, but you could definitely do it in awk:
… | awk -F";" '{ gsub("[^0-9]*","",$9); print $9 }'
If you really want to do it with sed, the expression will look something like:
… | sed -e 's,\(^[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)\(.*\),\1\2\3\4\5\6\7\8\9,'

For a version with sed (posix) only
line_9="$(echo $line | sed 'H;x;s/^\(.\)\(\([^;]*;\)\{8\}\)\([^;]*\)/\2\1\4\1/;h;s/\(\n\).*\1/\1/;x;s/.*\(\n\)\(.*\)\1.*/\2/;s/[^0-9+*,]*//g;G;s/\(.*\)\(\n\)\(.*\)\2/\3\1/;h;s/.*//;x' )"

How to display only the last field?

I used
cut -d " " -f 8
and
awk '{print $8}'
But this assumes that the 8th field is the last one, which is not always true.
How can I display the last field in a shell script?

Try doing this :
$ awk '{print $NF}'
or the funny
$ echo "foo bar base" | rev | cut -d ' ' -f1 | rev
base

Edit: this answer is wrong now because the question changed. Move along, nothing to see here.
You can use tail to print a specified number of bytes from the end of the input
tail -c 1

bash find chained to a grep which then prints

I have a series of index files for some data files which basically take the format
index file : asdfg.log.1234.2345.index
data file : asdfg.log
The idea is to do a search of all the index files. If the value XXXX appears in an index file, go and grep its corresponding data file and print out the line in the data file where the value XXXX appears.
So far I can simply search the index files for the value XXXX e.g.
find . -name "*.index" | xargs grep "XXXX" // Gives me a list of the index files with XXXX in them
How do I take the index file match and then grep its corresponding data file?

Does this do the trick?
find . -name '*.index' |
xargs grep -l "XXXX" |
sed 's/\.log\.*/.log/' |
xargs grep "XXXX"
The find command is from your example. The first xargs grep lists just the (index) file names. The sed maps the file names to the data file names. The second xargs grep then scans the data files.
You might want to insert a sort -u step after the sed step.

grep -l "XXXX" *.index | while read -r FOUND
do
if [ -f "${FOUND%.log*}log" ];then
grep "XXXX" "$FOUND"
fi
done

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

HBase Shell command questions - nosql

I have some questions about HBase Shell Command Tool: 1: How to list all column family names (just names!) in a table? 2: How to count the number of rows in a column family?

Use describe, it will show the column families as NAME=> 'columnfamilyname'

I have a listColumns script based on Tariq's answer that limits the scan (because I'd like it to finish in my lifetime). echo "scan '$1', LIMIT => 1" | hbase shell | awk '{print $2}' | grep column | sort | uniq | awk -F = '{print $2} ' Obviously you run the risk of rows having different columns.

Related

grep summarize lines by strings

Why is sed returning more characters than requested

Get a column using sed and modify it

How to display only the last field?

bash find chained to a grep which then prints

Categories

Resources