Grep numbers between colon and comma - perl

I want to grep all results which contain over 70 percent of usage
Example of output:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":69,"dir":"/root"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":1,"dir":"/oradump"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Expected View after the grep:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},

Awk is more suited here:
$ awk -F'[:,]' '$6>70' file
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},

Or with Perl:
$ perl -ne'print if /"percentage":([0-9]+),/ and $1 > 70'
(no pesky seperator counting needed)

perl -F'[:,]' -ane 'print if $F[5]>70' file

GNU sed
sed -n '/:[0]\?70,/d;/:[0-1]\?[7-9][0-9],/p' file

Related

split header of string

I want to reformat the lines below. Please see input example and desired output. I have been messing around with awk without finding the correct solution
Input:
>1-672762
TGAGGTAGTAGGTTGTATGGTT
>2-240457
TGAGGTAGTAGGTTGTGTGGTT
>3-130231
TAGCAGCACGTAAATATTGGCG
>4-116485
TGAGGTAGTAGGTTGTATAGTT
Output (needs to be tab separated):
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485
With perl :
$ perl -lne '/^>\d+-(\d+)/ or print "$_\t$1"' file
Output:
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485
Another approach in perl ("-" is chr(055)):
perl -wln055e's/(\S+)\s+(\S+).*/$2\t$1/s and print'
or
perl -wlp055e'BEGIN{<>}s/(\S+)\s+(\S+).*/$2\t$1/s'
$ awk -F- '/>/{x=$2;next} {print $0 "\t" x}' file
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485
This might work for you (GNU sed):
sed -r 'N;s/^[^-]*-(.*)\n(.*)/\2\t\1/' file

Remove all the characters from string after last '/'

I have the followiing input file and I need to remove all the characters from the strings that appear after the last '/'. I'll also show my expected output below.
input:
/start/one/two/stopone.js
/start/one/two/three/stoptwo.js
/start/one/stopxyz.js
expected output:
/start/one/two/
/start/one/two/three/
/start/one/
I have tried to use sed but with no luck so far.
You could simply use good old grep:
grep -o '.*/' file.txt
This simple expression takes advantage of the fact that grep is matching greedy. Meaning it will consume as much characters as possible, including /, until the last / in path.
Original Answer:
You can use dirname:
while read line ; do
echo dirname "$line"
done < file.txt
or sed:
sed 's~\(.*/\).*~\1~' file.txt
perl -lne 'print $1 if(/(.*)\//)' your_file
Try this GNU sed command,
$ sed -r 's~^(.*\/).*$~\1~g' file
/start/one/two/
/start/one/two/three/
/start/one/
Through awk,
awk -F/ '{sub(/.*/,"",$NF); print}' OFS="/" file

Way to get time string in mixed columns

Way to get time string in these mixed columns:
new1 new11 1.1.1.1 application id1223 831582 start 09:21:12 05/24/2013 -- --
new1 new11 1.1.1.1 application ffd1234 1085500 start -- -- 09:21:04 05/24/2013
Expected view:
09:21:12 05/24/2013
09:21:04 05/24/2013
I really think you need to show some effort. Anyway (my fault) I couldn't help trying to do it with grep:
grep -Eo '[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4}'
The idea is get data with the following format NN:NN:NN NN/NN/NNNN where N is a number. [0-9]{2} stands for 2 times [0-9].
Test
$ grep -Eo '[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4}' file
09:21:12 05/24/2013
09:21:04 05/24/2013
Even shorter (thanks Jaypal):
grep -Eo '([0-9]{2}:){2}[0-9]{2} ([0-9]{2}/){2}[0-9]{4}'
perl -lne 'print $1 if(/(\d+:\d+:\d+\s+\d+\/\d+\/\d+)/)' your_file
This might work for you (GNU sed):
sed -r 's|.*(..:..:.. ../../....).*|\1|' file
sed -r 's/.*start(.*)/\1/;s/-| //g' file
or
awk '{gsub(/-/,"",$0);print $8,$9}' file
Print the columns in that range that aren't "--":
perl -lane 'print "#{[grep { $_ ne q(--) } #F[7..$#F] ]}"' file

extract a substring of 11 characters from a line using sed,awk or perl

I have a file with many lines, in each line
there is either substring
whatever_blablablalsfjlsdjf;asdfjlds;f/watch?v=yPrg-JN50sw&amp,whatever_blabla
or
whatever_blablabla"/watch?v=yPrg-JN50sw&amp" class=whatever_blablablavwhate
I want to extract a substring, like the "yPrg-JN50s" above
the matching pattern is
the 11 characters after the string "/watch?="
how to extract the substring
I hope it is sed, awk in one line
if not, a pn line perl script is also ok
You can do
grep -oP '(?<=/watch\?v=).{11}'
if your grep knows Perl regex, or
sed 's/.*\/watch?v=\(.\{11\}\).*/\1/g'
$ cat file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
$
$ awk 'match($0,/\/watch\?v=/) { print substr($0,RSTART+RLENGTH,11) }' file
yPrg-JN50sw
yPrg-JN50sw
Just with the shell's parameter expansion, extract the 11 chars after "watch?v=":
while IFS= read -r line; do
tmp=${line##*watch?v=}
echo ${tmp:0:11}
done < filename
You could use sed to remove the extraneous information:
sed 's/[^=]\+=//; s/&.*$//' file
Or with awk and sensible field separators:
awk -F '[=&]' '{print $2}' file
Contents of file:
cat <<EOF > file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
EOF
Output:
yPrg-JN50sw
yPrg-JN50sw
Edit accommodating new requirements mentioned in the comments
cat <<EOF > file
<div id="" yt-grid-box "><div class="yt-lockup-thumbnail"><a href="/watch?v=0_NfNAL3Ffc" class="ux-thumb-wrap yt-uix-sessionlink yt-uix-contextlink contains-addto result-item-thumb" data-sessionlink="ved=CAMQwBs%3D&ei=CPTsy8bhqLMCFRR0fAodowXbww%3D%3D"><span class="video-thumb ux-thumb yt-thumb-default-185 "><span class="yt-thumb-clip"><span class="yt-thumb-clip-inner"><img src="//i1.ytimg.com/vi/0_NfNAL3Ffc/mqdefault.jpg" alt="Miniature" width="185" ><span class="vertical-align"></span></span></span></span><span class="video-time">5:15</span>
EOF
Use awk with sensible record separator:
awk -v RS='[=&"]' '/watch/ { getline; print }' file
Note, you should use a proper XML parser for this sort of task.
grep --perl-regexp --only-matching --regexp="(?<=/watch\\?=)([^&]{0,11})"
Assuming your lines have exactly the format you quoted, this should work.
awk '{print substr($0,10,11)}'
Edit: From the comment in another answer, I guess your lines are much longer and complicated than this, in which case something more comprehensive is needed:
gawk '{if(match($0, "/watch\\?v=(\\w+)",a)) print a[1]}'

To remove blank lines in data set

I need a one liner using sed, awk or perl to remove blank lines from my data file. The data in my file looks like this -
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca
These blanks are at random and appear anywhere in my data file. Can someone suggest a one-liner to remove these blank lines from my dataset.
it can be done in many ways.
e.g with awk:
awk '$0' yourFile
or sed:
sed '/^$/d' yourFile
or grep:
grep -v '^$' yourFile
A Perl solution. From the command line.
$ perl -i.bak -n -e'print if /\S/' INPUT_FILE
Edits the file in-place and creates a backup of the original file.
AWK Solution:
Here we loop through the input file to check if they have any field set. NF is AWK's in-built variable that is set to th number of fields. If the line is empty then NF is not set. In this one liner we test if NF is true, i.e set to a value. If it is then we print the line, which is implicit in AWK when the pattern is true.
awk 'NF' INPUT_FILE
SED Solution:
This solution is similar to the ones mentioned as the answer. As the syntax show we are not printing any lines that are blank.
sed -n '/^$/!p' INPUT_FILE
You can do:
sed -i.bak '/^$/d' file
A Perl solution:
perl -ni.old -e 'print unless /^\s*$/' file
...which create as backup copy of the original file, suffixed with '.old'
for perl it is as easier as sed,awk, or grep.
$ cat tmp/tmpfile
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca
$ perl -i -pe 's{^\s*\n$}{}' tmp/tmpfile
$ cat tmp/tmpfile
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca