Way to get time string in these mixed columns:
new1 new11 1.1.1.1 application id1223 831582 start 09:21:12 05/24/2013 -- --
new1 new11 1.1.1.1 application ffd1234 1085500 start -- -- 09:21:04 05/24/2013
Expected view:
09:21:12 05/24/2013
09:21:04 05/24/2013
I really think you need to show some effort. Anyway (my fault) I couldn't help trying to do it with grep:
grep -Eo '[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4}'
The idea is get data with the following format NN:NN:NN NN/NN/NNNN where N is a number. [0-9]{2} stands for 2 times [0-9].
Test
$ grep -Eo '[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4}' file
09:21:12 05/24/2013
09:21:04 05/24/2013
Even shorter (thanks Jaypal):
grep -Eo '([0-9]{2}:){2}[0-9]{2} ([0-9]{2}/){2}[0-9]{4}'
perl -lne 'print $1 if(/(\d+:\d+:\d+\s+\d+\/\d+\/\d+)/)' your_file
This might work for you (GNU sed):
sed -r 's|.*(..:..:.. ../../....).*|\1|' file
sed -r 's/.*start(.*)/\1/;s/-| //g' file
or
awk '{gsub(/-/,"",$0);print $8,$9}' file
Print the columns in that range that aren't "--":
perl -lane 'print "#{[grep { $_ ne q(--) } #F[7..$#F] ]}"' file
Related
I want to reformat the lines below. Please see input example and desired output. I have been messing around with awk without finding the correct solution
Input:
>1-672762
TGAGGTAGTAGGTTGTATGGTT
>2-240457
TGAGGTAGTAGGTTGTGTGGTT
>3-130231
TAGCAGCACGTAAATATTGGCG
>4-116485
TGAGGTAGTAGGTTGTATAGTT
Output (needs to be tab separated):
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485
With perl :
$ perl -lne '/^>\d+-(\d+)/ or print "$_\t$1"' file
Output:
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485
Another approach in perl ("-" is chr(055)):
perl -wln055e's/(\S+)\s+(\S+).*/$2\t$1/s and print'
or
perl -wlp055e'BEGIN{<>}s/(\S+)\s+(\S+).*/$2\t$1/s'
$ awk -F- '/>/{x=$2;next} {print $0 "\t" x}' file
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485
This might work for you (GNU sed):
sed -r 'N;s/^[^-]*-(.*)\n(.*)/\2\t\1/' file
I have the followiing input file and I need to remove all the characters from the strings that appear after the last '/'. I'll also show my expected output below.
input:
/start/one/two/stopone.js
/start/one/two/three/stoptwo.js
/start/one/stopxyz.js
expected output:
/start/one/two/
/start/one/two/three/
/start/one/
I have tried to use sed but with no luck so far.
You could simply use good old grep:
grep -o '.*/' file.txt
This simple expression takes advantage of the fact that grep is matching greedy. Meaning it will consume as much characters as possible, including /, until the last / in path.
Original Answer:
You can use dirname:
while read line ; do
echo dirname "$line"
done < file.txt
or sed:
sed 's~\(.*/\).*~\1~' file.txt
perl -lne 'print $1 if(/(.*)\//)' your_file
Try this GNU sed command,
$ sed -r 's~^(.*\/).*$~\1~g' file
/start/one/two/
/start/one/two/three/
/start/one/
Through awk,
awk -F/ '{sub(/.*/,"",$NF); print}' OFS="/" file
I want to grep all results which contain over 70 percent of usage
Example of output:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":69,"dir":"/root"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":1,"dir":"/oradump"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Expected View after the grep:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Awk is more suited here:
$ awk -F'[:,]' '$6>70' file
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Or with Perl:
$ perl -ne'print if /"percentage":([0-9]+),/ and $1 > 70'
(no pesky seperator counting needed)
perl -F'[:,]' -ane 'print if $F[5]>70' file
GNU sed
sed -n '/:[0]\?70,/d;/:[0-1]\?[7-9][0-9],/p' file
I have a line:
<random junk>TYPE=snp;<more random junk>
and I need to return everything between the end of TYPE= and the ; (in this case snp but it could be any of a number of text strings.
I tried various sed / awk solutions but I can't seem to get it working. I have the feeling this is a simple problem so, sorry about that.
This seems to work:
sed 's/.*TYPE=\(.*\);.*/\1/'
EDIT:
Ah, so there can be semicolons in the random junk. Try this:
sed 's/.*TYPE=\([^;]*\);.*/\1/'
requires GNU grep:
grep -Po '(?<=TYPE=)[^;]+'
meaning: preceded by "TYPE=", find some non-semicolon characters
One way using GNU sed:
sed -r 's/.*TYPE=([^;]+).*/\1/' file.txt
Since you also tagged this awk:
$ text='<random junk>TYPE=snp;<more random junk>'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
$ text='foo=bar;baz=fnu;TYPE=snp;XAI=0;XAM=0'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
(Only using the variable to keep the lines from wrapping.)
Or, to parse this as set of variable=value pairs rather than just a string of text:
$ echo "$text" | awk -vRS=";" -F= '$1=="TYPE" {print $2}'
snp
You can also do this in pure bash, if you want:
$ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0"
$ t=${t#*TYPE=}
$ t=${t%%;*}
$ echo $t
snp
I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred
I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file
As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile
This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed
cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.