Way to get time string in mixed columns

Way to get time string in mixed columns - perl

Way to get time string in these mixed columns:
new1 new11 1.1.1.1 application id1223 831582 start 09:21:12 05/24/2013 -- --
new1 new11 1.1.1.1 application ffd1234 1085500 start -- -- 09:21:04 05/24/2013
Expected view:
09:21:12 05/24/2013
09:21:04 05/24/2013

I really think you need to show some effort. Anyway (my fault) I couldn't help trying to do it with grep:
grep -Eo '[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4}'
The idea is get data with the following format NN:NN:NN NN/NN/NNNN where N is a number. [0-9]{2} stands for 2 times [0-9].
Test
$ grep -Eo '[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{4}' file
09:21:12 05/24/2013
09:21:04 05/24/2013
Even shorter (thanks Jaypal):
grep -Eo '([0-9]{2}:){2}[0-9]{2} ([0-9]{2}/){2}[0-9]{4}'

perl -lne 'print $1 if(/(\d+:\d+:\d+\s+\d+\/\d+\/\d+)/)' your_file

This might work for you (GNU sed):
sed -r 's|.*(..:..:.. ../../....).*|\1|' file

sed -r 's/.*start(.*)/\1/;s/-| //g' file
or
awk '{gsub(/-/,"",$0);print $8,$9}' file

Print the columns in that range that aren't "--":
perl -lane 'print "#{[grep { $_ ne q(--) } #F[7..$#F] ]}"' file

Related

split header of string

I want to reformat the lines below. Please see input example and desired output. I have been messing around with awk without finding the correct solution
Input:
>1-672762
TGAGGTAGTAGGTTGTATGGTT
>2-240457
TGAGGTAGTAGGTTGTGTGGTT
>3-130231
TAGCAGCACGTAAATATTGGCG
>4-116485
TGAGGTAGTAGGTTGTATAGTT
Output (needs to be tab separated):
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

With perl :
$ perl -lne '/^>\d+-(\d+)/ or print "$_\t$1"' file
Output:
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

Another approach in perl ("-" is chr(055)):
perl -wln055e's/(\S+)\s+(\S+).*/$2\t$1/s and print'
or
perl -wlp055e'BEGIN{<>}s/(\S+)\s+(\S+).*/$2\t$1/s'

$ awk -F- '/>/{x=$2;next} {print $0 "\t" x}' file
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

This might work for you (GNU sed):
sed -r 'N;s/^[^-]*-(.*)\n(.*)/\2\t\1/' file

Remove all the characters from string after last '/'

I have the followiing input file and I need to remove all the characters from the strings that appear after the last '/'. I'll also show my expected output below.
input:
/start/one/two/stopone.js
/start/one/two/three/stoptwo.js
/start/one/stopxyz.js
expected output:
/start/one/two/
/start/one/two/three/
/start/one/
I have tried to use sed but with no luck so far.

You could simply use good old grep:
grep -o '.*/' file.txt
This simple expression takes advantage of the fact that grep is matching greedy. Meaning it will consume as much characters as possible, including /, until the last / in path.
Original Answer:
You can use dirname:
while read line ; do
echo dirname "$line"
done < file.txt
or sed:
sed 's~\(.*/\).*~\1~' file.txt

perl -lne 'print $1 if(/(.*)\//)' your_file

Try this GNU sed command,
$ sed -r 's~^(.*\/).*$~\1~g' file
/start/one/two/
/start/one/two/three/
/start/one/
Through awk,
awk -F/ '{sub(/.*/,"",$NF); print}' OFS="/" file

Grep numbers between colon and comma

I want to grep all results which contain over 70 percent of usage
Example of output:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":69,"dir":"/root"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":1,"dir":"/oradump"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Expected View after the grep:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},

Awk is more suited here:
$ awk -F'[:,]' '$6>70' file
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},

Or with Perl:
$ perl -ne'print if /"percentage":([0-9]+),/ and $1 > 70'
(no pesky seperator counting needed)

perl -F'[:,]' -ane 'print if $F[5]>70' file

GNU sed
sed -n '/:[0]\?70,/d;/:[0-1]\?[7-9][0-9],/p' file

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

I have a line:
<random junk>TYPE=snp;<more random junk>
and I need to return everything between the end of TYPE= and the ; (in this case snp but it could be any of a number of text strings.
I tried various sed / awk solutions but I can't seem to get it working. I have the feeling this is a simple problem so, sorry about that.

This seems to work:
sed 's/.*TYPE=\(.*\);.*/\1/'
EDIT:
Ah, so there can be semicolons in the random junk. Try this:
sed 's/.*TYPE=\([^;]*\);.*/\1/'

requires GNU grep:
grep -Po '(?<=TYPE=)[^;]+'
meaning: preceded by "TYPE=", find some non-semicolon characters

One way using GNU sed:
sed -r 's/.*TYPE=([^;]+).*/\1/' file.txt

Since you also tagged this awk:
$ text='<random junk>TYPE=snp;<more random junk>'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
$ text='foo=bar;baz=fnu;TYPE=snp;XAI=0;XAM=0'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
(Only using the variable to keep the lines from wrapping.)
Or, to parse this as set of variable=value pairs rather than just a string of text:
$ echo "$text" | awk -vRS=";" -F= '$1=="TYPE" {print $2}'
snp

You can also do this in pure bash, if you want:
$ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0"
$ t=${t#*TYPE=}
$ t=${t%%;*}
$ echo $t
snp

AWK/SED. How to remove parentheses in simple text file

I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred

I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file

As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile

This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02

This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02

Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed

cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Way to get time string in mixed columns - perl

Way to get time string in these mixed columns: new1 new11 1.1.1.1 application id1223 831582 start 09:21:12 05/24/2013 -- -- new1 new11 1.1.1.1 application ffd1234 1085500 start -- -- 09:21:04 05/24/2013 Expected view: 09:21:12 05/24/2013 09:21:04 05/24/2013

perl -lne 'print $1 if(/(\d+:\d+:\d+\s+\d+\/\d+\/\d+)/)' your_file

This might work for you (GNU sed): sed -r 's|.(..:..:.. ../../....).|\1|' file

sed -r 's/.start(.)/\1/;s/-| //g' file or awk '{gsub(/-/,"",$0);print $8,$9}' file

Print the columns in that range that aren't "--": perl -lane 'print "#{[grep { $_ ne q(--) } #F[7..$#F] ]}"' file

Related

split header of string

Remove all the characters from string after last '/'

Grep numbers between colon and comma

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

AWK/SED. How to remove parentheses in simple text file

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Way to get time string in mixed columns - perl

Way to get time string in these mixed columns: new1 new11 1.1.1.1 application id1223 831582 start 09:21:12 05/24/2013 -- -- new1 new11 1.1.1.1 application ffd1234 1085500 start -- -- 09:21:04 05/24/2013 Expected view: 09:21:12 05/24/2013 09:21:04 05/24/2013

perl -lne 'print $1 if(/(\d+:\d+:\d+\s+\d+\/\d+\/\d+)/)' your_file

This might work for you (GNU sed): sed -r 's|.*(..:..:.. ../../....).*|\1|' file

sed -r 's/.*start(.*)/\1/;s/-| //g' file or awk '{gsub(/-/,"",$0);print $8,$9}' file

Print the columns in that range that aren't "--": perl -lane 'print "#{[grep { $_ ne q(--) } #F[7..$#F] ]}"' file

Related

split header of string

Remove all the characters from string after last '/'

Grep numbers between colon and comma

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

AWK/SED. How to remove parentheses in simple text file

Categories

Resources

This might work for you (GNU sed): sed -r 's|.(..:..:.. ../../....).|\1|' file

sed -r 's/.start(.)/\1/;s/-| //g' file or awk '{gsub(/-/,"",$0);print $8,$9}' file