Efficiently Extract multiple log values using sed - sed

I almost have a working code of what I need, I just don't quite know all the syntax.
I apologize for my previous not understandable post so I will rewrite my request, hopefully easier to understand.
I have a message log of several hundred lines. In this log, there are two lines that I am concerned with extracting data from.
The two log lines in the log are:
2357: 11-Feb-2019 09:51:22 (low) [] 1369 floating point MIPS (Whetstone) per CPU
2358: 11-Feb-2019 09:51:22 (low) [] 5388 integer MIPS (Dhrystone) per CPU
I am extracting the values 1369 & 5388 from those two lines. The code I have created is:
proc=( $(boinccmd --get_messages | sed -n 's/\s*integer MIPS (Dhrystone) per CPU//p' | awk -F\ '{print $6}') )
printf "%s"${proc}"\n"
proc=( $(boinccmd --get_messages | sed -n 's/\s*floating point MIPS (Whetstone) per CPU//p' | awk -F\ '{print $6}') )
printf "%s"${proc}"\n"
But this sends sed out on a double fishing trip.
Is there a way I can make this more efficient by either using a different process or by having sed double-up and look for two things at the same time?
Thanks.

How about awk:
$ awk '{print $6}' file
1369
5388

If Procedural Text Edit is an option:
forEach line {
ifElse (or
(contains cs "floating point MIPS (Whetstone) per CPU")
(contains cs "integer MIPS (Dhrystone) per CPU")) {
select (afterN char 4) { remove }
} {
remove
}
}

Related

Using invert range in sed

I have this:
$ cat f2
123-foo-456
abc-xx
foo-yy
ddd-ao
abc
6778
123
This gives me: (#1)
$ sed -n -e '/456/,/ddd/{/ddd/{!s/a/A/g;!s/o/Q/g};p}' f2
123-foo-456
abc-xx
foo-yy
ddd-ao
And this gives me: (#2)
$ sed -n -e '/456/,/ddd/{/ddd/!{s/a/A/g;s/o/Q/g};p}' f2
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
I prefer #2 since it does what I wanted to get as output.
Can someone explain the difference between the two?
And a good source of documentation that explains the difference?
/ddd/{!s/a/A/g;!s/o/Q/g}
when ddd is on the line (working buffer)
execute sub code { ...}
never (!) address ( with empty adress it mean every line so on no lines) substitute (s/a/A/g) ...
So it do nothing
/ddd/!{s/a/A/g;s/o/Q/g}
when ddd is NOT on the line (working buffer) (! is for address/pattern /ddd/)
execute sub code { ...}
substitue (s/a/A/g), ...
It change a to A on line that does not contain ddd
There is no noteworthy difference between the 2. They are both unintelligible sequences of random characters that became obsolete in the mid-1970s when awk was invented and so should never be used. sed is for simple substitution on individual lines, that is all. If you're using more than s, g, and p (with -n) then you're using the wrong tool. Stop wasting your time on this and just use awk:
$ cat tst.awk
/456/ { f=1 }
f {
if (/ddd/) {
f=0
}
else {
gsub(/a/,"A")
gsub(/o/,"Q")
}
print
}
$ awk -f tst.awk file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
Clear, simple, concise, robust, efficient, portable and better in every other way than an equivalent sed solution.
Or if having everything squeezed onto one line is appealing to you:
$ awk '/456/{f=1}f{if(/ddd/)f=0;else{gsub(/a/,"A");gsub(/o/,"Q")}print}' file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
You COULD write the awk script in the same style as the sed script:
$ awk '/456/,/ddd/{if(!/ddd/){gsub(/a/,"A");gsub(/o/,"Q")}print}' file
123-fQQ-456
Abc-xx
fQQ-yy
ddd-ao
but then you get the duplicated conditions (/ddd/ twice) that come with using range expressions which is one reason why they should never be used. Fortunately, unlike sed, awk has variables and so you never need to write range expressions.

How to assign number for a repeating pattern

I am doing some calculations using gaussian. From the gaussian output file, I need to extract the input structure information. The output file contains more than 800 structure coordinates. What I did so far is, collect all the input coordinates using some combinations of the grep, awk and sed commands, like so:
grep -A 7 "Input orientation:" test.log | grep -A 5 "C" | awk '/C/{print "structure number"}1' | sed '/--/d' > test.out
This helped me to grep all the input coordinates and insert a line with "structure number". So now I have a file that contains a pattern which is being repeated in a regular fashion. The file is like the following:
structure Number
4.176801 -0.044096 2.253823
2.994556 0.097622 2.356678
5.060174 -0.115257 3.342200
structure Number
4.180919 -0.044664 2.251182
3.002927 0.098946 2.359346
5.037811 -0.103410 3.389953
Here, "Structure number" is being repeated. I want to write a number like "structure number:1", "structure number 2" in increasing order.
How can I solve this problem?
Thanks for your help in advance.
I am not familiar at all with a program called gaussian, so I have no clue what the original input looked like. If someone posts an example I might be able to give an even shorter solution.
However, as far as I got it the OP is contented with the output of his/her code besided that he/she wants to append an increasing number to the lines inserted with awk.
This can be achieved with the following line (adjusting the OP's code):
grep -A 7 "Input orientation:" test.log | grep -A 5 "C" | awk '/C/{print "structure number"++i}1' | sed '/--/d' > test.out
Addendum:
Even without knowing the actual input, I am sure that one can at least get rid of the sed command leaving that piece of work to awk. Also, there is no need to quote a single character grep pattern:
grep -A 7 "Input orientation:" test.log | grep -A 5 C | awk '/C/{print "structure number"++i}!/--/' > test.out
I am not sure since I cannot test, but it should be possible to let awk do the grep's work, too. As a first guess I would try the following:
awk '/Input orientation:/{li=7}!li{next}{--li}/C/{print "structure number"++i;lc=5}!lc{next}{--lc}!/--/' test.log > test.out
While this might be a little bit longer in code it is an awk-only solution doing all the work in one process. If I had input to test with, I might come up with a shorter solution.

How to combine three consecutive lines of text file in sed?

I have a file, that consists of a repeating sequence of three lines, that I want to merge together. Put in other words, I'd like to replace every but third \n into space. E.g. I'd like the transform input
href="file:///home/adam/MyDocs/some_file.pdf"
visited="2013-06-02T20:40:06Z"
exec="'firefox %u'"
href="file:///home/adam/Desktop/FreeRDP-WebConnect-1.0.0.167-Setup.exe"
visited="2013-06-03T08:50:37Z"
exec="'firefox %u'"
href="file:///home/adam/Friends/contact.txt"
visited="2013-06-03T16:01:16Z"
exec="'gedit %u'"
href="file:///home/adam/Pictures/Screenshot%20from%202013-06-03%2019:10:36.png"
visited="2013-06-03T17:10:36Z"
exec="'eog %u'"
into
href="file:///home/adam/MyDocs/some_file.pdf" visited="2013-06-02T20:40:06Z" exec="'firefox %u'"
href="file:///home/adam/Desktop/FreeRDP-WebConnect-1.0.0.167-Setup.exe" visited="2013-06-03T08:50:37Z" exec="'firefox %u'"
href="file:///home/adam/Friends/contact.txt" visited="2013-06-03T16:01:16Z" exec="'gedit %u'"
href="file:///home/adam/Pictures/Screenshot%20from%202013-06-03%2019:10:36.png" visited="2013-06-03T17:10:36Z" exec="'eog %u'"
Unfortunately the file is rather long, so I'd prefer not to load the whole file into memory and not to write to result back into file - just print the concatenated lines into the standard output so I can pipe it further.
I know that potentially sed might just work for it, but after I had given it a honest try, I am still at square one; the learning curve is just too steep for me. :-(
I did a rough benchmarking and I found out, that the sed variant is almost twice as fast.
time awk '{ printf "%s", $0; if (NR % 3 == 0) print ""; else printf " " }' out.txt >/dev/null
real 0m1.893s
user 0m1.860s
sys 0m0.028s
and
time cat out.txt | sed 'N;N;s/\n/ /g' > /dev/null
real 0m1.360s
user 0m1.264s
sys 0m0.236s
It is interesting: why does sed require more kernel time than awk?
The out.txt is 200MB long and the processor is Intel(R) Core(TM) i7-3610QM CPU # 2.30GHz on Linux-Mint 14 with kernel 3.8.13-030813-generic.
I need this in my effort to parse the recently-used.xbel, the recently opened files list in the Cinnamon
If you came here for this specific problem, this line should help you:
xpath -q -e "//bookmark[*]/#href | //bookmark[*]/#visited | //bookmark[*]/info/metadata/bookmark:applications[1]/bookmark:application[1]/#exec" recently-used.xbel | sed 's/href="\(.*\)"/"\1"/;N;s/visited="\(.*\)"/\1/;N;s/exec="\(.*\)"/"\1"/;s/\n/ /g' | xargs -n3 whatever-script-you-write
how about this:
sed 'N;N;s/\n/ /g' file
You can use awk to do this pretty easily:
awk '{ printf "%s", $0; if (NR % 3 == 0) print ""; else printf " " }' file
The basic idea is "print each line folowed by a space, unless it's every third line, in which case print a newline".

I want to print a text file in columns

I have a text file which looks something like this:
jdkjf
kjsdh
jksfs
lksfj
gkfdj
gdfjg
lkjsd
hsfda
gadfl
dfgad
[very many lines, that is]
but would rather like it to look like
jdkjf kjsdh
jksfs lksfj
gkfdj gdfjg
lkjsd hsfda
gadfl dfgad
[and so on]
so I can print the text file on a smaller number of pages.
Of course, this is not a difficult problem, but I'm wondering if there is some excellent tool out there for solving problems like these.
EDIT: I'm not looking for a way to remove every other newline from a text file, but rather a tool which interprets text as "pictures" and then lays these out on the page nicely (by writing the appropriate whitespace symbols).
You can use this python code.
tables=input("Enter number of tables ")
matrix=[]
file=open("test.txt")
for line in file:
matrix.append(line.replace("\n",""))
if (len(matrix)==int(tables)):
print (matrix)
matrix=[]
file.close()
(Since you don't name your operating system, I'll simply assume Linux, Mac OS X or some other Unix...)
Your example looks like it can also be described by the expression "joining 2 lines together".
This can be achieved in a shell (with the help of xargs and awk) -- but only for an input file that is structured like your example (the result always puts 2 words on a line, irrespective of how many words each one contains):
cat file.txt | xargs -n 2 | awk '{ print $1" "$2 }'
This can also be achieved with awk alone (this time it really joins 2 full lines, irrespective of how many words each one contains):
awk '{printf $0 " "; getline; print $0}' file.txt
Or use sed --
sed 'N;s#\n# #' < file.txt
Also, xargs could do it:
xargs -L 2 < file.txt
I'm sure other people could come up with dozens of other, quite different methods and commandline combinations...
Caveats: You'll have to test for files with an odd number of lines explicitly. The last input line may not be processed correctly in case of odd number of lines.

Bash alias for ls that prints multiple columns by "type"

I'm listing just the file basenames with an ls command like this, which I got from here:
ls --color -1 . | tr '\n' '\0' | xargs -0 -n 1 basename
I would like to list all the directories in the first column, all the executables in the next, all the regular files last (perhaps also with a column for each extension).
So the first (and main) "challenge" is to print multiple columns of different lengths.
Do you have any suggestions what commands I should be using to write that script? Should I switch to find? Or should I just write the script all in Perl?
I want to be able to optionally sort the columns by size too ;-) I'm not necessarily looking for a script to do the above, but perhaps some advice on ways to approach writing such a script.
#!/bin/bash
width=20
awk -F':' '
/directory/{
d[i++]=$1
next
}
/executable/{
e[j++]=$1
next
}
{
f[k++]=$1
}
END{
a[1]=i;a[2]=j;a[3]=k
asort(a)
printf("%-*.*s | \t%-*.*s | \t%-*.*s\n", w,w,"Directories", w,w,"Executables", w,w,"Files")
print "------------------------------------------------------------------------"
for (i=0;i<a[3];i++)
printf("%-*.*s |\t%-*.*s |\t%-*.*s\n", w,w,d[i], w,w,e[i], w,w,f[i])
}' w=$width < <(find . -exec file {} +)
Sample output HERE
This can be further improved upon by calculating what the longest entry is per-column and using that as the width. I'll leave that as an exercise to the reader