Multiple mathematical operations on a file containing numbers

Multiple mathematical operations on a file containing numbers - sed

I have extracted the following data using 'grep' & 'sed' pipes from a file and now I want to perform a mathematical equation on the last two numbers, delete them and replace them with a single number.
Mathematical operations
Add the numbers together
divide by 2
multiply by 141
ROUNDUP to whole number
File Data
AJ29 IO_0_VRN_10 77.234 78.011
AJ30 IO_L1P_T0_100M 89.886 90.789
AJ31 IO_L1N_T0_100S 101.388 102.406
AK29 IO_L2P_T0_101M 66.163 66.828
AL29 IO_L2N_T0_101S 63.626 64.266
So the line starting AJ29 should appear as:
AJ29 IO_0_VRN_10 10945
I could put it in MS excel / Open Office calc and do this but want to avoid MS and keep it in a single linux script if it is possible. Hope you can help. The script I have so far is below and ideally I'd like to add a few more pipes to achieve this.
grep IOB xc7vx690tffg1930.pkg | sed 's/pin//g' | sed 's/IOB_[A-Za-z0-9]*//g' | sed 's/ /-/g' | sed 's/\t//g' | sed 's/^[-]*//g' | sed 's/-/ /g' | sed 's/ [0-9][0-9] //g' | sed 's/[[:space:]]\+/,/g' | sed 's/,X[0-9A-Z]*,//g' | sed 's/,[0-9]*[A-Z],//g' | sed 's/N\.A\.,/,/g' | sed 's/,$//g' | sed 's/,/ /g'

For calculations, use awk!
$ awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 141); NF--}1' file
AJ29 IO_0_VRN_10 10945
AJ30 IO_L1P_T0_100M 12738
AJ31 IO_L1N_T0_100S 14367
AK29 IO_L2P_T0_101M 9376
AL29 IO_L2N_T0_101S 9016
This replaces the penultimate field with the result of (penultimate*last)/2 * 141). To make it round, we use %.0f format as indicated in Awk printf number in width and round it up.
Also, it looks to me that you are piping way too many things: I counted one call to grep and 13 (!) to sed. You can probably use sed -e 'first block' -e 'second block' ... instead.
Explanation
In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
{...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
$(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
{$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
{...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".

Related

How to substitute with basic regex with alternating signs?

I want to do the following to all of the statements in the file:
Input: xblahxxblahxxblahblahx
Output: <blah><blah><blahblah>
So far I am thinking of using sed -i 's/x/</g' something.ucli

You can use
sed 's/x\([^x]*\)x/<\1>/g'
Details:
x - an x
\([^x]*\) - Group 1 (\1 refers to this group value from the replacement pattern): zero or more (*) chars other than x ([^x])
x - an x
See the online demo:
#!/bin/bash
s='xblahxxblahxxblahblahx'
sed 's/x\([^x]*\)x/<\1>/g' <<< "$s"
# => <blah><blah><blahblah>
If x is a multichar string, e.g.xyz, it will be easier with perl:
perl -pe 's/xyz(.*?)xyz/<$1>/g'
See this online demo.

Understanding ibase and obase used

I'm trying to solve the following exercise:
Write a command line that takes numbers from variables FT_NBR1, in ’\"?! base, and FT_NBR2, in mrdoc base, and displays the sum of both in gtaio luSnemf base.
I know the solution is:
echo $FT_NBR1 + $FT_NBR2 | sed 's/\\/1/g' | sed 's/?/3/g' | sed 's/!/4/g' | sed "s/\'/0/g" | sed "s/\"/2/g" | tr "mrdoc" "01234" | xargs echo "ibase=5; obase=23;" | bc | tr "0123456789ABC" "gtaio luSnemf"
I don't understand why ibase=5 and obase=23.
I read about ibase and obase, and I understand this is a base conversion, from base 5 to base 23. Anyone can explain me why 5 and 23. Thank you

The exercise description is a bit weird. A better one would be
Write a command line that takes numbers from variables FT_NBR1, with numbers represented by the letters "’\"?!", and FT_NBR2, represented by "mrdoc", and displays the sum of both with numbers represented by "gtaio luSnemf".
A shorter answer would be
echo $FT_NBR1 + $FT_NBR2 | tr "\'\\\\\"\?" "01234" | tr "mrdoc" "01234" | xargs echo "ibase=5; obase=23;" | bc | tr "0123456789ABC" "gtaio luSnemf"
Let's take it from the beginning:
echo $FT_NBR1 + $FT_NBR2 creates the expression using the input strings
tr "\'\\\\\"\?" "01234" translates the first input alphabet into numbers
tr "mrdoc" "01234" translates the second input alphabet into numbers
xargs echo "ibase=5; obase=23;" prepends number base information; the input base is 5 and the output base is 13, but obase must be expressed in the base of ibase and 13 in base 5 is 23.
bc does the actual calculation
tr "0123456789ABC" "gtaio luSnemf" does the translation into the output alphabet.

Extracting values from a single file

I have a file with multiple lines; but a specific line contains tons of information, with several repeated expressions. I'm trying to extract some specific values. I first tried some commands with sed, for instance, but with no success. So, I was wondering if you could give me some insights.
So, here you have one fraction of the unique line of the given document I mentioned:
[...]6[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01},DLOOP.rate_median=0.04131395026396427,length=
[...]
10[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61},DLOOP.rate_median=0.04131395026396427,length=
[...]
My aim here is first to extract all the values that is between the brackets, after "habitat.set.prob={". and put them in a single line in a text file.
Also, it would be important to extract the numbers that appears just before the expression "[&length_range=]", which in this case are "6" and "10". They are the label of the set of numbers after "prob={"
So the set of numbers I want to extract always appears between "habitat.set.prob={" and "},DLOOP.rate_median", while the other number (the label) is always rigth before "[&length_range="; but what is before the label is not the same expression; actually it is a random number.
The goal then is end up with a file with the following characteristcs:
6 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
10 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
and so on …
What do you think? Is this possible?
I started with this very basic command at least to try to extract the set of numbers, but it didn't work
sed -n "/habitat.set.prob={/,/},DLOOP.rate_median=/ p"
| Well... I got some improvement.
I was able to get the values at least:
awk '{gsub("habitat.set.prob={","\n");printf"%s",$0}' filename | awk -F'},' '{print $1"}"}' | grep -iv "TREE" > stats.txt
|
Many thanks in advance.
Cheers,
Luiz

Something like that:
sed -rn '/.*[0-9]+\[&length_range=\{/,/habitat.set.prob=\{/{s/.*\b([0-9]+)\[&length_range.*/\1/p; s/.*habitat.set.prob=\{([^D]+)\},DLOOP.rate.*/\1/p}' habitat
6
0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01
10
0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
The first part '/.a./,/.b./' searches from pattern a to b, distributed over multiple lines. The -n told sed to do non-printing as default.
In '/.a./,/.b./{s/.c./.d./p; s/.e./.f./p}'
there are two substitution commands with p=print in curly braces.

I am not sure if you really digged a little, so not providing the complete answer, but let's hope this would help you:
for the first part: getting the no(which you call as label) you didn't mention if there is any specific pattern, so try this (data is the file which contains the actual input) - you need to work on how to get the number and tweak the RE a bit
sed -n 's/.*\([0-9][0-9]*\).*length_range.*/\1/p' data
For the other part which gives the numericals between habitat and DLOOP:
sed -n 's/.*habitat.set.prob=\(.*\),DLOOP.*/\1/pg' data | tr '{' ' ' | tr '}' ' '
Now, try to take this as a starter and work on your output to get your desired result!
To explain a bit:
In the first section - I am trying to capture the numericals between anything(.*) and (.*)length_range [you can escape the character [ and & by using \ in front of them]
In the second section: I am capturing pattern in between habitat.set.prob and DLOOP and then doin a tr to remove the brackets.

#include <iostream>
using namespace std;
int main()
{
string p = "1:2:3:4"; //input your string
int arr[4] = {}; //create a new empty integer array to put the integers in it
for(int i=0, j=0; i <p.length(); i++){//loop on the string to extract integers
if( p[i] == ':'){continue;}//if the value = ':' skip it and continue
arr[j]=(int)p[i]-48;j++;//put the integer in the array we created
}
cout << "String={"<<arr[0]<<" "<<arr[1]<<" "<<arr[2]<<" "<<arr[3]<<"}";//print the array
return 0;
}

Add leading 0 in sed substitution

I have input data:
foo 24
foobar 5 bar
bar foo 125
and I'd like to have output:
foo 024
foobar 005 bar
bar foo 125
So I can use this sed substitutions:
s,\([a-z ]\+\)\([0-9]\)\([a-z ]*\),\100\2\3,
s,\([a-z ]\+\)\([0-9][0-9]\)\([a-z ]*\),\10\2\3,
But, can I make one substitution, that will do the same? Something like:
if (one digit) then two leading 0
elif (two digits) then one leading 0
Regards.

I doubt that the "if - else" logic can be incorporated in one substitution command without saving the intermediate data (length of the match for instance). It doesn't mean you can't do it easily, though. For instance:
$ N=5
$ sed -r ":r;s/\b[0-9]{1,$(($N-1))}\b/0&/g;tr" infile
foo 00024
foobar 00005 bar
bar foo 00125
It uses recursion, adding one zero to all numbers that are shorter than $N digits in a loop that ends when no more substitutions can be made. The r label basically says: try to do substitution, then goto r if found something to substitute. See more on flow control in sed here.

Use two substitute commands: the first one will search for one digit and will insert two zeroes just before, and the second one will search for a number with two digits and will insert one zero just before. GNU sed is needed because I use the word boundary command to search for digits (\b).
sed -e 's/\b[0-9]\b/00&/g; s/\b[0-9]\{2\}\b/0&/g' infile
EDIT to add a test:
Content of infile:
foo 24 9
foo 645 bar 5 bar
bar foo 125
Run previous command with following output:
foo 024 009
foo 645 bar 005 bar
bar foo 125

Add the max number of leading zeros first, then take this number of characters from the end:
echo 55 | sed -e 's:^:0000000:' -e 's:0\+\(.\{8\}\)$:\1:'
00000055

You seem to have the sed options covered, here's one way with awk:
BEGIN { RS="[ \n]"; ORS=OFS="" }
/^[0-9]+$/ { $0 = sprintf("%03d", $0) }
{ print $0, RT }

I find the following sed approach to pad an integer number with zeroes to 5 (n) digits quite straighforward:
sed -e "s/\<\([0-9]\{1,4\}\)\>/0000\1/; s/\<0*\([0-9]\{5\}\)\>/\1/"
If there is at least one, at most 4 (n-1) digits, add 4 (n-1) zeroes in
front
If there is any number of zeroes followed by 5 (n) digits after the first transformation, keep just these last 5 (n) digits
When there happen to be more than 5 (n) digits, this approach behaves the usual way -- nothing is padded or trimmed.
Input:
0
1
12
123
1234
12345
123456
1234567
Output:
00000
00001
00012
00123
01234
12345
123456
1234567

This might work for you (GNU sed):
echo '1.23 12,345 1 12 123 1234 1' |
sed 's/\(^\|\s\)\([0-9]\(\s\|$\)\)/\100\2/g;s/\(^\|\s\)\([0-9][0-9]\(\s\|$\)\)/\10\2/g'
1.23 12,345 001 012 123 1234 001
or perhaps a little easier on the eye:
sed -r 's/(^|\s)([0-9](\s|$))/\100\2/g;s/(^|\s)([0-9][0-9](\s|$))/\10\2/g'

Wordnet synsets using perl

I installed Wordnet::Similarity and Wordnet::QueryData as an easy way to calculate information content score and probability that comes with these modules. But I'm stuck at this basic problem: given a word, print n words similar to it - which should not be difficult that iterating through the synsets and doing join.
using the wn command and piping it with a whole lot of tr, sort | uniq I can get all the words:
wn cat -synsn | grep -v Sense | tr '=' ' ' | tr '>' ' ' | tr '\t' ' ' | tr ',' '\n' | sort | uniq
OUTPUT
8 senses of cat
adult female
adult male
African tea
Arabian tea
big cat
bozo
cat
cat
CAT
Caterpillar
cat-o'-nine-tails
computed axial tomography
computed tomography
computerized axial tomography
computerized tomography
CT
excitant
felid
feline
gossip
gossiper
gossipmonger
guy
hombre
kat
khat
man
newsmonger
qat
quat
rumormonger
rumourmonger
stimulant
stimulant drug
Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun cat
tracked vehicle
true cat
whip
woman
X-radiation
X-raying
but its kinda nasty,and needs further clean up.
What my script looks like is below, and what I want to get is all the words in cat#n1...8.
SCRIPT
use WordNet::QueryData;
my $wn = WordNet::QueryData->new( noload => 1);
print "Senses: ", join(", ", $wn->querySense("cat#n")), "\n";
print "Synset: ", join(", ", $wn->querySense("cat", "syns")), "\n";
print "Hyponyms: ", join(", ", $wn->querySense("cat#n#1", "hypo")), "\n";
OUTPUT:
Senses: cat#n#1, cat#n#2, cat#n#3, cat#n#4, cat#n#5, cat#n#6, cat#n#7, cat#n#8
Synset: cat#n, cat#v
Hyponyms: domestic_cat#n#1, wildcat#n#3
SCRIPT
use WordNet::QueryData;
my $wn = WordNet::QueryData->new;
foreach $word (qw/cat#n/) {
#senses = $wn->querySense($word);
foreach $wps (#senses) {
#gloss = $wn -> querySense($wps, "syns");
print "$wps : #gloss\n";
}
}
OUTPUT:
cat#n#1 : cat#n#1 true_cat#n#1
cat#n#2 : guy#n#1 cat#n#2 hombre#n#1 bozo#n#2
cat#n#3 : cat#n#3
cat#n#4 : kat#n#1 khat#n#1 qat#n#1 quat#n#1 cat#n#4 Arabian_tea#n#1 African_tea#n#1
cat#n#5 : cat-o'-nine-tails#n#1 cat#n#5
cat#n#6 : Caterpillar#n#2 cat#n#6
cat#n#7 : big_cat#n#1 cat#n#7
cat#n#8 : computerized_tomography#n#1 computed_tomography#n#1 CT#n#2 computerized_axial_tomography#n#1 computed_axial_tomography#n#1 CAT#n#8
P.S.
I have never written perl before, but have been looking into perl scripts since morning - and can now understand the basic stuff. Just need to know if there is cleaner way to do this using the api docs - couldn't figure out from the api or usergroup archives.
Update:
I think I'll settle with:
wn cat -synsn | sed '1,6d' |sed 's/Sense [[:digit:]]//g' | sed 's/[[:space:]]*=> //' | sed '/^$/d'
sed rocks!

I think you'll find the following hepful...
http://marimba.d.umn.edu/WordNet-Pairs/
What are the N most similar words to X, according to WordNet?
This data seeks to answer that question, where similarity is based on
measures from WordNet::Similarity. http://wn-similarity.sourceforge.net
-------------- verb data
These files were created with WordNet::Similarity version 2.05 using
WordNet 3.0. They show all the pairwise verb-verb similarities found
in WordNet according to the path, wup, lch, lin, res, and jcn measures.
The path, wup, and lch are path-based, while res, lin, and jcn are based
on information content.
As of March 15, 2011 pairwise measures for all verbs using the six
measures above are availble, each in their own .tar file. Each *.tar
file is named as WordNet-verb-verb-MEASURE-pairs.tar, and is approx
2.0 - 2.4 GB compressed. In each of these .tar files you will find
25,047 files, one for each verb sense. Each file consists of 25,048 lines,
where each line (except the first) contains a WordNet verb sense and the
similarity to the sense featured in that particular file. Doing
the math here, you find that each .tar file contains about 625,000,000
pairwise similarity values. Note that these are symmetric (sim (A,B)
= sim (B,A)) so you have a bit more than 300 million unique values.
-------------- noun data
As of August 19, 2011 pairwise measures for all nouns using the path
measure are available. This file is named WordNet-noun-noun-path-pairs.tar.
It is approximately 120 GB compressed. In this file you will find
146,312 files, one for each noun sense. Each file consists of
146,313 lines, where each line (except the first) contains a WordNet
noun sense and the similarity to the sense featured in that particular
file. Doing the math here, you find that each .tar file contains
about 21,000,000,000 pairwise similarity values. Note that these
are symmetric (sim (A,B) = sim (B,A)) so you have around 10 billion
unique values.
We are currently running wup, res, and lesk, but do not have an
estimated date of availability yet.

Put this is a script, say synonym.sh
wn $1 -synsn | sed '1,6d' |sed 's/Sense [[:digit:]]//g' | sed 's/[[:space:]]*=> //' | sed '/^$/d' | sed 's/ //g' | grep -iv $1 | tr '\n' ','
wn $1 -synsv | sed '1,6d' |sed 's/Sense [[:digit:]]//g' | sed 's/[[:space:]]*=> //' | sed '/^$/d' | sed 's/ //g' | grep -iv $1 | tr '\n' ',';echo
From your perl script
system("/path/synonym.sh","kittens");
system("/path/synonym.sh","cats");

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Multiple mathematical operations on a file containing numbers - sed

Related

How to substitute with basic regex with alternating signs?

Understanding ibase and obase used

Extracting values from a single file

Add leading 0 in sed substitution

Wordnet synsets using perl

Categories

Resources