Consider the following input file
* Z1 Z2 A1pre A2pre A1post A2post I1pre I2pre I1gs I2gs Eexc1 Eexc2 n1 n2 TKEpre TKEpost
* Z1: Atomic number of first fragment
* Z2: Atomic number of second fragment
* A1pre: Pre-neutron mass number of first fragment
* A2pre: Pre-neutron mass number of second fragment
* A1post: Post-neutron mass number of first fragment
* A2post: Post-neutron mass number of second fragment
* I1pre: Spin of first fragment after scission
* I2pre: Spin of second fragment after scission
* I1gs: Ground-state spin of first fragment
* I2gs: Ground-state spin of second fragment
* Eexc1: Excitation energy of first fragment [MeV]
* Eexc2: Excitation energy of second fragment [MeV]
* n1: Prompt neutrons emitted from first fragment
* n2: Primpt neutrons emitted from second fragment
* TKEpre: Pre-neutron total kinetic energy [MeV]
* TKEpost: Post-neutron total kinetic energy [MeV]
* Calculation with nominal model parameters
38 54 97 138 94 136 5.5 9.0 0.0 0.0 21.72 14.78 3 2 159.43 156.00
39 53 101 134 99 133 2.5 4.0 2.5 3.5 17.12 12.93 2 1 166.45 161.75
36 56 92 143 91 143 3.0 11.5 2.5 2.5 12.27 7.81 1 0 170.00 168.44
38 54 93 142 92 141 3.5 7.0 0.0 2.5 13.94 9.81 1 1 168.95 163.96
40 52 99 136 98 135 2.5 6.0 0.0 3.5 9.28 13.10 1 1 177.04 172.75
40 52 100 135 98 134 0.0 6.5 0.0 0.0 14.74 10.13 2 1 176.61 173.55
What I want to do is to print specific columns following the Calculation with nominal model parameters pattern.
So far I've tried
awk '{/Calculation with nominal model parameters/;line=FNR}{for(i=0;i<=NR-19;++i){getline;print $5 "\t" $6 "\t" $16 "\t" $16*$6/($5+$6) "\t" $16*$5/($5+$6)}}{print line}' input
and my output is more or less what I want
88 144 159.10 98.7517 60.3483
87 146 164.87 103.309 61.5609
92 141 163.96 99.2204 64.7396
98 135 172.75 100.091 72.6588
98 134 173.55 100.24 73.3099
98 134 173.55 100.24 73.3099
19
As you can see the two last lines are identical. My goal is to avoid calculating the same thing twice. To achieve that I thought that I should finish the loop after NR-NFR_pattern, that's why I used line variable.
The weird thing is that even if I hardcode the pattern's NFR I don't get the desired output.
The weirdest thing is that if I replace the hardcoded NFR with the variable that has the NFR information i.e.
awk '{/Calculation with nominal model parameters/;line=FNR}{for(i=0;i<=NR-line;++i){getline;print $5 "\t" $6 "\t" $16 "\t" $16*$6/($5+$6) "\t" $16*$5/($5+$6)}}' input
I get the error
awk: (FILENAME=input FNR=2) fatal: division by zero attempted
Any idea how to solve this?
What you are missing, I think, is that getline changes NR:
getline Set $0 from next input record; set NF, NR, FNR, RT.
I wouldn't use getline at all.
awk 'isdata{print $5 "\t" $6 "\t" $16 "\t" $16*$6/($5+$6) "\t" $16*$5/($5+$6)}
/Calculation with nominal model parameters/{isdata="yes"}' input
I get:
94 136 156.00 92.2435 63.7565
99 133 161.75 92.7274 69.0226
91 143 168.44 102.936 65.5044
92 141 163.96 99.2204 64.7396
98 135 172.75 100.091 72.6588
98 134 173.55 100.24 73.3099
Related
AWk experts, I have a file as descried below and I wonder if it is possible to easily convert it to the form that I want:
The file containing multiple variables over one month (one observance ONLY in one day, but some days may be missing). The format for each day is the same except the date/value. However there is some description lines (containing words and numbers) at the end of each day, and the number of description lines varies among different days.
KBO BTA Observations at 12Z 01 Feb 2020
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1000.0 92
925.0 765
850.0 1516
754.0 2546 13.0 9.3 78 9.85 150 2 310.2 340.6 312.0
752.0 2569 14.0 9.2 73 9.80 149 2 311.5 342.0 313.4
700.0 3173 -9.20 7.5 89 9.38 120 6 312.6 341.9 314.4
Station information and sounding indices
Station elevation: 2546.0
Lifted index: 1.83
Pres [hPa] of the Lifted Condensation Level: 693.42
1000 hPa to 500 hPa thickness: 5798.00
Precipitable water [mm] for entire sounding: 21.64
8022 KBO BTA Observations at 00Z 02 Feb 2020
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1000.0 97
925.0 758
850.0 1515
753.0 2546 10.8 6.8 76 8.30 190 3 307.9 333.4 309.5
750.0 2580 12.6 7.9 73 8.99 186 3 310.2 338.1 311.9
Here is what I want: remove all the description lines and read the date/time information and put it as the first column.
Time PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
20200201t12Z 754.0 2546 13.0 9.3 78 9.85 150 2 310.2 340.6 312.0
20200201t12Z 752.0 2569 14.0 9.2 73 9.80 149 2 311.5 342.0 313.4
20200201t12Z 700.0 3173 -9.2 7.5 89 9.38 120 6 312.6 341.9 314.4
20200202t00Z 753.0 2546 10.8 6.8 76 8.30 190 3 307.9 333.4 309.5
20200202t00Z 750.0 2580 12.6 7.9 73 8.99 186 3 310.2 338.1 311.9
Any help is appreciated.
Kelly
something like this...
$ awk 'function m(x)
{return sprintf("%02d",int(index("JanFebMarAprMayJunJulAugSepOctNovDec",x)-1)/3+1)}
NR==1 {print "time PRES TEMP WDIR WSPD RELH"}
/^-+$/ {f=!f}
f {date=p[n] m(p[n-1]) p[n-2]}
!f {n=split($0,p)}
NF==11 && !/[^ 0-9.-]/ {print date,$0}' file | column -t
time PRES TEMP WDIR WSPD RELH
20200201 1000 10 230 5 90
20200201 900 9 200 6 85
20200201 800 9 100 6 87
20200202 1000 9.2 233 5 90
20200202 900 9.1 200 4 80
20200202 800 9 176 2 80
Explanation
function just returns the month number from the month string by looking up the index of and converting to formatted number
f keeps track of the dashed lines so that from the previous line we can parse the date,
finally to find the data lines the heuristic is number of fields and no non-number signs (digits, spaces, dots or negative signs).
$ cat tst.awk
/^-+$/ && ( ((++dashCnt) % 2) == 1 ) {
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",p[n-1])+2)/3
time = sprintf("%04d%02d%02d", p[n], mthNr, p[n-2])
}
/^[[:upper:][:space:]]+$/ && !doneHdr++ { print "Time", $0 }
/^[0-9.[:space:]]+$/ { print time, $0 }
{ n = split($0,p) }
.
$ awk -f tst.awk file | column -t
Time PRES TEMP WDIR WSPD RELH
20200001 1000 10 230 5 90
20200001 900 9 200 6 85
20200001 800 9 100 6 87
20200002 1000 9.2 233 5 90
20200002 900 9.1 200 4 80
20200002 800 9 176 2 80
I am joining 3 tables to get the retention rate. Here is my query:
select first_visit.first_month as first_month,
new_users.new_users as new_users,
count(distinct visit_tracker.customer__id) as retained,
cast(count(distinct visit_tracker.customer__id) / new_users.new_users as float) as retention_percent
from first_visit
left join visit_tracker
on visit_tracker.customer__id=first_visit.customer__id
left join new_users
on new_users.first_month=first_visit.first_month
group by 1,2;
I get the following output:
first_month new_users retained retention_percent
0 93 34 0
1 119 42 0
2 188 102 0
3 223 71 0
and so on
What I want is this:
first_month new_users retained retention_percent
0 93 34 0.37
1 119 42 0.35
2 188 102 0.54
3 223 71 0.32
I am not sure why it's not producing the results I want. Any inputs?
This looks like a classic case of an integer division problem.
In this case count(distinct visit_tracker.customer__id) will return an integer which is then divided by a float. It looks like the float is cast into an integer and the result of the division is therefore an integer. Because the expected answer is less than one, it truncates to zero. The as float part of your query will not help as this happens after the truncation has already occured.
Try making sure both the numerator and the denominator are floats before performing the division or multiply by 100 beforehand as this stackoverflow answer suggests.
I have a file that looks like this (with real data and much bigger):
A B C D E F G H I
1 105.28 1 22 84 2 10.55 21 2
2 357.01 0 32 34 1 11.43 28 1
3 150.23 3 78 22 0 12.02 11 0
4 357.01 0 32 34 1 11.43 28 1
5 357.01 0 32 34 1 11.43 28 1
6 357.01 0 32 34 1 11.43 28 1
...
17000 357.01 0 32 34 1 11.43 28 1
I want to import all the numerical value into a matrix, skipping the headlines. For that purpose I use this code:
Filename = 'test.txt';
A = dlmread(Filename,' ',1,0); %Imports the whole data into a matrix
The problem with this is just that A is a 17 000 * 1 vector instead of a matrix with several columns. If I manual edit the data file, remove the headlines and just run this it works:
A = dlmread(Filename); %Imports the whole data into a matrix
But I would prefer not to do this since the headlines are used later on in the code. Any advice how to get this work?
edit: solved by using
' '
instead of just
' '
Use the import tool.
Make sure you choose the data.
Generate script.
pre allocation of struct in matlab is a problem.
Please see following code in matlab profiler
time calls line
2 65 sizeofTLS= 10000;
< 0.01 2 66 LaserS(sizeofTLS).POI(n)={0};
0.03 2 67 LaserS(sizeofTLS).dis(n)={0};
0.04 2 68 LaserS(sizeofTLS).plane(n)={0};
69
70
< 0.01 2 71 for it=1:sizeofTLS
16.74 2823212 72 LaserS(it).POI(1:n)={0};
16.91 2823212 73 LaserS(it).dis(1:n)={0};
16.88 2823212 74 LaserS(it).plane(1:n)={0};
1.04 2823212 75 end
How can I improve it(72,73,74)?
The best way to preallocate structs is with the following syntax:
myEmptyCell = num2cell( zeros(1,n) );
b = repmat( struct('POI', myEmptyCell ,...
'dis', myEmptyCell,...
'plane', myEmptyCell ) , sizeofTLS, 1 );
This is ~10x faster than not preallocating on my laptop.
I'm fairly sure there should be an elegant solution to this (in MATLAB), but I just can't think of it right now.
I have a list with [classIndex, start, end], and I want to collapse consecutive class indices into one group like so:
This
1 1 40
2 46 53
2 55 55
2 57 64
2 67 67
3 68 91
1 94 107
Should turn into this
1 1 40
2 46 67
3 68 91
1 94 107
How do I do that?
EDIT
Never mind, I think I got it - it's almost like fmarc's solution, but gets the indices right
a=[ 1 1 40
2 46 53
2 55 55
2 57 64
2 67 67
3 68 91
1 94 107];
d = diff(a(:,1));
startIdx = logical([1;d]);
endIdx = logical([d;1]);
b = [a(startIdx,1),a(startIdx,2),a(endIdx,3)];
Here is one solution:
Ad = find([1; diff(A(:,1))]~=0);
output = A(Ad,:);
output(:,3) = A([Ad(2:end)-1; Ad(end)],3);
clear Ad
One way to do it if the column in question is numeric:
Build the differences along the id-column. Consecutive identical items will have zero here:
diffind = diff(a(:,1)');
Use that to index your array, using logical indexing.
b = a([true [diffind~=0]],:);
Since the first item is always included and the difference vector starts with the difference from first to second element, we need to prepend one true value to the list.