Matlab: sum for two time series dataset? - matlab

I like to sum up the observations of two time series datasets when both of them have observations based on YEAR and DOY. I also like to skip the sum when either dataset has 0 and only do the sum for the maximum observation in that DOY.
Here is the example: data_1 and data_2.
data_1
% YEAR DOY OBS_1
1994 109 0.42
1994 110 0.73
1994 111 0.69
1994 113 0.8
1994 114 0.43
1994 115 0.75
1994 123 0.6
1994 127 0.2
1994 131 0.44
1994 131 0.43
1994 131 0.63
1994 132 0.99
1994 132 0.51
1994 133 0.71
1994 133 0.99
1994 134 0.65
1994 134 0.69
1994 134 0.97
1994 134 0.03
1994 134 0
1994 134 0
1994 135 0.68
1994 135 0.72
1994 136 1.22
1994 136 0
1994 136 0
1994 136 1.28
1994 136 1.34
data_2:
% YEAR DOY OBS_2
1994 110 0.92
1994 111 0.34
1994 113 0.42
1994 114 0.37
1994 115 0.38
1994 122 0.22
1994 127 0.32
1994 131 0.34
1994 131 0.2
1994 132 0.51
1994 132 0.43
1994 132 0.4
1994 133 0.4
1994 134 0.32
1994 134 0.39
1994 135 0.35
1994 135 0.38
1994 135 0.34
1994 135 1.83
1994 135 0.22
1994 135 0.36
1994 135 0.39
1994 135 0.24
1994 135 0.39
1994 136 0.42
1994 136 0.29
1994 136 0.3
1994 136 0.4
1994 136 0.54
1994 136 0.4

Here is a first attempt:
%# maximum day of year
sz = max(max(data_1(:,2)),max(data_2(:,2)));
doy = (1:sz)';
%# get max value for each DOY in each dataset
v1 = accumarray(data_1(:,2), data_1(:,3), [sz 1], #max);
v2 = accumarray(data_2(:,2), data_2(:,3), [sz 1], #max);
%# compute the sum
v = v1 + v2;
%# keep entries where none of the values were zeros
idx = (v1~=0 & v2~=0);
v = [doy(idx(:)) v(idx(:))];
The result:
>> v
v =
110 1.65
111 1.03
113 1.22
114 0.8
115 1.13
127 0.52
131 0.97
132 1.5
133 1.39
134 1.36
135 2.55
136 1.88
I didn't take the year field into account, since its 1994 across all your data...

Related

querying table inside a table in kdb

fellow q mortals!
I am stuck on a pretty unusual problem in kdbq+. Essentially I have a table that has a column of tables.
Below is the main table called full_tab
time bmm $
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------$
2020.08.12D00:06:12.049002000 +`offerid`source_id`sub1`impc`question_id`offer`total_click`rpc`revenue`rpm!(789 128 3 149 111 523 1037 852f;70995 70995 70995 70995 70995 70995 70995 70995f;31 31 31 31 31 31 31 31f;1 2 21 1 0N 0N 0N 0N;956 6$
2020.08.12D00:10:48.186445000 +`offerid`source_id`sub1`impc`question_id`offer`total_click`rpc`revenue`rpm!(789 128 3 149 111 523 1037 852f;70995 70995 70995 70995 70995 70995 70995 70995f;31 31 31 31 31 31 31 31f;3 7 55 5 0N 0N 0N 0N;956 6$
2020.08.12D00:15:50.596247000 +`offerid`source_id`sub1`impc`question_id`offer`total_click`rpc`revenue`rpm!(789 128 3 149 111 523 1037 852f;70995 70995 70995 70995 70995 70995 70995 70995f;31 31 31 31 31 31 31 31f;4 10 81 5 0N 0N 0N 0N;956 $
...
each row in bmm column is a table that looks like below
offerid source_id sub1 impc question_id offer total_click rpc revenue rpm
---------------------------------------------------------------------------------------------------------------------------
789 70995 31 1 956 "aaaa" 1 0 0 0
128 70995 31 2 698 "bbb" 2 0.4 0.8 400
3 70995 31 21 818 "ccc" 10 1.0575 10.575 503.5714
149 70995 31 1 941 "ddd" 1 0.4 0.4 400
111 70995 31 "eee" 10 1.057 10.575
523 70995 31 "fff" 1 0.4 0.4
1037 70995 31 "ggg" 1 0.4 0.4
852 70995 31 "hhh" 1 0.4 0.4
what I want is a final table that looks like below. From the full_tab I am trying to extract time column and from the corresponding bmm row extract the bmm[;`rpm] value that corresponds to a particular bmm[;`question_id], for the case below its question_id = 818
time q818
---------------------------------------------
2020.08.12D00:06:12.049002000 503.5714
2020.08.12D00:10:48.186445000 510.665
2020.08.12D00:15:50.596247000 533.445
...
I tried to pull the using the statement below
select time, q818: first each bmm[;`rpm][;(where each bmm[;`question_id]=818)] from full_tab;
but the above doesnt seem to work! :(
I think you could use something like the below:
q)getQID:{[t;qid] select time,q818:{[t;qid]exec rpm from t where question_id=qid}[;qid]'[bbm] from t}
q)getQID[full_tab;818]
time q818
-------------------------------------
2014.08.30D03:40:50.876084992 503.75
2008.06.26D08:14:03.717355744 510.665

extract the information from a matrix with three columns

I have a matrix with three columns
https://www.dropbox.com/s/jckdmg1p05v8lv7/y.mat?dl=0
i.e.
E1 E2 W
6 1464 0.36
6 1534 0.27
6 1585 0.27
8 1331 0.332
11 445 0.39
13 844 0.286
14 12 0.126
18 952 0.31
19 2376 0.32
20 394 0.22
20 399 0.22
20 589 0.22
21 321 0.22
21 1187 0.22
21 2509 0.22
22 1187 0.22
23 2235 0.22
24 2376 0.22
25 541 0.14
26 229 0.22
26 321 0.22
26 1187 0.22
26 2054 0.22
27 394 0.53
27 541 0.31
28 394 0.22
28 781 0.22
I used this condition
for k=1:size(y,1)
G(y(k,1),y(k,2))=true;
G(y(k,2),y(k,1))=true;
end
B=cellfun(#(x1) find(x1),num2cell(G,2),'un',0);
to extract links information like this:
1 394
2 2378
3 282
4 282
5 536
6 [1464,1534,1585]
7 2087
8 [394,399,1331]
9 1187
I need a third column contains the weight
e.i. {6,[1464,1534,1585],[0.36;0.27;0.27]}
I tried to use the above condition but I did not get the right values. Does anyone have idea how to do that ??
this is a possible soultion using accumarray:
a=[...
6 1464 0.36
6 1534 0.27
6 1585 0.27
8 1331 0.332
11 445 0.39
13 844 0.286
14 12 0.126
18 952 0.31
19 2376 0.32
20 394 0.22
20 399 0.22
20 589 0.22
21 321 0.22
21 1187 0.22
21 2509 0.22
22 1187 0.22
23 2235 0.22
24 2376 0.22
25 541 0.14
26 229 0.22
26 321 0.22
26 1187 0.22
26 2054 0.22
27 394 0.53
27 541 0.31
28 394 0.22
28 781 0.22];
% concatenate a with its copy, columns 1 and 2 swapped regarding symmetric relations
a = [a ; [fliplr(a(: , 1:2)) , a(: , 3) ]];
%create proper increasing indices for use in accumarray
[S SI] = sort(a(:,1));
S2=[0; (cumsum(diff(S)>0))];
idx = a(:,1);
idx(SI) = S2+1;
%gather elemets for each category
c1=accumarray([idx],a(:,1),[],#(x) {x(1)});
c2=accumarray([idx],a(:,2),[],#(x) {x});
c3=accumarray([idx],a(:,3),[],#(x) {x});
%concatenate columns
out=([c1 c2 c3]);
% your example
out(1,:)

Creation of a loop loading values from .txt files

i have a problem creating a loop which loads each value from ".txt" files and uses it in some calculations.
All the values are on the 2nd column and the first one is always on the 9th line of each file.
Each ".txt" file contains a different number of values on its 2nd column (they all have the same text after the final value), so i want a loop that can read those values and stop whenever it finds that text)
Here is an example of these files ( the values that interest me are the ones under the headline of G (33,55,93...............,18) )
Latitude: 34°40'30" North,
Longitude: 3°16'6" East
Results for: April
Inclination of plane: 32 deg.
Orientation (azimuth) of plane: 0 deg.
Time G Gd Gc DNI DNIc A Ad Ac
05:52 33 33 25 0 0 233 64 311
06:07 55 44 47 246 361 356 105 473
06:22 93 59 92 312 459 444 124 590
06:37 136 73 147 366 538 514 138 684
06:52 183 86 207 410 602 572 150 760
07:07 232 98 271 447 656 620 160 823
07:22 283 110 337 478 701 659 168 874
16:37 283 110 337 478 701 659 168 874
16:52 232 98 271 447 656 620 160 823
17:07 183 86 207 410 602 572 150 760
17:22 136 73 147 366 538 514 138 684
17:37 93 59 92 312 459 444 124 590
17:52 55 44 47 246 361 356 105 473
18:07 33 33 25 0 0 233 64 311
18:22 18 18 14 0 0 9 8 7
G: Global irradiance on a fixed plane (W/m2)
Gd: Diffuse irradiance on a fixed plane (W/m2)
Gc: Global clear-sky irradiance on a fixed plane (W/m2)
DNI: Direct normal irradiance (W/m2)
DNIc: Clear-sky direct normal irradiance (W/m2)
A: Global irradiance on 2-axis tracking plane (W/m2)
Ad: Diffuse irradiance on 2-axis tracking plane (W/m2)
Ac: Global clear-sky irradiance on 2-axis tracking plane (W/m2)
PVGIS (c) European Communities, 2001-2012

Saving text matrix in a directory: MATLAB

I have a matrix, say A =
11084 2009 572 277 1095 685 636 365 545 697 518 490 747 1648;
11084 2010 1000 533 340 212 635 254 399 759 110 248 490 214;
11084 2011 587 410 481 146 99 499 547 118 706 20 174 526;
12813 2009 216 486 1443 207 730 369 518 625 816 767 382 1352;
12813 2010 673 544 517 204 704 504 219 1033 633 168 473 272;
12813 2011 348 238 458 107 90 394 1014 196 1109 34 365 250;
The column 1 indicates Station ID, I want to save the output in a separate directory in the name of station ID; such as in this case a text file will be created named 11084.txt which will contain foll. data:
2009 572;2009 277;2009 1095;2009 685;2009 636;2009 365;2009 545;2009 697;2009 518;2009 490;2009 747;2009 1648;2010 1000;2010 533;2010 340;2010 212;2010 635;2010 254;2010 399;2010 759;2010 110;2010 248;2010 490;2010 214;2011 587;2011 410;2011 481;2011 146;2011 99;2011 499;2011 547;2011 118;2011 706;2011 20;2011 174;2011 526;
similarly, next 12813.txt which will contain
2009 216;2009 486;2009 1443;2009 207;2009 730;2009 369;2009 18;2009 625;2009 816;2009 767;2009 382;2009 1352;2010 673;2010 44;2010 517;2010 204;2010 704;2010 504;2010 219;2010 1033;2010 633;2010 168;2010 473;2010 272;2011 348;2011 238;2011 458;2011 107;2011 90;2011 394;2011 1014;2011 196;2011 1109;2011 34;2011 365;
2011 250;
Please let me know how to do so. Thanks,
A straight forward solution is just:
d = unique(A(:,1));
for i = 1:length(d)
fid = fopen([num2str(d(i)) '.txt'],'w');
aux = find(A(:,1)==d(i))';
for j = aux
for k = 3:size(A,2)
fprintf(fid,'%d %d;', A(j,2), A(j,k));
end
end
fclose(fid);
end

least squares with seasonal component in matlab

I was reading a paper which looked at investigating trends in monthly wind speed data for the past 20 years or so. The paper uses a number of different statistical approaches, which I am trying to replicate here.
The first method used is a simple linear regression model of the form
$$ y(t) = a_{1}t + b_{1} $$
where $a_{1}$ and $b_{1}$ can be determined by standard least squares.
Then they specify that some of the potential error in the linear regression model can be removed explicitly by accounting for the seasonal signal by fitting a model of the form:
$$ y(t) = a_{2}t + b_{2}\sin\left(\frac{2\pi}{12t} + c_{2}\right) + d_{2}$$
where coefficients $a_{2}$, $b_{2}$, $c_{2}$, and $d_{2}$ can be determined by least squares. They then go on to specify that this model was also tested with additional harmonic components of 3, 4, and 6 months.
Using the following data as an example:
% 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
y = [112 115 145 171 196 204 242 284 315 340 360 417 % Jan
118 126 150 180 196 188 233 277 301 318 342 391 % Feb
132 141 178 193 236 235 267 317 356 362 406 419 % Mar
129 135 163 181 235 227 269 313 348 348 396 461 % Apr
121 125 172 183 229 234 270 318 355 363 420 472 % May
135 149 178 218 243 264 315 374 422 435 472 535 % Jun
148 170 199 230 264 302 364 413 465 491 548 622 % Jul
148 170 199 242 272 293 347 405 467 505 559 606 % Aug
136 158 184 209 237 259 312 355 404 404 463 508 % Sep
119 133 162 191 211 229 274 306 347 359 407 461 % Oct
104 114 146 172 180 203 237 271 305 310 362 390 % Nov
118 140 166 194 201 229 278 306 336 337 405 432 ]; % Dec
time = datestr(datenum(yr(:),mo(:),1));
jday = datenum(time,'dd-mmm-yyyy');
y2 = reshape(y,[],1);
plot(jday,y2)
Can anyone demonstrate how the model above can be written in matlab?
Notice that your model is actually linear, we can use a trigonometric identity to show that. To use a nonlinear model use nlinfit.
Using your data I wrote the following script to compute and compare the different methods:
(you can comment out the opts.RobustWgtFun = 'bisquare'; line to see that it's exactly like the linear fit with the 12 periodicity)
% y = [112 115 ...
y2 = reshape(y,[],1);
t=(1:144).';
% trend
T = [ones(size(t)) t];
B=T\y2;
y_trend = T*B;
% least squeare, using linear fit and the 12 periodicity only
T = [ones(size(t)) t sin(2*pi*t/12) cos(2*pi*t/12)];
B=T\y2;
y_sincos = T*B;
% least squeare, using linear fit and 3,4,6,12 periodicities
addharmonics = [3 4 6];
T = [T bsxfun(#(h,t)sin(2*pi*t/h),addharmonics,t) bsxfun(#(h,t)cos(2*pi*t/h),addharmonics,t)];
B=T\y2;
y_sincos2 = T*B;
% least squeare with bisquare weights,
% using non-linear model of a linear fit and the 12 periodicity only
opts = statset('nlinfit');
opts.RobustWgtFun = 'bisquare';
b0 = [1;1;0;1];
modelfun = #(b,x) b(1)*x+b(2)*sin((b(3)+x)*2*pi/12)+b(4);
b = nlinfit(t,y2,modelfun,b0,opts);
% plot a comparison
figure
plot(t,y2,t,y_trend,t,modelfun(b,t),t,y_sincos,t,y_sincos2)
legend('Original','Trend','bisquare weight - 12 periodicity only', ...
'least square - 12 periodicity only','least square - 3,4,6,12 periodicities', ...
'Location','NorthWest');
xlim(minmax(t'));