Saving text matrix in a directory: MATLAB - matlab

I have a matrix, say A =
11084 2009 572 277 1095 685 636 365 545 697 518 490 747 1648;
11084 2010 1000 533 340 212 635 254 399 759 110 248 490 214;
11084 2011 587 410 481 146 99 499 547 118 706 20 174 526;
12813 2009 216 486 1443 207 730 369 518 625 816 767 382 1352;
12813 2010 673 544 517 204 704 504 219 1033 633 168 473 272;
12813 2011 348 238 458 107 90 394 1014 196 1109 34 365 250;
The column 1 indicates Station ID, I want to save the output in a separate directory in the name of station ID; such as in this case a text file will be created named 11084.txt which will contain foll. data:
2009 572;2009 277;2009 1095;2009 685;2009 636;2009 365;2009 545;2009 697;2009 518;2009 490;2009 747;2009 1648;2010 1000;2010 533;2010 340;2010 212;2010 635;2010 254;2010 399;2010 759;2010 110;2010 248;2010 490;2010 214;2011 587;2011 410;2011 481;2011 146;2011 99;2011 499;2011 547;2011 118;2011 706;2011 20;2011 174;2011 526;
similarly, next 12813.txt which will contain
2009 216;2009 486;2009 1443;2009 207;2009 730;2009 369;2009 18;2009 625;2009 816;2009 767;2009 382;2009 1352;2010 673;2010 44;2010 517;2010 204;2010 704;2010 504;2010 219;2010 1033;2010 633;2010 168;2010 473;2010 272;2011 348;2011 238;2011 458;2011 107;2011 90;2011 394;2011 1014;2011 196;2011 1109;2011 34;2011 365;
2011 250;
Please let me know how to do so. Thanks,

A straight forward solution is just:
d = unique(A(:,1));
for i = 1:length(d)
fid = fopen([num2str(d(i)) '.txt'],'w');
aux = find(A(:,1)==d(i))';
for j = aux
for k = 3:size(A,2)
fprintf(fid,'%d %d;', A(j,2), A(j,k));
end
end
fclose(fid);
end

Related

RMarkdown: Creating two side-by-side heatmaps with full figure borders using the pheatmap package

I am writing my first report in RMarkdown and struggling with specific figure alignments.
I have some data that I am manipulating into a format friendly for the package pheatmap such that it produces heatmap HTML output. The code that produces one of these looks like:
cleaned_mayo<- cleaned_mayo[which(cleaned_mayo$Source=="MayoBrainBank_Dickson"),]
# Segregate data
ad<- cleaned_mayo[which(cleaned_mayo$Diagnosis== "AD"),-c(1:13)]
control<- cleaned_mayo[which(cleaned_mayo$Diagnosis== "Control"),-c(1:13)]
# Average data across patients and assign diagnoses
ad<- as.data.frame(t(apply(ad,2, mean)))
control<- as.data.frame(t(apply(control,2, mean)))
ad$Diagnosis<- "AD"
control$Diagnosis<- "Control"
# Combine
avg_heat<- rbind(ad, control)
# Rearrange columns
avg_heat<- avg_heat[,c(32, 1:31)]
# Mean shift all expression values
avg_heat[,2:32]<- apply(avg_heat[,2:32], 2, function(x){x-mean(x)})
#################################
# CREATE HEAT MAP
#################################
# Plot average heat map
pheatmap(t(avg_heat[,2:32]), cluster_col= F, labels_col= c("AD", "Control"),gaps_col = c(1), labels_row = colnames(avg_heat)[2:32],
main= "Mayo Differential Expression for Genes of Interest: Averaged Across \n Patients within a Diagnosis",
show_colnames = T)
Where the numeric columns of cleaned_mayo look like:
C1QA C1QC C1QB LAPTM5 CTSS FCER1G PLEK CSF1R CD74 LY86 AIF1 FGD2 TREM2 PTK2B LYN UNC93B1 CTSC NCKAP1L TMEM119 ALOX5AP LCP1
1924_TCX 1101 1392 1687 1380 380 279 198 1889 6286 127 252 771 338 5795 409 494 337 352 476 170 441
1926_TCX 881 770 950 1064 239 130 132 1241 3188 76 137 434 212 5634 327 419 292 217 464 124 373
1935_TCX 3636 4106 5196 5206 1226 583 476 5588 27650 384 1139 1086 756 14219 1269 869 868 1378 1270 428 1216
1925_TCX 3050 4392 5357 3585 788 472 350 4662 11811 340 865 1051 468 13446 638 420 1047 850 756 616 1008
1963_TCX 3169 2874 4182 2737 828 551 208 2560 10103 204 719 585 499 9158 546 335 598 593 606 418 707
7098_TCX 1354 1803 2369 2134 634 354 245 1829 8322 227 593 371 411 10637 504 294 750 458 367 490 779
ITGAM LPCAT2 LGALS9 GRN MAN2B1 TYROBP CD37 LAIR1 CTSZ CYTH4
1924_TCX 376 649 699 1605 618 392 328 628 1774 484
1926_TCX 225 381 473 1444 597 242 290 321 1110 303
1935_TCX 737 1887 998 2563 856 949 713 1060 2670 569
1925_TCX 634 1323 575 1661 594 562 421 1197 1796 595
1963_TCX 508 696 429 1030 355 556 365 585 1591 360
7098_TCX 418 1011 318 1574 354 353 179 471 1471 321
All of this code is wrapped around the following header in the RMarkdown environment: {r heatmaps, echo=FALSE, results="asis", message=FALSE}.
What I would like to achieve is the two heatmaps side-by-side with black boxes around each individual heat map (i.e. containing the title and legend of the heatmap as well).
If anyone could tell me how to do this, or either one individually it would be greatly appreciated.
Thanks!

Matlab Spearman Correlation PVAL = 0?

I am conducting Spearman's Correlation with two data sets with 300 objects. These are my variables and commands:
a = [1:300]
b = [1 2 5 11 9 7 24 10 31 23 3 40 6 17 14 20 16 12 33 46 70 37 87 43 98 26 59 58 77 100 35 42 78 80 243 36 33327 4 83 160 163 198 86 94 406 111 28 29 55 113 239 295 110 196 177 32679 229 342 305 300 254 96 210 514 167 172 232 190 117 32081 25 158 19333 241 82 149 159 66 178 24487 68 30 1016 725 266 391 638 348 320 681 242 319 228 381 408 442 202 369 471 821 191 426 8 270 211 2266 619 576 441 680 3431 1167 723 74 318 556 640 395 1059 579 614 212 325 437 323 687 373 599 26637 985 54 84 802 724 154 417 240 1120 818 2309 462 109 104 509 494 427 57 2475 549 396 419 123 580 79 225 1132 351 76 16859 596 862 315 470 992 257 120 409 751 832 285 1534 714 1665 1376 2129 678 416 721 209 31971 183 356 1346 1015 1003 188 1076 1634 608 1056 338 308 145 418 625 1313 121 2484 996 783 329 1185 697 157 1100 175 622 235 456 277 166 2700 1439 461 653 433 540 1191 234 774 1894 1004 741 1062 948 48 99 405 797 237 1104 2286 22620 1429 30672 1808 169 458 22 1115 10660 872 474 1063 88 1727 1017 1107 1398 1519 703 1092 1027 272 263 1152 1770 1099 507 385 2118 19356 1778 2458 410 2110 7522 17166 4065 15136 13294 10876 17174 2434 9898 5663 13594 10506 11552 15635 9322 3223 8949 12388 13216 13851 13852 6696 12177 4700 17199 2067 11110 15486 5664 6593 4701 527 8616 268]
[RHO,PVAL] = corr(b',a','Type', 'Spearman')
RHO =
0.6954
PVAL =
0
Out of the 5 comparisons I made with other data sets of 300 objects, only 1 returned significant P-values. Is there an explanation for this?
I tried a different data set and got a value that was not significant (PVAL > 0.05). I also displayed the answer in a long (15 digits) and exponential form and got 0.00000000000000e+000 using:
format longEng
I also checked with another statistics program that reported the p-value as < 0.0001. This means that the p-value is just really, really small.

Creation of a loop loading values from .txt files

i have a problem creating a loop which loads each value from ".txt" files and uses it in some calculations.
All the values are on the 2nd column and the first one is always on the 9th line of each file.
Each ".txt" file contains a different number of values on its 2nd column (they all have the same text after the final value), so i want a loop that can read those values and stop whenever it finds that text)
Here is an example of these files ( the values that interest me are the ones under the headline of G (33,55,93...............,18) )
Latitude: 34°40'30" North,
Longitude: 3°16'6" East
Results for: April
Inclination of plane: 32 deg.
Orientation (azimuth) of plane: 0 deg.
Time G Gd Gc DNI DNIc A Ad Ac
05:52 33 33 25 0 0 233 64 311
06:07 55 44 47 246 361 356 105 473
06:22 93 59 92 312 459 444 124 590
06:37 136 73 147 366 538 514 138 684
06:52 183 86 207 410 602 572 150 760
07:07 232 98 271 447 656 620 160 823
07:22 283 110 337 478 701 659 168 874
16:37 283 110 337 478 701 659 168 874
16:52 232 98 271 447 656 620 160 823
17:07 183 86 207 410 602 572 150 760
17:22 136 73 147 366 538 514 138 684
17:37 93 59 92 312 459 444 124 590
17:52 55 44 47 246 361 356 105 473
18:07 33 33 25 0 0 233 64 311
18:22 18 18 14 0 0 9 8 7
G: Global irradiance on a fixed plane (W/m2)
Gd: Diffuse irradiance on a fixed plane (W/m2)
Gc: Global clear-sky irradiance on a fixed plane (W/m2)
DNI: Direct normal irradiance (W/m2)
DNIc: Clear-sky direct normal irradiance (W/m2)
A: Global irradiance on 2-axis tracking plane (W/m2)
Ad: Diffuse irradiance on 2-axis tracking plane (W/m2)
Ac: Global clear-sky irradiance on 2-axis tracking plane (W/m2)
PVGIS (c) European Communities, 2001-2012

Saving (in a matrix) the elapsed time and number of iterations for a large number of cases

I have a program that outputs the number of iterations and a test value, given inputs A1,A2,A3,A4.
I want to run through 5 values each of A1, A2, A3, A4, thus making 625 runs. In the process, I want to save the time elapsed for each run, the number of iterations, and test value in 3 separate matrices.
I have tried using 4 nested for loops, and made progress, but need some help on indexing the elements of the matrices. The iterator variables in the for loops don't match the indexing variables...
The code for the 4 nested loops is below:
m = logspace(-4,4,5);
n = logspace(0,8,5);
eltime = zeros(5,length(m)*length(m)*length(m));
for A1 = m
for A2 = m
for A3 = m
for A4 = n
tic
SmallMAX(A1,A2,A3,A4)
toc;
for i=1:numel(eltime)
for j = 1:length(n)
eltime(j,i) = toc;
end
end
end
end
end
end
The code for the main program is excerpted below:
function [k,test] = SmallMAX(A1,A2,A3,A4)
...
end
Thanks for any help.
In your case, the easiest way is to use A1, A2, A3 and A4 as counters instead of the actual values. This way you them to index the entries of eltime. We can then easily calculate the index in the second dimension with sub2ind and use A4 to index the first dimension of eltime. We need to adjust the arguments in SmallMAX as well.
Here is the code of the proposed method:
m = logspace(-4,4,5);
n = logspace(0,8,5);
eltime = zeros(length(n),length(m)*length(m)*length(m));
res_k = zeros(length(n),length(m)*length(m)*length(m)); % or zeros(size(eltime));
res_test = zeros(length(n),length(m)*length(m)*length(m)); % or zeros(size(eltime));
for A1 = 1:length(m)
for A2 = 1:length(m)
for A3 = 1:length(m)
for A4 = 1:length(n)
ind = sub2ind([length(m),length(m),length(m)],A3,A2,A1);
tic
[k,test] = SmallMAX(m(A1),m(A2),m(A3),n(A4));
eltime(A4,ind) = toc;
res_k(A4,ind) = k;
res_test(A4,ind) = test;
end
end
end
end
This is the order of the addressed entries of eltime:
eltime_order =
Columns 1 through 18
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
2 7 12 17 22 27 32 37 42 47 52 57 62 67 72 77 82 87
3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88
4 9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Columns 19 through 36
91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176
92 97 102 107 112 117 122 127 132 137 142 147 152 157 162 167 172 177
93 98 103 108 113 118 123 128 133 138 143 148 153 158 163 168 173 178
94 99 104 109 114 119 124 129 134 139 144 149 154 159 164 169 174 179
95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180
Columns 37 through 54
181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266
182 187 192 197 202 207 212 217 222 227 232 237 242 247 252 257 262 267
183 188 193 198 203 208 213 218 223 228 233 238 243 248 253 258 263 268
184 189 194 199 204 209 214 219 224 229 234 239 244 249 254 259 264 269
185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 260 265 270
Columns 55 through 72
271 276 281 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356
272 277 282 287 292 297 302 307 312 317 322 327 332 337 342 347 352 357
273 278 283 288 293 298 303 308 313 318 323 328 333 338 343 348 353 358
274 279 284 289 294 299 304 309 314 319 324 329 334 339 344 349 354 359
275 280 285 290 295 300 305 310 315 320 325 330 335 340 345 350 355 360
Columns 73 through 90
361 366 371 376 381 386 391 396 401 406 411 416 421 426 431 436 441 446
362 367 372 377 382 387 392 397 402 407 412 417 422 427 432 437 442 447
363 368 373 378 383 388 393 398 403 408 413 418 423 428 433 438 443 448
364 369 374 379 384 389 394 399 404 409 414 419 424 429 434 439 444 449
365 370 375 380 385 390 395 400 405 410 415 420 425 430 435 440 445 450
Columns 91 through 108
451 456 461 466 471 476 481 486 491 496 501 506 511 516 521 526 531 536
452 457 462 467 472 477 482 487 492 497 502 507 512 517 522 527 532 537
453 458 463 468 473 478 483 488 493 498 503 508 513 518 523 528 533 538
454 459 464 469 474 479 484 489 494 499 504 509 514 519 524 529 534 539
455 460 465 470 475 480 485 490 495 500 505 510 515 520 525 530 535 540
Columns 109 through 125
541 546 551 556 561 566 571 576 581 586 591 596 601 606 611 616 621
542 547 552 557 562 567 572 577 582 587 592 597 602 607 612 617 622
543 548 553 558 563 568 573 578 583 588 593 598 603 608 613 618 623
544 549 554 559 564 569 574 579 584 589 594 599 604 609 614 619 624
545 550 555 560 565 570 575 580 585 590 595 600 605 610 615 620 625

least squares with seasonal component in matlab

I was reading a paper which looked at investigating trends in monthly wind speed data for the past 20 years or so. The paper uses a number of different statistical approaches, which I am trying to replicate here.
The first method used is a simple linear regression model of the form
$$ y(t) = a_{1}t + b_{1} $$
where $a_{1}$ and $b_{1}$ can be determined by standard least squares.
Then they specify that some of the potential error in the linear regression model can be removed explicitly by accounting for the seasonal signal by fitting a model of the form:
$$ y(t) = a_{2}t + b_{2}\sin\left(\frac{2\pi}{12t} + c_{2}\right) + d_{2}$$
where coefficients $a_{2}$, $b_{2}$, $c_{2}$, and $d_{2}$ can be determined by least squares. They then go on to specify that this model was also tested with additional harmonic components of 3, 4, and 6 months.
Using the following data as an example:
% 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
y = [112 115 145 171 196 204 242 284 315 340 360 417 % Jan
118 126 150 180 196 188 233 277 301 318 342 391 % Feb
132 141 178 193 236 235 267 317 356 362 406 419 % Mar
129 135 163 181 235 227 269 313 348 348 396 461 % Apr
121 125 172 183 229 234 270 318 355 363 420 472 % May
135 149 178 218 243 264 315 374 422 435 472 535 % Jun
148 170 199 230 264 302 364 413 465 491 548 622 % Jul
148 170 199 242 272 293 347 405 467 505 559 606 % Aug
136 158 184 209 237 259 312 355 404 404 463 508 % Sep
119 133 162 191 211 229 274 306 347 359 407 461 % Oct
104 114 146 172 180 203 237 271 305 310 362 390 % Nov
118 140 166 194 201 229 278 306 336 337 405 432 ]; % Dec
time = datestr(datenum(yr(:),mo(:),1));
jday = datenum(time,'dd-mmm-yyyy');
y2 = reshape(y,[],1);
plot(jday,y2)
Can anyone demonstrate how the model above can be written in matlab?
Notice that your model is actually linear, we can use a trigonometric identity to show that. To use a nonlinear model use nlinfit.
Using your data I wrote the following script to compute and compare the different methods:
(you can comment out the opts.RobustWgtFun = 'bisquare'; line to see that it's exactly like the linear fit with the 12 periodicity)
% y = [112 115 ...
y2 = reshape(y,[],1);
t=(1:144).';
% trend
T = [ones(size(t)) t];
B=T\y2;
y_trend = T*B;
% least squeare, using linear fit and the 12 periodicity only
T = [ones(size(t)) t sin(2*pi*t/12) cos(2*pi*t/12)];
B=T\y2;
y_sincos = T*B;
% least squeare, using linear fit and 3,4,6,12 periodicities
addharmonics = [3 4 6];
T = [T bsxfun(#(h,t)sin(2*pi*t/h),addharmonics,t) bsxfun(#(h,t)cos(2*pi*t/h),addharmonics,t)];
B=T\y2;
y_sincos2 = T*B;
% least squeare with bisquare weights,
% using non-linear model of a linear fit and the 12 periodicity only
opts = statset('nlinfit');
opts.RobustWgtFun = 'bisquare';
b0 = [1;1;0;1];
modelfun = #(b,x) b(1)*x+b(2)*sin((b(3)+x)*2*pi/12)+b(4);
b = nlinfit(t,y2,modelfun,b0,opts);
% plot a comparison
figure
plot(t,y2,t,y_trend,t,modelfun(b,t),t,y_sincos,t,y_sincos2)
legend('Original','Trend','bisquare weight - 12 periodicity only', ...
'least square - 12 periodicity only','least square - 3,4,6,12 periodicities', ...
'Location','NorthWest');
xlim(minmax(t'));