I'm trying to use multidimensional scaling in Matlab. The goal is to convert a similarity matrix to scatter plot (in order to use k-means).
I've got the following test set:
London Stockholm Lisboa Madrid Paris Amsterdam Berlin Prague Rome Dublin
0 569 667 530 141 140 357 396 570 190
569 0 1212 1043 617 446 325 423 787 648
667 1212 0 201 596 768 923 882 714 714
530 1043 201 0 431 608 740 690 516 622
141 617 596 431 0 177 340 337 436 320
140 446 768 608 177 0 218 272 519 302
357 325 923 740 340 218 0 114 472 514
396 423 882 690 337 272 114 0 364 573
569 787 714 516 436 519 472 364 0 755
190 648 714 622 320 302 514 573 755 0
I got this dataset from the book Modern Multidimensional Scaling (Borg & Groenen, 2005). Tested it in SPSS using the PROXSCAL MDS method and I get the same result as stated in the book.
But I need to use MDS in Matlab in order to speed up the process. The tutorial on the site: http://www.mathworks.nl/help/stats/multidimensional-scaling.html#briu08r-4 looks the same as what I'm using above. When I change the data set as what is displayed above and run the code I get the following error: "Not a valid dissimilarity or distance matrix.".
I'm not sure what I'm doing wrong, and if classical MDS is the right choice. I also miss the possibility to say that I want the result in three dimensions (this will be needed in a later stage).
Your matrix is not symetric, check the indices (9,1) and (1,9). To quickly find asymetric indices use [x,y]=find(~(D'==D))
Related
I am writing my first report in RMarkdown and struggling with specific figure alignments.
I have some data that I am manipulating into a format friendly for the package pheatmap such that it produces heatmap HTML output. The code that produces one of these looks like:
cleaned_mayo<- cleaned_mayo[which(cleaned_mayo$Source=="MayoBrainBank_Dickson"),]
# Segregate data
ad<- cleaned_mayo[which(cleaned_mayo$Diagnosis== "AD"),-c(1:13)]
control<- cleaned_mayo[which(cleaned_mayo$Diagnosis== "Control"),-c(1:13)]
# Average data across patients and assign diagnoses
ad<- as.data.frame(t(apply(ad,2, mean)))
control<- as.data.frame(t(apply(control,2, mean)))
ad$Diagnosis<- "AD"
control$Diagnosis<- "Control"
# Combine
avg_heat<- rbind(ad, control)
# Rearrange columns
avg_heat<- avg_heat[,c(32, 1:31)]
# Mean shift all expression values
avg_heat[,2:32]<- apply(avg_heat[,2:32], 2, function(x){x-mean(x)})
#################################
# CREATE HEAT MAP
#################################
# Plot average heat map
pheatmap(t(avg_heat[,2:32]), cluster_col= F, labels_col= c("AD", "Control"),gaps_col = c(1), labels_row = colnames(avg_heat)[2:32],
main= "Mayo Differential Expression for Genes of Interest: Averaged Across \n Patients within a Diagnosis",
show_colnames = T)
Where the numeric columns of cleaned_mayo look like:
C1QA C1QC C1QB LAPTM5 CTSS FCER1G PLEK CSF1R CD74 LY86 AIF1 FGD2 TREM2 PTK2B LYN UNC93B1 CTSC NCKAP1L TMEM119 ALOX5AP LCP1
1924_TCX 1101 1392 1687 1380 380 279 198 1889 6286 127 252 771 338 5795 409 494 337 352 476 170 441
1926_TCX 881 770 950 1064 239 130 132 1241 3188 76 137 434 212 5634 327 419 292 217 464 124 373
1935_TCX 3636 4106 5196 5206 1226 583 476 5588 27650 384 1139 1086 756 14219 1269 869 868 1378 1270 428 1216
1925_TCX 3050 4392 5357 3585 788 472 350 4662 11811 340 865 1051 468 13446 638 420 1047 850 756 616 1008
1963_TCX 3169 2874 4182 2737 828 551 208 2560 10103 204 719 585 499 9158 546 335 598 593 606 418 707
7098_TCX 1354 1803 2369 2134 634 354 245 1829 8322 227 593 371 411 10637 504 294 750 458 367 490 779
ITGAM LPCAT2 LGALS9 GRN MAN2B1 TYROBP CD37 LAIR1 CTSZ CYTH4
1924_TCX 376 649 699 1605 618 392 328 628 1774 484
1926_TCX 225 381 473 1444 597 242 290 321 1110 303
1935_TCX 737 1887 998 2563 856 949 713 1060 2670 569
1925_TCX 634 1323 575 1661 594 562 421 1197 1796 595
1963_TCX 508 696 429 1030 355 556 365 585 1591 360
7098_TCX 418 1011 318 1574 354 353 179 471 1471 321
All of this code is wrapped around the following header in the RMarkdown environment: {r heatmaps, echo=FALSE, results="asis", message=FALSE}.
What I would like to achieve is the two heatmaps side-by-side with black boxes around each individual heat map (i.e. containing the title and legend of the heatmap as well).
If anyone could tell me how to do this, or either one individually it would be greatly appreciated.
Thanks!
I am conducting Spearman's Correlation with two data sets with 300 objects. These are my variables and commands:
a = [1:300]
b = [1 2 5 11 9 7 24 10 31 23 3 40 6 17 14 20 16 12 33 46 70 37 87 43 98 26 59 58 77 100 35 42 78 80 243 36 33327 4 83 160 163 198 86 94 406 111 28 29 55 113 239 295 110 196 177 32679 229 342 305 300 254 96 210 514 167 172 232 190 117 32081 25 158 19333 241 82 149 159 66 178 24487 68 30 1016 725 266 391 638 348 320 681 242 319 228 381 408 442 202 369 471 821 191 426 8 270 211 2266 619 576 441 680 3431 1167 723 74 318 556 640 395 1059 579 614 212 325 437 323 687 373 599 26637 985 54 84 802 724 154 417 240 1120 818 2309 462 109 104 509 494 427 57 2475 549 396 419 123 580 79 225 1132 351 76 16859 596 862 315 470 992 257 120 409 751 832 285 1534 714 1665 1376 2129 678 416 721 209 31971 183 356 1346 1015 1003 188 1076 1634 608 1056 338 308 145 418 625 1313 121 2484 996 783 329 1185 697 157 1100 175 622 235 456 277 166 2700 1439 461 653 433 540 1191 234 774 1894 1004 741 1062 948 48 99 405 797 237 1104 2286 22620 1429 30672 1808 169 458 22 1115 10660 872 474 1063 88 1727 1017 1107 1398 1519 703 1092 1027 272 263 1152 1770 1099 507 385 2118 19356 1778 2458 410 2110 7522 17166 4065 15136 13294 10876 17174 2434 9898 5663 13594 10506 11552 15635 9322 3223 8949 12388 13216 13851 13852 6696 12177 4700 17199 2067 11110 15486 5664 6593 4701 527 8616 268]
[RHO,PVAL] = corr(b',a','Type', 'Spearman')
RHO =
0.6954
PVAL =
0
Out of the 5 comparisons I made with other data sets of 300 objects, only 1 returned significant P-values. Is there an explanation for this?
I tried a different data set and got a value that was not significant (PVAL > 0.05). I also displayed the answer in a long (15 digits) and exponential form and got 0.00000000000000e+000 using:
format longEng
I also checked with another statistics program that reported the p-value as < 0.0001. This means that the p-value is just really, really small.
i have a problem creating a loop which loads each value from ".txt" files and uses it in some calculations.
All the values are on the 2nd column and the first one is always on the 9th line of each file.
Each ".txt" file contains a different number of values on its 2nd column (they all have the same text after the final value), so i want a loop that can read those values and stop whenever it finds that text)
Here is an example of these files ( the values that interest me are the ones under the headline of G (33,55,93...............,18) )
Latitude: 34°40'30" North,
Longitude: 3°16'6" East
Results for: April
Inclination of plane: 32 deg.
Orientation (azimuth) of plane: 0 deg.
Time G Gd Gc DNI DNIc A Ad Ac
05:52 33 33 25 0 0 233 64 311
06:07 55 44 47 246 361 356 105 473
06:22 93 59 92 312 459 444 124 590
06:37 136 73 147 366 538 514 138 684
06:52 183 86 207 410 602 572 150 760
07:07 232 98 271 447 656 620 160 823
07:22 283 110 337 478 701 659 168 874
16:37 283 110 337 478 701 659 168 874
16:52 232 98 271 447 656 620 160 823
17:07 183 86 207 410 602 572 150 760
17:22 136 73 147 366 538 514 138 684
17:37 93 59 92 312 459 444 124 590
17:52 55 44 47 246 361 356 105 473
18:07 33 33 25 0 0 233 64 311
18:22 18 18 14 0 0 9 8 7
G: Global irradiance on a fixed plane (W/m2)
Gd: Diffuse irradiance on a fixed plane (W/m2)
Gc: Global clear-sky irradiance on a fixed plane (W/m2)
DNI: Direct normal irradiance (W/m2)
DNIc: Clear-sky direct normal irradiance (W/m2)
A: Global irradiance on 2-axis tracking plane (W/m2)
Ad: Diffuse irradiance on 2-axis tracking plane (W/m2)
Ac: Global clear-sky irradiance on 2-axis tracking plane (W/m2)
PVGIS (c) European Communities, 2001-2012
I am trying to plot a qqplot graph for different data samples. I am able to draw it for individual data samples. How can I draw a qqplot graph for multiple data samples? Also, I want to connect all the points with a line and color for each data set to differentiate them. How can I achieve this in MATLAB?
I am getting the output as below:
I am trying to get the output in the below format(qqlplot for 4 samples).
I am loading the data from csv file into matlab.
Next, drawing the graph using the function qqplot(1mb);
Data Set1:(variable size: 1mb)
379
398
474
541
656
673
684
712
749
751
770
782
788
829
837
864
886
919
935
946
991
993
995
1000
DataSet2: (variable size: 512kb)
313
406
443
534
558
561
613
645
649
699
705
732
737
748
752
755
766
774
780
795
796
802
806
823
842
846
872
873
889
904
915
936
966
983
993
Does anyone know how I can reshape these two columns:
1 1
1 1
1 1
379 346
352 363
330 371
309 379
291 391
271 402
268 403
1 1
1 1
406 318
379 334
351 351
329 359
307 367
287 378
267 390
264 391
into these four columns:
1 1 1 1
1 1 1 1
1 1 406 318
379 346 379 304
352 363 351 351
330 371 329 359
309 379 307 367
291 391 287 378
271 402 267 390
268 403 264 391
That is, how to reshape a matrix that is the size of Nx2 into a size 10xM in matlab?
One solution using mat2cell, splitting every 10 rows. Probably easier to understand, because no 3d-matrices are used:
cell2mat(mat2cell(x,repmat(10,size(x,1)/10,1),size(x,2))')
Second solution using reshape and permute, should be faster but I did not try it.:
reshape(permute(reshape(x,10,[],size(x,2)),[1,3,2]),10,[])