extract the information from a matrix with three columns - matlab

I have a matrix with three columns
https://www.dropbox.com/s/jckdmg1p05v8lv7/y.mat?dl=0
i.e.
E1 E2 W
6 1464 0.36
6 1534 0.27
6 1585 0.27
8 1331 0.332
11 445 0.39
13 844 0.286
14 12 0.126
18 952 0.31
19 2376 0.32
20 394 0.22
20 399 0.22
20 589 0.22
21 321 0.22
21 1187 0.22
21 2509 0.22
22 1187 0.22
23 2235 0.22
24 2376 0.22
25 541 0.14
26 229 0.22
26 321 0.22
26 1187 0.22
26 2054 0.22
27 394 0.53
27 541 0.31
28 394 0.22
28 781 0.22
I used this condition
for k=1:size(y,1)
G(y(k,1),y(k,2))=true;
G(y(k,2),y(k,1))=true;
end
B=cellfun(#(x1) find(x1),num2cell(G,2),'un',0);
to extract links information like this:
1 394
2 2378
3 282
4 282
5 536
6 [1464,1534,1585]
7 2087
8 [394,399,1331]
9 1187
I need a third column contains the weight
e.i. {6,[1464,1534,1585],[0.36;0.27;0.27]}
I tried to use the above condition but I did not get the right values. Does anyone have idea how to do that ??

this is a possible soultion using accumarray:
a=[...
6 1464 0.36
6 1534 0.27
6 1585 0.27
8 1331 0.332
11 445 0.39
13 844 0.286
14 12 0.126
18 952 0.31
19 2376 0.32
20 394 0.22
20 399 0.22
20 589 0.22
21 321 0.22
21 1187 0.22
21 2509 0.22
22 1187 0.22
23 2235 0.22
24 2376 0.22
25 541 0.14
26 229 0.22
26 321 0.22
26 1187 0.22
26 2054 0.22
27 394 0.53
27 541 0.31
28 394 0.22
28 781 0.22];
% concatenate a with its copy, columns 1 and 2 swapped regarding symmetric relations
a = [a ; [fliplr(a(: , 1:2)) , a(: , 3) ]];
%create proper increasing indices for use in accumarray
[S SI] = sort(a(:,1));
S2=[0; (cumsum(diff(S)>0))];
idx = a(:,1);
idx(SI) = S2+1;
%gather elemets for each category
c1=accumarray([idx],a(:,1),[],#(x) {x(1)});
c2=accumarray([idx],a(:,2),[],#(x) {x});
c3=accumarray([idx],a(:,3),[],#(x) {x});
%concatenate columns
out=([c1 c2 c3]);
% your example
out(1,:)

Related

querying table inside a table in kdb

fellow q mortals!
I am stuck on a pretty unusual problem in kdbq+. Essentially I have a table that has a column of tables.
Below is the main table called full_tab
time bmm $
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------$
2020.08.12D00:06:12.049002000 +`offerid`source_id`sub1`impc`question_id`offer`total_click`rpc`revenue`rpm!(789 128 3 149 111 523 1037 852f;70995 70995 70995 70995 70995 70995 70995 70995f;31 31 31 31 31 31 31 31f;1 2 21 1 0N 0N 0N 0N;956 6$
2020.08.12D00:10:48.186445000 +`offerid`source_id`sub1`impc`question_id`offer`total_click`rpc`revenue`rpm!(789 128 3 149 111 523 1037 852f;70995 70995 70995 70995 70995 70995 70995 70995f;31 31 31 31 31 31 31 31f;3 7 55 5 0N 0N 0N 0N;956 6$
2020.08.12D00:15:50.596247000 +`offerid`source_id`sub1`impc`question_id`offer`total_click`rpc`revenue`rpm!(789 128 3 149 111 523 1037 852f;70995 70995 70995 70995 70995 70995 70995 70995f;31 31 31 31 31 31 31 31f;4 10 81 5 0N 0N 0N 0N;956 $
...
each row in bmm column is a table that looks like below
offerid source_id sub1 impc question_id offer total_click rpc revenue rpm
---------------------------------------------------------------------------------------------------------------------------
789 70995 31 1 956 "aaaa" 1 0 0 0
128 70995 31 2 698 "bbb" 2 0.4 0.8 400
3 70995 31 21 818 "ccc" 10 1.0575 10.575 503.5714
149 70995 31 1 941 "ddd" 1 0.4 0.4 400
111 70995 31 "eee" 10 1.057 10.575
523 70995 31 "fff" 1 0.4 0.4
1037 70995 31 "ggg" 1 0.4 0.4
852 70995 31 "hhh" 1 0.4 0.4
what I want is a final table that looks like below. From the full_tab I am trying to extract time column and from the corresponding bmm row extract the bmm[;`rpm] value that corresponds to a particular bmm[;`question_id], for the case below its question_id = 818
time q818
---------------------------------------------
2020.08.12D00:06:12.049002000 503.5714
2020.08.12D00:10:48.186445000 510.665
2020.08.12D00:15:50.596247000 533.445
...
I tried to pull the using the statement below
select time, q818: first each bmm[;`rpm][;(where each bmm[;`question_id]=818)] from full_tab;
but the above doesnt seem to work! :(
I think you could use something like the below:
q)getQID:{[t;qid] select time,q818:{[t;qid]exec rpm from t where question_id=qid}[;qid]'[bbm] from t}
q)getQID[full_tab;818]
time q818
-------------------------------------
2014.08.30D03:40:50.876084992 503.75
2008.06.26D08:14:03.717355744 510.665

Unrecognized Quartz MS font

I tried to generate the image with the Quartz MS font as follows.
Then use jTessBoxEditorFX to generate the box file, as follows.
0 26 23 97 125 0
1 169 26 189 122 0
2 209 23 279 124 0
3 305 23 370 124 0
4 391 25 461 121 0
5 481 23 551 124 0
6 571 23 641 124 0
7 665 27 731 124 0
8 753 24 822 124 0
9 842 24 912 125 0
The traineddata cannot be recognized normally. Is there any problem with my practice?
Whether there are similar experiences of seniors can guide me, thank you.

Gradient descent always going to infinity

I've tried everything and can't figure out why my gradient descent isn't working. I've looked at numerous examples and have changed the gradient descent code multiple times. When I run the program I get a response of NaN. I then printed every iteration and saw that before I got to NaN the value was going higher and higher (or lower and lower to negative infinity). I've tried different alpha values, starting betas values, and the number of iterations and every time it doesn't work. What's going on?
Here is my code:
A = load('A2-datasets/data-build-stories.mat');
X = [ones(60,1) A.data_build_stories(:,1)];
y = A.data_build_stories(:,2);
b = gradDes(X, y);
function beta = gradDes(X,y)
alpha = 0.01;
beta = [0;0];
m = length(y);
for i = 1:1000
beta = beta - (alpha/m) * (X' * (X * beta - y));
end
end
And here is data-build-stories.mat:
770 54
677 47
428 28
410 38
371 29
504 38
1136 80
695 52
551 45
550 40
568 49
504 33
560 50
512 40
448 31
538 40
410 27
409 31
504 35
777 57
496 31
386 26
530 39
360 25
355 23
1250 102
802 72
741 57
739 54
650 56
592 45
577 42
500 36
469 30
320 22
441 31
845 52
435 29
435 34
375 20
364 33
340 18
375 23
450 30
529 38
412 31
722 62
574 48
498 29
493 40
379 30
579 42
458 36
454 33
952 72
784 57
476 34
453 46
440 30
428 21
you are iterating through the gradient descent with a
too big alpha for your data.
try and change it:
A = load('tmp.txt');
X = [ones(60,1) A(:,1)];
y = A(:,2);
b = gradDes(X, y);
function beta = gradDes(X,y)
alpha = 0.00000001;
beta = [0;0];
m = length(y);
for i = 1:1000
beta = beta - (alpha/m) * (X' * (X * beta - y));
end
end
b =[ 0.0001 0.0719]

Matlab Spearman Correlation PVAL = 0?

I am conducting Spearman's Correlation with two data sets with 300 objects. These are my variables and commands:
a = [1:300]
b = [1 2 5 11 9 7 24 10 31 23 3 40 6 17 14 20 16 12 33 46 70 37 87 43 98 26 59 58 77 100 35 42 78 80 243 36 33327 4 83 160 163 198 86 94 406 111 28 29 55 113 239 295 110 196 177 32679 229 342 305 300 254 96 210 514 167 172 232 190 117 32081 25 158 19333 241 82 149 159 66 178 24487 68 30 1016 725 266 391 638 348 320 681 242 319 228 381 408 442 202 369 471 821 191 426 8 270 211 2266 619 576 441 680 3431 1167 723 74 318 556 640 395 1059 579 614 212 325 437 323 687 373 599 26637 985 54 84 802 724 154 417 240 1120 818 2309 462 109 104 509 494 427 57 2475 549 396 419 123 580 79 225 1132 351 76 16859 596 862 315 470 992 257 120 409 751 832 285 1534 714 1665 1376 2129 678 416 721 209 31971 183 356 1346 1015 1003 188 1076 1634 608 1056 338 308 145 418 625 1313 121 2484 996 783 329 1185 697 157 1100 175 622 235 456 277 166 2700 1439 461 653 433 540 1191 234 774 1894 1004 741 1062 948 48 99 405 797 237 1104 2286 22620 1429 30672 1808 169 458 22 1115 10660 872 474 1063 88 1727 1017 1107 1398 1519 703 1092 1027 272 263 1152 1770 1099 507 385 2118 19356 1778 2458 410 2110 7522 17166 4065 15136 13294 10876 17174 2434 9898 5663 13594 10506 11552 15635 9322 3223 8949 12388 13216 13851 13852 6696 12177 4700 17199 2067 11110 15486 5664 6593 4701 527 8616 268]
[RHO,PVAL] = corr(b',a','Type', 'Spearman')
RHO =
0.6954
PVAL =
0
Out of the 5 comparisons I made with other data sets of 300 objects, only 1 returned significant P-values. Is there an explanation for this?
I tried a different data set and got a value that was not significant (PVAL > 0.05). I also displayed the answer in a long (15 digits) and exponential form and got 0.00000000000000e+000 using:
format longEng
I also checked with another statistics program that reported the p-value as < 0.0001. This means that the p-value is just really, really small.

Saving text matrix in a directory: MATLAB

I have a matrix, say A =
11084 2009 572 277 1095 685 636 365 545 697 518 490 747 1648;
11084 2010 1000 533 340 212 635 254 399 759 110 248 490 214;
11084 2011 587 410 481 146 99 499 547 118 706 20 174 526;
12813 2009 216 486 1443 207 730 369 518 625 816 767 382 1352;
12813 2010 673 544 517 204 704 504 219 1033 633 168 473 272;
12813 2011 348 238 458 107 90 394 1014 196 1109 34 365 250;
The column 1 indicates Station ID, I want to save the output in a separate directory in the name of station ID; such as in this case a text file will be created named 11084.txt which will contain foll. data:
2009 572;2009 277;2009 1095;2009 685;2009 636;2009 365;2009 545;2009 697;2009 518;2009 490;2009 747;2009 1648;2010 1000;2010 533;2010 340;2010 212;2010 635;2010 254;2010 399;2010 759;2010 110;2010 248;2010 490;2010 214;2011 587;2011 410;2011 481;2011 146;2011 99;2011 499;2011 547;2011 118;2011 706;2011 20;2011 174;2011 526;
similarly, next 12813.txt which will contain
2009 216;2009 486;2009 1443;2009 207;2009 730;2009 369;2009 18;2009 625;2009 816;2009 767;2009 382;2009 1352;2010 673;2010 44;2010 517;2010 204;2010 704;2010 504;2010 219;2010 1033;2010 633;2010 168;2010 473;2010 272;2011 348;2011 238;2011 458;2011 107;2011 90;2011 394;2011 1014;2011 196;2011 1109;2011 34;2011 365;
2011 250;
Please let me know how to do so. Thanks,
A straight forward solution is just:
d = unique(A(:,1));
for i = 1:length(d)
fid = fopen([num2str(d(i)) '.txt'],'w');
aux = find(A(:,1)==d(i))';
for j = aux
for k = 3:size(A,2)
fprintf(fid,'%d %d;', A(j,2), A(j,k));
end
end
fclose(fid);
end