Could you please let me know how to get the full p-value from gsl_cdf_tdist_P function when p-value is smaller than 1E-16? I am getting 0 instead.
Thanks,
Woody
print "t-test p-value = " . ttest(\#n,\#t) . "\n";
sub ttest{
my ($n,$t) = #_;
my #n = #$n;
my #t = #$t;
my $nn = pdl(#n);
my $tt = pdl(#t);
my ($tstats, $df) = t_test( $nn, $tt );
use PDL::GSL::CDF;
my $p_2tail = 2 * (1 - gsl_cdf_tdist_P( $tstats->abs, $df ));
return $p_2tail;
}
My input values as follows:
my #n = qw (1 2 4 2 3 1 2 4 2 1 2 4 2 3 1 2 4 2 1 2 4 2 3 1 2 4 2);
my #t = qw (11 12 13 12 13 11 14 11 12 13 12 13 11 14 11 12 13 12 13 11 14);
I found an easy solution for this problem. I used gsl_cdf_tdist_Q to get p-values. The p-values are not limited to 16 digits after decimal because they are not remapped p-values like (1-gsl_cdf_tdist_P).
Woody
Related
I have several data files and totals within those data files that I need to re-calculate.
The variables are broken out by race/ethnicity * sex and then a total is given.
The pattern is repeated for several measures and I cannot re-structure the data files. I have to keep the structure intact.
UPDATED: For example below are the first 32 variables (and 10 rows of data) in one of the files -- Hispanic males, Hispanic females, American Indian males, American Indian females....total males, and total females for grade 8 and then grade 9.
I have over 100 of these totals to do so I want to automate the process. How can I select the 7 prior variables that end in _M or _F to sum (or something to that extent)? TIA!!!
G08_HI_M G08_HI_F G08_AM_M G08_AM_F G08_AS_M G08_AS_F G08_HP_M G08_HP_F G08_BL_M G08_BL_F G08_WH_M G08_WH_F G08_TR_M G08_TR_F TOT_G08_M TOT_G08_F G09_HI_M G09_HI_F G09_AM_M G09_AM_F G09_AS_M G09_AS_F G09_HP_M G09_HP_F G09_BL_M G09_BL_F G09_WH_M G09_WH_F G09_TR_M G09_TR_F TOT_G09_M TOT_G09_F
5 2 9 6 2 3 6 9 7 4 1 4 8 4 . . 7 11 2 13 4 2 14 10 10 13 2 11 9 5 . .
7 1 8 10 2 4 8 0 1 2 8 3 4 5 . . 7 13 12 13 5 15 3 2 2 13 11 15 3 15 . .
7 8 10 9 0 4 7 9 8 0 3 10 7 1 . . 15 9 11 9 11 9 6 7 14 9 12 8 6 14 . .
4 8 9 0 10 6 4 3 10 9 2 5 8 2 . . 13 2 5 13 3 14 5 15 10 15 7 11 9 6 . .
7 6 5 1 4 5 7 4 5 1 8 3 4 4 . . 9 7 7 2 4 8 3 4 3 10 9 8 7 7 . .
3 1 0 2 4 10 2 10 5 9 7 1 8 8 . . 7 9 5 7 13 6 12 13 10 6 2 13 3 12 . .
5 7 4 1 7 9 6 8 3 1 3 2 10 4 . . 14 12 8 5 6 2 2 5 6 4 12 6 4 5 . .
8 9 3 2 3 10 6 5 9 10 8 1 4 5 . . 10 2 3 8 3 15 3 14 9 14 3 12 4 12 . .
4 3 2 6 4 1 2 5 5 6 4 5 4 1 . . 3 14 12 12 15 10 14 11 5 8 9 14 7 15 . .
1 10 4 2 1 3 9 8 3 3 3 0 3 1 . . 12 9 5 7 14 9 13 9 6 14 5 7 13 13 . .
Is this your data? You mention you can change the data structure so you will need to "program the variable names" see below where I have example of create arrays (variable lists) that groups the names for G and sex. as implied by your TOT_: variables.
data G;
input G08_HI_M G08_HI_F G08_AM_M G08_AM_F G08_AS_M G08_AS_F G08_HP_M G08_HP_F
G08_BL_M G08_BL_F G08_WH_M G08_WH_F G08_TR_M G08_TR_F TOT_G08_M TOT_G08_F
G09_HI_M G09_HI_F G09_AM_M G09_AM_F G09_AS_M G09_AS_F G09_HP_M G09_HP_F
G09_BL_M G09_BL_F G09_WH_M G09_WH_F G09_TR_M G09_TR_F TOT_G09_M TOT_G09_F;
cards;
5 2 9 6 2 3 6 9 7 4 1 4 8 4 . . 7 11 2 13 4 2 14 10 10 13 2 11 9 5 . .
7 1 8 10 2 4 8 0 1 2 8 3 4 5 . . 7 13 12 13 5 15 3 2 2 13 11 15 3 15 . .
7 8 10 9 0 4 7 9 8 0 3 10 7 1 . . 15 9 11 9 11 9 6 7 14 9 12 8 6 14 . .
4 8 9 0 10 6 4 3 10 9 2 5 8 2 . . 13 2 5 13 3 14 5 15 10 15 7 11 9 6 . .
7 6 5 1 4 5 7 4 5 1 8 3 4 4 . . 9 7 7 2 4 8 3 4 3 10 9 8 7 7 . .
3 1 0 2 4 10 2 10 5 9 7 1 8 8 . . 7 9 5 7 13 6 12 13 10 6 2 13 3 12 . .
5 7 4 1 7 9 6 8 3 1 3 2 10 4 . . 14 12 8 5 6 2 2 5 6 4 12 6 4 5 . .
8 9 3 2 3 10 6 5 9 10 8 1 4 5 . . 10 2 3 8 3 15 3 14 9 14 3 12 4 12 . .
4 3 2 6 4 1 2 5 5 6 4 5 4 1 . . 3 14 12 12 15 10 14 11 5 8 9 14 7 15 . .
1 10 4 2 1 3 9 8 3 3 3 0 3 1 . . 12 9 5 7 14 9 13 9 6 14 5 7 13 13 . .
;;;;
run;
You can do something like this to create some array that can be used in SAS statistics functions.
proc transpose data=g(obs=0 drop=tot_:) out=gnames;
var _all_;
run;
data gnames;
set gnames;
g = input(substrn(_name_,2,3),2.);
length race $2; race=scan(_name_,2,'_');
length sex $1; sex =scan(_name_,3,'_');
run;
proc sort;
by g sex;
run;
proc print;
run;
data _null_;
set gnames;
by g sex;
if first.sex then put +3 'Array G' g z2. sex $1. '[*] ' #;
put _name_ #;
if last.sex then put ';';
run;
405 data _null_;
406 set gnames;
407 by g sex;
408 if first.sex then put +3 'Array G' g z2. sex $1. '[*] ' #;
409 put _name_ #;
410 if last.sex then put ';';
411 run;
Array G08F[*] G08_HI_F G08_AM_F G08_AS_F G08_HP_F G08_BL_F G08_WH_F G08_TR_F ;
Array G08M[*] G08_HI_M G08_AM_M G08_AS_M G08_HP_M G08_BL_M G08_WH_M G08_TR_M ;
Array G09F[*] G09_HI_F G09_AM_F G09_AS_F G09_HP_F G09_BL_F G09_WH_F G09_TR_F ;
Array G09M[*] G09_HI_M G09_AM_M G09_AS_M G09_HP_M G09_BL_M G09_WH_M G09_TR_M ;
It seems the totals are interspersed between the variables that are to be summed, so we can sum "every variable since the last total that meet some criteria, such as ending in '_F'"?
It can for example be done as below. I used a simplified data set, but the totals are the sum of every variabel since the last total for each gender. I used proc contents to get the variable list. I then go down that list building a sum expression for men and one for females. When a variable named tot is encountered, a finished line of the form tot1_M=sum(var1_M,var2_M,var3_M); is outputed. These lines are collected in the macro variable totals and inserted in a data step.
If you know it's always 7 variables for men, 7 females and then a sum, so you can use simply position and not the name, there is an easier solution below.
data old;
var1_M=1;
var1_F=2;
var2_M=3;
var2_F=4;
var3_M=5;
var3_F=6;
tot1_M=.;
tot1_F=.;
var4_M=7;
var4_F=8;
var5_M=9;
var5_F=10;
var6_M=11;
var6_F=12;
tot2_M=.;
tot2_F=.;
run;
proc contents data=old out=contents noprint;
run;
proc sort data=contents;
by varnum;
run;
data temp;
set contents;
length sumline_F sumline_M $400;
if _n_=1 then do;
sumline_M="sum(";
sumline_F="sum(";
end;
retain sumline_M sumline_F;
if find(name, "_M")>0 and find(name,"tot")=0 then sumline_M=cat(strip(sumline_M),strip(name), ", ");
else if find(name, "_F")>0 and find(name,"tot")=0 then sumline_F=cat(strip(sumline_F), strip(name), ", ");
if find(name,"tot")>0 and find(name,"_M")>0 then do;
sumline_M=substr(sumline_M,1, length(sumline_M)-1);
finline=cat(strip(name), "=", strip(sumline_M),");");
sumline_M="sum(";
end;
if find(name,"tot")>0 and find(name,"_F")>0 then do;
sumline_F=substr(sumline_F,1, length(sumline_F)-1);
finline=cat(strip(name), "=", strip(sumline_F),");");
sumline_F="sum(";
end;
run;
proc sql;
select finline
into :totals separated by " "
from temp
where not missing(finline);
data new;
set old;
&totals;
run;
If the order is always the same (always Male-female), you could go like this:
/* Defining data. Note that _M _F are always alternating, with no variables missing*/
data old;
var1_M=1;
var1_F=2;
var2_M=3;
var2_F=4;
var3_M=5;
var3_F=6;
var4_M=5;
var4_F=6;
var5_M=5;
var5_F=6;
var6_M=5;
var6_F=6;
var7_M=5;
var7_F=6;
tot1_M=.;
tot1_F=.;
var8_M=7;
var8_F=8;
var9_M=9;
var9_F=10;
var10_M=11;
var10_F=12;
var11_M=11;
var11_F=12;
var12_M=11;
var12_F=12;
var13_M=11;
var13_F=12;
var14_M=11;
var14_F=12;
tot2_M=.;
tot2_F=.;
run;
/* We have 7 _M and 7 _F-variables, so the first sum variable is number 15, the next 16. Adding 16 gived us the numbers of the next sum-variables*/
data totals;
do i=15 to 200 by 16;
output;
end;
do i=16 to 200 by 16;
output;
end;
run;
/* Puts the index of the sum variables into a macro variable*/
proc sql;
select i
into :sumvars separated by " "
from totals;
/* Loop variables using an array. If it is a sum variable, it's the sum of the 7 last variables, skipping every other.*/
data new;
set old;
array vars{*} _all_;
do i=1 to dim(vars);
if i in (&sumvars) then do;
vars{i}=sum(vars{i-2}, vars{i-4}, vars{i-6}, vars{i-8}, vars{i-10}, vars{i-12}, vars{i-14});
end;
end;
drop i;
run;
Each iteration in my perl code generates a vector of 5.
Output of first iteration is
out1
1
2
3
4
5
The second iterations generates same length of vector.
out2
10
20
30
40
50
and then it runs for nth time
out n
100
200
300
400
500
I want to merge these columns and have the final output in a tabular format or matrix format if you like:
out1 out2 ... outn
1 10 100
2 20 200
3 30 300
4 40 400
5 50 500
I tried splitting and then using the push but it prints "(101" and only do it once and not for all 20. I also have no idea where the "(101" comes from.
Any suggestions?
First, put all those output lists to a list. Second, iterate on that list: output every first element of each element-list in the first iteration, output every second element of each element-list in the second iteration, and so on.
For example
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #lists;
for my $i (1..10) {
my #list;
push #list, $_ * $i for (1..5);
push #lists, \#list;
}
$Data::Dumper::Indent = 0;
print Dumper(\#lists), "\n\n";
while (#{$lists[0]}) {
for my $list (#lists) {
print shift #$list, "\t";
}
print "\n";
}
Output:
$ perl t.pl
$VAR1 = [
[1,2,3,4,5],
[2,4,6,8,10],
[3,6,9,12,15],
[4,8,12,16,20],
[5,10,15,20,25],
[6,12,18,24,30],
[7,14,21,28,35],
[8,16,24,32,40],
[9,18,27,36,45],
[10,20,30,40,50]
];
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
Note: The output of Data::Dumper has been edited to make it more compact.
Save your vector information to an array of array as you do your processing. Then you can output the rows using a simple join:
use strict;
use warnings;
my #rows;
for my $i (1..10) {
my #vector = map {$i * $_} (1..5);
push #{$rows[$_]}, $vector[$_] for (0..$#vector);
}
for my $row (#rows) {
print join(" ", map {sprintf "%-3s", $_} #$row), "\n";
}
Outputs:
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
Note: It'd be a lot easier to advise if you provided code and actual data.
I have a pretty large matrix M and I am only interested in a few of the columns. I have a boolean vector V where a value of 1 represents a column that is of interest. Example:
-1 -1 -1 7 7 -1 -1 -1 7 7 7
M = -1 -1 7 7 7 -1 -1 7 7 7 7
-1 -1 7 7 7 -1 -1 -1 7 7 -1
V = 0 0 1 1 1 0 0 1 1 1 1
If multiple adjacent values of V are all 1, then I want the corresponding columns of M to be extracted into another matrix. Here's an example, using the matrices from before.
-1 7 7 -1 7 7 7
M1 = 7 7 7 M2 = 7 7 7 7
7 7 7 -1 7 7 -1
How might I do this efficiently? I would like all these portions of the matrix M to be stored in a cell array, or at least have an efficient way to generate them one after the other. Currently I'm doing this in a while loop and it is not as efficient as I'd like it to be.
(Note that my examples only include the values -1 and 7 just for clarity; this isn't the actual data I use.)
You can utilize the diff function for this, to break your V vector into blocks
% find where block differences exist
diffs = diff(V);
% move start index one value forward, as first value in
% diff represents diff between first and second in original vector
startPoints = find(diffs == 1) + 1;
endPoints = find(diffs == -1);
% if the first block begins with the first element diff won't have
% found start
if V(1) == 1
startPoints = [1 startPoints];
end
% if last block lasts until the end of the array, diff won't have found end
if length(startPoints) > length(endPoints)
endPoints(end+1) = length(V);
end
% subset original matrix into cell array with indices
results = cell(size(startPoints));
for c = 1:length(results)
results{c} = M(:,startPoints(c):endPoints(c));
end
The one thing I'm not sure of is if there's a better way to find the being_indices and end_indices.
Code:
X = [1 2 3 4 5 1 2 3 4 5
6 7 8 9 10 6 7 8 9 10
11 12 13 14 15 11 12 13 14 15
16 17 18 19 20 16 17 18 19 20
1 2 3 4 5 1 2 3 4 5
6 7 8 9 10 6 7 8 9 10
11 12 13 14 15 11 12 13 14 15
16 17 18 19 20 16 17 18 19 20];
V = logical([ 1 1 0 0 1 1 1 0 1 1]);
find_indices = find(V);
begin_indices = [find_indices(1) find_indices(find(diff(find_indices) ~= 1)+1)];
end_indices = [find_indices(find(diff(find_indices) ~= 1)) find_indices(end)];
X_truncated = mat2cell(X(:,V),size(X,1),[end_indices-begin_indices]+1);
X_truncated{:}
Output:
ans =
1 2
6 7
11 12
16 17
1 2
6 7
11 12
16 17
ans =
5 1 2
10 6 7
15 11 12
20 16 17
5 1 2
10 6 7
15 11 12
20 16 17
ans =
4 5
9 10
14 15
19 20
4 5
9 10
14 15
19 20
I have a CSV table where I have the merged data for 1024 independent variables and 25 dependent variables that are associated with them. For each independent variable (called 1 .. 1024), I have 10 different outcomes. I would like to
choose the best result for each independent variable, and
pipe the line containing that information into a new CSV file.
It seems like a fairly easy thing to ask of perl, and maybe it would be simple to do with a hash of an array of an array, but I'm still confused about how I could implement something like that for this collection of data.
Current code
I found a very helpful Q&A from 2009 on printing matching lines. It works fairly well after some tinkering, but a few issues remain:
I have to pre-sort the file so that my maximum value is the first value that appears for each case.
I also miss out on getting the best result for the first independent variable and
in some instances I get multiple lines returned to me instead of just the maximum value.
I'm fairly sure there must be an easier way to do this, and I would greatly appreciate any help and/or constructive criticism on my (ripped-off) script.
Thank you!
This is what I have so far:
#!/usr/bin/perl
use warnings;
use strict;
unless ($#ARGV == 0) {
print "USAGE: get_best.pl csvfile \n";
exit;
}
### this is a script to get the best "score"
my $input = $ARGV[0];
my $outfile = "bestofthebest.csv";
if (-e $outfile ) {
system "rm $outfile";
}
open(my $fh,'<',"$input") || die "could not open $input"; #try to open input
open (SUMMARY, ">>","$outfile") || die "could not open $outfile"; #open output file for writing
my $this_line = "";
my $do_next = 0;
while (<$fh>) {
chomp($_);
my $last_line = $this_line;
$this_line = $_;
if ($this_line =~ m/Seq/) {
print SUMMARY "$this_line\n";next;
}
my ($compound, $rank, $nnme, $G1, ..., $res1, $res2, $res3, $res4, $res5, $res6 ) = split(/\s+/, $this_line, 26);
my ($compound_old, $rank_old, $nnme_old, $G1_old, ..., $res1_old, $res2_old, $res3_old, $res4_old, $res5_old, $res6_old) = split(/\s+/, $last_line, 26);
foreach ($compound == $compound_old) {
if (($G1 >= $G1_old)){
print SUMMARY "$this_line\n";
print "\n $G1 G1 is >> $G1_old G1_old loop\n";
print "\n compound is $compound G1 is $G1\n";
$do_next = 1;
}
else {
$last_line = "";
$do_next = 0;
}
}
}
close ($fh);
close (SUMMARY);
Example input
This is what the input data looks like (I've left off some columns and rows, obviously)
10 8 3 -18.08 -1.4 -16.68 -15.94 -2.13 -9.45
11 10 4 -15.2 3.2 -18.4 -18.02 2.82 -5
11 5 4 -15.22 2.71 -17.92 -15.88 0.66 -4.51
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
11 4 4 -16.63 0.48 -17.1 -15.75 -0.87 -5.92
11 6 4 -15.21 1.83 -17.04 -18.41 3.21 -7
11 9 4 -15.18 1.82 -17 -16.56 1.38 -7.09
11 8 4 -14.98 1.93 -16.91 -16.78 1.79 -10.81
11 2 4 -18.75 -1.95 -16.8 -17.83 -0.92 -7.35
11 1 4 -19.67 -3.17 -16.5 -16.4 -3.27 -9.01
11 3 4 -16.69 -0.54 -16.14 -16.35 -0.34 -9.17
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 9 4 -19.09 -1.01 -18.08 -16.01 -3.09 -5.56
12 4 4 -19.48 -2.18 -17.3 -16.34 -3.14 -4
12 2 4 -19.86 -2.77 -17.1 -15.97 -3.9 -2.96
12 8 4 -19.49 -2.45 -17.03 -16.39 -3.1 -7.19
12 1 4 -20.28 -3.33 -16.95 -17.12 -3.16 -5.18
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
12 5 4 -19.63 -2.86 -16.77 -16.41 -3.22 -6.54
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
12 10 4 -19.39 -2.95 -16.44 -17.42 -1.97 -7.67
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
13 3 3 -16 2.94 -18.94 -19.24 3.24 -2.78
13 2 3 -13.79 4.9 -18.7 -17.35 3.56 -4.72
13 6 3 -22.08 -3.4 -18.68 -20.12 -1.96 -6.74
13 9 3 -18.98 -0.32 -18.66 -15.97 -3.01 -3.06
13 7 3 -20.4 -2.08 -18.32 -18.24 -2.17 -5.71
13 5 3 -19.94 -1.62 -18.32 -19.42 -0.52 -7.44
13 10 3 -19.26 -1.25 -18.01 -17.52 -1.74 -5.68
13 4 3 -17.75 -1.33 -16.42 -17.75 0 -9.15
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
14 5 3 -21.32 -2.95 -18.37 -18.08 -3.24 -6.03
14 7 3 -24.25 -6.29 -17.96 -18.78 -5.47 -9.21
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
14 4 3 -21.59 -3.93 -17.67 -19.32 -2.28 -6.55
14 1 3 -22.43 -4.79 -17.63 -18.09 -4.34 -5.63
Current Output:
10 2 3 -10.11 8.94 -19.04 -18.48 8.38 -4.09
11 5 4 -15.22 2.71 -17.92 -15.88 0.66 -4.51
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
15 10 4 -21.51 -1.51 -20 -17.63 -3.88 -2.45
16 5 4 -17.81 2.56 -20.37 -19.09 1.28 -1.19
16 2 4 -16.61 1.97 -18.58 -21.06 4.45 -6.47
Perhaps the follow will be helpful:
use strict;
use warnings;
my %hash;
while (<DATA>) {
my ( $indVarID, $val ) = (split)[ 0, 3 ];
$hash{$indVarID} = [ $val, $_ ]
if !exists $hash{$indVarID}
or $hash{$indVarID}[0] < $val;
}
print $hash{$_}[1] for sort { $a <=> $b } keys %hash;
__DATA__
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
11 4 4 -16.63 0.48 -17.1 -15.75 -0.87 -5.92
11 6 4 -15.21 1.83 -17.04 -18.41 3.21 -7
11 9 4 -15.18 1.82 -17 -16.56 1.38 -7.09
11 8 4 -14.98 1.93 -16.91 -16.78 1.79 -10.81
11 2 4 -18.75 -1.95 -16.8 -17.83 -0.92 -7.35
11 1 4 -19.67 -3.17 -16.5 -16.4 -3.27 -9.01
11 3 4 -16.69 -0.54 -16.14 -16.35 -0.34 -9.17
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 9 4 -19.09 -1.01 -18.08 -16.01 -3.09 -5.56
12 4 4 -19.48 -2.18 -17.3 -16.34 -3.14 -4
12 2 4 -19.86 -2.77 -17.1 -15.97 -3.9 -2.96
12 8 4 -19.49 -2.45 -17.03 -16.39 -3.1 -7.19
12 1 4 -20.28 -3.33 -16.95 -17.12 -3.16 -5.18
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
12 5 4 -19.63 -2.86 -16.77 -16.41 -3.22 -6.54
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
12 10 4 -19.39 -2.95 -16.44 -17.42 -1.97 -7.67
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
13 3 3 -16 2.94 -18.94 -19.24 3.24 -2.78
13 2 3 -13.79 4.9 -18.7 -17.35 3.56 -4.72
13 6 3 -22.08 -3.4 -18.68 -20.12 -1.96 -6.74
13 9 3 -18.98 -0.32 -18.66 -15.97 -3.01 -3.06
13 7 3 -20.4 -2.08 -18.32 -18.24 -2.17 -5.71
13 5 3 -19.94 -1.62 -18.32 -19.42 -0.52 -7.44
13 10 3 -19.26 -1.25 -18.01 -17.52 -1.74 -5.68
13 4 3 -17.75 -1.33 -16.42 -17.75 0 -9.15
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
14 5 3 -21.32 -2.95 -18.37 -18.08 -3.24 -6.03
14 7 3 -24.25 -6.29 -17.96 -18.78 -5.47 -9.21
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
14 4 3 -21.59 -3.93 -17.67 -19.32 -2.28 -6.55
14 1 3 -22.43 -4.79 -17.63 -18.09 -4.34 -5.63
Output:
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
This builds a hash of arrays (HoA), where the key is the independent variable ID and the value is a reference to a two-element list. The zeroth element in the list is the value found in the record's fourth column. The first element is the record.
As records are being read, if a new value for an independent variable is greater than the older value (or if there wasn't an older one), the new value and record are stored in the list.
When done, the keys are numerically sorted and the records which contained the greatest value for each independent variable ID are printed.
We have the following case:
Q = [idxcell{:,1}];
Sort = sort(Q,'descend')
Sort =
Columns 1 through 13
23 23 22 22 20 19 18 18 18 18 17 17 17
Columns 14 through 26
15 15 14 14 13 13 13 12 12 12 11 10 9
Columns 27 through 39
9 9 8 8 8 8 8 7 7 7 7 7 7
Columns 40 through 52
7 6 6 6 5 4 4 3 3 3 3 2 2
Columns 53 through 64
2 2 2 2 2 2 2 1 1 1 1 1
How can we sort matrix Sort according to how many times its values are repeated?
Awaiting result should be:
repeatedSort = 2(9) 7(7) 1(5) 8(5) 3(4) 18(4) 6(3) 9(3) 12(3) 13(3) 17(3) 4(2) 14(2) 15(2) 22(2) 23(2) 5(1) 10(1) 11(1) 19(1) 20(1)
or
repeatedSort = 2 7 1 8 3 18 6 9 12 13 17 4 14 15 22 23 5 10 11 19 20
Thank you in advance.
You can use the TABULATE function from the Statistics Toolbox, then call SORTROWS to sort by the frequency.
Example:
x = randi(10, [20 1]); %# random values
t = tabulate(x); %# unique values and counts
t = t(find(t(:,2)),1:2); %# get rid of entries with zero count
t = sortrows(t, -2) %# sort according to frequency
the result, where first column are the unique values, second is their count:
t =
2 4 %# value 2 appeared four times
5 4 %# etc...
1 3
8 3
7 2
9 2
4 1
6 1
Here's one way of doing it:
d = randi(10,1,30); %Some fake data
n = histc(d,1:10);
[y,ii] = sort(n,'descend');
disp(ii) % ii is now sorted according to frequency