I have four arrays of the same dimensions: (10,000, 1). My goal is to average the four arrays into one that has the same dimensions. How can one achieve it the most efficient way?
Edit: Over a year later, the silliness of this question is very apparent. Thank you for those who helped pry off the training wheels. In the immortal words 'Read the documentation!' I trust
You can simply add each array and divide by 4.
Example : arrays A1 A2 A3 A4
Average = (A1 + A2 + A3 + A4)/4
Since the arrays are identical in size, MATLAB will automatically add corresponding elements in the array and provide the accurate result.
Is there another problem you are having with this ?
Related
I have a big complex single matrix (9040 X 23293).
Because this matrix holds to much data for me, I want to average every n rows. For example, n can be 10 and the new matrix will be 904 X 23293.
I tried to use reshape but it does not work on complex numbers.
I would love to get some help.
Thanks,
Lauren
Thanks.
Laurn
Reshape works on complex numbers. As you did not share the code, I do'nt know what is the problem. Anyhow, if the number of rows is not multiple of 10, you can reshape first 10 * n rows and add the average of the remain rows. You can find the general solution in the following for the given complex matrix m:
fixed_num_rows = fix(size(m,1)/n);
means = mean(reshape(m(1:(fixed_num_rows * n),:), fixed_num_rows, n * size(m,2)),2);
means = [means; mean(mean(m((fixed_num_rows * n + 1):size(m,1),:)))];
I have a structure P with 20 matrices. Each matrix is 53x63x46 double. The names of the matrices are fairly random, for instance S154, S324, S412, etc. Is there any way I can do an average across these matrices without having to type out like this?
M=(P.S154 + P.S324 + P.S412 + ...)/20
Also, does it make sense to use structure for computation like this. According to this post, perhaps it should be converted to cell array.
struct2cell(P)
is a cell array each of whose elements is one of your structure fields (the field names are discarded). Then
cell2mat(struct2cell(P))
is the result of concatenating these matrices along the first axis. You might reasonably ask why it does that rather than, say, making a new axis and giving you a 4-dimensional array, but expecting sensible answers to such questions is asking for frustration. Anyway, unless I'm getting the dimensions muddled,
reshape(cell2mat(struct2cell(P)),[53 20 63 46])))
will then give you roughly the 4-dimensional array you're after, with the "new" axis being (of course!) number 2. So now
mean(reshape(cell2mat(struct2cell(P)),[53 20 63 46]),2)
will compute the mean along that axis. The result will have shape [53 1 63 46], so now you will need to fix up the axes again:
reshape(mean(reshape(cell2mat(struct2cell(P)),[53 20 63 46]),2),[53 63 46])
If you are using structures, and by your question, you have fieldnames for each matrix.
Therefore, you need to:
1 - use function fieldnames to extract all the matrix names inside your structure. - http://www.mathworks.com/help/matlab/ref/fieldnames.html
2- then you can access it by doing like:
names = fieldnames(P);
matrix1 = P.names{1}
Using a for loop you can then make your calculations pretty fast!
I'm using MATLAB to perform some statistics on some data. I have two 17x206x378 matrices where dimension 1 are subjects from the same group (so 17 subjects in matrix1, 17 in matrix 2). I want to perform ttests so I get 206 p-values. I then want to do this SEPARATELY for each of the 378 elements in the third dimension.
So say u is a 17x206x378 matrix and d is a different 17x206x378 matrix.
I basically started by doing:
[h,p,ci,s] = ttest2(u,d)
Which does in fact give me a p-matrix size 1x206x378 so everything looked great.
Then to do a quick check I just extracted the first of the third dimension elements from each matrix with:
u1=u(:,:,1); d1=d(:,:,1);
and ran test2 on this data via what you would expect:
[h1,p1,ci1,s1] = ttest2(u1,d1);
I again got a 1x206 p1-matrix of results but the values are not the same as those in the 1x206x378 p-matrix. When I plot the values in both the p(:,:,1) and the p1 vectors the resulting plots look very similar but not exactly the same.
Obviously one of these give results that are significant (below .05) in some instances where the other does not and I do not want to report a fake result so 2 questions I suppose?
1) I am under the impression I am doing the ttests on the same data so what exactly is going on here?
2) If I do ultimately want to get 206 p-values for each of the 378 third dimension elements, what is the correct way to do this?
Thanks for your help!
I ran the following code:
u = rand(17,206,378);
d = rand(17,206,378);
u1 = u(:,:,1);
d1 = d(:,:,1);
[h,p,ci,s] = ttest(u,d);
[h1,p1,ci1,s1] = ttest(u1,d1);
sum(abs(p1(1,:)- p(1,:,1)))
And the output was 0, indicating that the corresponding elements of p and p1 are the same. Maybe it's an indexing issue.
I'm working with a fairly large 3D matrix (32x87x378), and I want to be able to extract every Nth element of a matrix, while keeping them in the same order. Similar to a previous question I asked: Matlab: Extracting Nth element of a matrix, while maintaining the original order of matrix
The method I was given was quite practical (and simple) and works well in most instances. For a random (1x20) matrix, where I wanted every 5th value, beginning with 4 and 5 (so that I am left with a 1x8 matrix (ab) of elements 4,5,9,10,14,15,19,20). It is done as follows:
r = rand(1,20);
n = 5;
ab = r(sort([4:n:numel(r) 5:n:numel(r)]))
My question is, how can this method be used for a 3D matrix r for it's 3rd dimension (or can it?), such as this:
r = rand(2,5,20);
It should be fairly simple, such as this:
n = 5;
ab = r(sort([4:n:numel(r) 5:n:numel(r)],3));
However, this will then give me a 1x80 matrix, as it does not preserve the original dimensions. Is there a way to correct this using the sort function? I'm also open to other suggestions, but I just want to be sure I am not missing anything.
Thanks in advance.
See if this is what you are after -
ab = r(:,:,sort([4:n:size(r,3) 5:n:size(r,3)]))
I have production (q) values from 4 different methods stored in the 4 matrices. Each of the 4 matrices contains q values from a different method as:
Matrix_1 = 1 row x 20 column
Matrix_2 = 100 rows x 20 columns
Matrix_3 = 100 rows x 20 columns
Matrix_4 = 100 rows x 20 columns
The number of columns indicate the number of years. 1 row would contain the production values corresponding to the 20 years. Other 99 rows for matrix 2, 3 and 4 are just the different realizations (or simulation runs). So basically the other 99 rows for matrix 2,3 and 4 are repeat cases (but not with exact values because of random numbers).
Consider Matrix_1 as the reference truth (or base case ). Now I want to compare the other 3 matrices with Matrix_1 to see which one among those three matrices (each with 100 repeats) compares best, or closely imitates, with Matrix_1.
How can this be done in Matlab?
I know, manually, that we use confidence interval (CI) by plotting the mean of Matrix_1, and drawing each distribution of mean of Matrix_2, mean of Matrix_3 and mean of Matrix_4. The largest CI among matrix 2, 3 and 4 which contains the reference truth (or mean of Matrix_1) will be the answer.
mean of Matrix_1 = (1 row x 1 column)
mean of Matrix_2 = (100 rows x 1 column)
mean of Matrix_3 = (100 rows x 1 column)
mean of Matrix_4 = (100 rows x 1 column)
I hope the question is clear and relevant to SO. Otherwise please feel free to edit/suggest anything in question. Thanks!
EDIT: My three methods I talked about are a1, a2 and a3 respectively. Here's my result:
ci_a1 =
1.0e+008 *
4.084733001497999
4.097677503988565
ci_a2 =
1.0e+008 *
5.424396063219890
5.586301025525149
ci_a3 =
1.0e+008 *
2.429145282593182
2.838897116739112
p_a1 =
8.094614835195452e-130
p_a2 =
2.824626709966993e-072
p_a3 =
3.054667629953656e-012
h_a1 = 1; h_a2 = 1; h_a3 = 1
None of my CI, from the three methods, includes the mean ( = 3.454992884900722e+008) inside it. So do we still consider p-value to choose the best result?
If I understand correctly the calculation in MATLAB is pretty strait-forward.
Steps 1-2 (mean calculation):
k1_mean = mean(k1);
k2_mean = mean(k2);
k3_mean = mean(k3);
k4_mean = mean(k4);
Step 3, use HIST to plot distribution histograms:
hist([k2_mean; k3_mean; k4_mean]')
Step 4. You can do t-test comparing your vectors 2, 3 and 4 against normal distribution with mean k1_mean and unknown variance. See TTEST for details.
[h,p,ci] = ttest(k2_mean,k1_mean);
EDIT : I misinterpreted your question. See the answer of Yuk and following comments. My answer is what you need if you want to compare distributions of two vectors instead of a vector against a single value. Apparently, the latter is the case here.
Regarding your t-tests, you should keep in mind that they test against a "true" mean. Given the number of values for each matrix and the confidence intervals it's not too difficult to guess the standard deviation on your results. This is a measure of the "spread" of your results. Now the error on your mean is calculated as the standard deviation of your results divided by the number of observations. And the confidence interval is calculated by multiplying that standard error with appx. 2.
This confidence interval contains the true mean in 95% of the cases. So if the true mean is exactly at the border of that interval, the p-value is 0.05 the further away the mean, the lower the p-value. This can be interpreted as the chance that the values you have in matrix 2, 3 or 4 come from a population with a mean as in matrix 1. If you see your p-values, these chances can be said to be non-existent.
So you see that when the number of values get high, the confidence interval becomes smaller and the t-test becomes very sensitive. What this tells you, is nothing more that the three matrices differ significantly from the mean. If you have to choose one, I'd take a look at the distributions anyway. Otherwise the one with the closest mean seems a good guess. If you want to get deeper into this, you could also ask on stats.stackexchange.com
Your question and your method aren't really clear :
Is the distribution equal in all columns? This is important, as two distributions can have the same mean, but differ significantly :
is there a reason why you don't use the Central Limit Theorem? This seems to me like a very complex way of obtaining a result that can easily be found using the fact that the distribution of a mean approaches a normal distribution where sd(mean) = sd(observations)/number of observations. Saves you quite some work -if the distributions are alike! -
Now if the question is really the comparison of distributions, you should consider looking at a qqplot for a general idea, and at a 2-sample kolmogorov-smirnov test for formal testing. But please read in on this test, as you have to understand what it does in order to interprete the results correctly.
On a sidenote : if you do this test on multiple cases, make sure you understand the problem of multiple comparisons and use the appropriate correction, eg. Bonferroni or Dunn-Sidak.