I have a matrix A that contains 24 values for each day of the year (one value for each hour). Each column of A is a different day and each day has 24 rows worth of data (A is 24-by-365). I want to compare each day to each other by comparing the hour data of each day. To do this, I take one column of data and compare it to the next column. I take the difference of each hour's data in the two columns and then square and sum them to get a single value indicating how similar the two days are. I then do this with every possible combination of days, creating a 365-by-365 matrix, d, indicating how similar each day is to each other day. For example, element d(20,100) contains a value indicating how similar the 20th day of the year is to the 100th. The code is working, but it is quite slow and I would like to be able to vectorize it. Help would be greatly appreciated.
for j=1:365
for k=1:365
d(j,k)=sqrt(sum((A(:,j)-A(:,k)).^2));
end
end
Pairwise Euclidean distance using pdist, which does the heavy-lifting in C, and squareform to create the distance matrix:
d = squareform(pdist(A.'));
If you need this to be even faster (365-by-365 is not very big though), see my answer here or try this File Exchange program.
You can't beat horchler's answer, but for completeness here's how this can be done using bsxfun
d = bsxfun(#minus, permute(A, [3 2 1]), permute(A, [2 1 3]));
d = sqrt( sum( d.^2, 3 ) );
Another nice way of doing this is using the fact that || x - y || = sqrt(||x||^2-2< x,y >+||y||^2). Therefore
n = sum(A.^2, 1); % norm of each vector
b = bsxfun(#plus, n, n') - 2 * A' * A;
Related
I'm new to Matlab and I'm looking for a solution to a problem of determining blocks of same dates in one vector and to average over the corresponding block of data in another vector.
Given is a vector consisting of several blocks of dates in the format 'dd-mmm-yyyy'. The blocks with same dates can have variable length. An example would be
T= ['03-Jan-2013';
'03-Jan-2013';
'03-Jan-2013';
'04-Jan-2013';
'04-Jan-2013';
'05-Jan-2013']
Each date in T corresponds to a data entry in another vector H (for simplicity same dates from T have here the same corresponding number in H)
H= [1;
1;
1;
5;
5;
6]
The goal is now to determine the average of the elements of H which correspond to the same dates and return a modified date and data vector Tout and Hout which would look like this:
Tout=['03-Jan-2013';
'04-Jan-2013';
'05-Jan-2013']
and
Hout=[1;
5;
6]
where Hout represents the averaged values.
Both vectors are initially drawn from a textfile and can have a length of about 100k.
So looping is probably not the best thing to do.
I appreciate any help!
Use unique to get the unique dates and their multiplicity and accumarray to average over the ones that are repeated
[Tout,~,n] = unique(T, 'rows');
Hout = accumarray(n, H, [], #mean);
I have two vectors of values and I want to compare them statistically. For simplicity assume A = [2 3 5 10 15] and B = [2.5 3.1 4.8 10 18]. I want to compute the standard deviation, the root mean square error (RMSE), the mean, and present conveniently, maybe as histogram. Can you please help me how to do it so that I understand? I know question is probably simple, but I am new into this. Many thanks!
edited:
This is how I wanted to implement RMSE.
dt = 1;
for k=1:numel(A)
err(k)=sqrt(sum(A(1,1:k)-B(1,1:k))^2/k);
t(k) = dt*k;
end
However it gives me bigger values than I expect, since e.g. 3 and 3.1 differ only in 0.1.
This is how I calculate error between reference value of each cycle with corresponding estimated in that cycle.
Can you tell me, am I doing right, or what's wrong?
abs_err = A-B;
The way you are looping through the vectors is not element by element but rather by increasing the vector length, that is, you are comparing the following at each iteration:
A(1,1:k) B(1,1:k)
-------- --------
k=1 [2] [2.5]
=2 [2 3] [2.5 3.1]
=3 [2 3 5] [2.5 3.1 4.8]
....
At no point do you compare only 2 and 2.1!
Assuming A and B are vectors of identical length (and both are either column or row vectors), then you want functions std(A-B), mean(A-B), and if you look in matlab exchange, you will find a user-contributed rmse(A-B) but you can also compute the RMSE as sqrt(mean((A-B).^2)). As for displaying a histogram, try hist(A-B).
In your case:
dt = 1;
for k=1:numel(A)
stdab(k) = std(A(1,1:k)-B(1,1:k));
meanab(k) = mean(A(1,1:k)-B(1,1:k));
err(k)=sqrt(mean((A(1,1:k)-B(1,1:k)).^2));
t(k) = dt*k;
end
You can also include hist(A(1,1:k)-B(1,1:k)) in the loop if you want to compute histograms for every vector pair difference A(1,1:k)-B(1,1:k).
I am facing a problem and I would be grateful to anyone that could help. The problem is the following:
Consider that we have a vector D = [D1;D2;D3;...;DN] and a set of time instances TI = {t1,t2,t3,...,tM}. Each element of vector D, Di, corresponds to a subset of TI. For example D1 could correspond to time instances {t1,t2,t3} and D2 to {t2,t4,t5}.
I would like to find the combination of elements of D that corresponds to all elements of TI, without any of these being taken into account more than once, and at the same time minimizes the cost function sum(Dj). Dj are elements of vector D and each one corresponds to a set of time instances.
Let me give an example. Let us consider a vector
D = [15;10;5;2;35;15;25;25;25;30;45;5;1;40]
and a set
TI={5,10,15,20,25,30}
Each of D elements corresponds to
{[5 15];[5 20];[5 25];[5 30];[5 15 20];[5 20 25];[5 15 30];[5 20 25 30];[10 15];[10 20];[10 25];[10 15 20];[10 15 20 25];[10 30]}
respectively, e.g. D(1)=15 corresponds to time instance [5 15].
The solution that the procedure has to come up with is that the combination of D(4) and D(12), i.e. 2 and 1 respectively, has the minimum sum and correspond to all time instances.
I have to mention that the procedure has to be able to work with large vectors.
Thanks for every attempt to help!
The binary weight vector x places a weight on each D_i.
Let f=[D1;D2;...;DN].
Column j of A, A_j is a binary vector.
A_jk is 1 if D_j corresponds to Tk, else is zero.
The problem is:
min f^T*x s.t. A*x=1;
Then use bintprog to solve.
x = bintprog(f,[],[],A,ones(M,1))
I have 19 cells (19x1) with temperature data for an entire year where the first 18 cells represent 20 days (each) and the last cell represents 5 days, hence (18*20)+5 = 365days.
In each cell there should be 7200 measurements (apart from cell 19) where each measurement is taken every 4 minutes thus 360 measurements per day (360*20 = 7200).
The time vector for the measurements is only expressed as day number i.e. 1,2,3...and so on (thus no decimal day),
which is therefore displayed as 360 x 1's... and so on.
As the sensor failed during some days, some of the cells contain less than 7200 measurements, where one in
particular only contains 858 rows, which looks similar to the following example:
a=rand(858,3);
a(1:281,1)=1;
a(281:327,1)=2;
a(327:328,1)=5;
a(329:330,1)=9;
a(331:498,1)=19;
a(499:858,1)=20;
Where column 1 = day, column 2 and 3 are the data.
By knowing that each day number should be repeated 360 times is there a method for including an additional
amount of every value from 1:20 in order to make up the 360. For example, the first column requires
79 x 1's, 46 x 2's, 360 x 3's... and so on; where the final array should therefore have 7200 values in
order from 1 to 20.
If this is possible, in the rows where these values have been added, the second and third column should
changed to nan.
I realise that this is an unusual question, and that it is difficult to understand what is asked, but I hope I have been clear in expressing what i'm attempting to
acheive. Any advice would be much appreciated.
Here's one way to do it for a given element of the cell matrix:
full=zeros(7200,3)+NaN;
for i = 1:20 % for each day
starti = (i-1)*360; % find corresponding 360 indices into full array
full( starti + (1:360), 1 ) = i; % assign the day
idx = find(a(:,1)==i); % find any matching data in a for that day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % copy matching data over
end
You could probably use arrayfun to make this slicker, and maybe (??) faster.
You could make this into a function and use cellfun to apply it to your cell.
PS - if you ask your question at the Matlab help forums you'll most definitely get a slicker & more efficient answer than this. Probably involving bsxfun or arrayfun or accumarray or something like that.
Update - to do this for each element in the cell array the only change is that instead of searching for i as the day number you calculate it based on how far allong the cell array you are. You'd do something like (untested):
for k = 1:length(cellarray)
for i = 1:length(cellarray{k})
starti = (i-1)*360; % ... as before
day = (k-1)*20 + i; % first cell is days 1-20, second is 21-40,...
full( starti + (1:360),1 ) = day; % <-- replace i with day
idx = find(a(:,1)==day); % <-- replace i with day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % same as before
end
end
I am not sure I understood correctly what you want to do but this below works out how many measurements you are missing for each day and add at the bottom of your 'a' matrix additional lines so you do get the full 7200x3 matrix.
nbMissing = 7200-size(a,1);
a1 = nan(nbmissing,3)
l=0
for i = 1:20
nbMissing_i = 360-sum(a(:,1)=i);
a1(l+1:l+nbMissing_i,1)=i;
l = l+nb_Missing_i;
end
a_filled = [a;a1];
I'm desperately trying to avoid a for loop in Matlab, but I cannot figure out how to do it. Here's the situation:
I have two m x n matrices A and B and two vectors v and w of length d. I want to outer multiply A and v so that I get an m x n x d matrix where the (i,j,k) entry is A_(i,j) * v_k, and similarly for B and w.
Afterward, I want to add the resulting m x n x d matrices, and then take the mean along the last dimension to get back an m x n matrix.
I'm pretty sure I could handle the latter part, but the first part has me completely stuck. I tried using bsxfun to no avail. Anyone know an efficient way to do this? Thanks very much!
EDIT: This revision comes after the three great answers below. gnovice has the best answer to the question I asked without a doubt. However,the question that I meant to ask involves squaring each entry before taking the mean. I forgot to mention this part originally. Given this annoyance, both of the other answers work well, but the clever trick of doing algebra before coding doesn't help this time. Thanks for the help, everyone!
EDIT:
Even though the problem in the question has been updated, an algebraic approach can still be used to simplify matters. You still don't have to bother with 3-D matrices. Your result is just going to be this:
output = mean(v.^2).*A.^2 + 2.*mean(v.*w).*A.*B + mean(w.^2).*B.^2;
If your matrices and vectors are large, this solution will give you much better performance due to the reduced amount of memory required as compared to solutions using BSXFUN or REPMAT.
Explanation:
Assuming M is the m-by-n-by-d matrix that you get as a result before taking the mean along the third dimension, this is what a span along the third dimension will contain:
M(i,j,:) = A(i,j).*v + B(i,j).*w;
In other words, the vector v scaled by A(i,j) plus the vector w scaled by B(i,j). And this is what you get when you apply an element-wise squaring:
M(i,j,:).^2 = (A(i,j).*v + B(i,j).*w).^2;
= (A(i,j).*v).^2 + ...
2.*A(i,j).*B(i,j).*v.*w + ...
(B(i,j).*w).^2;
Now, when you take the mean across the third dimension, the result for each element output(i,j) will be the following:
output(i,j) = mean(M(i,j,:).^2);
= mean((A(i,j).*v).^2 + ...
2.*A(i,j).*B(i,j).*v.*w + ...
(B(i,j).*w).^2);
= sum((A(i,j).*v).^2 + ...
2.*A(i,j).*B(i,j).*v.*w + ...
(B(i,j).*w).^2)/d;
= sum((A(i,j).*v).^2)/d + ...
sum(2.*A(i,j).*B(i,j).*v.*w)/d + ...
sum((B(i,j).*w).^2)/d;
= A(i,j).^2.*mean(v.^2) + ...
2.*A(i,j).*B(i,j).*mean(v.*w) + ...
B(i,j).^2.*mean(w.^2);
Try reshaping the vectors v and w to be 1 x 1 x d:
mean (bsxfun(#times, A, reshape(v, 1, 1, [])) ...
+ bsxfun(#times, B, reshape(w, 1, 1, [])), 3)
Here I am using [] in the argument to reshape to tell it to fill that dimension in based on the product of all the other dimensions and the total number of elements in the vector.
Use repmat to tile the matrix in the third dimension.
A =
1 2 3
4 5 6
>> repmat(A, [1 1 10])
ans(:,:,1) =
1 2 3
4 5 6
ans(:,:,2) =
1 2 3
4 5 6
etc.
You still don't have to resort to any explicit loops or indirect looping using bsxfun et al. for your updated requirements. You can achieve what you want by a simple vectorized solution as follows
output = reshape(mean((v(:)*A(:)'+w(:)*B(:)').^2),size(A));
Since OP only says that v and w are vectors of length d, the above solution should work for both row and column vectors. If they are known to be column vectors, v(:) can be replaced by v and likewise for w.
You can check if this matches Lambdageek's answer (modified to square the terms) as follows
outputLG = mean ((bsxfun(#times, A, reshape(v, 1, 1, [])) ...
+ bsxfun(#times, B, reshape(w, 1, 1, []))).^2, 3);
isequal(output,outputLG)
ans =
1