Say I have this sample data
A =
1.0000 6.0000 180.0000 12.0000
1.0000 5.9200 190.0000 11.0000
1.0000 5.5800 170.0000 12.0000
1.0000 5.9200 165.0000 10.0000
2.0000 5.0000 100.0000 6.0000
2.0000 5.5000 150.0000 8.0000
2.0000 5.4200 130.0000 7.0000
2.0000 5.7500 150.0000 9.0000
I wish to calculate the variance of each column, grouped by class (the first column).
I have this working with the following code, but it uses hard coded indices, requiring knowledge of the number of samples per class and they must be in specific order.
Is there a better way to do this?
variances = zeros(2,4);
variances = [1.0 var(A(1:4,2)), var(A(1:4,3)), var(A(1:4,4));
2.0 var(A(5:8,2)), var(A(5:8,3)), var(A(5:8,4))];
disp(variances);
1.0 3.5033e-02 1.2292e+02 9.1667e-01
2.0 9.7225e-02 5.5833e+02 1.6667e+00
Separate the class labels and the data into different variables.
cls = A(:, 1);
data = A(:, 2:end);
Get the list of class labels
labels = unique(cls);
Compute the variances
variances = zeros(length(labels), 3);
for i = 1:length(labels)
variances(i, :) = var(data(cls == labels(i), :)); % note the use of logical indexing
end
I've done a fair bit of this type of stuff over the years, but to be able to judge, better vs. best, it would help to know what you expect to change in the data set or structure.
Otherwise, if no change is anticipated and the hard code works, stick with it.
Easy, peasy. Use consolidator. It is on the file exchange.
A = [1.0000 6.0000 180.0000 12.0000
1.0000 5.9200 190.0000 11.0000
1.0000 5.5800 170.0000 12.0000
1.0000 5.9200 165.0000 10.0000
2.0000 5.0000 100.0000 6.0000
2.0000 5.5000 150.0000 8.0000
2.0000 5.4200 130.0000 7.0000
2.0000 5.7500 150.0000 9.0000];
[C1,var234] = consolidator(A(:,1),A(:,2:4),#var)
C1 =
1
2
var234 =
0.035033 122.92 0.91667
0.097225 558.33 1.6667
We can test the variances produced, since we know the grouping.
var(A(1:4,2:4))
ans =
0.035033 122.92 0.91667
var(A(5:8,2:4))
ans =
0.097225 558.33 1.6667
It is efficient too.
Related
I have cell arrays A and B with different lengths and numbers.
A={1:0.5:5;1:0.5:2};
B={1:0.5:6;1:0.5:9};
C= [A;B];
I want to combine these cell arrays into a cell array C, which would then look like this:
C =
4×1 cell array
{1×9 double}
{1×3 double}
{1×11 double}
{1×17 double}
Then, I want to save this into a text file, that should look like this:
1.0000 1.5000 2.0000 2.5000 3.0000 3.5000 4.0000 4.5000 5.0000
1.0000 1.5000 2.0000
1.0000 1.5000 2.0000 2.5000 3.0000 3.5000 4.0000 4.5000 5.0000 5.5000 6.0000
1.0000 1.5000 2.0000 2.5000 3.0000 3.5000 4.0000 4.5000 5.0000 5.5000 6.0000 6.5000 7.0000 7.5000 8.0000 8.5000 9.0000
So far, I have only found code for text or same size arrays. This is my attempt, which doesn't work:
fid = open('filename.txt', 'wt');
fprintf(fid, '%f',C{:})
close(fid)
I believe the problem might be in the format you're specifying for the fprintf, as I believe using only '%f' will print one number on each row.
One way to do this would then be:
fid = fopen('filename.txt', 'wt');
for i = 1:length(C)
fmt = repmat('%f ',size(C{i})); % this only adds one whitespace in between numbers
fmt = [fmt,'\n']; % remember to add a new line
fprintf(fid,fmt,C{i});
end
fclose(fid);
I want to convert a n*n matrix into its respective row matrix in Matlab. How to achieve this ?
For example assume the original matrix is
7.0000 26.0000 6.0000 60.0000 78.5000
1.0000 29.0000 15.0000 52.0000 74.3000
11.0000 56.0000 8.0000 20.0000 104.3000
and I want to get the output as
7.0000 26.0000 6.0000 60.0000 78.5000 1.0000 29.0000 15.0000
52.0000 74.3000 11.0000 56.0000 8.0000 20.0000 104.3000
which is the row matrix.
As you want to reshape it into a vector, reshape might be a bit overkill, as you can just use linear indexing.
A = randi(10,5,5); %Create some matrix
B=A.'; %SLOW
B = B(:).'; %matrix -> vector conversion
On the other hand, the speed of the matrix -> vector conversion does not really matter as it is the initial transpose which is slow, which you'll need for any method, see e.g. Phil's answer.
Easiest solution is:
Anew = reshape(Aold',1,numel(Aold));
Of particular importance is that you need to use the transpose of Aold.
I think what you want is:
reshape(A, 1, []);
where A is your matrix. For example:
A = rand(5,5);
b = reshape(A, 1, []);
will give you a 1x25 matrix.
Suppose your original matrix is A (which has 15 elements)
A = [7.0000 26.0000 6.0000 60.0000 78.5000;
1.0000 29.0000 15.0000 52.0000 74.3000;
11.0000 56.0000 8.0000 20.0000 104.3000]
Now what you need is to reshape A to become a row vector.
reshape(A,1,[]) % Here 1 means you need one row, and [] means you need is as vector
If you want column vector you may use the following
reshape(A,[],1) % this gives a column vector
However, according to your objective the following code will do the job
A = [7.0000 26.0000 6.0000 60.0000 78.5000;
1.0000 29.0000 15.0000 52.0000 74.3000;
11.0000 56.0000 8.0000 20.0000 104.3000];
reshape(A,1,[])
The output will be
ans =
7.0000 1.0000 11.0000 26.0000 29.0000 56.0000 6.0000 15.0000 8.0000 60.0000 52.0000 20.0000 78.5000 74.3000 104.3000
For detailed information, type the following in the command window
help reshape
Hi I have data in MATLAB like this:
F =
1.0000 1.0000
2.0000 1.0000
3.0000 1.0000
3.1416 9.0000
4.0000 1.0000
5.0000 1.0000
6.0000 1.0000
6.2832 9.0000
7.0000 1.0000
8.0000 1.0000
9.0000 1.0000
9.4248 9.0000
10.0000 1.0000
I am looking for a way to sum the data in specific intervals. Example if I want my sampling interval to be 1, then the end result should be:
F =
1.0000 1.0000
2.0000 1.0000
3.0000 10.0000
4.0000 1.0000
5.0000 1.0000
6.0000 10.0000
7.0000 1.0000
8.0000 1.0000
9.0000 10.0000
10.0000 1.0000
i.e data is accumulated in the second column based on sampling the first row. Is there a function in MATLAB to do this?
Yes by combining histc() and accumarray():
F =[1.0000 1.0000;...
2.0000 1.0000;...
3.0000 1.0000;...
3.1416 9.0000;...
4.0000 1.0000;...
5.0000 1.0000;...
6.0000 1.0000;...
6.2832 9.0000;...
7.0000 1.0000;...
8.0000 1.0000;...
9.0000 1.0000;...
9.4248 9.0000;...
10.0000 1.0000];
range=1:0.5:10;
[~,bin]=histc(F(:,1),range);
result= [range.' accumarray(bin,F(:,2),[])]
If you run the code keep in mind that I changed the sampling interval (range) to 0.5.
This code works for all sampling intervals just define your wanted interval as range.
Yes and that's a job for accumarray:
Use the values in column 1 of F to sum (default behavior of accumarray) the elements in the 2nd column.
For a given interval of size s (Thanks to Luis Mendo for that):
S = accumarray(round(F(:,1)/s),F(:,2),[]); %// or you can use "floor" instead of "round".
S =
1
1
10
1
1
10
1
1
10
1
So constructing the output by concatenation:
NewF = [unique(round(F(:,1)/s)) S]
NewF =
1 1
2 1
3 10
4 1
5 1
6 10
7 1
8 1
9 10
10 1
Yay!!
i have a simple problem i am quite new to matlab so i am having problem in implementing it i have two 64x2 matrices u and h.i have to check if a single row in u is not equal to all of the rows in h.then the row which is not equal should be saved in a separate matrix meanwhile i have written this code but what it does is that r(i,:) get all the values of u(i,:) when this code runs, what i want is that only those values of u(i,:) should be stored in r which are not similar to any row in h matrix.
h=[];
for j=1:8
for i=1:8
h=[h; i j];
end
end
u=[5.3,1.4;6,8;2,3;3,5.5;2.6,8;3.7,2;4,2;5,3;1.9,8;5.4,4;3.2,3;2,2;2,4;2,3;8,2.2;8,4;7.3,1.5;6.2,5.1;2.4,1.5;3,5;2,7.1;1.8,2.7;3,4;6,5;6,1;5,4;4,6;3.5,2;5,7;7.2,8;7,7;5,5;6,3;6,6;1,2;5,8;3,5;1,5;2,2;2,1;6,3;4,7;6,8;3,6;1,6;5,2;3,5;8,7;8,4;4,8;1,1;6,3;7,5;8,1;1,6;4,5;5,5;6,7;6,7;6,7;6,3;3,4;5,7;1,1]
for i=1
for j=1:64
if u(i,:)==h(j,:)
c=1
else
c=0
if c==0
r(i,:)=u(i,:)
end
end
end
end
can anyone help me please
You can do it in one line with ismember:
r = u(~ismember(u,h,'rows'),:);
With your example data, the result is
>> r
r =
5.3000 1.4000
3.0000 5.5000
2.6000 8.0000
3.7000 2.0000
1.9000 8.0000
5.4000 4.0000
3.2000 3.0000
8.0000 2.2000
7.3000 1.5000
6.2000 5.1000
2.4000 1.5000
2.0000 7.1000
1.8000 2.7000
3.5000 2.0000
7.2000 8.0000
use setdiff with 'rows' option to compute r. Please avoid unnecessary loops. pre-allocate when possible.
% construct h without loop
[h{1} h{2}]=ndgrid(1:8,1:8);
h=[h{1}(:) h{2}(:)];
% get r using setdiff
r = setdiff( u, h, 'rows')
Results with
r =
1.8000 2.7000
1.9000 8.0000
2.0000 7.1000
2.4000 1.5000
2.6000 8.0000
3.0000 5.5000
3.2000 3.0000
3.5000 2.0000
3.7000 2.0000
5.3000 1.4000
5.4000 4.0000
6.2000 5.1000
7.2000 8.0000
7.3000 1.5000
8.0000 2.2000
Solution of you question in NlogN complexity (N=64):
N=size(h,1);
[husorted,origin_husorted,destination_hu]=unique([h;u],'rows','first');
iduplicates=destination_hu(N+1:end)<=destination_hu(N),:);
r=u;
r(iduplicates,:)=0;
destination_uh is the only output of unique that is useful; It verifies [h;u]=husorted(destination_uh,:)]. 'first' ensures that if line i of u is equal line j of h, then destination_uh(i+N) is equal to destination_uh(j).
Solution for your particular h, with complexity N:
r=u;
r(all(u==round(u)&u>=1&u<=8,2),:)=0;
I have 3 vectors: npdf, tn(:,1) and tn(:,2) and am finding the values of npdf in tn(:,2) line by line:
[npdf(1:20,1), tn(1:20,:)]
ans =
8.0000 3.0000 1.0000
11.0000 2.9167 1.0000
1.0000 3.3000 1.0000
11.0000 1.2167 1.0000
5.0000 2.8167 1.0000
1.0000 2.4000 1.0000
2.0000 2.4500 1.0000
4.0000 0.2500 1.0000
15.0000 3.7500 1.0000
15.0000 4.9167 1.0000
1.0000 2.8167 2.0000
17.0000 0.2500 2.0000
15.0000 1.0000 3.0000
4.0000 3.0000 3.0000
8.0000 0.5833 3.0000
1.0000 0.5833 3.0000
3.0000 5.0000 5.0000
11.0000 3.7500 6.0000
8.0000 3.0000 7.0000
15.0000 2.8000 7.0000
for i=1:length(npdf)
[LOCA,~]=ismember(tn(:,2),npdf(i,1,1));
dummy=find(LOCA~=0);
tpdf(i,1)=tn(randi(length(dummy),1,1),1);
end
each time it finds the value of npdf in tn(:,2) it chooses a value from tn(:,1).
Here's the problem: if it can't locate the value from npdf in tn(:,2) then I need to choose the nearest value (in magnitude) in tn(:,2) and proceed. Either that or some sort of interpolation between nearest values.. How would you do this most efficiently?
At your discretion to change the code, it doesn't look very efficient to me.
It can be done easily by using knnsearch as follows:
[idx,D]=knnsearch(tn(:,2),npdf,'K',size(tn,1));
for i=1:size(D,1)
tpdf(i,1)=tn(randi(sum(D(i,:)==min(D(i,:))),1,1),1);
end
It finds distance of each value in npdf to all the values in tn. Then it considers only the nearest value. Then it selects a random indices from tn(:,1) as per your code.