How to do feature selection with rankfeatures? - matlab

I have tried the following code to select the features in the data. The first two columns represent features and the last column represent classes:
clear all;
close all;
data = [27 9 2
11.6723281 28.93422177 2
25 9 2
23 8 2
5.896096039 23.97745722 1
21 6 2
21.16823369 5.292058423 2
4.242640687 13.43502884 1
22 6 2];
Attributes = data(:,1:2);
Classes = data(:,3);
train = [1 3 4 5 6 7];
testInds = [2 8 9];
BC = Classes == 2;
I = rankfeatures(data,BC);
Error using rankfeatures (line 208)
The length of GROUP must equal the number of columns in X.
Error in selection (line 16)
I = rankfeatures(data,BC);
Is there any other function to do this???

Related

compare two arrays of different length (case of inequality)

I have two arrays with different length, for example A =[ 2 3 11 0 8 ] and B=[ 2 6 8] ( The data are bigger in the real case) and I want to compare them and find elements that verify abs(A(i)-B(j))> 2 .
Is there any fast function that do that (such ismember but for inequalities) ?
You can create a small function that will check all the possible combinations and send you back the "valid" combination.
A = [2 3 11 0 8];
B = [2 6 8];
C = isbigger(A,B,2); %output = the element that verify abs(A-B)>2
function COMB = isbigger(A,B,val)
[X,Y] = meshgrid(A,B);
X = X(:);
Y = Y(:);
index = abs((X(:)-Y(:)))>val;
COMB = [X(index),Y(index)];
end
OUTPUT:
C =
2 6
2 8
3 6
3 8
11 2
11 6
11 8
0 6
0 8
8 2

vec2mat w/ different number of columns

Referring to Reshape row wise w/ different starting/ending elements number #Divakar came with a nice solution but, what if the number of columns is not always the same?
Sample run -
>> A'
ans =
4 9 8 9 6 1 8 9 7 7 7 4 6 2 7 1
>> out
out =
4 9 8 9 0 0
6 1 8 9 7 7
7 4 6 2 7 1
I took only the first 4 terms of A and put them in out, then fill the rest 2 empty cell with 0's. So the ncols = [4 6 6]. Unfortunately vet2mat doesn't allow vector as columns number.
Any suggestions?
You can employ bsxfun's masking capability here -
%// Random inputs
A = randi(9,1,15)
ncols = [4 6 5]
%// Initialize output arary of transposed size as compared to the desired
%// output arary size, as we need to insert values into it row-wise and MATLAB
%// follows column-major indexing
out = zeros(max(ncols),numel(ncols));
mask = bsxfun(#le,[1:max(ncols)]',ncols); %//'# valid positions mask for output
out(mask) = A; %// insert input array elements
out = out.' %//'# transpose output back to the desired output array size
Code run -
A =
5 3 7 2 7 2 4 6 8 1 9 7 5 4 5
ncols =
4 6 5
out =
5 3 7 2 0 0
7 2 4 6 8 1
9 7 5 4 5 0
You could use accumarray for that:
A = [4 9 8 9 6 1 8 9 7 7 7 4 6 2 7 1].'; %'// data
ncols = [4 6 6]; %// columns
n = max(ncols);
cs = cumsum(ncols);
ind = 1;
ind(cs+1) = 1;
ind = cumsum(ind(1:end-1)); %// `ind` tells the row for each element of A
result = accumarray(ind(:), A(:), [], #(x) {[x; zeros(n-numel(x),1)]}); %// split `A` as
%// dictated by `ind`, and fill with zeros. Each group is put into a cell.
result = [result{:}].'; %'// concatenate all cells

Matrix 1,2,3 how can i generate?

i want to control the creation of random numbers in this matrix :
Mp = floor(1+(10*rand(2,20)));
mp1 = sort(Mp,2);
i want to modify this code in order to have an output like this :
1 1 2 2 3 3 3 4 5 5 6 7 7 8 9 9 10 10 10 10
1 2 3 3 3 3 3 3 4 5 6 6 6 6 7 8 9 9 9 10
i have to fill each row with all the numbers going from 1 to 10 in an increasing order and the second matrix that counts the occurences of each number should be like this :
1 2 1 2 1 2 3 1 1 2 1 1 2 1 1 2 1 2 3 4
1 1 1 2 3 4 5 6 1 1 1 2 3 4 1 1 1 2 3 1
and the most tricky matrix that i'v been looking for since the last week is the third matrix that should skim through each row of the first matrix and returns the numbers of occurences of each number and the position of the last occcurence.here is an example of how the code should work. this example show the intended result after running through the first row of the first matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (positions)
1 2
2 2
3 3
4 1
5 2
6 1
7 2
8 1
9 2
10 4
(numbers)
this example show the intended result after running through the second row of the first matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (positions)
1 1 2
2 1 2
3 3 6
4 1 1
5 3
6 1 4
7 2 1
8 1 1
9 2 3
10 4
(numbers)
so the wanted matrix must be filled up with zeros from the beginning and each time after running through each row of the first matrix, we add the new result to the previous one...
I believe the following code does everything you asked for. If I didn't understand, you need to get a lot clearer in how you pose your question...
Note - I hard coded some values / sizes. In "real code" you would never do that, obviously.
% the bit of code that generates and sorts the initial matrix:
Mp = floor(1+(10*rand(2,20)));
mp1 = sort(Mp, 2);
clc
disp(mp1)
occCount = zeros(size(mp1));
for ii = 1:size(mp1,1)
for jj = 1:size(mp1,2)
if (jj == 1)
occCount(ii,jj) = 1;
else
if (mp1(ii,jj) == mp1(ii,jj-1))
occCount(ii,jj) = occCount(ii, jj-1) + 1;
else
occCount(ii,jj) = 1;
end
end
end
end
% this is the second matrix you asked for
disp(occCount)
% now the third:
big = zeros(10, 20);
for ii = 1:size(mp1,1)
for jj = 1:10
f = find(mp1(ii,:) == jj); % index of all of them
if numel(f) > 0
last = f(end);
n = numel(f);
big(jj, last) = big(jj, last) + n;
end
end
end
disp(big)
Please see if this is indeed what you had in mind.
The following code solves both the second and third matrix generation problems with a single loop. For clarity, the second matrix M2 is the 2-by-20 array in the example containing the cumulative occurrence count. The third matrix M3 is the sparse matrix of size 10-by-20 in the example that encodes the number and position of the last occurrence of each unique value. The code only loops over the rows, using accumarray to do most of the work. It is generalized to any size and content of mp1, as long as the rows are sorted first.
% data
mp1 = [1 1 2 2 3 3 3 4 5 5 6 7 7 8 9 9 10 10 10 10;
1 2 3 3 3 3 3 3 4 5 6 6 6 6 7 8 9 9 9 10]; % the example first matrix
nuniq = max(mp1(:));
% accumulate
M2 = zeros(size(mp1));
M3 = zeros(nuniq,size(mp1,2));
for ir=1:size(mp1,1),
cumSums = accumarray(mp1(ir,:)',1:size(mp1,2),[],#numel,[],true)';
segments = arrayfun(#(x)1:x,nonzeros(cumSums),'uni',false);
M2(ir,:) = [segments{:}];
countCoords = accumarray(mp1(ir,:)',1:size(mp1,2),[],#max,[],true);
[ii,jj] = find(countCoords);
nzinds = sub2ind(size(M3),ii,nonzeros(countCoords));
M3(nzinds) = M3(nzinds) + nonzeros(cumSums);
end
I won't print the outputs because they are a bit big for the answer, and the code is runnable as is.
NOTE: For new test data, I suggest using the commands Mp = randi(10,[2,20]); mp1 = sort(Mp,2);. Or based on your request to user2875617 and his response, ensure all numbers with mp1 = sort([repmat(1:10,2,1) randi(10,[2,10])],2); but that isn't really random...
EDIT: Error in code fixed.
I am editing the previous answer to check if it is fast when mp1 is large, and apparently it is:
N = 20000; M = 200; P = 100;
mp1 = sort([repmat(1:P, M, 1), ceil(P*rand(M,N-P))], 2);
tic
% Initialise output matrices
out1 = zeros(M, N); out2 = zeros(P, N);
for gg = 1:M
% Frequencies of each row
freqs(:, 1) = mp1(gg, [find(diff(mp1(gg, :))), end]);
freqs(:, 2) = histc(mp1(gg, :), freqs(:, 1));
cumfreqs = cumsum(freqs(:, 2));
k = 1;
for hh = 1:numel(freqs(:, 1))
out1(gg, k:cumfreqs(hh)) = 1:freqs(hh, 2);
out2(freqs(hh, 1), cumfreqs(hh)) = out2(freqs(hh, 1), cumfreqs(hh)) + freqs(hh, 2);
k = cumfreqs(hh) + 1;
end
end
toc

Matlab: sorting a vector by the number of time each unique value occurs

We have p.e. i = 1:25 iterations.
Each iteration result is a 1xlength(N) cell array, where 0<=N<=25.
iteration 1: 4 5 9 10 20
iteration 2: 3 8 9 13 14 6
...
iteration 25: 1 2 3
We evaluate the results of all iterations to one matrix sorted according to frequency each value is repeated in descending order like this example:
Matrix=
Columns 1 through 13
16 22 19 25 2 5 8 14 17 21 3 12 13
6 5 4 4 3 3 3 3 3 3 2 2 2
Columns 14 through 23
18 20 1 6 7 9 10 11 15 23
2 2 1 1 1 1 1 1 1 1
Result explanation: Column 1: N == 16 is present in 6 iterations, column 2: N == 22 is present in 5 iterations etc.
If a number N isn't displayed (in that paradigm N == 4, N == 24) in any iteration, is not listed with frequency index of zero either.
I want to associate each iteration (i) to the first N it is displayed p.e. N == 9 to be present only in first iteration i = 1 and not in i = 2 too, N == 3 only to i = 2 and not in i = 25 too etc until all i's to be unique associated to N's.
Thank you in advance.
Here's a way that uses a feature of unique (i.e. that it returns the index to the first value) that was introduced in R2012a
%# make some sample data
iteration{1} = [1 2 4 6];
iteration{2} = [1 3 6];
iteration{3} = [1 2 3 4 5 6];
nIter= length(iteration);
%# create an index vector so we can associate N's with iterations
nn = cellfun(#numel,iteration);
idx = zeros(1,sum(nn));
idx([1,cumsum(nn(1:end-1))+1]) = 1;
idx = cumsum(idx); %# has 4 ones, 3 twos, 6 threes
%# create a vector of the same length as idx with all the N's
nVec = cat(2,iteration{:});
%# run `unique` on the vector to identify the first occurrence of each N
[~,firstIdx] = unique(nVec,'first');
%# create a "cleanIteration" array, where each N only appears once
cleanIter = accumarray(idx(firstIdx)',firstIdx',[nIter,1],#(x){sort(nVec(x))},{});
cleanIter =
[1x4 double]
[ 3]
[ 5]
>> cleanIter{1}
ans =
1 2 4 6
Here is another solution using accumarray. Explanations in the comments
% example data (from your question)
iteration{1} = [4 5 9 10 20 ];
iteration{2} = [3 8 9 13 14 6];
iteration{3} = [1 2 3];
niterations = length(iteration);
% create iteration numbers
% same as Jonas did in the first part of his code, but using a short loop
for i=1:niterations
idx{i} = i*ones(size(iteration{i}));
end
% count occurences of values from all iterations
% sort them in descending order
occurences = accumarray([iteration{:}]', 1);
[occ val] = sort(occurences, 1, 'descend');
% remove zero occurences and create the Matrix
nonzero = find(occ);
Matrix = [val(nonzero) occ(nonzero)]'
Matrix =
3 9 1 2 4 5 6 8 10 13 14 20
2 2 1 1 1 1 1 1 1 1 1 1
% find minimum iteration number for all occurences
% again, using accumarray with #min function
assoc = accumarray([iteration{:}]', [idx{:}]', [], #min);
nonzero = find(assoc);
result = [nonzero assoc(nonzero)]'
result =
1 2 3 4 5 6 8 9 10 13 14 20
3 3 2 1 1 2 2 1 1 2 2 1

What is the simplest way to create a weight matrix bases on how frequent each element appear in the matrix?

This is the input matrix
7 9 6
8 7 9
7 6 7
Based on the frequency their appearance in the matrix (Note. these values are for explanation purpose. I didn't pre-calculate them in advance. That why I ask this question)
number frequency
6 2
7 4
8 1
9 2
and the output I expect is
4 2 2
1 4 2
4 2 4
Is there a simple way to do this?
Here's a three-line solution. First prepare the input:
X = [7 9 6;8 7 9;7 6 7];
Now do:
[a m n] = unique(X);
b = hist(X(:),a);
c = reshape(b(n),size(X));
Which gives this value for c:
4 2 2
1 4 2
4 2 4
If you also wanted the frequency matrix, you can get it with this code:
[a b']
Here is a code with for-loop (a is input matrix, freq - frequency matrix with 2 columns):
weight = zeros(size(a));
for k = 1:size(freq,1)
weight(a==freq(k,1)) = freq(k,2);
end
Maybe it can be solved without loops, but my code looks like:
M = [7 9 6 ;
8 7 9 ;
7 6 7 ;];
number = unique(M(:));
frequency = hist(M(:), number)';
map = containers.Map(number, frequency);
[height width] = size(M);
result = zeros(height, width); %allocate place
for i=1:height
for j=1:width
result(i,j) = map(M(i,j));
end
end