permutations of vectors in matrix in matlab - matlab

I have a static matrix from a file:
size(data)=[80 5]
What I want is to change position of each vector randomly
when I use perms like:
N = size(data, 1);
X = perms(1:N); % # Permutations of column indices
Y = meshgrid(1:N, 1:factorial(N)); % # Row indices
idx = (X - 1) * N + Y; % # Convert to linear indexing
C = data(idx)
But its giving me an error: Maximum variable size allowed by the program is
exceeded.
Is there any other function to give me what I need?

perms is not good for large numbers i.e any number >10
Refer the Documentation
It says
perms(v) is practical when length(v) is less than about 10.
See the size it takes from the following code:
MB(10,1) = 0;
for N = 1:10
X = perms(1:N);
dt=whos('X');
MB(N)=dt.bytes*9.53674e-7;
end
plot(1:10,MB,'r*-');
Note the sudden rise in the steepness of the curve beyond 9.

So i think im done
N = size(data, 1);
r=randperm(N);
for ii=1:80
matrix(r(ii),:) =data(ii,:) ;
end

Related

Matlab rref() function precision error after 12th column of hilbert matrices

My question may be a simple one but I could not think of a logical explanation for my question:
When I use
rref(hilb(8)), rref(hilb(9)), rref(hilb(10)), rref(hilb(11))
it gives me the result that I expected, a unit matrix.
However when it comes to the
rref(hilb(12))
it does not give a nonsingular matrix as expected. I used Wolfram and it gives the unit matrix for the same case so I am sure that it should have given a unit matrix. There may be a round off error or something like that but then 1/11 or 1/7 have also some troublesome decimals
so why does Matlab behave like this when it comes to 12?
It indeed seems like a precision error. This makes sense as the determinant of Hilbert matrix of order n tends to 0 as n tends to infinity (see here). However, you can use rref with tol parameter:
[R,jb] = rref(A,tol)
and take tol to be very small to get more precise results. For example, rref(hilb(12),1e-20)
will give you identity matrix.
EDIT- more details regarding the role of the tol parameter.
The source code of rref is provided at the bottom of the answer. The tol is used when we search for a maximal element in absolute value in a certain part of a column, to find the pivot row.
% Find value and index of largest element in the remainder of column j.
[p,k] = max(abs(A(i:m,j))); k = k+i-1;
if (p <= tol)
% The column is negligible, zero it out.
A(i:m,j) = zeros(m-i+1,1);
j = j + 1;
If all the elements are smaller than tol in absolute value, the relevant part of the column is filled by zeros. This seems to be where the precision error for rref(hilb(12)) occurs. By reducing the tol we avoid this issue in rref(hilb(12),1e-20).
source code:
function [A,jb] = rref(A,tol)
%RREF Reduced row echelon form.
% R = RREF(A) produces the reduced row echelon form of A.
%
% [R,jb] = RREF(A) also returns a vector, jb, so that:
% r = length(jb) is this algorithm's idea of the rank of A,
% x(jb) are the bound variables in a linear system, Ax = b,
% A(:,jb) is a basis for the range of A,
% R(1:r,jb) is the r-by-r identity matrix.
%
% [R,jb] = RREF(A,TOL) uses the given tolerance in the rank tests.
%
% Roundoff errors may cause this algorithm to compute a different
% value for the rank than RANK, ORTH and NULL.
%
% Class support for input A:
% float: double, single
%
% See also RANK, ORTH, NULL, QR, SVD.
% Copyright 1984-2005 The MathWorks, Inc.
% $Revision: 5.9.4.3 $ $Date: 2006/01/18 21:58:54 $
[m,n] = size(A);
% Does it appear that elements of A are ratios of small integers?
[num, den] = rat(A);
rats = isequal(A,num./den);
% Compute the default tolerance if none was provided.
if (nargin < 2), tol = max(m,n)*eps(class(A))*norm(A,'inf'); end
% Loop over the entire matrix.
i = 1;
j = 1;
jb = [];
while (i <= m) && (j <= n)
% Find value and index of largest element in the remainder of column j.
[p,k] = max(abs(A(i:m,j))); k = k+i-1;
if (p <= tol)
% The column is negligible, zero it out.
A(i:m,j) = zeros(m-i+1,1);
j = j + 1;
else
% Remember column index
jb = [jb j];
% Swap i-th and k-th rows.
A([i k],j:n) = A([k i],j:n);
% Divide the pivot row by the pivot element.
A(i,j:n) = A(i,j:n)/A(i,j);
% Subtract multiples of the pivot row from all the other rows.
for k = [1:i-1 i+1:m]
A(k,j:n) = A(k,j:n) - A(k,j)*A(i,j:n);
end
i = i + 1;
j = j + 1;
end
end
% Return "rational" numbers if appropriate.
if rats
[num,den] = rat(A);
A=num./den;
end

Count the number of unique values for each column of a submatrix in a fast manner

I have a matrix X with tens of rows and thousands of columns, all elements are categorical and re-organized to an index matrix. For example, ith column X(:,i) = [-1,-1,0,2,1,2]' is converted to X2(:,i) = ic of [x,ia,ic] = unique(X(:,i)), for convenient use of function accumarray. I randomly selected a submatrix from the matrix and counted the number of unique values of each column of the submatrix. I performed this procedure 10,000 times. I know several methods for counting number of unique values in a column, the fasted way I found so far is shown below:
mx = max(X);
for iter = 1:numperm
for j = 1:ny
ky = yrand(:,iter)==uy(j);
% select submatrix from X where all rows correspond to rows in y that y equals to uy(j)
Xk = X(ky,:);
% specify the sites where to put the number of each unique value
mxj = mx*(j-1);
mxi = mxj+1;
mxk = max(Xk)+mxj;
% iteration to count number of unique values in each column of the submatrix
for i = 1:c
pxs(mxi(i):mxk(i),i) = accumarray(Xk(:,i),1);
end
end
end
This is a way to perform random permutation test to calculate information gain between a data matrix X of size n by c and categorical variable y, under which y is randomly permutated. In above codes, all randomly permutated y are stored in matrix yrand, and the number of permutations is numperm. The unique values of y are stored in uy and the unique number is ny. In each iteration of 1:numperm, submatrix Xk is selected according to the unique element of y and number of unique elements in each column of this submatrix is counted and stored in matrix pxs.
The most time costly section in the above code is the iterations of i = 1:c for large c.
Is it possible to perform the function accumarray in a matrix manner to avoid for loop? How else can I improve the above code?
-------
As requested, a simplified test function including above codes is provided as
%% test
function test(x,y)
[r,c] = size(x);
x2 = x;
numperm = 1000;
% convert the original matrix to index matrix for suitable and fast use of accumarray function
for i = 1:c
[~,~,ic] = unique(x(:,i));
x2(:,i) = ic;
end
% get 'numperm' rand permutations of y
yrand(r, numperm) = 0;
for i = 1:numperm
yrand(:,i) = y(randperm(r));
end
% get statistic of y
uy = unique(y);
nuy = numel(uy);
% main iterations
mx = max(x2);
pxs(max(mx),c) = 0;
for iter = 1:numperm
for j = 1:nuy
ky = yrand(:,iter)==uy(j);
xk = x2(ky,:);
mxj = mx*(j-1);
mxk = max(xk)+mxj;
mxi = mxj+1;
for i = 1:c
pxs(mxi(i):mxk(i),i) = accumarray(xk(:,i),1);
end
end
end
And a test data
x = round(randn(60,3000));
y = [ones(30,1);ones(30,1)*-1];
Test the function
tic; test(x,y); toc
return Elapsed time is 15.391628 seconds. in my computer. In the test function, 1000 permutations is set. So if I perform 10,000 permutation and do some additional computations (are negligible comparing to the above code), time more than 150 s is expected. I think whether the code can be improved. Intuitively, perform accumarray in a matrix manner can save lots of time. Can I?
The way suggested by #rahnema1 has significantly improved the calculations, so I posted my answer here, as also requested by #Dev-iL.
%% test
function test(x,y)
[r,c] = size(x);
x2 = x;
numperm = 1000;
% convert the original matrix to index matrix for suitable and fast use of accumarray function
for i = 1:c
[~,~,ic] = unique(x(:,i));
x2(:,i) = ic;
end
% get 'numperm' rand permutations of y
yrand(r, numperm) = 0;
for i = 1:numperm
yrand(:,i) = y(randperm(r));
end
% get statistic of y
uy = unique(y);
nuy = numel(uy);
% main iterations
mx = max(max(x2));
% preallocation
pxs(mx*nuy,c) = 0;
% set the edges of the bin for function histc
binrg = (1:mx)';
% preallocation of the range of matrix into which the results will be stored
mxr = mx*(0:nuy);
for iter = 1:numperm
yt = yrand(:,iter);
for j = 1:nuy
pxs(mxr(j)+1:mxr(j),:) = histc(x2(yt==uy(j)),binrg);
end
end
Test results:
>> x = round(randn(60,3000));
>> y = [ones(30,1);ones(30,1)*-1];
>> tic; test(x,y); toc
Elapsed time is 15.632962 seconds.
>> tic; test(x,y); toc % using the way suggested by rahnema1, i.e., revised function posted above
Elapsed time is 2.900463 seconds.

Vectorize with Matlab Meshgrid in Chebfun

I am trying to use meshgrid in Matlab together with Chebfun to get rid of double for loops. I first define a quasi-matrix of N functions,
%Define functions of type Chebfun
N = 10; %number of functions
x = chebfun('x', [0 8]); %Domain
psi = [];
for i = 1:N
psi = [psi sin(i.*pi.*x./8)];
end
A sample calculation would be to compute the double sum $\sum_{i,j=1}^10 psi(:,i).*psi(:,j)$. I can achieve this using two for loops in Matlab,
h = 0;
for i = 1:N
for j = 1:N
h = h + psi(:,i).*psi(:,j);
end
end
I then tried to use meshgrid to vectorize in the following way:
[i j] = meshgrid(1:N,1:N);
h = psi(:,i).*psi(:,j);
I get the error "Column index must be a vector of integers". How can I overcome this issue so that I can get rid of my double for loops and make my code a bit more efficient?
BTW, Chebfun is not part of native MATLAB and you have to download it in order to run your code: http://www.chebfun.org/. However, that shouldn't affect how I answer your question.
Basically, psi is a N column matrix and it is your desire to add up products of all combinations of pairs of columns in psi. You have the right idea with meshgrid, but what you should do instead is unroll the 2D matrix of coordinates for both i and j so that they're single vectors. You'd then use this and create two N^2 column matrices that is in such a way where each column corresponds to that exact column numbers specified from i and j sampled from psi. You'd then do an element-wise multiplication between these two matrices and sum across all of the columns for each row. BTW, I'm going to use ii and jj as variables from the output of meshgrid instead of i and j. Those variables are reserved for the complex number in MATLAB and I don't want to overshadow those unintentionally.
Something like this:
%// Your code
N = 10; %number of functions
x = chebfun('x', [0 8]); %Domain
psi = [];
for i = 1:N
psi = [psi sin(i.*pi.*x./8)];
end
%// New code
[ii,jj] = meshgrid(1:N, 1:N);
%// Create two matrices and sum
matrixA = psi(:, ii(:));
matrixB = psi(:, jj(:));
h = sum(matrixA.*matrixB, 2);
If you want to do away with the temporary variables, you can do it in one statement after calling meshgrid:
h = sum(psi(:, ii(:)).*psi(:, jj(:)), 2);
I don't have Chebfun installed, but we can verify that this calculates what we need with a simple example:
rng(123);
N = 10;
psi = randi(20, N, N);
Running this code with the above more efficient solution gives us:
>> h
h =
8100
17161
10816
12100
14641
9216
10000
8649
9025
11664
Also, running the above double for loop code also gives us:
>> h
h =
8100
17161
10816
12100
14641
9216
10000
8649
9025
11664
If you want to be absolutely sure, we can have both codes run with the outputs as separate variables, then check if they're equal:
%// Setup
rng(123);
N = 10;
psi = randi(20, N, N);
%// Old code
h = 0;
for i = 1:N
for j = 1:N
h = h + psi(:,i).*psi(:,j);
end
end
%// New code
[ii,jj] = meshgrid(1:N, 1:N);
hnew = sum(psi(:, ii(:)).*psi(:, jj(:)), 2);
%// Check for equality
eql = isequal(h, hnew);
eql checks if both variables are equal, and we do get them as such:
>> eql
eql =
1

Matrices kernelpca

we are working on a project and trying to get some results with KPCA.
We have a dataset (handwritten digits) and have taken the 200 first digits of each number so our complete traindata matrix is 2000x784 (784 are the dimensions).
When we do KPCA we get a matrix with the new low-dimensionality dataset e.g.2000x100. However we don't understand the result. Shouldn;t we get other matrices such as we do when we do svd for pca? the code we use for KPCA is the following:
function data_out = kernelpca(data_in,num_dim)
%% Checking to ensure output dimensions are lesser than input dimension.
if num_dim > size(data_in,1)
fprintf('\nDimensions of output data has to be lesser than the dimensions of input data\n');
fprintf('Closing program\n');
return
end
%% Using the Gaussian Kernel to construct the Kernel K
% K(x,y) = -exp((x-y)^2/(sigma)^2)
% K is a symmetric Kernel
K = zeros(size(data_in,2),size(data_in,2));
for row = 1:size(data_in,2)
for col = 1:row
temp = sum(((data_in(:,row) - data_in(:,col)).^2));
K(row,col) = exp(-temp); % sigma = 1
end
end
K = K + K';
% Dividing the diagonal element by 2 since it has been added to itself
for row = 1:size(data_in,2)
K(row,row) = K(row,row)/2;
end
% We know that for PCA the data has to be centered. Even if the input data
% set 'X' lets say in centered, there is no gurantee the data when mapped
% in the feature space [phi(x)] is also centered. Since we actually never
% work in the feature space we cannot center the data. To include this
% correction a pseudo centering is done using the Kernel.
one_mat = ones(size(K));
K_center = K - one_mat*K - K*one_mat + one_mat*K*one_mat;
clear K
%% Obtaining the low dimensional projection
% The following equation needs to be satisfied for K
% N*lamda*K*alpha = K*alpha
% Thus lamda's has to be normalized by the number of points
opts.issym=1;
opts.disp = 0;
opts.isreal = 1;
neigs = 30;
[eigvec eigval] = eigs(K_center,[],neigs,'lm',opts);
eig_val = eigval ~= 0;
eig_val = eig_val./size(data_in,2);
% Again 1 = lamda*(alpha.alpha)
% Here '.' indicated dot product
for col = 1:size(eigvec,2)
eigvec(:,col) = eigvec(:,col)./(sqrt(eig_val(col,col)));
end
[~, index] = sort(eig_val,'descend');
eigvec = eigvec(:,index);
%% Projecting the data in lower dimensions
data_out = zeros(num_dim,size(data_in,2));
for count = 1:num_dim
data_out(count,:) = eigvec(:,count)'*K_center';
end
we have read lots of papers but still cannot get the hand of kpca's logic!
Any help would be appreciated!
PCA Algorithm:
PCA data samples
Compute mean
Compute covariance
Solve
: Covariance matrix.
: Eigen Vectors of covariance matrix.
: Eigen values of covariance matrix.
With the first n-th eigen vectors you reduce the dimensionality of your data to the n dimensions. You can use this code for the PCA, it has an integraded example and it is simple.
KPCA Algorithm:
We choose a kernel function in you code this is specified by:
K(x,y) = -exp((x-y)^2/(sigma)^2)
in order to represent your data in a high dimensional space hopping that, in this space your data will be well represented for further porposes like classification or clustering whereas this task could be harder to be solved in the initial feature space. This trick is aslo known as "Kernel trick". Look figure.
[Step1] Constuct gram matrix
K = zeros(size(data_in,2),size(data_in,2));
for row = 1:size(data_in,2)
for col = 1:row
temp = sum(((data_in(:,row) - data_in(:,col)).^2));
K(row,col) = exp(-temp); % sigma = 1
end
end
K = K + K';
% Dividing the diagonal element by 2 since it has been added to itself
for row = 1:size(data_in,2)
K(row,row) = K(row,row)/2;
end
Here because the gram matrix is symetric the half of the values are computed and the final result is obtained by adding the computed so far gram matrix and its transpose. Finally, we divide by 2 as the comments mention.
[Step2] Normalize the kernel matrix
This is done by this part of your code:
K_center = K - one_mat*K - K*one_mat + one_mat*K*one_mat;
As the comments mention a pseudocentering procedure must be done. For an idea about the proof here.
[Step3] Solve the eigenvalue problem
For this task this part of the code is responsible.
%% Obtaining the low dimensional projection
% The following equation needs to be satisfied for K
% N*lamda*K*alpha = K*alpha
% Thus lamda's has to be normalized by the number of points
opts.issym=1;
opts.disp = 0;
opts.isreal = 1;
neigs = 30;
[eigvec eigval] = eigs(K_center,[],neigs,'lm',opts);
eig_val = eigval ~= 0;
eig_val = eig_val./size(data_in,2);
% Again 1 = lamda*(alpha.alpha)
% Here '.' indicated dot product
for col = 1:size(eigvec,2)
eigvec(:,col) = eigvec(:,col)./(sqrt(eig_val(col,col)));
end
[~, index] = sort(eig_val,'descend');
eigvec = eigvec(:,index);
[Step4] Change representaion of each data point
For this task this part of the code is responsible.
%% Projecting the data in lower dimensions
data_out = zeros(num_dim,size(data_in,2));
for count = 1:num_dim
data_out(count,:) = eigvec(:,count)'*K_center';
end
Look the details here.
PS: I encurage you to use code written from this author and contains intuitive examples.

Generate every binary n x m matrix in matlab

I'd like to generate every boolean matrix in matlab as a 3-dimensional array.
For example:
mat(:,:,1) = [[1 0][0 1]]
mat(:,:,2) = [[1 1][0 1]]
...
My final goal is to generate every trinary matrix of a given size.
Keep in mind that I know that the number of matrices is exponential, but I'll use small numbers.
Not sure that the previous answer actually does what you want... With that method, I get multiple entries in array2D that are the same. Here is a vectorised and (I believe) correct solution:
clear all;
nRows = 2;
nCols = nRows; % Only works for square matrices
% Generate matrix of all binary numbers that fit in nCols
max2Pow = nCols;
maxNum = 2 ^ max2Pow - 1;
allBinCols = bsxfun(#bitand, (0:maxNum)', 2.^((max2Pow-1):-1:0)) > 0;
% Get the indices of the rows in this matrix required for each output
% binary matrix
N = size(allBinCols, 1);
A = repmat({1:N}, nCols, 1);
[B{1:nCols}] = ndgrid(A{:});
rowInds = reshape(cat(3, B{:}), [], nCols)';
% Get the appropriate rows and reshape to a 3D array of right size
nMats = size(rowInds, 2);
binMats = reshape(allBinCols(rowInds(:), :)', nRows, nCols, nMats)
Note that for anything other than small numbers of nRows you will run out of memory pretty quickly, because you're generating 2^(nRows*nRows) matrices of size nRows*nRows. ThatsAlottaNumbers.
Actually the answer is pretty straightforward. Each matrix being boolean, it can be indexed by the binary number obtained when reading all the values in any given order.
For a binary matrix: let n be the number of element in your matrix (n = rows * cols).
for d=0:(2^n-1)
%Convert binary to decimal string
str = dec2bin(d);
%Convert string to array
array1D = str - '0';
array1D = [array1D zeros(1, n-length(array1D))];
%Reshape
array2D(:,:,d+1) = reshape(array1D, rows, cols);
end
This can be very easily generalized to any base by changing dec2bin into dec2base and changing 2^n into (yourbase)^n.