I am working with largish binary matrices, at the moment up to 100x100.
Lets say I am working with 30x30 binary matrices. Then there are a total of 2^(30x30) binary matrices. I want to select a binary matrix at random, where each of the 2^(30x30) matrices has the same probability of being selected.
My solution attempt was to pick a number between 1 and 2^(30x30) using the function randi(n) with n = 2^(30x30) and then converting the result to the appropriate binary matrix. The problem I ran into was that randi(n) does not take values for n larger than 2^54. Matlab in general does not seem to like very large numbers.
Any suggestions?
If each matrix of booleans has equal probability, then the elements of the matrix each have equal probability of 0 and 1. You can just fill a matrix of the appropriate size with n² uniform random booleans.
I don't have MATLAB handy, but in Octave you'd do something like unidrnd(2, n, n) - 1.
You can use randint in the range [0 1]:
matrix=randint(30,30,[0 1]);
You can also use rand and threshold the resulting matrix:
matrix=rand(30,30);
matrix=round(matrix);
EDIT: just realized it also works with randi with the following syntax:
matrix=randi([0 1],30,30);
Related
I am using this function X = randsrc(250,600,[[-1,0,1];[0.5/ps,1-1/ps,0.5/ps]])) with ps=2373 It shows that 250*600 matrix is generated. Its entries only contain -1,0 or 1. And -1,0,1 is randomly choosed according to the probability distribution 0.5/ps,1-1/ps,0.5/ps.
So that the density is about 0.00042.
The above X is called sparse random projection matrix, see https://web.stanford.edu/~hastie/Papers/Ping/KDD06_rp.pdf. It can be used to compress a data vector from dimension 600 to 250 with some nice geometric properties guaranteed.
The problem is that in Matlab, randsrc seems to be very slow (e.g., compared with randn(250,600)). Then, how can I fast generate the above matrix?
BTW, how can I fast calculate X*y? where y may be a dense vector.
My code is:
ps=2373;
tic;
X = randsrc(250,600,[[-1,0,1];[0.5/ps,1-1/ps,0.5/ps]]));
toc
a = randn(600,1);
tic;
X*a;
toc
Also, I have tried a same Python function http://scikit-learn.org/stable/modules/generated/sklearn.random_projection.SparseRandomProjection.html, it is twice faster than Matlab.
You can use sprand to generate a sparsity structure, then find to extract the rows and columns of the non-zero elements. Finally randsample will select values -1,1 with 50% probability of each:
ps=2373;
tic
[i,j,~] = find(sprand(250,600,1/ps))
X = sparse(i,j,randsample([-1,1],length(i),true))
toc
MATLAB is very fast at multiplying matrices so X*a is very fast.
I have simulation data in a vector of size 50,000 x 1, which has NaNs and non-NaNs. I would like to average the non-NaNs, but the function nanmean returns NAN. I have tried removing the NANs, but I only get a vector of zeros. Visual inspection of the vector leads me to doubt that the true mean of this vector is really NaN.
Also, I would like to use this vector to compute covariance with several other vectors (at some point). My alternative is doing this in Excel, which would be painful.
Any thoughts?
Thank you
Let's say your data in stored in a vector A, you can take the mean of the vector excluding the NaNs as well as any Inf and -Inf values via:
meanA = mean( A(isfinite(A)) );
Assuming you have a vector that only contains finite numeric values, and a NaN here and there, the solution is very simple
nanmean(A)
This should only bring trouble if there are non finite values in your vector.
In this case you could filter them out as suggested by #Ryan, but then you need to realize that you are not actually calculating the mean of the vector.
Ask yourself whether you may instead be interested in something like
nanmedian(A)
About the calculation of covariances and the likes, assuming you have vectors v and w, then I would recommend you to do something like this:
idx = isfinite(v) & isfinite(w);
cov(v(idx),w(idx))
I want to generate 100x1 matrix with 3 numbers -1,1 and 0. I want to be able to control how much of 1's and -1's are assigned. I tried using
Y = rand(10,1)<0.1
but this only gives me 0's an 1's. But I am able to control the number of 1's in the matrix . Is there a similar type of function that I can use for adding and controlling the number of -1 and 1's along with the default 0. Sorry I am new matlab env.
Thanks
Start by initializing your array:
x = [-1*ones(30,1); zeros(25,1);ones(45,1)];
then use matlab's wonderful indexing with randperm:
y= x(randperm(100));
plot (y, 'o')
I try to calculate the correlation matrix of a set of histogram vectors. But the result is a truncated version of what I (think) I want. I have 200 histograms by 32 bins each. The result from
correlation_matrix = corrcoef(set_of_histograms)
is a 32 by 32 matrix.
I want to use this to calculate how my original histograms match up. (this by later using eigs and other stuff).
But which correlation method is right for this? I have tried "corrcoef" but there are "corr" and "cov" as well. Can't understand their differences by reading matlab help...
correlation_matrix = corrcoef(set_of_histograms')
(Note the ')
1) corrcoef treats every column as an observation, and calculates the correlations between each pair. I'm assuming your histograms matrix is 200x32; hence, in your case, every row is an observation. If you transpose your histograms matrix before running corrcoef, you should get the 200x200 result you're looking for:
[rho, p] = corrcoef( set_of_histograms' );
(' transposes the matrix)
2) cov returns the covariance matrix, not the correlation; while the covariance matrix is used in calculating the correlation, it is not the measure you're looking for.
3) As for corr and corrcoef, they have a few implementation differences between them. As long as you are only interested in Pearson's correlation, they are identical for your purposes. corr also has an option to calculate Spearman's or Kendall's correlations, which corrcoef does not have.
I have a requirement for the generation of a given number N of vectors of given size each consistent of a uniform distribution of 0s and 1s.
This is what I am doing at the moment, but I noticed that the distribution is strongly peaked at half 1s and half 0s, which is no good for what I am doing:
a = randint(1, sizeOfVector, [0 1]);
The unifrnd function looks promising for what I need, but I can't manage to understand how to output a binary vector of that size.
Is it the case that I can use the unifrnd function (and if so how would be appreciated!) or can is there any other more convenient way to obtain such a set of vectors?
Any help appreciated!
Note 1: just to be sure - here's what the requirement actually says:
randomly choose N vectors of given size that are
uniformly distributed over [0;1]
Note 2: I am generating initial configurations for Cellular Automata, that's why I can have only binary values [0;1].
To generate 1 random vector with elements from {0, 1}:
unidrnd(2,sizeOfVector,1)-1
The other variants are similar.
If you want to get uniformly distributed 0's and 1's, you want to use randi. However, from the requirement, I'd think that the vectors can have real values, for which you'd use rand
%# create a such that each row contains a vector
a = randi(1,N,sizeOfVector); %# for uniformly distributed 0's and 1's
%# if you want, say, 60% 1's and 40% 0's, use rand and threshold
a = rand(N,sizeOfVector) > 0.4; %# maybe you need to call double(a) if you don't want logical output
%# if you want a random number of 1's and 0's, use
a = rand(N,sizeOfVector) > rand(1);