Passing values to a sparse matrix in MATLAB - matlab

Might sound too simple to you but I need some help in regrad to do all folowings in one shot instead of defining redundant variables i.e. tmp_x, tmp_y:
X= sparse(numel(find(G==0)),2);
[tmp_x, temp_y] = ind2sub(size(G), find(G == 0));
X(:)=[tmp_x, tmp_y];
(More info: G is a sparse matrix)
I tried:
X(:)=ind2sub(size(G), find(G == 0));
but that threw an error.
How can I achieve this without defining tmp_x, tmp_y?

A couple of comments with your code:
numel(find(G == 0)) is probably one of the worst ways to determine how many entries that are zero in your matrix. I would personally do numel(G) - nnz(G). numel(G) determines how many elements are in G and nnz(G) determines how many non-zero values are in G. Subtracting these both would give you the total number of elements that are zero.
What you are doing is first declaring X to be sparse... then when you're doing the final assignment in the last line to X, it reconverts the matrix to double. As such, the first statement is totally redundant.
If I understand what you are doing, you want to find the row and column locations of what is zero in G and place these into a N x 2 matrix. Currently with what MATLAB has available, this cannot be done without intermediate variables. The functions that you'd typically use (find, ind2sub, etc.) require intermediate variables if you want to capture the row and column locations. Using one output variable will give you the column locations only.
You don't have a choice but to use intermediate variables. However, if you want to make this more efficient, you don't even need to use ind2sub. Just use find directly:
[I,J] = find(~G);
X = [I,J];

Related

How to perform operations along a certain dimension of an array?

I have a 3D array containing five 3-by-4 slices, defined as follows:
rng(3372061);
M = randi(100,3,4,5);
I'd like to collect some statistics about the array:
The maximum value in every column.
The mean value in every row.
The standard deviation within each slice.
This is quite straightforward using loops,
sz = size(M);
colMax = zeros(1,4,5);
rowMean = zeros(3,1,5);
sliceSTD = zeros(1,1,5);
for indS = 1:sz(3)
sl = M(:,:,indS);
sliceSTD(indS) = std(sl(1:sz(1)*sz(2)));
for indC = 1:sz(1)
rowMean(indC,1,indS) = mean(sl(indC,:));
end
for indR = 1:sz(2)
colMax(1,indR,indS) = max(sl(:,indR));
end
end
But I'm not sure that this is the best way to approach the problem.
A common pattern I noticed in the documentation of max, mean and std is that they allow to specify an additional dim input. For instance, in max:
M = max(A,[],dim) returns the largest elements along dimension dim. For example, if A is a matrix, then max(A,[],2) is a column vector containing the maximum value of each row.
How can I use this syntax to simplify my code?
Many functions in MATLAB allow the specification of a "dimension to operate over" when it matters for the result of the computation (several common examples are: min, max, sum, prod, mean, std, size, median, prctile, bounds) - which is especially important for multidimensional inputs. When the dim input is not specified, MATLAB has a way of choosing the dimension on its own, as explained in the documentation; for example in max:
If A is a vector, then max(A) returns the maximum of A.
If A is a matrix, then max(A) is a row vector containing the maximum value of each column.
If A is a multidimensional array, then max(A) operates along the first array dimension whose size does not equal 1, treating the elements as vectors. The size of this dimension becomes 1 while the sizes of all other dimensions remain the same. If A is an empty array whose first dimension has zero length, then max(A) returns an empty array with the same size as A.
Then, using the ...,dim) syntax we can rewrite the code as follows:
rng(3372061);
M = randi(100,3,4,5);
colMax = max(M,[],1);
rowMean = mean(M,2);
sliceSTD = std(reshape(M,1,[],5),0,2); % we use `reshape` to turn each slice into a vector
This has several advantages:
The code is easier to understand.
The code is potentially more robust, being able to handle inputs beyond those it was initially designed for.
The code is likely faster.
In conclusion: it is always a good idea to read the documentation of functions you're using, and experiment with different syntaxes, so as not to miss similar opportunities to make your code more succinct.

(matlab matrix operation), Is it possible to get a group of value from matrix without loop?

I'm currently working on implementing a gradient check function in which it requires to get certain index values from the result matrix. Could someone tell me how to get a group of values from the matrix?
To be specific, for a result matrx res with size M x N, I'll need to get element res(3,1), res(4,2), res(1,3), res(2,4)...
In my case, M is dimension and N is batch size and there's a label array whose size is 1xbatch_size, [3 4 1 2...]. So the desired values are res(label(:),1:batch_size). Since I'm trying to practice vectorization programming and it's better not using loop. Could someone tell me how to get a group of value without a iteration?
Cheers.
--------------------------UPDATE----------------------------------------------
The only idea I found is firstly building a 'mask matrix' then use the original result matrix to do element wise multiplication (technically called 'Hadamard product', see in wiki). After that just get non-zero element out and do the sum operation, the code in matlab should look like:
temp=Mask.*res;
desired_res=temp(temp~=0); %Note: the temp(temp~=0) extract non-zero elements in a 'column' fashion: it searches temp matrix column by column then put the non-zero number into container 'desired_res'.
In my case, what I wanna do next is simply sum(desired_res) so I don't need to consider the order of those non-zero elements in 'desired_res'.
Based on this idea above, creating mask matrix is the key aim. There are two methods to do this job.
Codes are shown below. In my case, use accumarray function to add '1' in certain location (which are stored in matrix 'subs') and add '0' to other space. This will give you a mask matrix size [rwo column]. The usage of full(sparse()) is similar. I made some comparisons on those two methods (repeat around 10 times), turns out full(sparse) is faster and their time costs magnitude is 10^-4. So small difference but in a large scale experiments, this matters. One benefit of using accumarray is that it could define the matrix size while full(sparse()) cannot. The full(sparse(subs, 1)) would create matrix with size [max(subs(:,1)), max(subs(:,2))]. Since in my case, this is sufficient for my requirement and I only know few of their usage. If you find out more, please share with us. Thanks.
The detailed description of those two functions could be found on matlab's official website. accumarray and full, sparse.
% assume we have a label vector
test_labels=ones(10000,1);
% method one, accumarray(subs,1,[row column])
tic
subs=zeros(10000,2);
subs(:,1)=test_labels;
subs(:,2)=1:10000;
k1=accumarray(subs,1,[10, 10000]);
t1=toc % to compare with method two to check which one is faster
%method two: full(sparse(),1)
tic
k2=full(sparse(test_labels,1:10000,1));
t2=toc

MATLAB spending an incredible amount of time writing a relatively small matrix

I have a small MATLAB script (included below) for handling data read from a CSV file with two columns and hundreds of thousands of rows. Each entry is a natural number, with zeros only occurring in the second column. This code is taking a truly incredible amount of time (hours) to run what should be achievable in at most some seconds. The profiler identifies that approximately 100% of the run time is spent writing a matrix of zeros, whose size varies depending on input, but in all usage is smaller than 1000x1000.
The code is as follows
function [data] = DataHandler(D)
n = size(D,1);
s = max(D,1);
data = zeros(s,s);
for i = 1:n
data(D(i,1),D(i,2)+1) = data(D(i,1),D(i,2)+1) + 1;
end
It's the data = zeros(s,s); line that takes around 100% of the runtime. I can make the code run quickly by just changing out the s's in this line for 1000, which is a sufficient upper bound to ensure it won't run into errors for any of the data I'm looking at.
Obviously there're better ways to do this, but being that I just bashed the code together to quickly format some data I wasn't too concerned. As I said, I fixed it by just replacing s with 1000 for my purposes, but I'm perplexed as to why writing that matrix would bog MATLAB down for several hours. New code runs instantaneously.
I'd be very interested if anyone has seen this kind of behaviour before, or knows why this would be happening. Its a little disconcerting, and it would be good to be able to be confident that I can initialize matrices freely without killing MATLAB.
Your call to zeros is incorrect. Looking at your code, D looks like a D x 2 array. However, your call of s = max(D,1) would actually generate another D x 2 array. By consulting the documentation for max, this is what happens when you call max in the way you used:
C = max(A,B) returns an array the same size as A and B with the largest elements taken from A or B. Either the dimensions of A and B are the same, or one can be a scalar.
Therefore, because you used max(D,1), you are essentially comparing every value in D with the value of 1, so what you're actually getting is just a copy of D in the end. Using this as input into zeros has rather undefined behaviour. What will actually happen is that for each row of s, it will allocate a temporary zeros matrix of that size and toss the temporary result. Only the dimensions of the last row of s is what is recorded. Because you have a very large matrix D, this is probably why the profiler hangs here at 100% utilization. Therefore, each parameter to zeros must be scalar, yet your call to produce s would produce a matrix.
What I believe you intended should have been:
s = max(D(:));
This finds the overall maximum of the matrix D by unrolling D into a single vector and finding the overall maximum. If you do this, your code should run faster.
As a side note, this post may interest you:
Faster way to initialize arrays via empty matrix multiplication? (Matlab)
It was shown in this post that doing zeros(n,n) is in fact slow and there are several neat tricks to initializing an array of zeros. One way is to accomplish this by empty matrix multiplication:
data = zeros(n,0)*zeros(0,n);
One of my personal favourites is that if you assume that data was not declared / initialized, you can do:
data(n,n) = 0;
If I can also comment, that for loop is quite inefficient. What you are doing is calculating a 2D histogram / accumulation of data. You can replace that for loop with a more efficient accumarray call. This also avoids allocating an array of zeros and accumarray will do that under the hood for you.
As such, your code would basically become this:
function [data] = DataHandler(D)
data = accumarray([D(:,1) D(:,2)+1], 1);
accumarray in this case will take all pairs of row and column coordinates, stored in D(i,1) and D(i,2) + 1 for i = 1, 2, ..., size(D,1) and place all that match the same row and column coordinates into a separate 2D bin, we then add up all of the occurrences and the output at this 2D bin gives you the total tally of how many values at this 2D bin which corresponds to the row and column coordinate of interest mapped to this location.

What's an appropriate data structure for a matrix with random variable entries?

I'm currently working in an area that is related to simulation and trying to design a data structure that can include random variables within matrices. To motivate this let me say I have the following matrix:
[a b; c d]
I want to find a data structure that will allow for a, b, c, d to either be real numbers or random variables. As an example, let's say that a = 1, b = -1, c = 2 but let d be a normally distributed random variable with mean 0 and standard deviation 1.
The data structure that I have in mind will give no value to d. However, I also want to be able to design a function that can take in the structure, simulate a uniform(0,1), obtain a value for d using an inverse CDF and then spit out an actual matrix.
I have several ideas to do this (all related to the MATLAB icdf function) but would like to know how more experienced programmers would do this. In this application, it's important that the structure is as "lean" as possible since I will be working with very very large matrices and memory will be an issue.
EDIT #1:
Thank you all for the feedback. I have decided to use a cell structure and store random variables as function handles. To save some processing time for large scale applications, I have decided to reference the location of the random variables to save time during the "evaluation" part.
One solution is to create your matrix initially as a cell array containing both numeric values and function handles to functions designed to generate a value for that entry. For your example, you could do the following:
generatorMatrix = {1 -1; 2 #randn};
Then you could create a function that takes a matrix of the above form, evaluates the cells containing function handles, then combines the results with the numeric cell entries to create a numeric matrix to use for further calculations:
function numMatrix = create_matrix(generatorMatrix)
index = cellfun(#(c) isa(c,'function_handle'),... %# Find function handles
generatorMatrix);
generatorMatrix(index) = cellfun(#feval,... %# Evaluate functions
generatorMatrix(index),...
'UniformOutput',false);
numMatrix = cell2mat(generatorMatrix); %# Change from cell to numeric matrix
end
Some additional things you can do would be to use anonymous functions to do more complicated things with built-in functions or create cell entries of varying size. This is illustrated by the following sample matrix, which can be used to create a matrix with the first row containing a 5 followed by 9 ones and the other 9 rows containing a 1 followed by 9 numbers drawn from a uniform distribution between 5 and 10:
generatorMatrix = {5 ones(1,9); ones(9,1) #() 5*rand(9)+5};
And each time this matrix is passed to create_matrix it will create a new 10-by-10 matrix where the 9-by-9 submatrix will contain a different set of random values.
An alternative solution...
If your matrix can be easily broken into blocks of submatrices (as in the second example above) then using a cell array to store numeric values and function handles may be your best option.
However, if the random values are single elements scattered sparsely throughout the entire matrix, then a variation similar to what user57368 suggested may work better. You could store your matrix data in three parts: a numeric matrix with placeholders (such as NaN) where the randomly-generated values will go, an index vector containing linear indices of the positions of the randomly-generated values, and a cell array of the same length as the index vector containing function handles for the functions to be used to generate the random values. To make things easier, you can even store these three pieces of data in a structure.
As an example, the following defines a 3-by-3 matrix with 3 random values stored in indices 2, 4, and 9 and drawn respectively from a normal distribution, a uniform distribution from 5 to 10, and an exponential distribution:
matData = struct('numMatrix',[1 nan 3; nan 2 4; 0 5 nan],...
'randIndex',[2 4 9],...
'randFcns',{{#randn , #() 5*rand+5 , #() -log(rand)/2}});
And you can define a new create_matrix function to easily create a matrix from this data:
function numMatrix = create_matrix(matData)
numMatrix = matData.numMatrix;
numMatrix(matData.randIndex) = cellfun(#feval,matData.randFcns);
end
If you were using NumPy, then masked arrays would be the obvious place to start, but I don't know of any equivalent in MATLAB. Cell arrays might not be compact enough, and if you did use a cell array, then you would have to come up with an efficient way to find the non-real entries and replace them with a sample from the right distribution.
Try using a regular or sparse matrix to hold the real values, and leave it at zero wherever you want a random variable. Then alongside that store a sparse matrix of the same shape whose non-zero entries correspond to the random variables in your matrix. If you want, the value of the entry in the second matrix can be used to indicate which distribution (ie. 1 for uniform, 2 for normal, etc.).
Whenever you want to get a purely real matrix to work with, you iterate over the non-zero values in the second matrix to convert them to samples, and then add that matrix to your first.

MATLAB: What's [Y,I]=max(AS,[],2);?

I just started matlab and need to finish this program really fast, so I don't have time to go through all the tutorials.
can someone familiar with it please explain what the following statement is doing.
[Y,I]=max(AS,[],2);
The [] between AS and 2 is what's mostly confusing me. And is the max value getting assigned to both Y and I ?
According to the reference manual,
C = max(A,[],dim) returns the largest elements along the dimension of A specified by scalar dim. For example, max(A,[],1) produces the maximum values along the first dimension (the rows) of A.
[C,I] = max(...) finds the indices of the maximum values of A, and returns them in output vector I. If there are several identical maximum values, the index of the first one found is returned.
I think [] is there just to distinguish itself from max(A,B).
C = max(A,[],dim) returns the largest elements along the dimension of A specified by scalar dim. For example, max(A,[],1) produces the maximum values along the first dimension (the rows) of A.
Also, the [C, I] = max(...) form gives you the maximum values in C, and their indices (i.e. locations) in I.
Why don't you try an example, like this? Type it into MATLAB and see what you get. It should make things much easier to see.
m = [[1;6;2] [5;8;0] [9;3;5]]
max(m,[],2)
AS is matrix.
This will return the largest elements of AS in its 2nd dimension (i.e. its columns)
This function is taking AS and producing the maximum value along the second dimension of AS. It returns the max value 'Y' and the index of it 'I'.
note the apparent wrinkle in the matlab convention; there are a number of builtin functions which have signature like:
xs = sum(x,dim)
which works 'along' the dimension dim. max and min are the oddbal exceptions:
xm = max(x,dim); %this is probably a silent semantical error!
xm = max(x,[],dim); %this is probably what you want
I sometimes wish matlab had a binary max and a collapsing max, instead of shoving them into the same function...