Repeating rows of matrix in MATLAB - matlab

I have a question that is related to this post: "Cloning" row or column vectors. I tried to work around the answers posted there, yet failed to apply them to my problem.
In my case, I'd like to "clone" each row row of a matrix by converting a matrix like
A = [1,2; 3, 4; 5, 6]
into the matrix
B = [1, 2
1, 2
3, 4
3, 4
5, 6
5, 6]
by repeating each row of A a number of times.
So far, I was able to work with repmat for a single row like
A = [1, 2];
B = repmat(A, 2, 1)
>> B = [1, 2
1, 2]
I was trying to build a loop using that formula, in order to obtain the matrix wanted. The loop looked like
T = 3; N = 2;
for t = 1:T
for I = 1:N
B = repmat(C, 21, 1)
end
end
Has anyone an idea how to correctly write the loop, or a better way to do this?

kron
There are a few ways you can do this. The shortest way would be to use the kron function as suggested by Adiel in the comments.
A = [1,2; 3, 4; 5, 6];
B = kron(A, [1;1]);
Note that the number of elements in the ones vector controls how many times each row is duplicated. For n times, use kron(A, ones(n,1)).
kron calculates the kronecker tensor product, which is not necessarily a fast process, nor is it intuitive to understand, but it does give the right result!
reshape and repmat
A more understandable process might involve a combination of reshape and repmat. The aim is to reshape the matrix into a row vector, repeat it the desired number of times, then reshape it again to regain the two-column matrix.
B = reshape(repmat(reshape(A, 1, []), 2, 1), [], 2);
Note that the 2 within the repmat function controls how many times each row is duplicated. For n times, use reshape(repmat(reshape(A, 1, []), n, 1), [], 2).
Speed
A quick benchmark can be written:
% Setup, using a large A
A = rand(1e5, 2);
f = #() kron(A, [1;1]);
g = #() reshape(repmat(reshape(A, 1, []), 2, 1), [], 2);
% timing
timeit(f);
timeit(g);
Output:
kron option: 0.0016622 secs
repmat/reshape option: 0.0012831 secs
Extended benchmark over different sizes:
Summary:
the reshape option is quicker (~25%) for just duplicating the rows once each, so you should go for this option if you want to end up with 2 of each row for a large matrix.
the reshape option appears to have complexity O(n) for the number of row repetitions. kron has some initial overhead, but is much quicker when you want many repetitions and hardly slows down because of them! Go for the kron method if you are doing more than a few repetitions.

Related

How to vectorize Matlab Code with mvnpdf in?

I have some working code in matlab, and speed is vital. I have vectorized/optimized many parts of it, and the profiler now tells me that the most time is spent a short piece of code. For this,
I have some parameter sets for a multi-variate normal
distribution.
I then have to get the value from the corresponding PDF at some point
pos,
and multiply it by some other value stored in a vector.
I have produced a minimal working example below:
num_params = 1000;
prob_dist_params = repmat({ [1, 2], [10, 1; 1, 5] }, num_params, 1);
saved_nu = rand( num_params, 1 );
saved_pos = rand( num_params, 2 );
saved_total = 0;
tic()
for param_counter = 1:size(prob_dist_params)
% Evaluate the PDF at specified points
pdf_vals = mvnpdf( saved_pos(param_counter,:), prob_dist_params{param_counter,1}, prob_dist_params{param_counter, 2} );
saved_total = saved_total + saved_nu(param_counter)*pdf_vals;
end % End of looping over parameters
toc()
I am aware that prob_dist_params are all the same in this case, but in my code we have each element of this different depending on a few things upstream. I call this particular piece of code many tens of thousands of time in my full program, so am wondering if there is anything at all I can do to vectorize this loop, or failing that, speed it up at all? I do not know how to do so with the inclusion of a mvnpdf() function.
Yes you can, however, I don't think it will give you a huge performance boost. You will have to reshape your mu's and sigma's.
Checking the doc of mvnpdf(X,mu,sigma), you see that you will have to provide X and mu as n-by-d numeric matrix and sigma as d-by-d-by-n.
In your case, d is 2 and n is 1000. You have to split the cell array in two matrices, and reshape as follows:
prob_dist_mu = cell2mat(prob_dist_params(:,1));
prob_dist_sigma = cell2mat(permute(prob_dist_params(:,2),[3 2 1]));
With permute, I make the first dimension of the cell array the third dimension, so cell2mat will result in a 2-by-2-by-1000 matrix. Alternatively you can define them as follows,
prob_dist_mu = repmat([1 2], [num_params 1]);
prob_dist_sigma = repmat([10, 1; 1, 5], [1 1 num_params]);
Now call mvnpdf with
pdf_vals = mvnpdf(saved_pos, prob_dist_mu, prob_dist_sigma);
saved_total = saved_nu.'*pdf_vals; % simple dot product

MATLAB: Rearranging feature matrix

I have a feature matrix of size ~1M x 3 where the columns are doc#,wordID#,wordcount
What's a fast way in Matlab to rearrange this feature matrix so it is instead of size #docs x # unique words i.e.
(length(unique(featurematrix(:,1))) x length(unique(featurematrix(:,2)))
so that each row instead represents an entire document, each column represents a different word, and the values are the wordcounts from the 3rd column of the original matrix?
I started writing a bunch of loops, but had the feeling there's probably some short idiomatic way to do this already built-in to Matlab.
You can actually use accumarray to accomplish this
data = [1, 1, 1;
1, 2, 2;
1, 5, 3;
2, 1, 4;
2, 3, 5];
result = accumarray(data(:,1:2), data(:,3))
% 1 2 0 0 3
% 4 0 5 0 0
Alternately you could use sparse
result = full(sparse(data(:,1), data(:,2), data(:,3)))

How can I find each max element of three matrices as new matrix?

Maybe the question is a little bit confused, I'll make an example below.
Let's say I have a 3 matrices a, b, c with same size.
a = [2, 5; 6, 9];
b = [3, 3; 8, 1];
c = [5, 5; 2, 7];
How can I get the new matrix max with each max element in all three matrices?
max = [5, 5; 8, 9]
I know I could create logical matrix like a>b and then do the math, calc it out, is there any other more efficient way to do it?
You can concatenate the matrices into one 2x2x3 matrix using
d=cat(3,a,b,c)
and then use max-function to get your desired output:
maxValues=max(d,[],3)
The 3rd input to max defines along which dimension of the first input you want to find the maximum value.

Creating a vector with random sampling of two vectors in matlab

How does one create a vector that is composed of a random sampling of two other vectors?
For example
Vector 1 [1, 3, 4, 7], Vector 2 [2, 5, 6, 8]
Random Vector [random draw from vector 1 or 2 (value 1 or 2), random draw from vector 1 or 2 (value 3 or 5)... etc]
Finally, how can one ask matlab to repeat this process n times to draw a distribution of results?
Thank you,
There are many ways you could do this. One possibility is:
tmp=round(rand(size(vector1)))
res = tmp.*vector1 + (1-tmp).*vector2
To get one mixed sample, you may use the idea of the following code snippet (not the optimal one, but maybe clear enough):
a = [1, 3, 4, 7];
b = [2, 5, 6, 8];
selector = randn(size(a));
sample = a.*(selector>0) + b.*(selector<=0);
For n samples put the above code in a for loop:
for k=1:n
% Sample code (without initial "samplee" assignments)
% Here do stuff with the sample
end;
More generally, if X is a matrix and for each row you want to take a sample from a column chosen at random, you can do this with a loop:
y = zeros(size(X,1),1);
for ii = 1:size(X,1)
y(ii) = X(ii,ceil(rand*size(X,2)));
end
You can avoid the loop using clever indexing via sub2ind:
idx_n = ceil(rand(size(X,1),1)*size(X,2));
idx = sub2ind(size(X),(1:size(X,1))',idx_n);
y = X(idx);
If I understand your question, you are choosing two random numbers. First you decide whether to select vector 1 or vector 2; next you pick an element from the chosen vector.
The following code takes advantage of the fact that vector1 and vector2 are the same length:
N = 1000;
sampleMatrix = [vector1 vector2];
M = numel(sampleMatrix);
randIndex = ceil(rand(1,N)*M); % N random numbers from 1 to M
randomNumbers = sampleMatrix(randIndex); % sample N times from the matrix
You can then display the result with, for instance
figure; hist(randomNumbers); % draw a histogram of numbers drawn
When vector1 and vector2 have different elements, you run into a problem. If you concatenate them, you will end up picking elements from the longer vector more often. One way around this is to create random samplings from both arrays, then choose between them:
M1 = numel(vector1);
M2 = numel(vector2);
r1 = ceil(rand(1,N)*M1);
r2 = ceil(rand(1,N)*M2);
randMat = [vector1(r1(:)) vector2(r2(:))]; % two columns, now pick one or the other
randPick = ceil(rand(1,N)*2);
randomNumbers = [randMat(randPick==1, 1); randMat(randPick==2, 2)];
On re-reading, maybe you just want to pick "element 1 from either 1 or 2", then "element 2 from either 1 or 2", etc for all the elements of the vector. In that case, do
N=numel(vector1);
randPick = ceil(rand(1,N)*2);
randMat=[vector1(:) vector2(:)];
randomNumbers = [randMat(randPick==1, 1); randMat(randPick==2, 2)];
This problem can be solved using the function datasample.
Combine both vectors into one and apply the function. I like this approach more than the handcrafted versions in the other answers. It gives you much more flexibility in choosing what you actually want, while being a one-liner.

Based on a matrix A, generate a matrix B with all possible multiplications of the columns of Matrix A

Say I have a matrix with 3 columns: c1, c2,c3, and I want to create a new matrix in which each column is any possible product of two of the columns of this matrix.
So, if I had a matrix with d columns, I would like to create a new matrix with d+d(d-1)/2+d columns. For example, consider the matrix with 3 columns c1, c2,c3. The matrix that I would like to create should have the columns c1, c2,c3, c1xc2, c2xc3,c1xc3, c1^2, c2^2 and c3^2.
Is there any efficient way to do this?
I'm embarrassed to post this - I'm sure there must be a simpler way (there is a MUCH simpler way - see my december update at the bottom of the answer), but this will do the job:
A = [1 2 3; 4 5 6];
n = size(A, 2);
B = A(:, reshape(ones(n, 1) * (1:n), 1, n^2)) .* repmat(A, 1, n);
Soln = [A, B(:, logical(reshape(tril(toeplitz(ones(n, 1))), 1, n^2)'))];
The calculation is not efficient, since in the B step I actually calculate double the number of combinations that I need (ie I get c1.*c1, c1.*c2, c1.*c3, c2.*c1, c2.*c2, c2.*c3, c3.*c1, c3.*c2, c3.*c3), and then in the second step I pull out only the columns that I need (eg I get rid of c3.*c1 as I've already got c1.*c3 and so on).
UPDATE: Was just out driving and a much better method occurred to me. You just need to construct two index vectors of the form: I1 = [1 1 1 2 2 3] and I2 = [1 2 3 2 3 3], then (A(:, I1) .* A(:, I2)) will get you all the column products you are after. I'm away from my computer at the moment, but will come back later and work out a general way to construct the index vectors. I think it can be fairly easily accompished using the tril(toeplitz) construction. Cheers. Will update in a few hours.
UPDATE: Rody's second solution (+1) is exactly what I had in mind with my previous update so I won't bother repeating what he has done there now. Yoda's is quite neat too actually, so another +1.
DECEMBER UPDATE: Funnily enough, after working on it here, I had to revisit this problem for my own research (coding up White's test for heteroscedasticity). I'm actually favoring a new approach now, recommended (somewhat cryptically) by #slayton in the comments. Specifically, using nchoosek. My new solution looks like this:
T = 20; K = 4;
X = randi(100, T, K);
Index = nchoosek((1:K), 2);
XAll = [X, X(:, Index(:, 1)) .* X(:, Index(:, 2)), X.^2];
nchoosek yields exactly the indices we need to construct the cross-products quickly and easily!
The following is somewhat decent:
B = arrayfun(#(x) circshift(A, [0 -x]), 0:size(A,2)-1, 'UniformOutput', false);
B = cat(2, ones(size(A)), B{:});
C = repmat(A, 1, size(A,2)+1) .* B
This will however result in the matrix
c1 c2 c3 (c1.*c1) (c2.*c2) (c3.*c3) (c1.*c2) (c2.*c3) (c3.*c1) (c1.*c3) (c2.*c1) (c3.*c2)
of which the sequence is different than what you asked, and the products are not unique. If you only want all unique products, use this:
[sA1, sA2] = size(A);
aa = repmat(1:sA2, sA2,1);
C = [A, A(:,nonzeros(triu(aa))).*A(:,nonzeros(triu(aa.')))]
which results in
c1 c2 c3 (c1.*c1) (c2.*c1) (c2.*c2) (c3.*c1) (c3.*c2) (c3.*c3)
which is a different sequence than you asked for, but it contains only the unique products.
Does the sequence matter for your purpose?
Here's another alternative. First, define a function that returns all possible pairs (as per your requirements in the question) for a given number of columns:
cols=#(n)cat(1,num2cell((1:n)'),num2cell((1:n)'*[1,1],2),num2cell(nchoosek(1:n,2),2))
The function should be fairly self-explanatory. Try looking at the output for a few small values of n and see for yourself. With this in place, you can proceed as follows:
s = RandStream('twister','Seed',1); %for reproducibility
x = rand(s, 4, 3) %your matrix
% 0.4170 0.1468 0.3968
% 0.7203 0.0923 0.5388
% 0.0001 0.1863 0.4192
% 0.3023 0.3456 0.6852
o = cellfun(#(c)prod(x(:,c),2),cols(size(x,2)),'UniformOutput',0);
out = cat(2,o{:})
% 0.4170 0.1468 0.3968 0.1739 0.0215 0.1574 0.0612 0.1655 0.0582
% 0.7203 0.0923 0.5388 0.5189 0.0085 0.2903 0.0665 0.3881 0.0498
% 0.0001 0.1863 0.4192 0.0000 0.0347 0.1757 0.0000 0.0000 0.0781
% 0.3023 0.3456 0.6852 0.0914 0.1194 0.4695 0.1045 0.2072 0.2368