MATLAB - sparse to dense matrix - matlab

I have a sparse matrix in a data file produced by a code(which is not MATLAB). The data file consists of four columns. The first two column are the real and imaginary part of a matrix entry and the third and fourth columns are the corresponding row and column index respectively.
I convert this into a dense matrix in Matlab using the following script.
tic
dataA = load('sparse_LHS.dat');
toc
% Initialise matrix
tic
Nr = 15; Nz = 15; Neq = 5;
A (Nr*Nz*Neq,Nr*Nz*Neq) = 0;
toc
tic
lA = length(dataA)
rowA = dataA(:,3); colA = dataA(:,4);
toc
tic
for i = 1:lA
A(rowA(i), colA(i)) = complex(dataA(i,1), dataA(i,2));
end
toc
This scipt is, however, very slow(the for loop is the culprit).
Elapsed time is 0.599023 seconds.
Elapsed time is 0.001978 seconds.
Elapsed time is 0.000406 seconds.
Elapsed time is 275.462138 seconds.
Is there any fast way of doing this in matlab?
Here is what I tried so far:
parfor - This gives me
valid indices are restricted in parfor loops
I tired to recast the for loop as something like this:
A(rowA(:),colA(:)) = complex(dataA(:,1), dataA(:,2));
and I get an error
Subscripted assignment dimension mismatch.

The reason your last try doesn't work is that Matlab can't take a list of subscripts for both columns and rows, and match them to assign elements in order. Instead, it's making all the combinations of rows and columns from the list - this is how it looks:
dataA = magic(4)
dataA =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
dataA([1,2],[1,4]) =
16 13
5 8
So we got 4 elements ([1,1],[1,4],[2,1],[2,4]) instead of 2 ([1,1] and [2,4]).
In order to use subscripts in a list, you need to converts them to linear indexing, and one simple way to do this is using the function sub2ind.
Using this function you can write the following code to do it all at once:
% Initialise matrix
Nr = 15; Nz = 15; Neq = 5;
A(Nr*Nz*Neq,Nr*Nz*Neq) = 0;
% Place all complex values from dataA(:,1:2) in to A by the subscripts in dataA(:,3:4):
A(sub2ind(size(A),dataA(:,3),dataA(:,4))) = complex(dataA(:,1), dataA(:,2));
sub2ind is not such a quick function (but it will be much quicker than your loop), so if you have a lot of data, you might want to do the computation of the linear index by yourself:
rowA = dataA(:,3);
colA = dataA(:,4);
% compute the linear index:
ind = (colA-1)*size(A,1)+rowA;
% Place all complex values from dataA(:,1:2) in to A by the the index 'ind':
A(ind) = complex(dataA(:,1), dataA(:,2));
P.S.:
If you are using Matlab R2015b or later:
A = zeros(Nr*Nz*Neq,Nr*Nz*Neq);
is quicker than:
A(Nr*Nz*Neq,Nr*Nz*Neq) = 0;

Related

How to vectorize searching function in Matlab?

Here is a Matlab coding problem (A little different version with intersect not setdiff here:
a rating matrix A with 3 cols, the 1st col is user'ID which maybe duplicated, 2nd col is the item'ID which maybe duplicated, 3rd col is rating from user to item, ranging from 1 to 5.
Now, I have a subset of user IDs smallUserIDList and a subset of item IDs smallItemIDList, then I want to find the rows in A that rated by users in smallUserIDList, and collect the items that user rated, and do some calculations, such as setdiff with smallItemIDList and count the result, as the following code does:
userStat = zeros(length(smallUserIDList), 1);
for i = 1:length(smallUserIDList)
A2= A(A(:,1) == smallUserIDList(i), :);
itemIDList_each = unique(A2(:,2));
setDiff = setdiff(itemIDList_each , smallItemIDList);
userStat(i) = length(setDiff);
end
userStat
Finally, I find the profile viewer showing that the loop above is inefficient, the question is how to improve this piece of code with vectorization but the help of for loop?
For example:
Input:
A = [
1 11 1
2 22 2
2 66 4
4 44 5
6 66 5
7 11 5
7 77 5
8 11 2
8 22 3
8 44 3
8 66 4
8 77 5
]
smallUserIDList = [1 2 7 8]
smallItemIDList = [11 22 33 55 77]
Output:
userStat =
0
1
0
2
Vanilla MATLAB:
As far as I can tell your code is equivalent to:
%// Create matrix such that: user_item_rating(user,item)==rating
user_item_rating = sparse(A(:,1),A(:,2),A(:,3));
%// Keep all BUT the items in smallItemIDList
user_item_rating(:,smallItemIDList) = [];
%// Keep only those users in `smallUserIDList` and use order of this list
user_item_rating = user_item_rating(smallUserIDList,:);
%// Count the number of ratings
userStat = sum(user_item_rating~=0, 2);
This will work if there is at most one rating per (user,item)-combination. Also it should be quite efficient.
Clean approach without reinventing the wheel:
Check out grpstats from the Statistics Toolbox!
An implementation could look similar to this:
%// Create ratings table
ratings = array2table(A, 'VariableNames', {'user','item','rating'});
%// Remove items we don't care about (smallItemIDList)
ratings = ratings(~ismember(ratings.item, smallItemIDList),:);
%// Keep only users we care about (smallUserIDList)
ratings = ratings(ismember(ratings.user, smallUserIDList),:);
%// Compute the statistics grouped by 'user'.
userStat = grpstats(ratings, 'user');
This could be one vectorized approach -
%// Take care of equality between first column of A and smallUserIDList to
%// find the matching row and column indices.
%// NOTE: This corresponds to "A(:,1) == smallUserIDList(i)" from OP.
[R,C] = find(bsxfun(#eq,A(:,1),smallUserIDList.')); %//'
%// Take care of non-equality between second column of A and smallItemIDList.
%// NOTE: This corresponds to SETDIFF in the original loopy code from OP.
mask1 = ~ismember(A(R,2),smallItemIDList);
AR2 = A(R,2); %// Elements from 2nd col of A that has matches from first step
%// Get only those elements from C and AR2 that has ONES in mask1
C1 = C(mask1);
AR2 = AR2(mask1);
%// Initialized output array
userStat = zeros(numel(smallUserIDList),1);
if ~isempty(C1)%//There is at least one element in C, so do further processing
%// Find the count of duplicate elements for each ID in C1 indexed into AR2.
%// NOTE: This corresponds to "unique(A2(:,2))" from OP.
dup_counts = accumarray(C1,AR2,[],#(x) numel(x)-numel(unique(x)));
%// Get the count of matches for each ID in C in the mask1.
%// NOTE: This corresponds to:
%// "length(setdiff(itemIDList_each , smallItemIDList))" from OP.
accums = accumarray(C,mask1);
%// Store the counts in output array and also subtract the dup counts
userStat(1:numel(accums)) = accums;
userStat(1:numel(dup_counts)) = userStat(1:numel(dup_counts)) - dup_counts;
end
Benchmarking
The code listed next compares runtimes for proposed approach against the original loopy code -
%// Size parameters and random inputs with them
A_nrows = 5000;
IDlist_len = 5000;
max_userID = 1000;
max_itemID = 1000;
A = [randi(max_userID,A_nrows,1) randi(max_itemID,A_nrows,1) randi(5,A_nrows,2)];
smallUserIDList = randi(max_userID,IDlist_len,1);
smallItemIDList = randi(max_itemID,IDlist_len,1);
disp('---------------------------- With Original Approach')
tic
%// Original posted code
toc
disp('---------------------------- With Proposed Approach'))
tic
%// Proposed approach code
toc
The runtimes thus obtained with three sets of datasizes were -
Case #1:
A_nrows = 500;
IDlist_len = 500;
max_userID = 100;
max_itemID = 100;
---------------------------- With Original Approach
Elapsed time is 0.136630 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.004163 seconds.
Case #2:
A_nrows = 5000;
IDlist_len = 5000;
max_userID = 100;
max_itemID = 100;
---------------------------- With Original Approach
Elapsed time is 1.579468 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.050498 seconds.
Case #3:
A_nrows = 5000;
IDlist_len = 5000;
max_userID = 1000;
max_itemID = 1000;
---------------------------- With Original Approach
Elapsed time is 1.252294 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.044198 seconds.
Conclusion: The speedups with the proposed approach over the original loopy code thus seem to be huge!!
I think you are trying to remove a fixed set of ratings for a subset of users and count the number of remaining ratings:
Does the following work:
Asub = A(ismember(A(:,1), smallUserIDList),1:2);
Bremove = allcomb(smallUserIDList, smallItemIDList);
Akeep = setdiff(Asub, Bremove, 'rows');
T = varfun(#sum, array2table(Akeep), 'InputVariables', 'Akeep2', 'GroupingVariables', 'Akeep1');
% userStat = T.GroupCount;
you need the allcomb function from the file exchange from matlab central, it gives a cartesian product of two vectors, and is easy to implement anyway.

Using find with a struct

I have a struct that holds thousands of samples of data. Each data point contains multiple objects. For example:
Structure(1).a = 7
Structure(1).b = 3
Structure(2).a = 2
Structure(2).b = 6
Structure(3).a = 1
Structure(3).b = 6
...
... (thousands more)
...
Structure(2345).a = 4
Structure(2345).b = 9
... and so on.
If I wanted to find the index number of all the '.b' objects containing the number 6, I would have expected the following function would do the trick:
find(Structure.b == 6)
... and I would expect the answer to contain '2' and '3' (for the input shown above).
However, this doesn't work. What is the correct syntax and/or could I be arranging my data in a more logical way in the first place?
The syntax Structure.b for an array of structs gives you a comma-separated list, so you'll have to concatenate them all (for instance, using brackets []) in order to obtain a vector:
find([Structure.b] == 6)
For the input shown above, the result is as expected:
ans =
2 3
As Jonas noted, this would work only if there are no fields containing empty matrices, because empty matrices will not be reflected in the concatenation result.
Handling structs with empty fields
If you suspect that these fields may contain empty matrices, either convert them to NaNs (if possible...) or consider using one of the safer solutions suggested by Rody.
In addition, I've thought of another interesting workaround for this using strings. We can concatenate everything into a delimited string to keep the information about empty fields, and then tokenize it back (this, in my humble opinion, is easier to be done in MATLAB than handle numerical values stored in cells).
Inspired by Jonas' comment, we can convert empty fields to NaNs like so:
str = sprintf('%f,', Structure.b)
B = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN)
and this allows you to apply find on the contents of B:
find(B{:} == 6)
ans =
2
3
Building on EitanT's answer with Jonas' comment, a safer way could be
>> S(1).a = 7;
S(1).b = 3;
S(2).a = 2;
S(2).b = 6;
S(3).a = 1;
S(3).b = [];
S(4).a = 1;
S(4).b = 6;
>> find( cellfun(#(x)isequal(x,6),{S.b}) )
ans =
2 4
It's probably not very fast though (compared to EitanT's version), so only use this when needed.
Another answer to this question! This time, we'll compare the performance of the following 4 methods:
My original method
EitanT's original method (which does not handle emtpies)
EitanT's improved method using strings
A new method: a simple for-loop
Another new method: a vectorized, emtpy-safe version
Test code:
% Set up test
N = 1e5;
S(N).b = [];
for ii = 1:N
S(ii).b = randi(6); end
% Rody Oldenhuis 1
tic
sol1 = find( cellfun(#(x)isequal(x,6),{S.b}) );
toc
% EitanT 1
tic
sol2 = find([S.b] == 6);
toc
% EitanT 2
tic
str = sprintf('%f,', S.b);
values = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN);
sol3 = find(values{:} == 6);
toc
% Rody Oldenhuis 2
tic
ids = false(N,1);
for ii = 1:N
ids(ii) = isequal(S(ii).b, 6);
end
sol4 = find(ids);
toc
% Rody Oldenhuis 3
tic
idx = false(size(S));
SS = {S.b};
inds = ~cellfun('isempty', SS);
idx(inds) = [SS{inds}]==6;
sol5 = find(idx);
toc
% make sure they are all equal
all(sol1(:)==sol2(:))
all(sol1(:)==sol3(:))
all(sol1(:)==sol4(:))
all(sol1(:)==sol5(:))
Results on my machine at work (AMD A6-3650 APU (4 cores), 4GB RAM, Windows 7 64 bit):
Elapsed time is 28.990076 seconds. % Rody Oldenhuis 1 (cellfun)
Elapsed time is 0.119165 seconds. % EitanT 1 (no empties)
Elapsed time is 22.430720 seconds. % EitanT 2 (string manipulation)
Elapsed time is 0.706631 seconds. % Rody Oldenhuis 2 (loop)
Elapsed time is 0.207165 seconds. % Rody Oldenhuis 3 (vectorized)
ans =
1
ans =
1
ans =
1
ans =
1
On my Homebox (AMD Phenom(tm) II X6 1100T (6 cores), 16GB RAM, Ubuntu64 12.10):
Elapsed time is 0.572098 seconds. % cellfun
Elapsed time is 0.119557 seconds. % no emtpties
Elapsed time is 0.220903 seconds. % string manipulation
Elapsed time is 0.107345 seconds. % loop
Elapsed time is 0.180842 seconds. % cellfun-with-string
Gotta love that JIT :)
and wow...anyone know why the two systems behave so differently?
Also, little known fact -- cellfun with one of the possible string arguments is incredibly fast (which goes to show how much overhead anonymous functions require...).
Still, if you can be absolutely sure there are no empties, go for EitanT's original answer; that's what Matlab is for. If you can't be sure, just go for the loop.

Use a vector to index a matrix without linear index

G'day,
I'm trying to find a way to use a vector of [x,y] points to index from a large matrix in MATLAB.
Usually, I would convert the subscript points to the linear index of the matrix.(for eg. Use a vector as an index to a matrix) However, the matrix is 4-dimensional, and I want to take all of the elements of the 3rd and 4th dimensions that have the same 1st and 2nd dimension. Let me hopefully demonstrate with an example:
Matrix = nan(4,4,2,2); % where the dimensions are (x,y,depth,time)
Matrix(1,2,:,:) = 999; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(3,4,:,:) = 888; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(4,4,:,:) = 124;
Now, I want to be able to index with the subscripts (1,2) and (3,4), etc and return not only the 999 and 888 which exist in Matrix(:,:,1,1) but the contents which exist at Matrix(:,:,1,2),Matrix(:,:,2,1) and Matrix(:,:,2,2), and so on (IRL, the dimensions of Matrix might be more like size(Matrix) = (300 250 30 200)
I don't want to use linear indices because I would like the results to be in a similar vector fashion. For example, I would like a result which is something like:
ans(time=1)
999 888 124
999 888 124
ans(time=2)
etc etc etc
etc etc etc
I'd also like to add that due to the size of the matrix I'm dealing with, speed is an issue here - thus why I'd like to use subscript indices to index to the data.
I should also mention that (unlike this question: Accessing values using subscripts without using sub2ind) since I want all the information stored in the extra dimensions, 3 and 4, of the i and jth indices, I don't think that a slightly faster version of sub2ind still would not cut it..
I can think of three ways to go about this
Simple loop
Just loop over all the 2D indices you have, and use colons to access the remaining dimensions:
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
Vectorized calculation of Linear indices
Skip sub2ind and vectorize the computation of linear indices:
% generalized for arbitrary dimensions of M
sz = size(M);
nd = ndims(M);
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
% the linear indices
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
Sub2ind
Just use the ready-made tool that ships with Matlab:
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
Speed
So which one's the fastest? Let's find out:
clc
M = nan(4,4,2,2);
sz = size(M);
nd = ndims(M);
twoDinds = [...
1 2
4 3
3 4
4 4
2 1];
tic
for ii = 1:1e3
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
end
toc
tic
twoDinds_prev = twoDinds;
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
M(inds) = rand;
end
toc
tic
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
M(inds) = rand;
end
toc
Results:
Elapsed time is 0.004778 seconds. % loop
Elapsed time is 0.807236 seconds. % vectorized linear inds
Elapsed time is 0.839970 seconds. % linear inds with sub2ind
Conclusion: use the loop.
Granted, the tests above are largely influenced by JIT's failure to compile the two last loops, and the non-specificity to 4D arrays (the last two method also work on ND arrays). Making a specialized version for 4D will undoubtedly be much faster.
Nevertheless, the indexing with simple loop is, well, simplest to do, easiest on the eyes and very fast too, thanks to JIT.
So, here is a possible answer... but it is messy. I suspect it would more computationally expensive then a more direct method... And this would definitely not be my preferred answer. It would be great if we could get the answer without any for loops!
Matrix = rand(100,200,30,400);
grabthese_x = (1 30 50 90);
grabthese_y = (61 9 180 189);
result=nan(size(length(grabthese_x),size(Matrix,3),size(Matrix,4));
for tt = 1:size(Matrix,4)
subset = squeeze(Matrix(grabthese_x,grabthese_y,:,tt));
for NN=1:size(Matrix,3)
result(:,NN,tt) = diag(subset(:,:,NN));
end
end
The resulting matrix, result should have size size(result) = (4 N tt).
I think this should work, even if Matrix isn't square. However, it is not ideal, as I said above.

in matlab, calculate mean in a part of one column where another column satisfies a condition

I'm quite new to matlab, and I'm curious how to do this:
I have a rather large (27000x11) matrix, and the 8th column contains a number which changes sometimes but is constant for like 2000 rows (not necessarily consecutive).
I would like to calculate the mean of the entries in the 3rd column for those rows where the 8th column has the same value. This for each value of the 8th column.
I would also like to plot the 3rd column's means as a function of the 8th column's value but that I can do if I can get a new matrix (2x2) containing [mean_of_3rd,8th].
Ex: (smaller matrix for convenience)
1 2 3 4 5
3 7 5 3 2
1 3 2 5 3
4 5 7 5 8
2 4 7 4 4
Since the 4th column has the same value in row 1 and 5 I'd like to calculate the mean of 2 and 4 (the corresponding elements of column 2, italic bold) and put it in another matrix together with the 4th column's value. The same for 3 and 5 (bold) since the 4th column has the same value for these two.
3 4
4 5
and so on... is this possible in an easy way?
Use the all-mighty, underused accumarray :
This line gives you mean values of 4th column accumulated by 2nd column:
means = accumarray( A(:,4) ,A(:,2),[],#mean)
This line gives you number of element in each set:
count = accumarray( A(:,4) ,ones(size(A(:,4))))
Now if you want to filter only those that have at least one occurence:
>> filtered = means(count>1)
filtered =
3
4
This will work only for positive integers in the 4th column.
Another possibility for counting amount of elements in each set:
count = accumarray( A(:,4) ,A(:,4),[],#numel)
A slightly refined approach based on the ideas of Andrey and Rody. We can not use accumarray directly, since the data is real, not integer. But, we can use unique to find the indices of the repeating entries. Then we operate on integers.
% get unique entries in 4th column
[R, I, J] = unique(A(:,4));
% count the repeating entries: now we have integer indices!
counts = accumarray(J, 1, size(R));
% sum the 2nd column for all entries
sums = accumarray(J, A(:,2), size(R));
% compute means
means = sums./counts;
% choose only the entries that show more than once in 4th column
inds = counts>1;
result = [means(inds) R(inds)];
Time comparison for the following synthetic data:
A=randi(100, 1000000, 5);
% Rody's solution
Elapsed time is 0.448222 seconds.
% The above code
Elapsed time is 0.148304 seconds.
My official answer:
A4 = A(:,4);
R = unique(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(R)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
result = [means(inds) R(inds)];
This is because of the following. Here's all of the alternatives we've come up with, in profiling form:
%# sample data
A = [
1 2 3 4 5
3 7 5 3 2
1 3 2 5 3
4 5 7 5 8
2 4 7 4 4];
%# accumarray
%# works only on positive integers in A(:,4)
tic
for ii = 1:1e4
means = accumarray( A(:,4) ,A(:,2),[],#mean);
count = accumarray( A(:,4) ,ones(size(A(:,4))));
filtered = means(count>1);
end
toc
%# arrayfun
%# works only on integers in A(:,4)
tic
for ii = 1:1e4
B = arrayfun(#(x) A(A(:,4)==x, 2), min(A(:,4)):max(A(:,4)), 'uniformoutput', false);
filtered = cellfun(#mean, B(cellfun(#(x) numel(x)>1, B)) );
end
toc
%# ordinary loop
%# works only on integers in A(:,4)
tic
for ii = 1:1e4
A4 = A(:,4);
R = min(A4):max(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(R)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
filtered = means(inds);
end
toc
Results:
Elapsed time is 1.238352 seconds. %# (accumarray)
Elapsed time is 7.208585 seconds. %# (arrayfun + cellfun)
Elapsed time is 0.225792 seconds. %# (for loop)
The ordinary loop is clearly the way to go here.
Note the absence of mean in the inner loop. This is because mean is not a Matlab builtin function (at least, on R2010), so that using it inside the loop makes the loop unqualified for JIT compilation, which slows it down by a factor of over 10. Using the form above accelerates the loop to almost 5.5 times the speed of the accumarray solution.
Judging on your comment, it is almost trivial to change the loop to work on all entries in A(:,4) (not just the integers):
A4 = A(:,4);
R = unique(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(A4)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
filtered = means(inds);
Which I will copy-paste to the top as my official answer :)

MATLAB: duplicating vector 'n' times [duplicate]

This question already has answers here:
Octave / Matlab: Extend a vector making it repeat itself?
(3 answers)
Closed 9 years ago.
I have a vector, e.g.
vector = [1 2 3]
I would like to duplicate it within itself n times, i.e. if n = 3, it would end up as:
vector = [1 2 3 1 2 3 1 2 3]
How can I achieve this for any value of n? I know I could do the following:
newvector = vector;
for i = 1 : n-1
newvector = [newvector vector];
end
This seems a little cumbersome though. Any more efficient methods?
Try
repmat([1 2 3],1,3)
I'll leave you to check the documentation for repmat.
This is a Faster Method Than repmat or reshape by an Order of Magnitude
One of the best methods for doing such things is Using Tony's Trick. Repmat and Reshape are usually found to be slower than Tony's trick as it directly uses Matlabs inherent indexing. To answer you question,
Lets say, you want to tile the row vector r=[1 2 3] N times like r=[1 2 3 1 2 3 1 2 3...], then,
c=r'
cc=c(:,ones(N,1));
r_tiled = cc(:)';
This method has significant time savings against reshape or repmat for large N's.
EDIT : Reply to #Li-aung Yip's doubts
I conducted a small Matlab test to check the speed differential between repmat and tony's trick. Using the code mentioned below, I calculated the times for constructing the same tiled vector from a base vector A=[1:N]. The results show that YES, Tony's-Trick is FASTER BY AN ORDER of MAGNITUDE, especially for larger N. People are welcome to try it themselves. This much time differential can be critical if such an operation has to be performed in loops. Here is the small script I used;
N= 10 ;% ASLO Try for values N= 10, 100, 1000, 10000
% time for tony_trick
tic;
A=(1:N)';
B=A(:,ones(N,1));
C=B(:)';
t_tony=toc;
clearvars -except t_tony N
% time for repmat
tic;
A=(1:N);
B=repmat(A,1,N);
t_repmat=toc;
clearvars -except t_tony t_repmat N
The Times (in seconds) for both methods are given below;
N=10, time_repmat = 8e-5 , time_tony = 3e-5
N=100, time_repmat = 2.9e-4 , time_tony = 6e-5
N=1000, time_repmat = 0.0302 , time_tony = 0.0058
N=10000, time_repmat = 2.9199 , time_tony = 0.5292
My RAM didn't permit me to go beyond N=10000. I am sure, the time difference between the two methods will be even more significant for N=100000. I know, these times might be different for different machines, but the relative difference in order-of-magnitude of times will stand. Also, I know, the avg of times could have been a better metric, but I just wanted to show the order of magnitude difference in time consumption between the two approaches. My machine/os details are given below :
Relevant Machine/OS/Matlab Details : Athlon i686 Arch, Ubuntu 11.04 32 bit, 3gb ram, Matlab 2011b
Based on Abhinav's answer and some tests, I wrote a function which is ALWAYS faster than repmat()!
It uses the same parameters, except for the first parameter which must be a vector and not a matrix.
function vec = repvec( vec, rows, cols )
%REPVEC Replicates a vector.
% Replicates a vector rows times in dim1 and cols times in dim2.
% Auto optimization included.
% Faster than repmat()!!!
%
% Copyright 2012 by Marcel Schnirring
if ~isscalar(rows) || ~isscalar(cols)
error('Rows and cols must be scaler')
end
if rows == 1 && cols == 1
return % no modification needed
end
% check parameters
if size(vec,1) ~= 1 && size(vec,2) ~= 1
error('First parameter must be a vector but is a matrix or array')
end
% check type of vector (row/column vector)
if size(vec,1) == 1
% set flag
isrowvec = 1;
% swap rows and cols
tmp = rows;
rows = cols;
cols = tmp;
else
% set flag
isrowvec = 0;
end
% optimize code -> choose version
if rows == 1
version = 2;
else
version = 1;
end
% run replication
if version == 1
if isrowvec
% transform vector
vec = vec';
end
% replicate rows
if rows > 1
cc = vec(:,ones(1,rows));
vec = cc(:);
%indices = 1:length(vec);
%c = indices';
%cc = c(:,ones(rows,1));
%indices = cc(:);
%vec = vec(indices);
end
% replicate columns
if cols > 1
%vec = vec(:,ones(1,cols));
indices = (1:length(vec))';
indices = indices(:,ones(1,cols));
vec = vec(indices);
end
if isrowvec
% transform vector back
vec = vec';
end
elseif version == 2
% calculate indices
indices = (1:length(vec))';
% replicate rows
if rows > 1
c = indices(:,ones(rows,1));
indices = c(:);
end
% replicate columns
if cols > 1
indices = indices(:,ones(1,cols));
end
% transform index when row vector
if isrowvec
indices = indices';
end
% get vector based on indices
vec = vec(indices);
end
end
Feel free to test the function with all your data and give me feedback. When you found something to even improve it, please tell me.