MATLAB separating array [duplicate] - matlab

I'm trying to elegantly split a vector. For example,
vec = [1 2 3 4 5 6 7 8 9 10]
According to another vector of 0's and 1's of the same length where the 1's indicate where the vector should be split - or rather cut:
cut = [0 0 0 1 0 0 0 0 1 0]
Giving us a cell output similar to the following:
[1 2 3] [5 6 7 8] [10]

Solution code
You can use cumsum & accumarray for an efficient solution -
%// Create ID/labels for use with accumarray later on
id = cumsum(cut)+1
%// Mask to get valid values from cut and vec corresponding to ones in cut
mask = cut==0
%// Finally get the output with accumarray using masked IDs and vec values
out = accumarray(id(mask).',vec(mask).',[],#(x) {x})
Benchmarking
Here are some performance numbers when using a large input on the three most popular approaches listed to solve this problem -
N = 100000; %// Input Datasize
vec = randi(100,1,N); %// Random inputs
cut = randi(2,1,N)-1;
disp('-------------------- With CUMSUM + ACCUMARRAY')
tic
id = cumsum(cut)+1;
mask = cut==0;
out = accumarray(id(mask).',vec(mask).',[],#(x) {x});
toc
disp('-------------------- With FIND + ARRAYFUN')
tic
N = numel(vec);
ind = find(cut);
ind_before = [ind-1 N]; ind_before(ind_before < 1) = 1;
ind_after = [1 ind+1]; ind_after(ind_after > N) = N;
out = arrayfun(#(x,y) vec(x:y), ind_after, ind_before, 'uni', 0);
toc
disp('-------------------- With CUMSUM + ARRAYFUN')
tic
cutsum = cumsum(cut);
cutsum(cut == 1) = NaN; %Don't include the cut indices themselves
sumvals = unique(cutsum); % Find the values to use in indexing vec for the output
sumvals(isnan(sumvals)) = []; %Remove NaN values from sumvals
output = arrayfun(#(val) vec(cutsum == val), sumvals, 'UniformOutput', 0);
toc
Runtimes
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.068102 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.117953 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 12.560973 seconds.
Special case scenario: In cases where you might have runs of 1's, you need to modify few things as listed next -
%// Mask to get valid values from cut and vec corresponding to ones in cut
mask = cut==0
%// Setup IDs differently this time. The idea is to have successive IDs.
id = cumsum(cut)+1
[~,~,id] = unique(id(mask))
%// Finally get the output with accumarray using masked IDs and vec values
out = accumarray(id(:),vec(mask).',[],#(x) {x})
Sample run with such a case -
>> vec
vec =
1 2 3 4 5 6 7 8 9 10
>> cut
cut =
1 0 0 1 1 0 0 0 1 0
>> celldisp(out)
out{1} =
2
3
out{2} =
6
7
8
out{3} =
10

For this problem, a handy function is cumsum, which can create a cumulative sum of the cut array. The code that produces an output cell array is as follows:
vec = [1 2 3 4 5 6 7 8 9 10];
cut = [0 0 0 1 0 0 0 0 1 0];
cutsum = cumsum(cut);
cutsum(cut == 1) = NaN; %Don't include the cut indices themselves
sumvals = unique(cutsum); % Find the values to use in indexing vec for the output
sumvals(isnan(sumvals)) = []; %Remove NaN values from sumvals
output = {};
for i=1:numel(sumvals)
output{i} = vec(cutsum == sumvals(i)); %#ok<SAGROW>
end
As another answer shows, you can use arrayfun to create a cell array with the results. To apply that here, you'd replace the for loop (and the initialization of output) with the following line:
output = arrayfun(#(val) vec(cutsum == val), sumvals, 'UniformOutput', 0);
That's nice because it doesn't end up growing the output cell array.
The key feature of this routine is the variable cutsum, which ends up looking like this:
cutsum =
0 0 0 NaN 1 1 1 1 NaN 2
Then all we need to do is use it to create indices to pull the data out of the original vec array. We loop from zero to max and pull matching values. Notice that this routine handles some situations that may arise. For instance, it handles 1 values at the very beginning and very end of the cut array, and it gracefully handles repeated ones in the cut array without creating empty arrays in the output. This is because of the use of unique to create the set of values to search for in cutsum, and the fact that we throw out the NaN values in the sumvals array.
You could use -1 instead of NaN as the signal flag for the cut locations to not use, but I like NaN for readability. The -1 value would probably be more efficient, as all you'd have to do is truncate the first element from the sumvals array. It's just my preference to use NaN as a signal flag.
The output of this is a cell array with the results:
output{1} =
1 2 3
output{2} =
5 6 7 8
output{3} =
10
There are some odd conditions we need to handle. Consider the situation:
vec = [1 2 3 4 5 6 7 8 9 10 11 12 13 14];
cut = [1 0 0 1 1 0 0 0 0 1 0 0 0 1];
There are repeated 1's in there, as well as a 1 at the beginning and end. This routine properly handles all this without any empty sets:
output{1} =
2 3
output{2} =
6 7 8 9
output{3} =
11 12 13

You can do this with a combination of find and arrayfun:
vec = [1 2 3 4 5 6 7 8 9 10];
N = numel(vec);
cut = [0 0 0 1 0 0 0 0 1 0];
ind = find(cut);
ind_before = [ind-1 N]; ind_before(ind_before < 1) = 1;
ind_after = [1 ind+1]; ind_after(ind_after > N) = N;
out = arrayfun(#(x,y) vec(x:y), ind_after, ind_before, 'uni', 0);
We thus get:
>> celldisp(out)
out{1} =
1 2 3
out{2} =
5 6 7 8
out{3} =
10
So how does this work? Well, the first line defines your input vector, the second line finds how many elements are in this vector and the third line denotes your cut vector which defines where we need to cut in our vector. Next, we use find to determine the locations that are non-zero in cut which correspond to the split points in the vector. If you notice, the split points determine where we need to stop collecting elements and begin collecting elements.
However, we need to account for the beginning of the vector as well as the end. ind_after tells us the locations of where we need to start collecting values and ind_before tells us the locations of where we need to stop collecting values. To calculate these starting and ending positions, you simply take the result of find and add and subtract 1 respectively.
Each corresponding position in ind_after and ind_before tell us where we need to start and stop collecting values together. In order to accommodate for the beginning of the vector, ind_after needs to have the index of 1 inserted at the beginning because index 1 is where we should start collecting values at the beginning. Similarly, N needs to be inserted at the end of ind_before because this is where we need to stop collecting values at the end of the array.
Now for ind_after and ind_before, there is a degenerate case where the cut point may be at the end or beginning of the vector. If this is the case, then subtracting or adding by 1 will generate a start and stopping position that's out of bounds. We check for this in the 4th and 5th line of code and simply set these to 1 or N depending on whether we're at the beginning or end of the array.
The last line of code uses arrayfun and iterates through each pair of ind_after and ind_before to slice into our vector. Each result is placed into a cell array, and our output follows.
We can check for the degenerate case by placing a 1 at the beginning and end of cut and some values in between:
vec = [1 2 3 4 5 6 7 8 9 10];
cut = [1 0 0 1 0 0 0 1 0 1];
Using this example and the above code, we get:
>> celldisp(out)
out{1} =
1
out{2} =
2 3
out{3} =
5 6 7
out{4} =
9
out{5} =
10

Yet another way, but this time without any loops or accumulating at all...
lengths = diff(find([1 cut 1])) - 1; % assuming a row vector
lengths = lengths(lengths > 0);
data = vec(~cut);
result = mat2cell(data, 1, lengths); % also assuming a row vector
The diff(find(...)) construct gives us the distance from each marker to the next - we append boundary markers with [1 cut 1] to catch any runs of zeros which touch the ends. Each length is inclusive of its marker, though, so we subtract 1 to account for that, and remove any which just cover consecutive markers, so that we won't get any undesired empty cells in the output.
For the data, we mask out any elements corresponding to markers, so we just have the valid parts we want to partition up. Finally, with the data ready to split and the lengths into which to split it, that's precisely what mat2cell is for.
Also, using #Divakar's benchmark code;
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.272810 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.436276 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 17.112259 seconds.
-------------------- With mat2cell
Elapsed time is 0.084207 seconds.
...just sayin' ;)

Here's what you need:
function spl = Splitting(vec,cut)
n=1;
j=1;
for i=1:1:length(b)
if cut(i)==0
spl{n}(j)=vec(i);
j=j+1;
else
n=n+1;
j=1;
end
end
end
Despite how simple my method is, it's in 2nd place for performance:
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.264428 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.407963 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 18.337940 seconds.
-------------------- SIMPLE
Elapsed time is 0.271942 seconds.

Unfortunately there is no 'inverse concatenate' in MATLAB. If you wish to solve a question like this you can try the below code. It will give you what you looking for in the case where you have two split point to produce three vectors at the end. If you want more splits you will need to modify the code after the loop.
The results are in n vector form. To make them into cells, use num2cell on the results.
pos_of_one = 0;
% The loop finds the split points and puts their positions into a vector.
for kk = 1 : length(cut)
if cut(1,kk) == 1
pos_of_one = pos_of_one + 1;
A(1,one_pos) = kk;
end
end
F = vec(1 : A(1,1) - 1);
G = vec(A(1,1) + 1 : A(1,2) - 1);
H = vec(A(1,2) + 1 : end);

Related

Find equal rows between two Matlab matrices

I have a matrix index in Matlab with size GxN and a matrix A with size MxN.
Let me provide an example before presenting my question.
clear
N=3;
G=2;
M=5;
index=[1 2 3;
13 14 15]; %GxN
A=[1 2 3;
5 6 7;
21 22 23;
1 2 3;
13 14 15]; %MxN
I would like your help to construct a matrix Response with size GxM with Response(g,m)=1 if the row A(m,:) is equal to index(g,:) and zero otherwise.
Continuing the example above
Response= [1 0 0 1 0;
0 0 0 0 1]; %GxM
This code does what I want (taken from a previous question of mine - just to clarify: the current question is different)
Response=permute(any(all(bsxfun(#eq, reshape(index.', N, [], G), permute(A, [2 3 4 1])), 1), 2), [3 4 1 2]);
However, the command is extremely slow for my real matrix sizes (N=19, M=500, G=524288). I understand that I will not be able to get huge speed but anything that can improve on this is welcome.
Aproach 1: computing distances
If you have the Statistics Toolbox:
Response = ~(pdist2(index, A));
or:
Response = ~(pdist2(index, A, 'hamming'));
This works because pdist2 computes the distance between each pair of rows. Equal rows have distance 0. The logical negation ~ gives 1 for those pairs of rows, and 0 otherwise.
Approach 2: reducing rows to unique integer labels
This approach is faster on my machine:
[~,~,u] = unique([index; A], 'rows');
Response = bsxfun(#eq, u(1:G), u(G+1:end).');
It works by reducing rows to unique integer labels (using the third output of unique), and comparing the latter instead of the former.
For your size values this takes approximately 1 second on my computer:
clear
N = 19; M = 500; G = 524288;
index = randi(5,G,N); A = randi(5,M,N);
tic
[~,~,u] = unique([index; A], 'rows');
Response = bsxfun(#eq, u(1:G), u(G+1:end).');
toc
gives
Elapsed time is 1.081043 seconds.
MATLAB has a multitude of functions for working with sets, including setdiff, intersect, union etc. In this case, you can use the ismember function:
[~, Loc] = ismember(A,index,'rows');
Which gives:
Loc =
1
0
0
1
2
And Response would be constructed as follows:
Response = (1:size(index,1) == Loc).';
Response =
2×5 logical array
1 0 0 1 0
0 0 0 0 1
You could reshape the matrices so that each row instead lies along the 3rd dimension. Then we can use implicit expansion (see bsxfun for R2016b or earlier) for equality of all elements, and all to aggregate on the rows (i.e. false if not all equal for a given row).
Response = all( reshape( index, [], 1, size(index,2) ) == reshape( A, 1, [], size(A,2) ), 3 );
You might even be able to avoid some reshaping by using all in another dimension, but it's easier for me to visualise it this way.

MATLAB - Returning a matrix of sums of elements corresponding to the same kind

Overview
An n×m matrix A and an n×1 vector Date are the inputs of the function S = sumdate(A,Date).
The function returns an n×m vector S such that all rows in S correspond to the sum of the rows of A from the same date.
For example, if
A = [1 2 7 3 7 3 4 1 9
6 4 3 0 -1 2 8 7 5]';
Date = [161012 161223 161223 170222 160801 170222 161012 161012 161012]';
Then I would expect the returned matrix S is
S = [15 9 9 6 7 6 15 15 15;
26 7 7 2 -1 2 26 26 26]';
Because the elements Date(2) and Date(3) are the same, we have
S(2,1) and S(3,1) are both equal to the sum of A(2,1) and A(3,1)
S(2,2) and S(3,2) are both equal to the sum of A(2,2) and A(3,2).
Because the elements Date(1), Date(7), Date(8) and Date(9) are the same, we have
S(1,1), S(7,1), S(8,1), S(9,1) equal the sum of A(1,1), A(7,1), A(8,1), A(9,1)
S(1,2), S(7,2), S(8,2), S(9,2) equal the sum of A(1,2), A(7,2), A(8,2), A(9,2)
The same for S([4,6],1) and S([4,6],2)
As the element Date(5) does not repeat, so S(5,1) = A(5,1) = 7 and S(5,2) = A(5,2) = -1.
The code I have written so far
Here is my try on the code for this task.
function S = sumdate(A,Date)
S = A; %Pre-assign S as a matrix in the same size of A.
Dlist = unique(Date); %Sort out a non-repeating list from Date
for J = 1 : length(Dlist)
loc = (Date == Dlist(J)); %Compute a logical indexing vector for locating the J-th element in Dlist
S(loc,:) = repmat(sum(S(loc,:)),sum(loc),1); %Replace the located rows of S by the sum of them
end
end
I tested it on my computer using A and Date with these attributes:
size(A) = [33055 400];
size(Date) = [33055 1];
length(unique(Date)) = 2645;
It took my PC about 1.25 seconds to perform the task.
This task is performed hundreds of thousands of times in my project, therefore my code is too time-consuming. I think the performance will be boosted up if I can eliminate the for-loop above.
I have found some built-in functions which do special types of sums like accumarray or cumsum, but I still do not have any ideas on how to eliminate the for-loop.
I would appreciate your help.
You can do this with accumarray, but you'll need to generate a set of row and column subscripts into A to do it. Here's how:
[~, ~, index] = unique(Date); % Get indices of unique dates
subs = [repmat(index, size(A, 2), 1) ... % repmat to create row subscript
repelem((1:size(A, 2)).', size(A, 1))]; % repelem to create column subscript
S = accumarray(subs, A(:)); % Reshape A into column vector for accumarray
S = S(index, :); % Use index to expand S to original size of A
S =
15 26
9 7
9 7
6 2
7 -1
6 2
15 26
15 26
15 26
Note #1: This will use more memory than your for loop solution (subs will have twice the number of element as A), but may give you a significant speed-up.
Note #2: If you are using a version of MATLAB older than R2015a, you won't have repelem. Instead you can replace that line using kron (or one of the other solutions here):
kron((1:size(A, 2)).', ones(size(A, 1), 1))

How to do following in Matlab?

Suppose I have a sequence of n numbers
e=[5,4,45,63,22,22,1,12,3,2,2,16,14,14,16,17,1,19,21,15,32,32,27,27,43,41,7,8,13,23,23]
then for first 10 numbers i.e.
[5,4,45,63,22,22,1,12,3,2]
count numbers other than 1 to 5 and then divide by 10, i.e.
[45,63,22,22,12]
total 5, so result should be 5/10,now for first 20 numbers i.e.
[5,4,45,63,22,22,1,12,3,2,2,16,14,14,16,17,1,19,21,15]
then
[45,63,22,22,12,16,14,14,16,17,19,21,15]
total = 13, so 13/20, like this for 10, 20, 30,... up to n numbers
and then plot figure with x axis points 0 10 20 30 ... n and y axis with 5/10, 13/20, ... how to do this
I tried by this
for e=10:10:400
for u=1:length(e)
d=(numel(u)>5)
h=d/u
end
end
but it shows different.
Try this
e= [5,4,45,63,22,22,1,12,3,2,2,16,14,14,16,17,1,19,21,15,32,32,27,27,43,41,7,8,13,23,23];
bins = 10:10:numel(e);
counts = NaN(1,numel(bins)); %// pre-allocation. I'm pre-allocating with NaN here instead of zeros because 0 is a valid result for an element of counts and thus could make us miss errors should something go wrong
for k = 1:numel(bins)
counts(k) = sum(e(1:bins(k)) > 5)/bins(k);
end
plot(bins, counts) %// or you might prefer bar(bins, counts)
Here e(1:bins(k)) will be the first 10 elements of e in the first iteration of the loop, the first 20 in the second and so on. sum(... > 10) just counts how many elements are greater than 5. To understand how this works, consider x = [3 4 5 6 7 8 5 1 2]. Now x > 5 will return the logical array [0 0 0 1 1 1 0 0 0], so sum(x>10) is the same as sum([0 0 0 1 1 1 0 0 0]) which is 3 i.e. the count of elements in x greater than 5. Now you just need to store this count in a different element of counts at each iteration hence we have counts(k) = ... and not counts = ... as the latter (i.e. how your code has it) just overrides counts with the scalar count at each iteration rather than creating a vector that records each count at each iteration.
In MATLAB you can often do away with loops and you can do so in this case as well:
counts = cumsum(e > 5)./(1:numel(e));
h = counts(10:10:end);
Hope, this can help you
n=30 % this because of the 'e' size
Lim = 5 % your limit
Steps = 10 %
xValues = Steps:Steps:n
PlotSeries = NaN(size(e,2)/Steps,2)
for x = 1:1:size(e,2)/Steps
PlotSeries(x,:) = [xValues(x),size(e(e(1:xValues(x))>Lim),2)/xValues(x)]
end
plot(PlotSeries)

Recurring Function with Matrix Input

I believe most functions in MATLAB should be able to receive matrix input and return the output in the form of matrix.
For example sqrt([1 4 9]) would return [1 2 3].
However, when I tried this recurring factorial function:
function k = fact(z)
if z ~= 0
k = z * fact(z-1);
else
k = 1;
end
end
It works perfectly when a number is input into fact. However, when a matrix is input into fact, it returns the matrix itself, without performing the factorial function.
E.g.
fact(3) returns 6
fact([1 2 3]) returns [1 2 3] instead of [1 2 6].
Any help is appreciated. Thank you very much!
Since MATLAB is not known to be good with recursive functions, how about a vectorized approach? Try this for a vector input -
mat1 = repmat([1:max(z)],[numel(z) 1])
mat1(bsxfun(#gt,1:max(z),z'))=1
output1 = prod(mat1,2)
Sample run -
z =
1 2 7
output1 =
1
2
5040
For the sake of answering your original question, here's the annoying loopy code for a vector or 2D matrix as input -
function k1 = fact1(z1)
k1 = zeros(size(z1));
for ii = 1:size(z1,1)
for jj = 1:size(z1,2)
z = z1(ii,jj);
if z ~= 0
k1(ii,jj) = z .* fact1(z-1);
else
k1(ii,jj) = 1;
end
end
end
return
Sample run -
>> fact1([1 2 7;3 2 1])
ans =
1 2 5040
6 2 1
You can use the gamma function to compute the factorial without recursion:
function k = fact(z)
k = gamma(z+1);
Example:
>> fact([1 2 3 4])
ans =
1 2 6 24
Not sure if all of you know, but there is an actual factorial function defined in MATLAB that can take in arrays / matrices of any size, and computes the factorial element-wise. For example:
k = factorial([1 2 3 4; 5 6 7 8])
k =
1 2 6 24
120 720 5040 40320
Even though this post is looking for a recursive implementation, and Divakar has provided a solution, I'd still like to put my two cents in and suggest an alternative. Also, let's say that we don't have access to factorial, and we want to compute this from first principles. What I would personally do is create a cell array that's the same size as the input matrix, but each element in this cell array would be a linear index array from 1 up to the number defined for each location in the original matrix. You would then apply prod to each cell element to compute the factorial. A precondition is that no number is less than 1, and that all elements are integers. As such:
z1 = ... ; %// Define input matrix here
z1_matr = arrayfun(#(x) 1:x, z1, 'uni', 0);
out = cellfun(#prod, z1_matr);
If z1 = [1 2 3 4; 5 6 7 8];, from my previous example, we get the same output with the above code:
out =
1 2 6 24
120 720 5040 40320
This will obviously be slower as there is an arrayfun then cellfun call immediately after, but I figured I'd add another method for the sake of just adding in another method :) Not sure how constructive this is, but I figured I'd add my own method and join Divakar and Luis Mendo :)

Eliminating zeros in a matrix - Matlab

Hi I have the following matrix:
A= 1 2 3;
0 4 0;
1 0 9
I want matrix A to be:
A= 1 2 3;
1 4 9
PS - semicolon represents the end of each column and new column starts.
How can I do that in Matlab 2014a? Any help?
Thanks
The problem you run into with your problem statement is the fact that you don't know the shape of the "squeezed" matrix ahead of time - and in particular, you cannot know whether the number of nonzero elements is a multiple of either the rows or columns of the original matrix.
As was pointed out, there is a simple function, nonzeros, that returns the nonzero elements of the input, ordered by columns. In your case,
A = [1 2 3;
0 4 0;
1 0 9];
B = nonzeros(A)
produces
1
1
2
4
3
9
What you wanted was
1 2 3
1 4 9
which happens to be what you get when you "squeeze out" the zeros by column. This would be obtained (when the number of zeros in each column is the same) with
reshape(B, 2, 3);
I think it would be better to assume that the number of elements may not be the same in each column - then you need to create a sparse array. That is actually very easy:
S = sparse(A);
The resulting object S is a sparse array - that is, it contains only the non-zero elements. It is very efficient (both for storage and computation) when lots of elements are zero: once more than 1/3 of the elements are nonzero it quickly becomes slower / bigger. But it has the advantage of maintaining the shape of your matrix regardless of the distribution of zeros.
A more robust solution would have to check the number of nonzero elements in each column and decide what the shape of the final matrix will be:
cc = sum(A~=0);
will count the number of nonzero elements in each column of the matrix.
nmin = min(cc);
nmax = max(cc);
finds the smallest and largest number of nonzero elements in any column
[i j s] = find(A); % the i, j coordinates and value of nonzero elements of A
nc = size(A, 2); % number of columns
B = zeros(nmax, nc);
for k = 1:nc
B(1:cc(k), k) = s(j == k);
end
Now B has all the nonzero elements: for columns with fewer nonzero elements, there will be zero padding at the end. Finally you can decide if / how much you want to trim your matrix B - if you want to have no zeros at all, you will need to trim some values from the longer columns. For example:
B = B(1:nmin, :);
Simple solution:
A = [1 2 3;0 4 0;1 0 9]
A =
1 2 3
0 4 0
1 0 9
A(A==0) = [];
A =
1 1 2 4 3 9
reshape(A,2,3)
ans =
1 2 3
1 4 9
It's very simple though and might be slow. Do you need to perform this operation on very large/many matrices?
From your question it's not clear what you want (how to arrange the non-zero values, specially if the number of zeros in each column is not the same). Maybe this:
A = reshape(nonzeros(A),[],size(A,2));
Matlab's logical indexing is extremely powerful. The best way to do this is create a logical array:
>> lZeros = A==0
then use this logical array to index into A and delete these zeros
>> A(lZeros) = []
Finally, reshape the array to your desired size using the built in reshape command
>> A = reshape(A, 2, 3)