Matlab - Using two index arrays to operate on a sub-matrix - matlab

I'm trying to figure out how to access Matlab sub-arrays (part of an array) with a generic set of subscript vectors.
In general, the problem is defined as:
Given two n-dim endpoints of an array index (both size nd), one having the initial set of indices (startInd) and the other having the last set of indices (endInd), how to access the sub-matrix which is included between the pair of index-sets?
For example, I want to replace this:
Mat=rand(10,10,10,10);
Mat(2:7, 1:6, 1:6, 2:8) = 1.0;
With an operation that can accept any set of two n-dim vectors specifying the indices for the last operation, which is "abstractly" expressed as:
Mat=rand(10,10,10,10);
startInd=[2 1 1 2];
endInd =[7 6 6 8];
IndexVar=???
Mat(IndexVar) = 1.0;
Thus I want to access the sub-matrix Mat(2:7, 1:6, 1:6, 2:8) using a variable or some other generic form that allows a generic n-dim. Preferably not a loop (as it is slow).
I have tried using something of this nature:
% Generate each index list separately:
nDims=length(startInd);
ind=cell(nDims,1);
for j=1:nDims
ind{j}=startInd(j):1:endInd(j);
end
% Access the matrix:
S.type = '()';
S.subs = ind;
Mat=subsasgn(Mat,S,1.0)
This seems to get the job done, but is very slow and memory-expansive, but might give someone an idea...

If you don't mind looping over dimensions (which should be much faster than looping over array entries):
indexVar = arrayfun(#(a,b) colon(a,b), startInd, endInd, 'UniformOutput', false);
Mat(indexVar{:}) = 1;
This uses arrayfun (essentially a loop) to create a cell array with the indexing vectors, which is then expanded into a comma-separated list.
Now that I see your code: this uses the same approach, only that the loop is replaced by arrayfun and the comma-separated list allows a more natural indexing syntax instead of subsasgn.

Related

Extract values from a vector and sort them based on their original squence

I have a vector of numbers (temperatures), and I am using the MATLAB function mink to extract the 5 smallest numbers from the vector to form a new variable. However, the numbers extracted using mink are automatically ordered from lowest to largest (of those 5 numbers). Ideally, I would like to retain the sequence of the numbers as they are arranged in the original vector. I hope my problem is easy to understand. I appreciate any advice.
The function mink that you use was introduced in MATLAB 2017b. It has (as Andras Deak mentioned) two output arguments:
[B,I] = mink(A,k);
The second output argument are the indices, such that B == A(I).
To obtain the set B but sorted as they appear in A, simply sort the vector of indices I:
B = A(sort(I));
For example:
>> A = [5,7,3,1,9,4,6];
>> [~,I] = mink(A,3);
>> A(sort(I))
ans =
3 1 4
For older versions of MATLAB, it is possible to reproduce mink using sort:
function [B,I] = mink(A,k)
[B,I] = sort(A);
B = B(1:k);
I = I(1:k);
Note that, in the above, you don't need the B output, your ordered_mink can be written as follows
function B = ordered_mink(A,k)
[~,I] = sort(A);
B = A(sort(I(1:k)));
Note: This solution assumes A is a vector. For matrix A, see Andras' answer, which he wrote up at the same time as this one.
First you'll need the corresponding indices for the extracted values from mink using its two-output form:
[vals, inds] = mink(array);
Then you only need to order the items in val according to increasing indices in inds. There are multiple ways to do this, but they all revolve around sorting inds and using the corresponding order on vals. The simplest way is to put these vectors into a matrix and sort the rows:
sorted_rows = sortrows([inds, vals]); % sort on indices
and then just extract the corresponding column
reordered_vals = sorted_rows(:,2); % items now ordered as they appear in "array"
A less straightforward possibility for doing the sorting after the above call to mink is to take the sorting order of inds and use its inverse to reverse-sort vals:
reverse_inds = inds; % just allocation, really
reverse_inds(inds) = 1:numel(inds); % contruct reverse permutation
reordered_vals = vals(reverse_inds); % should be the same as previously

MATLAB: return both arguments from ISMEMBER when used inside SPLITAPPLY

How can I access both arguments of ismember when it is used inside splitapply?
slitapply only returns scalar values for each group, so in order to compute nonscalar values for each group (as returned by the first argument of ismemebr), one has to enclose the anonymous function (in this case ismember) inside curly brackets {} to return a cell array.
But now, when I provide two output arguments to splitapply, I get an error:
Output argument "varargout{2}" (and maybe others) not assigned during call to
"#(x,y) {ismember(x,y)}"
ADD 1
I can create another function, say, ismember2cell which would apply ismember and turn outputs into cell arrays:
function [a, b] = ismember2cell(x,y)
[a,b] = ismember(x,y);
a = {a};
b = {b};
end
but maybe there is a solution which doesn't require this workaround.
One potentially faster option is to just do what splitapply is already doing under the hood by splitting your data into cell arrays (using functions like mat2cell or accumarray) and then using cellfun to apply your function across them. Using cellfun will allow you to easily capture multiple outputs (such as from ismember). For example:
% Sample data:
A = [1 2 3 4 5];
B = [1 2 1 5 5];
G = [1 1 1 2 2]; % Group index
% Group data into cell arrays:
cellA = accumarray(G(:), A(:), [], #(x) {x(:).'}); % See note below about (:).' syntax
cellB = accumarray(G(:), B(:), [], #(x) {x(:).'});
% Apply function:
[Lia, Locb] = cellfun(#ismember, cellA, cellB, 'UniformOutput', false);
NOTE: My sample data are row vectors, but I had to use the colon operator to reshape them into column vectors when passing them to accumarray (it wants columns). Once distributed into a cell array, each piece of the vector would still be a column vector, and I simply wanted to keep them as row vectors to match the original sample data. The syntax (:).' is a colon reshaping followed by a nonconjugate transpose, ensuring a row vector as a result no matter the shape of x. In this case I probably could have just used .', but I've gotten into the habit of never assuming what the shape of a variable is.
I cannot find a global solution, but the accepted answer of this post helps me to define a helper function for your problem:
function varargout = out2cell(varargin)
[x{1:nargout}]=feval(varargin{:});
varargout = num2cell(x);
I think that you may succeed in calling
splitapply(#(x,y) out2cell(#ismember, x, y), A, B);

Matlab: creating vector from 2 input vectors [duplicate]

I'm trying to insert multiple values into an array using a 'values' array and a 'counter' array. For example, if:
a=[1,3,2,5]
b=[2,2,1,3]
I want the output of some function
c=somefunction(a,b)
to be
c=[1,1,3,3,2,5,5,5]
Where a(1) recurs b(1) number of times, a(2) recurs b(2) times, etc...
Is there a built-in function in MATLAB that does this? I'd like to avoid using a for loop if possible. I've tried variations of 'repmat()' and 'kron()' to no avail.
This is basically Run-length encoding.
Problem Statement
We have an array of values, vals and runlengths, runlens:
vals = [1,3,2,5]
runlens = [2,2,1,3]
We are needed to repeat each element in vals times each corresponding element in runlens. Thus, the final output would be:
output = [1,1,3,3,2,5,5,5]
Prospective Approach
One of the fastest tools with MATLAB is cumsum and is very useful when dealing with vectorizing problems that work on irregular patterns. In the stated problem, the irregularity comes with the different elements in runlens.
Now, to exploit cumsum, we need to do two things here: Initialize an array of zeros and place "appropriate" values at "key" positions over the zeros array, such that after "cumsum" is applied, we would end up with a final array of repeated vals of runlens times.
Steps: Let's number the above mentioned steps to give the prospective approach an easier perspective:
1) Initialize zeros array: What must be the length? Since we are repeating runlens times, the length of the zeros array must be the summation of all runlens.
2) Find key positions/indices: Now these key positions are places along the zeros array where each element from vals start to repeat.
Thus, for runlens = [2,2,1,3], the key positions mapped onto the zeros array would be:
[X 0 X 0 X X 0 0] % where X's are those key positions.
3) Find appropriate values: The final nail to be hammered before using cumsum would be to put "appropriate" values into those key positions. Now, since we would be doing cumsum soon after, if you think closely, you would need a differentiated version of values with diff, so that cumsum on those would bring back our values. Since these differentiated values would be placed on a zeros array at places separated by the runlens distances, after using cumsum we would have each vals element repeated runlens times as the final output.
Solution Code
Here's the implementation stitching up all the above mentioned steps -
% Calculate cumsumed values of runLengths.
% We would need this to initialize zeros array and find key positions later on.
clens = cumsum(runlens)
% Initalize zeros array
array = zeros(1,(clens(end)))
% Find key positions/indices
key_pos = [1 clens(1:end-1)+1]
% Find appropriate values
app_vals = diff([0 vals])
% Map app_values at key_pos on array
array(pos) = app_vals
% cumsum array for final output
output = cumsum(array)
Pre-allocation Hack
As could be seen that the above listed code uses pre-allocation with zeros. Now, according to this UNDOCUMENTED MATLAB blog on faster pre-allocation, one can achieve much faster pre-allocation with -
array(clens(end)) = 0; % instead of array = zeros(1,(clens(end)))
Wrapping up: Function Code
To wrap up everything, we would have a compact function code to achieve this run-length decoding like so -
function out = rle_cumsum_diff(vals,runlens)
clens = cumsum(runlens);
idx(clens(end))=0;
idx([1 clens(1:end-1)+1]) = diff([0 vals]);
out = cumsum(idx);
return;
Benchmarking
Benchmarking Code
Listed next is the benchmarking code to compare runtimes and speedups for the stated cumsum+diff approach in this post over the other cumsum-only based approach on MATLAB 2014B-
datasizes = [reshape(linspace(10,70,4).'*10.^(0:4),1,[]) 10^6 2*10^6]; %
fcns = {'rld_cumsum','rld_cumsum_diff'}; % approaches to be benchmarked
for k1 = 1:numel(datasizes)
n = datasizes(k1); % Create random inputs
vals = randi(200,1,n);
runs = [5000 randi(200,1,n-1)]; % 5000 acts as an aberration
for k2 = 1:numel(fcns) % Time approaches
tsec(k2,k1) = timeit(#() feval(fcns{k2}, vals,runs), 1);
end
end
figure, % Plot runtimes
loglog(datasizes,tsec(1,:),'-bo'), hold on
loglog(datasizes,tsec(2,:),'-k+')
set(gca,'xgrid','on'),set(gca,'ygrid','on'),
xlabel('Datasize ->'), ylabel('Runtimes (s)')
legend(upper(strrep(fcns,'_',' '))),title('Runtime Plot')
figure, % Plot speedups
semilogx(datasizes,tsec(1,:)./tsec(2,:),'-rx')
set(gca,'ygrid','on'), xlabel('Datasize ->')
legend('Speedup(x) with cumsum+diff over cumsum-only'),title('Speedup Plot')
Associated function code for rld_cumsum.m:
function out = rld_cumsum(vals,runlens)
index = zeros(1,sum(runlens));
index([1 cumsum(runlens(1:end-1))+1]) = 1;
out = vals(cumsum(index));
return;
Runtime and Speedup Plots
Conclusions
The proposed approach seems to be giving us a noticeable speedup over the cumsum-only approach, which is about 3x!
Why is this new cumsum+diff based approach better than the previous cumsum-only approach?
Well, the essence of the reason lies at the final step of the cumsum-only approach that needs to map the "cumsumed" values into vals. In the new cumsum+diff based approach, we are doing diff(vals) instead for which MATLAB is processing only n elements (where n is the number of runLengths) as compared to the mapping of sum(runLengths) number of elements for the cumsum-only approach and this number must be many times more than n and therefore the noticeable speedup with this new approach!
Benchmarks
Updated for R2015b: repelem now fastest for all data sizes.
Tested functions:
MATLAB's built-in repelem function that was added in R2015a
gnovice's cumsum solution (rld_cumsum)
Divakar's cumsum+diff solution (rld_cumsum_diff)
knedlsepp's accumarray solution (knedlsepp5cumsumaccumarray) from this post
Naive loop-based implementation (naive_jit_test.m) to test the just-in-time compiler
Results of test_rld.m on R2015b:
Old timing plot using R2015a here.
Findings:
repelem is always the fastest by roughly a factor of 2.
rld_cumsum_diff is consistently faster than rld_cumsum.
repelem is fastest for small data sizes (less than about 300-500 elements)
rld_cumsum_diff becomes significantly faster than repelem around 5 000 elements
repelem becomes slower than rld_cumsum somewhere between 30 000 and 300 000 elements
rld_cumsum has roughly the same performance as knedlsepp5cumsumaccumarray
naive_jit_test.m has nearly constant speed and on par with rld_cumsum and knedlsepp5cumsumaccumarray for smaller sizes, a little faster for large sizes
Old rate plot using R2015a here.
Conclusion
Use repelem below about 5 000 elements and the cumsum+diff solution above.
There's no built-in function I know of, but here's one solution:
index = zeros(1,sum(b));
index([1 cumsum(b(1:end-1))+1]) = 1;
c = a(cumsum(index));
Explanation:
A vector of zeroes is first created of the same length as the output array (i.e. the sum of all the replications in b). Ones are then placed in the first element and each subsequent element representing where the start of a new sequence of values will be in the output. The cumulative sum of the vector index can then be used to index into a, replicating each value the desired number of times.
For the sake of clarity, this is what the various vectors look like for the values of a and b given in the question:
index = [1 0 1 0 1 1 0 0]
cumsum(index) = [1 1 2 2 3 4 4 4]
c = [1 1 3 3 2 5 5 5]
EDIT: For the sake of completeness, there is another alternative using ARRAYFUN, but this seems to take anywhere from 20-100 times longer to run than the above solution with vectors up to 10,000 elements long:
c = arrayfun(#(x,y) x.*ones(1,y),a,b,'UniformOutput',false);
c = [c{:}];
There is finally (as of R2015a) a built-in and documented function to do this, repelem. The following syntax, where the second argument is a vector, is relevant here:
W = repelem(V,N), with vector V and vector N, creates a vector W where element V(i) is repeated N(i) times.
Or put another way, "Each element of N specifies the number of times to repeat the corresponding element of V."
Example:
>> a=[1,3,2,5]
a =
1 3 2 5
>> b=[2,2,1,3]
b =
2 2 1 3
>> repelem(a,b)
ans =
1 1 3 3 2 5 5 5
The performance problems in MATLAB's built-in repelem have been fixed as of R2015b. I have run the test_rld.m program from chappjc's post in R2015b, and repelem is now faster than other algorithms by about a factor 2:

Is there a splat operator (or equivalent) in Matlab?

If I have an array (of unknown length until runtime), is there a way to call a function with each element of the array as a separate parameter?
Like so:
foo = #(varargin) sum(cell2mat(varargin));
bar = [3,4,5];
foo(*bar) == foo(3,4,5)
Context: I have a list of indices to an n-d array, Q. What I want is something like Q(a,b,:), but I only have [a,b]. Since I don't know n, I can't just hard-code the indexing.
There is no operator in MATLAB that will do that. However, if your indices (i.e. bar in your example) were stored in a cell array, then you could do this:
bar = {3,4,5}; %# Cell array instead of standard array
foo(bar{:}); %# Pass the contents of each cell as a separate argument
The {:} creates a comma-separated list from a cell array. That's probably the closest thing you can get to the "operator" form you have in your example, aside from overriding one of the existing operators (illustrated here and here) so that it generates a comma-separated list from a standard array, or creating your own class to store your indices and defining how the existing operators operate for it (neither option for the faint of heart!).
For your specific example of indexing an arbitrary N-D array, you could also compute a linear index from your subscripted indices using the sub2ind function (as detailed here and here), but you might end up doing more work than you would for my comma-separated list solution above. Another alternative is to compute the linear index yourself, which would sidestep converting to a cell array and use only matrix/vector operations. Here's an example:
% Precompute these somewhere:
scale = cumprod(size(Q)).'; %'
scale = [1; scale(1:end-1)];
shift = [0 ones(1, ndims(Q)-1)];
% Then compute a linear index like this:
indices = [3 4 5];
linearIndex = (indices-shift)*scale;
Q(linearIndex) % Equivalent to Q(3,4,5)

Retrieving element index in spfun, cellfun, arrayfun, etc. in MATLAB

Is there any way to retrieve the index of the element on which a function called by cellfun, arrayfun or spfun acts? (i.e. retrieve the index of the element within the scope of the function).
For the sake of simplicity, imagine I have the following toy example:
S = spdiags([1:4]',0,4,4)
f = spfun(#(x) 2*x,S)
which builds a 4x4 sparse diagonal matrix and then multiplies each element by 2.
And say that now, instead of multiplying each element by the constant number 2, I would like to multiply it by the index the element has in the original matrix, i.e. assuming that linear_index holds the index for each element, it would be something like this:
S = spdiags([1:4]',0,4,4)
f = spfun(#(x) linear_index*x,S)
However, note the code above does not work (linear_index is undeclared).
This question is partially motivated by the fact that blocproc gives you access to block_struct.location, which one could argue references the location (~index) of the current element within the full object (an image in this case):
block_struct.location: A two-element vector, [row col], that specifies
the position of the first pixel (minimum-row, minimum-column) of the
block data in the input image.
No, but you can supply the linear index as extra argument.
Both cellfun and arrayfun accept multiple input arrays. Thus, with e.g. arrayfun, you can write
a = [1 1 2 2];
lin_idx = 1:4;
out = arrayfun(#(x,y)x*y,a,lin_idx);
This doesn't work with spfun, unfortunately, since it only accepts a single input (the sparse array).
You can possibly use arrayfun instead, like so:
S = spdiags([1:4]',0,4,4);
lin_idx = find(S);
out = spones(S);
out(lin_idx) = arrayfun(#(x,y)x*y,full(S(lin_idx)),lin_idx);
%# or
out(lin_idx) = S(lin_idx) .* lin_idx;
Note that the call to full won't get you into memory trouble, since S(lin_idx) is 0% sparse.
You can create a sparse matrix with linear_index filling instead of values.
Create A:
A(find(S))=find(S)
Then use A and S without spfun, e.g.: A.*S. This runs very fast.
It's simple. Just make a cell like:
C = num2cell(1:length(S));
then:
out=arrayfun(#(x,c) c*x,S,C)