I have a variable pth which is a cell array of dimension 1xn where n is a user input. Each of the elements in pth is itself a cell array and length(pth{k}) for k=1:n is variable (result of another function). Each element pth{k}{kk} where k=1:n and kk=1:length(pth{k}) is a 1D vector of integers/node numbers of again variable length. So to summarise, I have a variable number of variable-length vectors organised in a avriable number of cell arrays.
I would like to try and find all possible intersections when you take a vector at random from pth{1}, pth{2}, {pth{3}, etc... There are various functions on the File Exchange that seem to do that, for example this one or this one. The problem I have is you need to call the function this way:
mintersect(v1,v2,v3,...)
and I can't write all the inputs in the general case because I don't know explicitly how many there are (this would be n above). Ideally, I would like to do some thing like this;
mintersect(pth{1}{1},pth{2}{1},pth{3}{1},...,pth{n}{1})
mintersect(pth{1}{1},pth{2}{2},pth{3}{1},...,pth{n}{1})
mintersect(pth{1}{1},pth{2}{3},pth{3}{1},...,pth{n}{1})
etc...
mintersect(pth{1}{1},pth{2}{length(pth{2})},pth{3}{1},...,pth{n}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{2},...,pth{n}{1})
etc...
keep going through all the possible combinations, but I can't write this in code. This function from the File Exchange looks like a good way to find all possible combinations but again I have the same problem with the function call with the variable number of inputs:
allcomb(1:length(pth{1}),1:length(pth{2}),...,1:length(pth{n}))
Does anybody know how to work around this issue of function calls with variable number of input arguments when you can't physically specify all the input arguments because their number is variable? This applies equally to MATLAB and Octave, hence the two tags. Any other suggestion on how to find all possible combinations/intersections when taking a vector at random from each pth{k} welcome!
EDIT 27/05/20
Thanks to Mad Physicist's answer, I have ended up using the following which works:
disp('Computing intersections for all possible paths...')
grids = cellfun(#(x) 1:numel(x), pth, 'UniformOutput', false);
idx = cell(1, numel(pth));
[idx{:}] = ndgrid(grids{:});
idx = cellfun(#(x) x(:), idx, 'UniformOutput', false);
idx = cat(2, idx{:});
valid_comb = [];
k = 1;
for ii = idx'
indices = reshape(num2cell(ii), size(pth));
selection = cellfun(#(p,k) p{k}, pth, indices, 'UniformOutput', false);
if my_intersect(selection{:})
valid_comb = [valid_comb k];
endif
k = k+1;
end
My own version is similar but uses a for loop instead of the comma-separated list:
disp('Computing intersections for all possible paths...')
grids = cellfun(#(x) 1:numel(x), pth, 'UniformOutput', false);
idx = cell(1, numel(pth));
[idx{:}] = ndgrid(grids{:});
idx = cellfun(#(x) x(:), idx, 'UniformOutput', false);
idx = cat(2, idx{:});
[n_comb,~] = size(idx);
temp = cell(n_pipes,1);
valid_comb = [];
k = 1;
for k = 1:n_comb
for kk = 1:n_pipes
temp{kk} = pth{kk}{idx(k,kk)};
end
if my_intersect(temp{:})
valid_comb = [valid_comb k];
end
end
In both cases, valid_comb has the indices of the valid combinations, which I can then retrieve using something like:
valid_idx = idx(valid_comb(1),:);
for k = 1:n_pipes
pth{k}{valid_idx(k)} % do something with this
end
When I benchmarked the two approaches with some sample data (pth being 4x1 and the 4 elements of pth being 2x1, 9x1, 8x1 and 69x1), I got the following results:
>> benchmark
Elapsed time is 51.9075 seconds.
valid_comb = 7112
Elapsed time is 66.6693 seconds.
valid_comb = 7112
So Mad Physicist's approach was about 15s faster.
I also misunderstood what mintersect did, which isn't what I wanted. I wanted to find a combination where no element present in two or more vectors, so I ended writing my version of mintersect:
function valid_comb = my_intersect(varargin)
% Returns true if a valid combination i.e. no combination of any 2 vectors
% have any elements in common
comb_idx = combnk(1:nargin,2);
[nr,nc] = size(comb_idx);
valid_comb = true;
k = 1;
% Use a while loop so that as soon as an intersection is found, the execution stops
while valid_comb && (k<=nr)
temp = intersect(varargin{comb_idx(k,1)},varargin{comb_idx(k,2)});
valid_comb = isempty(temp) && valid_comb;
k = k+1;
end
end
Couple of helpful points to construct a solution:
This post shows you how to construct a Cartesian product between arbitrary arrays using ndgrid.
cellfun accepts multiple cell arrays simultaneously, which you can use to index specific elements.
You can capture a variable number of arguments from a function using cell arrays, as shown here.
So let's get the inputs to ndgrid from your outermost array:
grids = cellfun(#(x) 1:numel(x), pth, 'UniformOutput', false);
Now you can create an index that contains the product of the grids:
index = cell(1, numel(pth));
[index{:}] = ndgrid(grids{:});
You want to make all the grids into column vectors and concatenate them sideways. The rows of that matrix will represent the Cartesian indices to select the elements of pth at each iteration:
index = cellfun(#(x) x(:), index, 'UniformOutput', false);
index = cat(2, index{:});
If you turn a row of index into a cell array, you can run it in lockstep over pth to select the correct elements and call mintersect on the result.
for i = index'
indices = num2cell(i');
selection = cellfun(#(p, i) p{i}, pth, indices, 'UniformOutput', false);
mintersect(selection{:});
end
This is written under the assumption that pth is a row array. If that is not the case, you can change the first line of the loop to indices = reshape(num2cell(i), size(pth)); for the general case, and simply indices = num2cell(i); for the column case. The key is that the cell from of indices must be the same shape as pth to iterate over it in lockstep. It is already generated to have the same number of elements.
I believe this does the trick. Calls mintersect on all possible combinations of vectors in pth{k}{kk} for k=1:n and kk=1:length(pth{k}).
Using eval and messing around with sprintf/compose a bit. Note that typically the use of eval is very much discouraged. Can add more comments if this is what you need.
% generate some data
n = 5;
pth = cell(1,n);
for k = 1:n
pth{k} = cell(1,randi([1 10]));
for kk = 1:numel(pth{k})
pth{k}{kk} = randi([1 100], randi([1 10]), 1);
end
end
% get all combs
str_to_eval = compose('1:length(pth{%i})', 1:numel(pth));
str_to_eval = strjoin(str_to_eval,',');
str_to_eval = sprintf('allcomb(%s)',str_to_eval);
% use eval to get all combinations for a given pth
all_combs = eval(str_to_eval);
% and make strings to eval in intersect
comp = num2cell(1:numel(pth));
comp = [comp ;repmat({'%i'}, 1, numel(pth))];
str_pattern = sprintf('pth{%i}{%s},', comp{:});
str_pattern = str_pattern(1:end-1); % get rid of last ,
strings_to_eval = cell(length(all_combs),1);
for k = 1:size(all_combs,1)
strings_to_eval{k} = sprintf(str_pattern, all_combs(k,:));
end
% and run eval on all those strings
result = cell(length(all_combs),1);
for k = 1:size(all_combs,1)
result{k} = eval(['mintersect(' strings_to_eval{k} ')']);
%fprintf(['mintersect(' strings_to_eval{k} ')\n']); % for debugging
end
For a randomly generated pth, the code produces the following strings to evaluate (where some pth{k} have only one cell for illustration):
mintersect(pth{1}{1},pth{2}{1},pth{3}{1},pth{4}{1},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{1},pth{4}{2},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{1},pth{4}{3},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{2},pth{4}{1},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{2},pth{4}{2},pth{5}{1})
mintersect(pth{1}{1},pth{2}{1},pth{3}{2},pth{4}{3},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{1},pth{4}{1},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{1},pth{4}{2},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{1},pth{4}{3},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{2},pth{4}{1},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{2},pth{4}{2},pth{5}{1})
mintersect(pth{1}{2},pth{2}{1},pth{3}{2},pth{4}{3},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{1},pth{4}{1},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{1},pth{4}{2},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{1},pth{4}{3},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{2},pth{4}{1},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{2},pth{4}{2},pth{5}{1})
mintersect(pth{1}{3},pth{2}{1},pth{3}{2},pth{4}{3},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{1},pth{4}{1},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{1},pth{4}{2},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{1},pth{4}{3},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{2},pth{4}{1},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{2},pth{4}{2},pth{5}{1})
mintersect(pth{1}{4},pth{2}{1},pth{3}{2},pth{4}{3},pth{5}{1})
As Madphysicist pointed out, I misunderstood the initial structure of your initial cell array, however the point stands. The way to pass an unknown number of arguments to a function is via comma-separated-list generation, and your function needs to support it by being declared with varargin. Updated example below.
Create a helper function to collect a random subcell from each main cell:
% in getRandomVectors.m
function Out = getRandomVectors(C) % C: a double-jagged array, as described
N = length(C);
Out = cell(1, N);
for i = 1 : length(C)
Out{i} = C{i}{randi( length(C{i}) )};
end
end
Then assuming you already have an mintersect function defined something like this:
% in mintersect.m
function Intersections = mintersect( varargin )
Vectors = varargin;
N = length( Vectors );
for i = 1 : N; for j = 1 : N
Intersections{i,j} = intersect( Vectors{i}, Vectors{j} );
end; end
end
Then call this like so:
C = { { 1:5, 2:4, 3:7 }, {1:8}, {2:4, 3:9, 2:8} }; % example double-jagged array
In = getRandomVectors(C); % In is a cell array of randomly selected vectors
Out = mintersect( In{:} ); % Note the csl-generator syntax
PS. I note that your definition of mintersect differs from those linked. It may just be you didn't describe what you want too well, in which case my mintersect function is not what you want. What mine does is produce all possible intersections for the vectors provided. The one you linked to produces a single intersection which is common to all vectors provided. Use whichever suits you best. The underlying rationale for using it is the same though.
PS. It is also not entirely clear from your description whether what you're after is a random vector k for each n, or the entire space of possible vectors over all n and k. The above solution does the former. If you want the latter, see MadPhysicist's solution on how to create a cartesian product of all possible indices instead.
I have a list of parameters and I need to evaluate my method over this list. Right now, I am doing it this way
% Parameters
params.corrAs = {'objective', 'constraint'};
params.size = {'small', 'medium', 'large'};
params.density = {'uniform', 'non-uniform'};
params.k = {3,4,5,6};
params.constraintP = {'identity', 'none'};
params.Npoints_perJ = {2, 3};
params.sampling = {'hks', 'fps'};
% Select the current parameter
for corrAs_iter = params.corrAs
for size_iter = params.size
for density_iter = params.density
for k_iter = params.k
for constraintP_iter = params.constraintP
for Npoints_perJ_iter = params.Npoints_perJ
for sampling_iter = params.sampling
currentParam.corrAs = corrAs_iter;
currentParam.size = size_iter;
currentParam.density = density_iter;
currentParam.k = k_iter;
currentParam.constraintP = constraintP_iter;
currentParam.Npoints_perJ = Npoints_perJ_iter;
currentParam.sampling = sampling_iter;
evaluateMethod(currentParam);
end
end
end
end
end
end
end
I know it looks ugly and if I want to add a new type of parameter, I have to write another for loop. Is there any way, I can vectorize this? Or maybe use 2 for loops instead of so many.
I tried the following but, it doesn't result in what I need.
for i = 1:numel(fields)
% if isempty(params.(fields{i}))
param.(fields{i}) = params.(fields{i})(1);
params.(fields{i})(1) = [];
end
What you need is all combinations of your input parameters. Unfortunately, as you add more parameters the storage requirements will grow quickly (and you'll have to use a large indexing matrix).
Instead, here is an idea which uses linear indicies of a (never created) n1*n2*...*nm matrix, where ni is the number of elements in each field, for m fields.
It is flexible enough to cope with any amount of fields being added to params. Not performance tested, although as with any "all combinations" operation you should be wary of the non-linear increase in computation time as you add more fields to params, note prod(sz)!
The code I've shown is fast, but the performance will depend entirely on which operations you do in the loop.
% Add parameters here
params.corrAs = {'objective', 'constraint'};
params.size = {'small', 'medium', 'large'};
params.density = {'uniform', 'non-uniform'};
% Setup
f = fieldnames( params );
nf = numel(f);
sz = NaN( nf, 1 );
% Loop over all parameters to get sizes
for jj = 1:nf
sz(jj) = numel( params.(f{jj}) );
end
% Loop for every combination of parameters
idx = cell(1,nf);
for ii = 1:prod(sz)
% Use ind2sub to switch from a linear index to the combination set
[idx{:}] = ind2sub( sz, ii );
% Create currentParam from the combination indices
currentParam = struct();
for jj = 1:nf
currentParam.(f{jj}) = params.(f{jj}){idx{jj}};
end
% Do something with currentParam here
% ...
end
Asides:
I'm using dynamic field name references for indexing the fields
I'm passing multiple outputs into a cell array from ind2sub, so you can handle a variable number of field names when ind2sub has one output for each dimension (or field in this use-case).
Here is a vectorized solution :
names = fieldnames(params).';
paramGrid = cell(1,numel(names));
cp = struct2cell(params);
[paramGrid{:}] = ndgrid(cp{:});
ng = [names;paramGrid];
st = struct(ng{:});
for param = st(:).'
currentParam = param;
end
Instead of nested loops we can use ndgrid to create the cartesian product of the cell entries so we can find all combinations of cell entries without loop.
I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?
This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));
M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another
I am using the input parser functionality a lot but now I am facing the following issue:
I am coding an object that computes the rate-of-change of time series, i.e. one input argument is a matrix of doubles (timeSeries). A second input parameter is the time period used to compute the rate-of-change (lag). lag can be a vector wherby different rate-of-changes will be computed. A third input vector is called weightVector. weightVector will be used to compute an average of all rate-of-changes and applies the appropriate weights to the respective rate-of-change results.
Now, I always like to define some default values when using input parser. I would like to define weightVector to be an equal weighted vector. However, the length of the default weightVector is dependent on the length of lag. For example, if lag = [1,2,3], then weightVectorshould equal [1/3, 1/3, 1/3]. How am I supposed to code a situation like this? My current code for the constructor looks like this:
function obj = roc(timeSeries, varargin)
%% Input parser
% 1. Create input parser instance
p = inputParser;
% 2. Default values for input arguments
default_lag = 1;
default_weightVector = 1/length(lag); % This line is causing
% problems as LAG isn't
% defined, yet.
% 3. Validation of input arguments
valid_lag = {'vector', 'nonempty', 'integer', 'positive'};
check_lag = #(x) validateattributes(x, {'numeric'}, valid_lag);
valid_weightVector = {'vector', 'nonempty'};
check_weightVector = #(x) validateattributes(x, {'numeric'}, ...
valid_weightVector);
% 4. Add input arguments to input scheme
p.addRequired('timeSeries');
p.addParamater('lag', default_lag, check_lag);
p.addParameter('weightVector', default_weightVector, check_weightVector);
% 5. Parse input arguments
parse(p, timeSeries, varargin{:});
% 6. Assign results to variables
lag = p.Results.lag;
weightVector = p.Results.weightVector;
%% Main code
end % Constructor
function obj = roc(timeSeries, varargin)
%% Input parser
% 1. Create input parser instance
p = inputParser;
% 2. Default values for input arguments
default_lag = 1;
default_weightVector = 1;
% 3. Validation of input arguments
valid_lag = {'vector', 'nonempty', 'integer', 'positive'};
check_lag = #(x) validateattributes(x, {'numeric'}, valid_lag);
% 4. Add input arguments to input scheme
p.addRequired('timeSeries');
p.addParameter('lag', default_lag, check_lag);
p.addParameter('weightVector', default_weightVector);
% 5. Parse input arguments
parse(p, timeSeries, varargin{:});
% 6. Assign results to variables
lag = p.Results.lag;
if check_weightVector(p.Results.weightVector, lag) == true
weightVector = p.Results.weightVector;
end
function vout = check_weightVector(weightVector, lag) % validation function
if length(lag) ~= length(weightVector)
error('lengthWeightVector:WrongNumberOfElements', 'The number of elements in "weightVector" must correspond to the number of elements in "lag"');
elseif sum(weightVector) ~= 1
error('sumWeightVector:SumNotEqualToOne', 'The sum of elements in "weightVector" must equal 1');
end
vout = true;
end
%% Main code
end
I have a 500-by-1 matrix of strings S and a 500-by-1 matrix of numbers N. I want to see them in the variable editor together like this:
S(1) N(1)
S(2) N(2)
...
S(500) N(500)
Is this possible?
The following should allow you to look at the variables together in the Command Window:
disp([char(S) blanks(numel(N))' num2str(N)]);
The array S (which I presume is a cell array) is converted to a character array using the function CHAR. It's then concatenated with a column vector of blanks (made using the function BLANKS) and then a string representation of the numeric array N (made using the function NUM2STR). This is then displayed using the function DISP.
Speaking narrowly to your question, just convert the numbers to cells. You'll have a single variable that the array editor can handle.
X = [ S num2cell(N) ];
More broadly, here's an array-oriented variant of sprintf that's useful for displaying records constructed from parallel arrays. You'd call it like this. I use something like this for displaying tabular data a lot.
sprintf2('%-*s %8g', max(cellfun('prodofsize',S)), S, N)
Here's the function.
function out = sprintf2(fmt, varargin)
%SPRINTF2 Quasi-"vectorized" sprintf
%
% out = sprintf2(fmt, varargin)
%
% Like sprintf, but takes arrays of arguments and returns cellstr. This
% lets you do formatted output on nonscalar arrays.
%
% Example:
% food = {'wine','cheese','fancy bread'};
% price = [10 6.38 8.5];
% sprintf2('%-12s %6.2f', food, price)
% % Fancier formatting with width detection
% sprintf2('%-*s %6.2f', max(cellfun('prodofsize',food)), food, price)
[args,n] = promote(varargin);
out = cell(n,1);
for i = 1:n
argsi = grab(args, i);
out{i} = sprintf(fmt, argsi{:});
end
% Convenience HACK for display to command line
if nargout == 0
disp(char(out));
clear out;
end
function [args,n] = promote(args)
%PROMOTE Munge inputs to get cellstrs
for i = 1:numel(args)
if ischar(args{i})
args{i} = cellstr(args{i});
end
end
n = cellfun('prodofsize', args);
if numel(unique(n(n > 1))) > 1
error('Inconsistent lengths in nonscalar inputs');
end
n = max(n);
function out = grab(args, k)
%GRAB Get the kth element of each arg, popping out cells
for i = 1:numel(args)
if isscalar(args{i})
% "Scalar expansion" case
if iscell(args{i})
out{i} = args{i}{1};
else
out{i} = args{i};
end
else
% General case - kth element of array
if iscell(args{i})
out{i} = args{i}{k};
else
out{i} = args{i}(k);
end
end
end