Optimize algorithm that generates the number of units in each binary state - matlab

TL;DR: I need to find all possible combinations of N row vectors (of size 1xB), whose row-wise sum produces the desired result vector (also of size 1xB).
I have a binary matrix (1 or 0 entries only) of size N x B where N denotes the number of units and B denotes the number of bins. Each unit, i.e., each row, of the matrix can be in one of 2^B states. That is, if B=2, the states possible are {0,0}, {0,1}, {1,0} or {1,1}. If B=3, then the possible states are {0,0,0}, {0,0,1}, {0,1,0}, {0,1,1}, {1,0,0}, {1,0,1}, {1,1,0} or {1,1,1}. Basically the binary representation of the numbers from 0 to 2^B-1.
For the matrix, I know the sum over the rows of the matrix, for example, {1,2}. This sum can be achieved through different binary matrices like [0,0;0,1;1,1] or [0,1;0,1;1,0]. The number of units in each state are {1,1,0,1} and {0,2,1,0}, respectively for each of the matrices, where the first number corresponds to the first state {0,0}, second to the second state {0,1} and so on in increasing order. My problem is to find all possible vectors of these numbers of states that satisfy a particular matrix sum.
Now to implement this in MATLAB, I used recursion and a global variable. This to me was the easiest approach, however, it takes a lot of time. The code I used is given below:
function output = getallstate()
global nState % stores all the possible vectors
global nStateRow % stores the current row of the vector
global statebin %stores the binary representation of all the possible states
nState = [];
nStateRow = 1;
nBin = 2; % number of columns or B
v = [1 2]; % should always be of the size 1 x nBin
N = 3; % number of units
statebin = de2bi(0:(2 ^ nBin - 1), nBin) == 1; % stored as logical because I use it to index later
getnstate(v, 2 ^ nBin - 1, nBin) % the main function
checkresult(v, nState, nBin) % will result in false if even one of the results is incorrect
% adjust for max number of units, because the total of each row cannot exceed this number.
output = nState(1:end-1, :); % last row is always repeated (needs to be fixed somehow)
output(:, 1) = N - sum(output(:, 2:end), 2); % the first column, that is the number of units in the all 0 state is always determined by the number of units in the other states
if any(output(:, 1) < 0)
output(output(:, 1) < 0, :) = [];
end
end
function getnstate(r, state, nBin)
global nState
global nStateRow
global statebin
if state == 0
if all(r == 0)
nStateRow = nStateRow + 1;
nState(nStateRow, :) = nState(nStateRow - 1, :);
end
else
for a = 0:min(r(statebin(state + 1, :)))
nState(nStateRow, state + 1) = a;
getnstate(r - a * statebin(state + 1, :), state - 1, nBin);
end
end
end
function allOk = checkresult(r, nState, nBin)
% just a function that checks whether the obtained vectors all result in the correct sum
allstate = de2bi(0:(2 ^ nBin - 1), nBin);
allOk = true;
for iRow = 1:size(nState, 1)
sumR = sum(bsxfun(#times, allstate, nState(iRow, :).'), 1);
allOk = allOk & isequal(sumR,r);
end
end
function b = de2bi(d, n)
d = d(:);
[~, e] = log2(max(d));
b = rem(floor(d * pow2(1-max(n, e):0)), 2);
end
The above code works fine and gives all possible states but, as is expected, it gets slower as you increase the number of columns (B) and the number of units (N). Also, it uses globals. The following are my questions:
Is there a way to generate these without using globals?
Is there a non-recursive way for this algorithm?
EDIT 1
In what way do the above and still have an optimised algorithm which is faster than the current version?
EDIT 2
Added the de2bi function to remove dependency on the Communications Toolbox.

Related

Can operations on submatrices (and subvectors) be vectorized?

I'm currently working on an edge detector in octave. Coming from other programming languages like Java and Python, I'm used to iterating in for loops, rather than performing operations on entire matrices. Now in octave, this causes a serious performance hit, and I'm having a bit of difficulty figuring out how to vectorize my code. I have the following two pieces of code:
1)
function zc = ZeroCrossings(img, T=0.9257)
zc = zeros(size(img));
# Iterate over central positions of all 3x3 submatrices
for y = 2:rows(img) - 1
for x = 2:columns(img) - 1
ndiff = 0;
# Check all necessary pairs of elements of the submatrix (W/E, N/S, NW/SE, NE/SW)
for d = [1, 0; 0, 1; 1, 1; 1, -1]'
p1 = img(y-d(2), x-d(1));
p2 = img(y+d(2), x+d(1));
if sign(p1) != sign(p2) && abs(p1 - p2) >= T
ndiff++;
end
end
# If at least two pairs fit the requirements, these coordinates are a zero crossing
if ndiff >= 2
zc(y, x) = 1;
end
end
end
end
2)
function g = LinkGaps(img, k=5)
g = zeros(size(img));
for i = 1:rows(img)
g(i, :) = link(img(i, :), k);
end
end
function row = link(row, k)
# Find first 1
i = 1;
while i <= length(row) && row(i) == 0
i++;
end
# Iterate over gaps
while true
# Determine gap start
while i <= length(row) && row(i) == 1
i++;
end
start = i;
# Determine gap stop
while i <= length(row) && row(i) == 0
i++;
end
# If stop wasn't reached, exit loop
if i > length(row)
break
end
# If gap is short enough, fill it with 1s
if i - start <= k
row(start:i-1) = 1;
end
end
end
Both of these functions iterate over submatrices (or rows and subrows in the second case), and particularly the first one seems to be slowing down my program quite a bit.
This function takes a matrix of pixels (img) and returns a binary (0/1) matrix, with 1s where zero crossings (pixels whose corresponding 3x3 neighbourhoods fit certain requirements) were found.
The outer 2 for loops seem like they should be possible to vectorize somehow. I can put the body into its own function (taking as an argument the necessary submatrix) but I can't figure out how to then call this function on all submatrices, setting their corresponding (central) positions to the returned value.
Bonus points if the inner for loop can also be vectorized.
This function takes in the binary matrix from the previous one's output, and fills in gaps in its rows (i.e. sets them to 1). A gap is defined as a series of 0s of length <= k, bounded on both sides by 1s.
Now I'm sure at least the outer loop (the one in LinkGaps) is vectorizable. However, the while loop in link again operates on subvectors, rather than single elements so I'm not sure how I'd go about vectorizing it.
Not a full solution, but here is an idea how you could do the first without any loops:
% W/E
I1 = I(2:end-1,1:end-2);
I2 = I(2:end-1,3:end );
C = (I1 .* I2 < 0) .* (abs(I1 - I2)>=T);
% N/S
I1 = I(1:end-2,2:end-1);
I2 = I(3:end, 2:end-1);
C = C + (I1 .* I2 < 0) .* (abs(I1 - I2)>=T);
% proceed similarly with NW/SE and NE/SW
% ...
% zero-crossings where count is at least 2
ZC = C>=2;
Idea: form two subimages that are appropriately shifted, check for the difference in sign (product negative) and threshold the difference. Both tests return a logical (0/1) matrix, the element-wise product does the logical and, result is a 0/1 matrix with 1 where both tests have succeeded. These matrices can be added to keep track of the counts (ndiff).

Indices of constant consecutive values in a matrix, and number of constant values

I have a matrix with constant consecutive values randomly distributed throughout the matrix. I want the indices of the consecutive values, and further, I want a matrix of the same size as the original matrix, where the number of consecutive values are stored in the indices of the consecutive values. For Example
original_matrix = [1 1 1;2 2 3; 1 2 3];
output_matrix = [3 3 3;2 2 0;0 0 0];
I have struggled mightily to find a solution to this problem. It has relevance for meteorological data quality control. For example, if I have a matrix of temperature data from a number of sensors, and I want to know what days had constant consecutive values, and how many days were constant, so I can then flag the data as possibly faulty.
temperature matrix is number of days x number of stations and I want an output matrix that is also number of days x number of stations, where the consecutive values are flagged as described above.
If you have a solution to that, please provide! Thank you.
For this kind of problems, I made my own utility function runlength:
function RL = runlength(M)
% calculates length of runs of consecutive equal items along columns of M
% work along columns, so that you can use linear indexing
% find locations where items change along column
jumps = diff(M) ~= 0;
% add implicit jumps at start and end
ncol = size(jumps, 2);
jumps = [true(1, ncol); jumps; true(1, ncol)];
% find linear indices of starts and stops of runs
ijump = find(jumps);
nrow = size(jumps, 1);
istart = ijump(rem(ijump, nrow) ~= 0); % remove fake starts in last row
istop = ijump(rem(ijump, nrow) ~= 1); % remove fake stops in first row
rl = istop - istart;
assert(sum(rl) == numel(M))
% make matrix of 'derivative' of runlength
% don't need last row, but needs same size as jumps for indices to be valid
dRL = zeros(size(jumps));
dRL(istart) = rl;
dRL(istop) = dRL(istop) - rl;
% remove last row and 'integrate' to get runlength
RL = cumsum(dRL(1:end-1,:));
It only works along columns since it uses linear indexing. Since you want do something similar along rows, you need to transpose back and forth, so you could use it for your case like so:
>> original = [1 1 1;2 2 3; 1 2 3];
>> original = original.'; % transpose, since runlength works along columns
>> output = runlength(original);
>> output = output.'; % transpose back
>> output(output == 1) = 0; % see hitzg's comment
>> output
output =
3 3 3
2 2 0
0 0 0

General method to find submatrix in matlab matrix

I am looking for a 'good' way to find a matrix (pattern) in a larger matrix (arbitrary number of dimensions).
Example:
total = rand(3,4,5);
sub = total(2:3,1:3,3:4);
Now I want this to happen:
loc = matrixFind(total, sub)
In this case loc should become [2 1 3].
For now I am just interested in finding one single point (if it exists) and am not worried about rounding issues. It can be assumed that sub 'fits' in total.
Here is how I could do it for 3 dimensions, however it just feels like there is a better way:
total = rand(3,4,5);
sub = total(2:3,1:3,3:4);
loc = [];
for x = 1:size(total,1)-size(sub,1)+1
for y = 1:size(total,2)-size(sub,2)+1
for z = 1:size(total,3)-size(sub,3)+1
block = total(x:x+size(sub,1)-1,y:y+size(sub,2)-1,z:z+size(sub,3)-1);
if isequal(sub,block)
loc = [x y z]
end
end
end
end
I hope to find a workable solution for an arbitrary number of dimensions.
Here is low-performance, but (supposedly) arbitrary dimensional function. It uses find to create a list of (linear) indices of potential matching positions in total and then just checks if the appropriately sized subblock of total matches sub.
function loc = matrixFind(total, sub)
%matrixFind find position of array in another array
% initialize result
loc = [];
% pre-check: do all elements of sub exist in total?
elements_in_both = intersect(sub(:), total(:));
if numel(elements_in_both) < numel(unique(sub))
% if not, return nothing
return
end
% select a pivot element
% Improvement: use least common element in total for less iterations
pivot_element = sub(1);
% determine linear index of all occurences of pivot_elemnent in total
starting_positions = find(total == pivot_element);
% prepare cell arrays for variable length subscript vectors
[subscripts, subscript_ranges] = deal(cell([1, ndims(total)]));
for k = 1:length(starting_positions)
% fill subscript vector for starting position
[subscripts{:}] = ind2sub(size(total), starting_positions(k));
% add offsets according to size of sub per dimension
for m = 1:length(subscripts)
subscript_ranges{m} = subscripts{m}:subscripts{m} + size(sub, m) - 1;
end
% is subblock of total equal to sub
if isequal(total(subscript_ranges{:}), sub)
loc = [loc; cell2mat(subscripts)]; %#ok<AGROW>
end
end
end
This is based on doing all possible shifts of the original matrix total and comparing the upper-leftmost-etc sub-matrix of the shifted total with the sought pattern subs. Shifts are generated using strings, and are applied using circshift.
Most of the work is done vectorized. Only one level of loops is used.
The function finds all matchings, not just the first. For example:
>> total = ones(3,4,5,6);
>> sub = ones(3,3,5,6);
>> matrixFind(total, sub)
ans =
1 1 1 1
1 2 1 1
Here is the function:
function sol = matrixFind(total, sub)
nd = ndims(total);
sizt = size(total).';
max_sizt = max(sizt);
sizs = [ size(sub) ones(1,nd-ndims(sub)) ].'; % in case there are
% trailing singletons
if any(sizs>sizt)
error('Incorrect dimensions')
end
allowed_shift = (sizt-sizs);
max_allowed_shift = max(allowed_shift);
if max_allowed_shift>0
shifts = dec2base(0:(max_allowed_shift+1)^nd-1,max_allowed_shift+1).'-'0';
filter = all(bsxfun(#le,shifts,allowed_shift));
shifts = shifts(:,filter); % possible shifts of matrix "total", along
% all dimensions
else
shifts = zeros(nd,1);
end
for dim = 1:nd
d{dim} = 1:sizt(dim); % vectors with subindices per dimension
end
g = cell(1,nd);
[g{:}] = ndgrid(d{:}); % grid of subindices per dimension
gc = cat(nd+1,g{:}); % concatenated grid
accept = repmat(permute(sizs,[2:nd+1 1]), [sizt; 1]); % acceptable values
% of subindices in order to compare with matrix "sub"
ind_filter = find(all(gc<=accept,nd+1));
sol = [];
for shift = shifts
total_shifted = circshift(total,-shift);
if all(total_shifted(ind_filter)==sub(:))
sol = [ sol; shift.'+1 ];
end
end
For an arbitrary number of dimensions, you might try convn.
C = convn(total,reshape(sub(end:-1:1),size(sub)),'valid'); % flip dimensions of sub to be correlation
[~,indmax] = max(C(:));
% thanks to Eitan T for the next line
cc = cell(1,ndims(total)); [cc{:}] = ind2sub(size(C),indmax); subs = [cc{:}]
Thanks to Eitan T for the suggestion to use comma-separated lists for a generalized ind2sub.
Finally, you should test the result with isequal because this is not a normalized cross correlation, meaning that larger numbers in a local subregion will inflate the correlation value potentially giving false positives. If your total matrix is very inhomogeneous with regions of large values, you might need to search other maxima in C.

Matrix dimension must agree error in matlab?

I have adapted some existing code for my program but I am coming across an error that I do not know the cause for. I have data with N observations where my goal is to break up the data into increasing smaller subsamples and do calculations on each of the subsamples. To determine the how the subsample size will change, the program finds divisors of N and stores it into an array OptN.
dmin = 2;
% Find OptN such that it has the largest number of
% divisors among all natural numbers in the interval [0.99*N,N]
N = length(x);
N0 = floor(0.99*N);
dv = zeros(N-N0+1,1);
for i = N0:N,
dv(i-N0+1) = length(divisors(i,dmin));
end
OptN = N0 + find(max(dv)==dv) - 1;
% Use the first OptN values of x for further analysis
x = x(1:OptN);
% Find the divisors >= dmin for OptN
d = divisors(OptN,dmin);
function d = divisors(n,n0)
% Find all divisors of the natural number N greater or equal to N0
i = n0:floor(n/2);
d = find((n./i)==floor(n./i))' + n0 - 1; % Problem line
In function divisors is where the problem occurs. I have 'Error using ./ Matrix dimensions must agree.' However, this worked with input data of length 60, but when I try data of length 1058 it gives me the above error.
I think that with large dataset it's possible that find(max(dv)==dv) will returns multiple numbers. So OptN will become a vector, not a scalar.
Then the length of i (BTW not a good name for variable in MATLAB, it's also a complex number i) will be unpredictable and probably different from n causing the dimension error in the next statement.
You can try find(max(dv)==dv,1) instead to get only the first match. Or add a loop.

Indexing of unknown dimensional matrix

I have a non-fixed dimensional matrix M, from which I want to access a single element.
The element's indices are contained in a vector J.
So for example:
M = rand(6,4,8,2);
J = [5 2 7 1];
output = M(5,2,7,1)
This time M has 4 dimensions, but this is not known in advance. This is dependent on the setup of the algorithm I'm writing. It could likewise be that
M = rand(6,4);
J = [3 1];
output = M(3,1)
so I can't simply use
output=M(J(1),J(2))
I was thinking of using sub2ind, but this also needs its variables comma separated..
#gnovice
this works, but I intend to use this kind of element extraction from the matrix M quite a lot. So if I have to create a temporary variable cellJ every time I access M, wouldn't this tremendously slow down the computation??
I could also write a separate function
function x= getM(M,J)
x=M(J(1),J(2));
% M doesn't change in this function, so no mem copy needed = passed by reference
end
and adapt this for different configurations of the algorithm. This is of course a speed vs flexibility consideration which I hadn't included in my question..
BUT: this is only available for getting the element, for setting there is no other way than actually using the indices (and preferably the linear index). I still think sub2ind is an option. The final result I had intended was something like:
function idx = getLinearIdx(J, size_M)
idx = ...
end
RESULTS:
function lin_idx = Lidx_ml( J, M )%#eml
%LIDX_ML converts an array of indices J for a multidimensional array M to
%linear indices, directly useable on M
%
% INPUT
% J NxP matrix containing P sets of N indices
% M A example matrix, with same size as on which the indices in J
% will be applicable.
%
% OUTPUT
% lin_idx Px1 array of linear indices
%
% method 1
%lin_idx = zeros(size(J,2),1);
%for ii = 1:size(J,2)
% cellJ = num2cell(J(:,ii));
% lin_idx(ii) = sub2ind(size(M),cellJ{:});
%end
% method 2
sizeM = size(M);
J(2:end,:) = J(2:end,:)-1;
lin_idx = cumprod([1 sizeM(1:end-1)])*J;
end
method 2 is 20 (small number of index sets (=P) to convert) to 80 (large number of index sets (=P)) times faster than method 1. easy choice
For the general case where J can be any length (which I assume always matches the number of dimensions in M), there are a couple options you have:
You can place each entry of J in a cell of a cell array using the num2cell function, then create a comma-separated list from this cell array using the colon operator:
cellJ = num2cell(J);
output = M(cellJ{:});
You can sidestep the sub2ind function and compute the linear index yourself with a little bit of math:
sizeM = size(M);
index = cumprod([1 sizeM(1:end-1)]) * (J(:) - [0; ones(numel(J)-1, 1)]);
output = M(index);
Here is a version of gnovices option 2) which allows to process a whole matrix of subscripts, where each row contains one subscript. E.g for 3 subscripts:
J = [5 2 7 1
1 5 2 7
4 3 9 2];
sizeM = size(M);
idx = cumprod([1 sizeX(1:end-1)])*(J - [zeros(size(J,1),1) ones(size(J,1),size(J,2)-1)]).';