Balanced distribution of elements to bins based on index

Balanced distribution of elements to bins based on index - element

I have n elements with index 0..(n-1). I want to distribute the elements to m bins like so:
I want to fill the bins sequentially
The size of the bins should be between ⌊number_of_elements / number_of_bins⌋ and ⌈number_of_elements / number_of_bins⌉. The bigger bins should come first.
I want to assign the elements based on the index of the element. I can only come up with solutions with various for loops. It should be possible to use only one for loop to assign the elements to a bin and mod and div and maybe if-operators for this.
Example: I have n=7 elements and and m=3 bins. The result should be this:
Bin 1: 0, 1, 2
Bin 2: 3, 4
Bin 3: 5, 6

Here is a proof of concept example in Python.
# Initialize
elements = [0, 1, 2, 3, 4, 5, 6];
n = len(elements); # Number of elements
m = 3; # Number of bins
bins = [[] for x in range(m)];
# Precalculate this
elementsPerBinCeil = n / m;
elementsPerBinFloor = n / m - 1;
# This is the bin number above which we have to use elementsPerBinFloor
cutoffNum = n % m;
i = 0; # This is which bin to assign the element to
# Assign all elements to a bin
for element in elements:
bins[i].append(element);
# Move to next bin
if (i < cutoffNum and len(bins[i]) > elementsPerBinCeil):
i += 1;
elif (i >= cutoffNum and len(bins[i]) > elementsPerBinFloor):
i += 1;
Update: I have several example implementations in Python here. Check the various branches of the repository if you are interested in different ways to do the same thing.

Related

Is there a way to modify the isdag matlab function in order for it to ignore cycles of length zero?

recently I've been tasked with programming an algorithm that optimizes job-shop scheduling problems and I'm following an approach which uses directed graphs for this. In these directed graphs nodes represent events and edges represent time precedence constraints between events, i.e. time inequalities. So, for example, 2 consecutive nodes A & B separated by a directed edge of length 2 which goes from A to B would represent the inequality tB-tA>=2. It follows that equality would be represented by 2 directed edges of opposite directions, one with positive length and the other one with negative length. Thus, we end up with a graph which has some cycles of length zero.
Matlab has a function called isdag which returns true if a directed graph has no cycles and false otherwise; is there a way to modify this function in order for it to ignore the cycles of length zero? If not, has anyone got any idea on how to program this? Thanks in advance!!
I also tried this but it doesn't work. I've tried it with the adjacency matrix adjMatrix = [0, 10, 0; -9, 0, 5; 0, 0, 0] it should return true as it has a cycle between nodes 1 and 2 of length 10+(-9)=1, but it returns false,
function result = hasCycleWithPositiveWeight(adjMatrix)
n = size(adjMatrix,1);
visited = false(1,n);
path = zeros(1,n);
result = false;
for i = 1:n
pathStart = 1;
pathEnd = 1;
path(pathEnd) = i;
totalWeight = 0;
while pathStart <= pathEnd
node = path(pathStart);
visited(node) = true;
for j = 1:n
if adjMatrix(node,j) > 0
totalWeight = totalWeight + adjMatrix(node,j);
if visited(j)
if j == i && totalWeight > 0
result = true;
return;
end
else
pathEnd = pathEnd + 1;
path(pathEnd) = j;
end
end
end
pathStart = pathStart + 1;
totalWeight = totalWeight - adjMatrix(node, path(max(pathStart-1,1)));
visited(node) = false;
end
end
end

If you want to find the cycle between just two consecutive nodes use this:
a = [0, 10, 0; -9, 0, 5; 0, 0, 0];
b = a.';
And = (a & b);
Add = (a + b);
result = any( And .* Add, 'all');
It returns true if there is a cycle that its length isn't 0.
Explanation:
In the graph if the length between node 2 and 5 is 12 we set the element A(2 , 5) of the adjacency matrix to 12 and if the length between node 5 and 2 is -8 we set the element A(5, 2) of the adjacency matrix to 8. So there is a symmetry between nodes relationship. The matrix transpose changes the position of (2, 5) to (5, 2) and (5, 2) to (2, 5).
If we And a matrix with its transpose the result matrix shows that there is a cycle between two nodes if it is 1.
If we Add a matrix to its transpose the result matrix shows the sum of the pairwise lengths between nodes.
If we multiplyelement-wise the two matrices Add and And the result matrix shows the sum of the pairwise lengths between nodes but the sum of lengths of two nodes that don't form a cycle is set to 0.
Now the function any can be used to test if the matrix has a cycle that its length isn't 0.

Any way for matlab to sum an array according to specified bins NOT by for iteration? Best if there is buildin function for this

For example, if
A = [7,8,1,1,2,2,2]; % the bins (or subscripts)
B = [2,1,1,1,1,1,2]; % the array
then the desired function "binsum" has two outputs, one is the bins, and the other is the sum. It is just adding values in B according to subscripts in A. For example, for 2, the sum is 1 + 1 + 2 = 4, for 1 it is 1 + 1 = 2.
[bins, sums] = binsum(A,B);
bins = [1,2,7,8]
sums = [2,4,2,1]
The elements in "bins" need not be ordered but must correspond to elements in "sums". This can surely be done by "for" iterations, but "for" iteration is not desired, because there is a performance concern. It is best if there is a build in function for this.
Thanks a lot!

This is another job for accumarray
A = [7,8,1,1,2,2,2]; % the bins (or subscripts)
B = [2,1,1,1,1,1,2]; % the array
sums = accumarray(A.', B.').';
bins = unique(A);
Results:
>> bins
bins =
1 2 7 8
sums =
2 4 0 0 0 0 2 1
The index in sums corresponds to the bin value, so sums(2) = 4. You can use nonzeros to remove the unused bins so that bins(n) corresponds to sums(n)
sums = nonzeros(sums).';
sums =
2 4 2 1
or, to generate this form of sums in one line:
sums = nonzeros(accumarray(A.', B.')).';

Another possibility is to use sparse and then find.
Assuming A contains positive integers,
[bins, ~, sums] = find(sparse(A, 1, B));
This works because sparse automatically adds values (third input) for matching positions (as defined by the first two inputs).
If A can contain arbitrary values, you also need a call to unique, and find can be replaced by nonzeros:
[bins, ~, labels]= unique(A);
sums = nonzeros(sparse(labels, 1, B));

Here is a solution using sort and cumsum:
[s,I]=sort(A);
c=cumsum(B(I));
k= [s(1:end-1)~=s(2:end) true];
sums = diff([0 c(k)])
bins = s(k)

Using elements of a vector to set elements of a matrix

I have a vector whose elements identify the indices (per column) that I need to set in a different matrix. Specifically, I have:
A = 7
1
2
and I need to create a matrix B with some number of rows of zeros, except for the elements identified by A. In other words, I want B:
B = zeros(10, 3); % number of rows is known; num columns = size(A)
B(A(1), 1) = 1
B(A(2), 2) = 1
B(A(3), 3) = 1
I would like to do this without having to write a loop.
Any pointers would be appreciated.
Thanks.

Use linear indexing:
B = zeros(10, 3);
B(A(:).'+ (0:numel(A)-1)*size(B,1)) = 1;
The second line can be written equivalently with sub2ind (may be a little slower):
B(sub2ind(size(B), A(:).', 1:numel(A))) = 1;

Loops on a matrix to look at all combinations of rows and columns [duplicate]

This question already has answers here:
Generate a matrix containing all combinations of elements taken from n vectors
(4 answers)
Closed 8 years ago.
I have an arbitrary n-by-n matrix. I want to look at sets of columns and rows of the matrix and do some analysis on them, for example by setting all elements of a specific set of rows and columns equal to zero. To do this I need to analyse all combinations of rows and columns.
For example, if n=3 the process selects the row and columns 1, 2, 3, 12, 13, 23, 123 in succession and creates a new variable for each row and column.
I am currently the technique below for a matrix of size 4:
H = [some 4-by-4 matrix]
for i1 = 1:n
for i2 = 1:n
for i3 = 1:n
for i4 = 1:n
% Set all rows and columns of all variables equal to 0
H(:,i1) = 0;
H(i1,:) = 0;
H(:,i2) = 0;
H(i2,:) = 0;
H(:,i3) = 0;
H(i3,:) = 0;
H(:,i4) = 0;
H(i4,:) = 0;
% Some more analysis on i1, i2, i3, i4...
end
end
end
end
This is an extremely crude method but it seems to work. Obviously, this technique looks at the set (1,1,1,1) which is equivalent to just (1) first, then (1,1,1,2) which is equivalent to (1,2), then (1,1,1,3) which is equivalent to (1,3)... and so on...
The problem here is that this is not a general process for any matrix of size n, this is only a crude process for a matrix of size 4.
Is there any way to generalise the process so that it works for any arbitrary n-by-n matrix?
Thanks!

You can reduce the arbitrary number of loops to one:
for k = 1:2^n-1
ind = dec2bin(k,n)=='1';
H(ind,:) = 0;
H(:,ind) = 0;
end
The trick is to use just one loop to create a logical index (ind) that tells which columns will be selected. So for n=4 the variable ind takes the values [0 0 0 1], [0 0 1 0], [0 0 1 1], ... [1 1 1 1].

Here is a neat way to do that with only two for loops and no magic function. It uses the binary representation of the integer numbers to decide whether to zero out a column and a row.
I just fix some values for the test
n = 3;
Mat = rand(n,n);
Then, we know that there are 2^n combinations, so let's number them from 0 to 2^n-1:
for tag=0:2^n-1
We make a copy to keep the original matrix untouched
myMat = Mat;
Now loop on the row and columns
for (i=1:n)
Here is the trick: if the i-th bit of tag (in binary) is 1, then we zero out the column and row, otherwise we keep it untouched.
if ( mod( floor(tag/2^(i-1)), 2) == 1 )
myMat(:,i) = 0;
myMat(i,:) = 0;
end
end
Finally display to check that we have what we need.
myMat
end

Matlab: finding largest sum of first dimension, grouping by second dimension

I have a 2-D matrix A(value, label). I want to find the label that has the largest and second largest sum of values. For example:
A = (1, 1;
2, 1;
3, 2;
4, 2;
5, 3)
In this case the result should be largest = 2, second largest = 3. How can I do this in MATLAB?

[b,m,n]=unique(a(:,2));
[val, idx]= sort(accumarray(n,a(:,1)),'descend');
b(idx(1:2))
Output is:
ans =
2
3

Something like this should do the trick.
A = [1, 1;
2, 1;
3, 2;
4, 2;
5, 3];
labels = unique(A(:,2)); % Pull out unique labels
for i = 1:numel(labels)
idx = (A(:,2) == labels(i)); % Find elements which match current label
s(i,1) = sum(A(idx,1)); % Sum them
end
r = sortrows([s labels], -1); % Sort by decreasing sum
r(1,2); % Label corresponding to largest sum
r(2,2); % Label corresponding to second largest sum
EDIT accumarray is a built-in function that will do this for you. Although I find the documentation on it somewhat cryptic.

Since your question isn't super clear and i don't get what sum you are referring to I'm just gonna guess that you are aiming for something like this
q=sortrows(A,-1);
q=q(1:2,:);
which will give the two labels (right column) with the largest values (left column) in q.
If this wasn't what you were looking for please comment.
EDIT: Missread which column that contained labels, corrected