I have a matrix that I want to randomize a couple of thousand times, while keeping the row and column totals the same:
1 2 3
A 0 0 1
B 1 1 0
C 1 0 0
An example of a valid random matrix would be:
1 2 3
A 1 0 0
B 1 1 0
C 0 0 1
My actual matrix is a lot bigger (about 600x600 items), so I really need an approach that is computationally efficient.
My initial (inefficient) approach consisted of shuffling arrays using the Perl Cookbook shuffle
I pasted my current code below. I've got extra code in place to start with a new shuffled list of numbers, if no solution is found in the while loop. The algorithm works fine for a small matrix, but as soon as I start scaling up it takes forever to find a random matrix that fits the requirements.
Is there a more efficient way to accomplish what I'm searching for?
Thanks a lot!
#!/usr/bin/perl -w
use strict;
my %matrix = ( 'A' => {'3' => 1 },
'B' => {'1' => 1,
'2' => 1 },
'C' => {'1' => 1 }
);
my #letters = ();
my #numbers = ();
foreach my $letter (keys %matrix){
foreach my $number (keys %{$matrix{$letter}}){
push (#letters, $letter);
push (#numbers, $number);
}
}
my %random_matrix = ();
&shuffle(\#numbers);
foreach my $letter (#letters){
while (exists($random_matrix{$letter}{$numbers[0]})){
&shuffle (\#numbers);
}
my $chosen_number = shift (#numbers);
$random_matrix{$letter}{$chosen_number} = 1;
}
sub shuffle {
my $array = shift;
my $i = scalar(#$array);
my $j;
foreach my $item (#$array )
{
--$i;
$j = int rand ($i+1);
next if $i == $j;
#$array [$i,$j] = #$array[$j,$i];
}
return #$array;
}
The problem with your current algorithm is that you are trying to shuffle your way out of dead ends -- specifically, when your #letters and #numbers arrays (after the initial shuffle of #numbers) yield the same cell more than once. That approach works when the matrix is small, because it doesn't take too many tries to find a viable re-shuffle. However, it's a killer when the lists are big. Even if you could hunt for alternatives more efficiently -- for example, trying permutations rather than random shuffling -- the approach is probably doomed.
Rather than shuffling entire lists, you might tackle the problem by making small modifications to an existing matrix.
For example, let's start with your example matrix (call it M1). Randomly pick one cell to change (say, A1). At this point the matrix is in an illegal state. Our goal will be to fix it in the minimum number of edits -- specifically 3 more edits. You implement these 3 additional edits by "walking" around the matrix, with each repair of a row or column yielding another problem to be solved, until you have walked full circle (err ... full rectangle).
For example, after changing A1 from 0 to 1, there are 3 ways to walk for the next repair: A3, B1, and C1. Let's decide that the 1st edit should fix rows. So we pick A3. On the second edit, we will fix the column, so we have choices: B3 or C3 (say, C3). The final repair offers only one choice (C1), because we need to return to the column of our original edit. The end result is a new, valid matrix.
Orig Change A1 Change A3 Change C3 Change C1
M1 M2
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
----- ----- ----- ----- -----
A | 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0
B | 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0
C | 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1
If an editing path leads to a dead end, you backtrack. If all of the repair paths fail, the initial edit can be rejected.
This approach will generate new, valid matrixes quickly. It will not necessarily produce random outcomes: M1 and M2 will still be highly correlated with each other, a point that will become more directly evident as the size of the matrix grows.
How do you increase the randomness? You mentioned that most cells (99% or more) are zeros. One idea would be to proceed like this: for each 1 in the matrix, set its value to 0 and then repair the matrix using the 4-edit method outlined above. In effect, you would be moving all of the ones to new, random locations.
Here is an illustration. There are probably further speed optimizations in here, but this approach yielded 10 new 600x600 matrixes, at 0.5% density, in 30 seconds or so on my Windows box. Don't know if that's fast enough.
use strict;
use warnings;
# Args: N rows, N columns, density, N iterations.
main(#ARGV);
sub main {
my $n_iter = pop;
my $matrix = init_matrix(#_);
print_matrix($matrix);
for my $n (1 .. $n_iter){
warn $n, "\n"; # Show progress.
edit_matrix($matrix);
print_matrix($matrix);
}
}
sub init_matrix {
# Generate initial matrix, given N of rows, N of cols, and density.
my ($rows, $cols, $density) = #_;
my #matrix;
for my $r (1 .. $rows){
push #matrix, [ map { rand() < $density ? 1 : 0 } 1 .. $cols ];
}
return \#matrix;
}
sub print_matrix {
# Dump out a matrix for checking.
my $matrix = shift;
print "\n";
for my $row (#$matrix){
my #vals = map { $_ ? 1 : ''} #$row;
print join("\t", #vals), "\n";
}
}
sub edit_matrix {
# Takes a matrix and moves all of the non-empty cells somewhere else.
my $matrix = shift;
my $move_these = cells_to_move($matrix);
for my $cell (#$move_these){
my ($i, $j) = #$cell;
# Move the cell, provided that the cell hasn't been moved
# already and the subsequent edits don't lead to a dead end.
$matrix->[$i][$j] = 0
if $matrix->[$i][$j]
and other_edits($matrix, $cell, 0, $j);
}
}
sub cells_to_move {
# Returns a list of non-empty cells.
my $matrix = shift;
my $i = -1;
my #cells = ();
for my $row (#$matrix){
$i ++;
for my $j (0 .. #$row - 1){
push #cells, [$i, $j] if $matrix->[$i][$j];
}
}
return \#cells;
}
sub other_edits {
my ($matrix, $cell, $step, $last_j) = #_;
# We have succeeded if we've already made 3 edits.
$step ++;
return 1 if $step > 3;
# Determine the roster of next edits to fix the row or
# column total upset by our prior edit.
my ($i, $j) = #$cell;
my #fixes;
if ($step == 1){
#fixes =
map { [$i, $_] }
grep { $_ != $j and not $matrix->[$i][$_] }
0 .. #{$matrix->[0]} - 1
;
shuffle(\#fixes);
}
elsif ($step == 2) {
#fixes =
map { [$_, $j] }
grep { $_ != $i and $matrix->[$_][$j] }
0 .. #$matrix - 1
;
shuffle(\#fixes);
}
else {
# On the last edit, the column of the fix must be
# the same as the column of the initial edit.
#fixes = ([$i, $last_j]) unless $matrix->[$i][$last_j];
}
for my $f (#fixes){
# If all subsequent fixes succeed, we are golden: make
# the current fix and return true.
if ( other_edits($matrix, [#$f], $step, $last_j) ){
$matrix->[$f->[0]][$f->[1]] = $step == 2 ? 0 : 1;
return 1;
}
}
# Failure if we get here.
return;
}
sub shuffle {
my $array = shift;
my $i = scalar(#$array);
my $j;
for (#$array ){
$i --;
$j = int rand($i + 1);
#$array[$i, $j] = #$array[$j, $i] unless $i == $j;
}
}
Step 1: First I would initialize the matrix to zeros and calculate the required row and column totals.
Step 2: Now pick a random row, weighted by the count of 1s that must be in that row (so a row with count 300 is more likely to be picked than a row with weight 5).
Step 3: For this row, pick a random column, weighted by the count of 1s in that column (except ignore any cells that may already contain a 1 - more on this later).
Step 4: Place a one in this cell and reduce both the row and column count for the appropriate row and column.
Step 5: Go back to step 2 until no rows have non-zero count.
The problem though is that this algorithm can fail to terminate because you may have a row where you need to place a one, and a column that needs a one, but you've already placed a one in that cell, so you get 'stuck'. I'm not sure how likely this is to happen, but I wouldn't be surprised if it happened very frequently - enough to make the algorithm unusable. If this is a problem I can think of two ways to fix it:
a) Construct the above algorithm recursively and allow backtracking on failure.
b) Allow a cell to contain a value greater than 1 if there is no other option and keep going. Then at the end you have a correct row and column count but some cells may contain numbers greater than 1. You can fix this by finding a grouping that looks like this:
2 . . . . 0
. . . . . .
. . . . . .
0 . . . . 1
and changing it to:
1 . . . . 1
. . . . . .
. . . . . .
1 . . . . 0
It should be easy to find such a grouping if you have many zeros. I think b) is likely to be faster.
I'm not sure it's the best way, but it's probably faster than shuffling arrays. I'll be tracking this question to see what other people come up with.
I'm not a mathematician, but I figure that if you need to keep the same column and row totals, then random versions of the matrix will have the same quantity of ones and zeros.
Correct me if I'm wrong, but that would mean that making subsequent versions of the matrix would only require you to shuffle around the rows and columns.
Randomly shuffling columns won't change your totals for rows and columns, and randomly shuffling rows won't either. So, what I would do, is first shuffle rows, and then shuffle columns.
That should be pretty fast.
Not sure if it will help, but you can try going from one corner and for each column and row you should track the total and actual sum. Instead of trying to hit a good matrix, try to see the total as amount and split it. For each element, find the smaller number of row total - actual row total and column total - actual column total. Now you have the upper bound for your random number.
Is it clear? Sorry I don't know Perl, so I cannot show any code.
Like #Gabriel I'm not a Perl programmer so it's possible that this is what your code already does ...
You've only posted one example. It's not clear whether you want a random matrix which has the same number of 1s in each row and column as your start matrix, or one which has the same rows and columns but shuffled. If the latter is good enough you could create an array of row (or column, it doesn't matter) indexes and randomly permute that. You can then read your original array in the order specified by the randomised index. No need to modify the original array or create a copy.
Of course, this might not meet aspects of your requirements which are not explicit.
Thank the Perl code of FMc. Based on this solution, I rewrite it in Python (for my own use and share here for more clarity) as shown below:
matrix = numpy.array(
[[0, 0, 1],
[1, 1, 0],
[1, 0, 0]]
)
def shuffle(array):
i = len(array)
j = 0
for _ in (array):
i -= 1;
j = random.randrange(0, i+1) #int rand($i + 1);
#print('arrary:', array)
#print(f'len(array)={len(array)}, (i, j)=({i}, {j})')
if i != j:
tmp = array[i]
array[i] = array[j]
array[j] = tmp
return array
def other_edits(matrix, cell, step, last_j):
# We have succeeded if we've already made 3 edits.
step += 1
if step > 3:
return True
# Determine the roster of next edits to fix the row or
# column total upset by our prior edit.
(i, j) = cell
fixes = []
if (step == 1):
fixes = [[i, x] for x in range(len(matrix[0])) if x != j and not matrix[i][x] ]
fixes = shuffle(fixes)
elif (step == 2):
fixes = [[x, j] for x in range(len(matrix)) if x != i and matrix[x][j]]
fixes = shuffle(fixes)
else:
# On the last edit, the column of the fix must be
# the same as the column of the initial edit.
if not matrix[i][last_j]: fixes = [[i, last_j]]
for f in (fixes):
# If all subsequent fixes succeed, we are golden: make
# the current fix and return true.
if ( other_edits(matrix, f, step, last_j) ):
matrix[f[0]][f[1]] = 0 if step == 2 else 1
return True
# Failure if we get here.
return False # return False
def cells_to_move(matrix):
# Returns a list of non-empty cells.
i = -1
cells = []
for row in matrix:
i += 1;
for j in range(len(row)):
if matrix[i][j]: cells.append([i, j])
return cells
def edit_matrix(matrix):
# Takes a matrix and moves all of the non-empty cells somewhere else.
move_these = cells_to_move(matrix)
for cell in move_these:
(i, j) = cell
# Move the cell, provided that the cell hasn't been moved
# already and the subsequent edits don't lead to a dead end.
if matrix[i][j] and other_edits(matrix, cell, 0, j):
matrix[i][j] = 0
return matrix
def Shuffle_Matrix(matrix, N, M, n_iter):
for n in range(n_iter):
print(f'iteration: {n+1}') # Show progress.
matrix = edit_matrix(matrix)
#print('matrix:\n', matrix)
return matrix
print(matrix.shape[0], matrix.shape[1])
# Args: N rows, N columns, N iterations.
matrix2 = Shuffle_Matrix(matrix, matrix.shape[0], matrix.shape[1], 1)
print("The resulting matrix:\n", matrix2)
Related
My mat file contains 40,000 rows and two columns. I have to read it line by line
and then get values of last column in a single row.
Following is my code:
for v = 1:40000
firstRowB = data.d(v,:)
if(firstRowB(1,2)==1)
count1=count1+1;
end
if(firstRowB(1,2)==2)
count2=count2+1;
end
end
FirstRowB gets the row checks whether last column equals 1 or 2 and then increases the value of respective count by 1.
But I keep getting this error:
Reference to non-existent field 'd'.
You could use vectorization (it is always convenient especially in Matlab). Taking advantage of the fact that true is one and false is zero, if you just want to count you can do :
count1 = sum ( data.d(:, 2) == 1 ) ;
count2 = sum (data.d(:,2) == 2 ) ;
in fact in general you could define :
getNumberOfElementsInLastColEqualTo = #(numb) sum (data.d(:,end) == numb ) ;
counts =arrayfun( getNumberOfElementsInLastColEqualTo , [1 2 ] );
Hope this helps.
I have an array, A = [a1,a2,a3,...aP] with size P. I have to sample q elements from array A.
I plan to use a loop with q iterations, and randomly pick a element from A at each iteration. But how can I make sure that the picked number will be different at each iteration?
The other answers all involve shuffling the array, which is O(n).
It means modifying the original array (destructive) or copying the original array (memory intensive).
The first way to make it more memory efficient is not to shuffle the original array but to shuffle an array of indexes.
# Shuffled list of indexes into #deck
my #shuffled_indexes = shuffle(0..$#deck);
# Get just N of them.
my #pick_indexes = #shuffled_indexes[ 0 .. $num_picks - 1 ];
# Pick cards from #deck
my #picks = #deck[ #pick_indexes ];
It is at least independent of the content of the #deck, but its still O(nlogn) performance and O(n) memory.
A more efficient algorithm (not necessarily faster, depends on now big your array is) is to look at each element of the array and decide if it's going to make it into the array. This is similar to how you select a random line from a file without reading the whole file into memory, each line has a 1/N chance of being picked where N is the line number. So the first line has a 1/1 chance (it's always picked). The next has a 1/2. Then 1/3 and so on. Each pick will overwrite the previous pick. This results in each line having a 1/total_lines chance.
You can work it out for yourself. A one line file has a 1/1 chance so the first one is always picked. A two line file... the first line has a 1/1 then a 1/2 chance of surviving, which is 1/2, and the second line has a 1/2 chance. For a three line file... the first line has a 1/1 chance of being picked, then a 1/2 * 2/3 chance of surviving which is 2/6 or 1/3. And so on.
The algorithm is O(n) for speed, it iterates through an unordered array once, and does not consume any more memory than is needed to store the picks.
With a little modification, this works for multiple picks. Instead of a 1/$position chance, it's $picks_left / $position. Each time a pick is successful, you decrement $picks_left. You work from the high position to the low one. Unlike before, you don't overwrite.
my $picks_left = $picks;
my $num_left = #$deck;
my #picks;
my $idx = 0;
while($picks_left > 0 ) { # when we have all our picks, stop
# random number from 0..$num_left-1
my $rand = int(rand($num_left));
# pick successful
if( $rand < $picks_left ) {
push #picks, $deck->[$idx];
$picks_left--;
}
$num_left--;
$idx++;
}
This is how perl5i implements its pick method (coming next release).
To understand viscerally why this works, take the example of picking 2 from a 4 element list. Each should have a 1/2 chance of being picked.
1. (2 picks, 4 items): 2/4 = 1/2
Simple enough. Next element has a 1/2 chance that an element will already have been picked, in which case it's chances are 1/3. Otherwise its chances are 2/3. Doing the math...
2. (1 or 2 picks, 3 items): (1/3 * 1/2) + (2/3 * 1/2) = 3/6 = 1/2
Next has a 1/4 chance that both elements will already be picked (1/2 * 1/2), then it has no chance; 1/2 chance that only one will be picked, then it has 1/2; and the remaining 1/4 that no items will be picked in which case it's 2/2.
3. (0, 1 or 2 picks, 2 items): (0/2 * 1/4) + (1/2 * 2/4) + (2/2 * 1/4) = 2/8 + 1/4 = 1/2
Finally, for the last item, there's a 1/2 the previous took the last pick.
4. (0 or 1 pick, 1 items): (0/1 * 2/4) + (1/1 * 2/4) = 1/2
Not exactly a proof, but good for convincing yourself it works.
From perldoc perlfaq4:
How do I shuffle an array randomly?
If you either have Perl 5.8.0 or later installed, or if you have
Scalar-List-Utils 1.03 or later installed, you can say:
use List::Util 'shuffle';
#shuffled = shuffle(#list);
If not, you can use a Fisher-Yates shuffle.
sub fisher_yates_shuffle {
my $deck = shift; # $deck is a reference to an array
return unless #$deck; # must not be empty!
my $i = #$deck;
while (--$i) {
my $j = int rand ($i+1);
#$deck[$i,$j] = #$deck[$j,$i];
}
}
# shuffle my mpeg collection
#
my #mpeg = <audio/*/*.mp3>;
fisher_yates_shuffle( \#mpeg ); # randomize #mpeg in place
print #mpeg;
You could also use List::Gen:
my $gen = <1..10>;
print "$_\n" for $gen->pick(5); # prints five random numbers
You can suse the Fisher-Yates shuffle algorithm to randomly permute your array and then use a slice of the first q elements. Here's code from PerlMonks:
# randomly permutate #array in place
sub fisher_yates_shuffle
{
my $array = shift;
my $i = #$array;
while ( --$i )
{
my $j = int rand( $i+1 );
#$array[$i,$j] = #$array[$j,$i];
}
}
fisher_yates_shuffle( \#array ); # permutes #array in place
You can probably optimize this by having the shuffle stop after it has q random elements selected. (The way this is written, you'd want the last q elements.)
You may construct second array, boolean with size P and store true for picked numbers. And when the numer is picked, check second table; in case "true" you must pick next one.
How can I compare two numbers with an inequality? (greater than or less than)
I want to compare single digits
For example
1 2
5 3
9 2
etc.
This is the best way to compare two numbers.Why because, if you are intelligent enough, you can use the same code in bigger programs.It's highly portable.
Assume we have two numbers a,b.
we have two blocks : if( a>=b ) and else,
Hope its enough.
0 1 0 a b 0
Make the array like this. And point to the (4) i.e. point to the a
+>+< This is for managing if a=0 and b=0
[->-[>]<<] This is a magic loop. if a is the one which
reaches 0 first (a<b),then pointer will be at(4).
Else it will be at (3)
<[-
// BLOCK (a>=b)
//You are at (2) and do whatever you want and come back to (2).
//Its a must
]
<[-<
// BLOCK(a<b)
//You are at (1) and do whatever you want and come back to (1).
//Its a must
]
It will not affect the following program code as both the code blocks will end up in (1) You can do further coding assuming that pointer will reach (1)
Please remove the documentation if you copy the code. Because code contains some valid brainfuck symbols like < . , etc.
Once you know which is the distance between the two numbers you should or decrement both of them in the same loop iteration and then check both for being zero: you will understand which one is the smaller.
Eg:
+++++ > +++ < [->-< check is first is zero, then second]
(this is just to give you a hint, you will have to take care about equal numbers and similar issues.
I was thinking about this too, and while I'm sure this isn't the best solution, at least it can answer the question of which number is larger =)
The program asks for two characters, outputs '<' if the first is smaller, '>' if it is larger, and '=' if they are equal. After outputting one char, the program halts by asking for additional input.
+>,>,<<[>-[>>>]<[>>-[>++++++++++[->++++++<]>.,]++++++++++[->++++++<]>+.,]<-[>>>]<<[>>>++++++++++[->++++++<]>++.,]<<<]
Hopefully somewhat clearer:
+ init (0) to 1
>, read (1)
>, read (2)
<<[ loop forever
>-[>>>] decrement (1) going to (4) if (1) != 0
<[ goto (0) == 1 if (1) reached 0 (otherwise goto (3))
>>-[>++++++++++[->++++++<]>.,] decrement (2) printing lessthan if larger than 0
++++++++++[->++++++<]>+., if (2) == 0 print '='
]
<-[>>>] decrement (2) going to (5) if (2) != 0
<<[ goto (0) == 1 if (2) reached 0 (otherwise goto (3))
>>>++++++++++[->++++++<]>++., print largerthan since (2) reached 0 first
]
<<< goto(0)
]
I made a solution, that gives you back a boolean and the pointer always at the same point.
This is how it looks like at the beginning:
0 0 0 a b 0 0
p
And these are the two possible outputs:
0 0 0 0 0 1 0 #true
p
0 0 0 0 0 0 0 #false
p
The code:
>>>>
[ # while cell != 0
- # decrement a
[ # if a != 0
>- # decrement b
[ # if b != 0
< # go left
<-< # undo the finally-block;
] # finally-block
<[-]> # clear a
>+> # res = 1; move to end-position
<<< # undo the finally-block
] # finally-block
>[-]>> # clear b; res = 0; move to end-position
] #
minified version:
>>>>[-[>-[< <-<]<[-]>>+><<<]>[-]>>]
Given two numbers A and B, the following code will print A if A is greater than B, B if B is greater than A and C if both are equal.
>>>>>>>>>++++++[>+++++++++++<-]>[>+>+>+<<<-]>+>->
<<<<<<<<<<<,>,<
[->-<[>]<<]>>>[>>]>>>>>>>>.
No such thing exists in BF. The > and < in BF move the pointer to the right and to the left, respectively.
Problem statement:
We have equal number of men and women. Each man has a preference score toward each woman. So do the woman for each man. Each of the men and women have certain interests. Based on the interest, we calculate the preference scores.
So initially, we have an input in a file having x columns. The first column is the person (man/woman) id. Ids are nothing but numbers from 0 ... n. (First half are men and next half women). The remaining x-1 columns will have the interests. These are integers too.
Now, using this n by x-1 matrix, we have come up with an n by n/2 matrix. The new matrix has all men and woman as their rows and scores for opposite sex in columns.
We have to sort the scores in descending order, also we need to know the id of person related to the scores after sorting.
So, here I wanted to use hash table.
Once we get the scores we need to make up pairs, for which we need to follow some rules.
My trouble is with the second matrix of n by n/2 that needs to give information of which man/woman has how much preference on a woman/man. I need these scores sorted so that I know who is the first preferred woman/man, 2nd preferred and so on for a man/woman.
I hope to get good suggestions on the data structures I use. I prefer PHP or Perl.
NB:
This is not homework. This is a little modified version of stable marriage algorithm. I have a working solution. I am only working on optimizing my code.
It is very similar to stable marriage problem but here we need to calculate the scores based on the interests they share. So, I have implemented it as the way you see in the wiki page http://en.wikipedia.org/wiki/Stable_marriage_problem.
My problem is not solving the problem. I solved it and can run it. I am just trying to have a better solution. So I am asking suggestions on the type of data structure to use.
Conceptually I tried using an array of hashes. where the array index give the person id and the hash in it gives the ids <=> scores in sorted manner. I initially start with an array of hashes. Now, I sort the hashes on values, but I could not store the sorted hashes back in an array. So just stored the keys after sorting and used these to get the values from my initial unsorted hashes.
Can we store the hashes after sorting?
Can you suggest a better structure?
I think the following implements the Gale-Shapley algorithm where each person's preference ordering is given as an array of scores over the members of the opposite sex.
As an aside, I just found out that David Gale passed away (see his Wikipedia entry — he will be missed).
The code is wordy, I just quickly transcribed the algorithm as described on Wikipedia and did not check original sources, but it should give you an idea of how to use appropriate Perl data structures. If the dimensions of the problem grow, profile first before trying to optimize.
I am not going to try to address the specific issues in your problem. In particular, you did not fully flesh out the idea of computing a match score based on interests and trying to guess is bound to be frustrating.
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my (%pref, %people, %proposed_by);
while ( my $line = <DATA> ) {
my ($sex, $id, #pref) = split ' ', $line;
last unless $sex and ($sex) =~ /^(m|w)\z/;
$pref{$sex}{$id} = [ map 0 + $_, #pref ];
$people{$sex}{$id} = undef;
}
while ( defined( my $man = bachelor($people{m}) ) ) {
my #women = eligible_women($people{w}, $proposed_by{$man});
next unless #women;
my $woman = argmax($pref{m}{$man}, \#women);
$proposed_by{$man}{$woman} = 1;
if ( defined ( my $jilted = $people{w}{$woman}{m} ) ) {
my $proposal_score = $pref{w}{$woman}[$man];
my $jilted_score = $pref{w}{$woman}[$jilted];
next if $proposal_score < $jilted_score;
$people{m}{$jilted}{w} = undef;
}
$people{m}{$man}{w} = $woman;
$people{w}{$woman}{m} = $man;
}
print Dump \%people;
sub argmax {
my ($pref, $candidates) = #_;
my ($ret) = sort { $pref->[$b] <=> $pref->[$a] } #$candidates;
return $ret;
}
sub bachelor {
my ($men) = #_;
my ($bachelor) = grep { not defined $men->{$_}{w} } keys %$men;
return $bachelor;
}
sub eligible_women {
my ($women, $proposed_to) = #_;
return grep { not defined $proposed_to->{$_} } keys %$women;
}
__DATA__
m 0 10 20 30 40 50
m 1 50 30 40 20 10
m 2 30 40 50 10 20
m 3 10 10 10 10 10
m 4 50 40 30 20 10
w 0 50 40 30 20 10
w 1 40 30 20 10 50
w 2 30 20 10 50 40
w 3 20 10 50 40 30
w 4 10 50 40 30 20
I have a mathematical set in a Perl array: (1, 2, 3). I'd like to find all the subsets of that set: (1), (2), (3), (1,2), (1,3), (2,3).
With 3 elements this isn't too difficult but if set has 10 elements this gets tricky.
Thoughts?
You can use Data::PowerSet like Matthew mentioned. However, if, as indicated in your example, you only want proper subsets and not every subset, you need to do a little bit more work.
# result: all subsets, except {68, 22, 43}.
my $values = Data::PowerSet->new({max => 2}, 68, 22, 43);
Likewise, if you want to omit the null set, just add the min parameter:
# result: all subsets, except {} and {68, 22, 43}.
my $values = Data::PowerSet->new({min => 1, max => 2}, 68, 22, 43);
Otherwise, to get all subsets, just omit both parameters:
# result: every subset.
my $values = Data::PowerSet->new(68, 22, 43);
See Data::PowerSet, http://coding.derkeiler.com/Archive/Perl/comp.lang.perl/2004-01/0076.html , etc.
Since you say "mathematical set", I assume you mean there are no duplicates.
A naive implementation that works for up to 32 elements:
my $set = [1,2,3];
my #subsets;
for my $count ( 1..(1<<#$set)-2 ) {
push #subsets, [ map $count & (1<<$_) ? $set->[$_] : (), 0..$#$set ];
}
(For the full range of subsets, loop from 0 to (1<<#$set)-1; excluding 0 excludes the null set, excluding (1<<#$set)-1 excludes the original set.)
Update: I'm not advocating this over using a module, just suggesting it in case you are looking to understand how to go about such a problem. In general, each element is either included or excluded from any given subset. You want to pick an element and generate first all possible subsets of the other elements not including your picked element and then all possible subsets of the other elements including your picked element. Recursively apply this to the "generate all possible subsets". Finally, discard the null subset and the non-proper subset. In the above code, each element is assigned a bit. First all subsets
are generated with the high bit on, then all those with it off. For each of those alternatives, subsets are generated first with the next-to-highest bit off, then on. Continuing this until you are just working on the lowest bit, what you end up with is all the possible numbers, in order.
If you don't want to use an existing module or can't then you can simply code your own subset generation algorithm using a bit-mask and a binary counter. Sample code follows -
#!/usr/bin/perl
use strict;
use warnings;
my #set = (1, 2, 3);
my #bitMask = (0, 0, 0); #Same size as #set, initially filled with zeroes
printSubset(\#bitMask, \#set) while ( genMask(\#bitMask, \#set) );
sub printSubset {
my ($bitMask, $set) = #_;
for (0 .. #$bitMask-1) {
print "$set->[$_]" if $bitMask->[$_] == 1;
}
print"\n";
}
sub genMask {
my ($bitMask, $set) = #_;
my $i;
for ($i = 0; $i < #$set && $bitMask->[$i]; $i++) {
$bitMask->[$i] = 0;
}
if ($i < #$set) {
$bitMask->[$i] = 1;
return 1;
}
return 0;
}
Note: I haven't been able to test the code, some bugs might need to be ironed out.
Use Algorithm::ChooseSubsets.
It's a counting problem - for N elements there are exactly 2^N subsets and you have to count from 0 to 2^N - 1 in binary to list them all.
For eg 3 items there are 8 possible subsets: 000, 001, 010, 011, 100, 101, 110 and 111 - the numbers show which members are present.