Eulerian path, directed graph - perl

Good evening. I'm really a beginner, and I am working in an implementation of an Eulerian path. That means that every edge (not a vertex) of a directed graph has to be used only once.
For some reason it doesn't manage to go over all vertices even if in paper it should. It seems to ignore half of the vertices or simply not adding them to the circuit.
The expected outcome is:
6->7->8->9->6->3->0->2->1->3->4
Nevertheless, the outcome I get is:
6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9 9 9 9 9 3 3 3 3 3 3
What i have as code is the following:
my %edges={'6'=>['3','7'],'8'=>['9'],'1'=>['3'],'0'=>['2'],'3'=>['0','4'], '7' =>['8'],'9'=>['6'],'2'=>['1']};
my $startvertex=6; #this i got from additional code
my $location=$startvertex;
my #stack = ($startvertex);
my #circuit = ();
while (#stack)
{
if (#{$edges{$location}}[0])
{
push #stack, $location;
my $newlocation=#{$edges{$location}}[0];
splice #{$edges{$location}},0,1;
$location=$newlocation;
}
else
{
push #circuit, $location;
$location=pop #stack;
}
}
my #revcircuit= reverse #circuit;
print #revcircuit;
Thank you very much in advance for your insight.

The problem is here:
if (#{$edges{$location}}[0])
One of your nodes is called 0 which is false in Perl. Therefore, once the zero node is reached, the program continues as if there were no more nodes.
Use defined instead.
Here's a working version, slightly edited (e.g. removed unnecessary array dereference):
#!/usr/bin/perl
use warnings;
use strict;
my %edges = ( 6 => [3, 7],
8 => [9],
1 => [3],
0 => [2],
3 => [0, 4],
7 => [8],
9 => [6],
2 => [1]);
my $startvertex = 6;
my $location = $startvertex;
my #stack = ($startvertex);
my #circuit;
while (#stack) {
if (defined $edges{$location}[0]) {
push #stack, $location;
$location = shift #{ $edges{$location} };
} else {
push #circuit, $location;
$location = pop #stack;
}
}
my #revcircuit = reverse #circuit;
print "#revcircuit\n";
Also note that it uses round parentheses to define the %edges hash. Curly braces introduce a hash reference, with them I'm getting
Reference found where even-sized list expected at ./1.pl line 5.

Related

Split an array based on value

This is my first question here, so sorry if I make any mistakes posting this.
I'm trying to split an array based on its values. Basically I want to create two arrays whose values are as close to the average as possible. I managed to do this with this code:
function Sum($v) {
[Linq.Enumerable]::Sum([int64[]]$v)
}
$arr = 0..9 | % {get-random -min 1 -max 10}
"ARRAY:" + $arr
"SUM: " + (sum $arr)
"AVG: " + (sum $arr)/2
# start of the code that matters
$wavg = (sum $arr)/2
foreach ($i in (0..($arr.Count-1))) {
$wavg -= $arr[$i]
if ($wavg -le 0) {
$i-=(-$wavg -gt $arr[$i]/2);break
}
}
"SPLIT INDEX: " + $i
"ARR1: " + $arr[0..$i] + " (" + $(sum $arr[0..$i]) + ")"
"ARR2: " + $arr[($i+1)..$arr.Count] + " (" + $(sum $arr[($i+1)..$arr.Count]) + ")"
The reason my foreach is structured this way is because in my actual code the values are in an index hash and are accessed as $index[$arr[$i]].
This means that the resulting two arrays could be of unequal size (it would be easy if I could just split the array in half). Sample output of my code to demonstrate this:
ARRAY: 5 3 6 3 2 3 6 3 1 3
SUM: 35
AVG: 17.5
SPLIT INDEX: 3
ARR1: 5 3 6 3 (17)
ARR2: 2 3 6 3 1 3 (18)
The code works as is, but I feel it could be done in a more elegant and speedier way. Because I need to execute this code a few thousand times in my script I want it to be as fast as possible.

How to calculate number of rows of a referenced array in Perl

A part of my code calculates inverse of a matrix (generated previously in the code) with dimensions more than 300 X 300. I want to use the elements of the inversed matrix further in the code. Have used the below code for this, trying with only 5X5 matrix for testing:
use strict;
use warnings;
use Math::MatrixReal;
my #a=(); #a is the matrix obtained
$a[0][0]=0.18761134;
$a[0][1]=0.010779401; #Have hard-coded the values here till $a[4][4]
my $ref_a = \#a;
my $b = Math::MatrixReal->new_from_rows($ref_a);
my $b_inv = $b->inverse();
print "\n Inverse is\n",$b_inv; #prints correct inverse
print "\n\nTest printing elements\n";
print $$b_inv[0][1][1],"\n"; #prints the correct element
my $row_b=scalar(#{$b});
print "Number of rows in b: ",$row_b,"\n"; #prints 6
my $col_b=#{$$b[0]};
print "Columns in b: ",$col_b,"\n"; #prints 5
my $row_binv=scalar(#$b_inv);
print "Number of rows in b_inv: ",$row_binv,"\n"; #prints 3
my $col_binv=#{$$b_inv[0]};
print "Number of columns in b_inv ",$col_binv,"\n"; #prints 5
I am not able to understand
why the output of number of rows for both b and b_inv is wrong? How to get the correct value of number of rows?
That although the syntax of printing elements of a referenced array is $$b_inv[1][1], I get the correct output when I use $$b_inv[0][1][1]
You are creating a
Math::MatrixReal
matrix object, and then accessing it as a simple Perl array. Poking around inside a Perl object indiscriminately is wrong, and you must use the methods defined in the documentation
In particular, your statement
print $$b_inv[0][1][1],"\n"; # prints the correct element
accesses a three-dimensional array, and there is no way of knowing what the "correct element" should be for this without reading the code of the module
This modification sets up a 5 x 5 identity matrix (in future, please provide data that we can use to reproduce your results) and takes its inverse. The values output are derived using the object's methods as I described and are all correct. Note that the rows and columns are indexed from one instead of from zero that you would expect for Perl arrays
use strict;
use warnings 'all';
use Math::MatrixReal;
my #arr = (
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
);
my $ref_a = \#arr;
my $b = Math::MatrixReal->new_from_rows(\#arr);
my $b_inv = $b->inverse;
print "\nInverse is\n", $b_inv;
print "\n\nTest printing elements\n";
print $b_inv->element($_, $_), "\n" for 1 .. 5;
my ($row_b, $col_b) = $b->dim;;
print "Number of rows in b: $row_b\n"; # prints 5
print "Columns in b: $col_b\n"; # prints 5
my ($row_binv, $col_binv) = $b_inv->dim;;
print "Number of rows in b_inv: $row_binv\n"; # prints 5
print "Number of columns in b_inv $col_binv\n"; # prints 5
output
Inverse is
[ 1.000000000000E+000 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 ]
[ 0.000000000000E+000 1.000000000000E+000 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 ]
[ 0.000000000000E+000 0.000000000000E+000 1.000000000000E+000 0.000000000000E+000 0.000000000000E+000 ]
[ 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 1.000000000000E+000 0.000000000000E+000 ]
[ 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 0.000000000000E+000 1.000000000000E+000 ]
Test printing elements
1
1
1
1
1
Number of rows in b: 5
Columns in b: 5
Number of rows in b_inv: 5
Number of columns in b_inv 5

Branch and bound using Perl

I have a problem, I cannot find an answer to. I am using Perl. My input is a symmetric cost-matrix, kind of like the TSP.
I want to know all solutions that lie beneath my boundary, which is 10.
This is my matrix:
- B E G I K L P S
B - 10 10 2 10 10 10 10
E 10 - 2 10 10 10 1 10
G 10 2 - 10 2 3 3 3
I 2 10 10 - 4 10 10 2
K 10 10 2 4 - 10 10 3
L 10 10 3 10 10 - 2 2
P 10 1 3 10 10 2 - 10
S 10 10 3 2 3 2 10 -
Does anybody know how to implement the branch and bound algorithm to solve this? For now, I did replace every 10 in the matrix with "-".
What I did so far:
#verwbez = ( ["-", B, E, G, I, K, L, P, S],
[B,"-", 10, 10, 2, 10, 10, 10, 10],
[E, 10, "-", 2, 10, 10, 10, 1, 10],
[G, 10, 2, "-", 10, 2, 3, 3, 3],
[I, 2, 10, 10, "-", 4, 10, 10, 2],
[K, 10, 10, 2, 4, "-", 10, 10, 3],
[L, 10, 10, 3, 10, 10, "-", 2, 2],
[P, 10, 1, 3, 10, 10, 2, "-", 10],
[S, 10, 10, 3, 2, 3, 2, 10, "-"]);
for ($i=0;$i<=$#verwbez;$i++) {
for ($j=0; $j<=$#{$verwbez[$i]};$j++) {
while ($verwbez[$i][$j] >=7) {
$verwbez[$i][$j] = "-";
}
}
}
Basically just altering the matrix, every 10 is replaced with a "-". Now I want to find all solutions that are beneath 10 and contain 4 districts where always two cities are linked together. But unfortunately, I do not know how to proceed/start...
You're unlikely to get someone to implement the Branch and Bound algorithm for you. However, the following stackoverflow post, TSP - branch and bound, has some links to some helpful resources:
Optimal Solution for TSP using Branch and Bound
B&B Implementations for the TSP -
Part 1: A solution with nodes containing partial tours with
constraints
B&B Implementations for the TSP - Part 2: Single threaded solution with many inexpensive nodes
Since you appear new to perl, we can give you some quick tips
Always include use strict; and use warnings at the top of each and every perl script
Use the range operator .. when creating an incrementing for loop.
Your while loop should actually be an if statement.
For increased style, consider using qw() when initializing a mixed word/number array, especially since it will allow you to easily align a multidimensional array's elements
Your first goal for a project like this should be to create a method to output your multidimensional array in a readable format, so you can observe and verify the changes that you're making.
All of that gives the following changes:
use strict;
use warnings;
my #verwbez = (
[qw(- B E G I K L P S )],
[qw(B - 10 10 2 10 10 10 10)],
[qw(E 10 - 2 10 10 10 1 10)],
[qw(G 10 2 - 10 2 3 3 3 )],
[qw(I 2 10 10 - 4 10 10 2 )],
[qw(K 10 10 2 4 - 10 10 3 )],
[qw(L 10 10 3 10 10 - 2 2 )],
[qw(P 10 1 3 10 10 2 - 10)],
[qw(S 10 10 3 2 3 2 10 - )],
);
for my $i (0 .. $#verwbez) {
for my $j (0 .. $#{$verwbez[$i]}) {
if ($verwbez[$i][$j] =~ /\d/ && $verwbez[$i][$j] >= 7) {
$verwbez[$i][$j] = ".";
}
}
}
for (#verwbez) {
for (#$_) {
printf "%2s ", $_;
}
print "\n";
}
Outputs:
- B E G I K L P S
B - . . 2 . . . .
E . - 2 . . . 1 .
G . 2 - . 2 3 3 3
I 2 . . - 4 . . 2
K . . 2 4 - . . 3
L . . 3 . . - 2 2
P . 1 3 . . 2 - .
S . . 3 2 3 2 . -
Note that B has only 1 city it's near to. So if the goal was solving the TSP, then there isn't a trivial solution. However, given there are only 8 cities and (n-1)! circular permutations. That gives us just 5,040 permutations, so using brute force would totally work for finding a lowest cost solution.
use strict;
use warnings;
use Algorithm::Combinatorics qw(circular_permutations);
my #verwbez = ( ... already defined ... );
# Create a cost between two cities hash:
my %cost;
for my $i (1..$#verwbez) {
for my $j (1..$#{$verwbez[$i]}) {
$cost{ $verwbez[$i][0] }{ $verwbez[0][$j] } = $verwbez[$i][$j] if $i != $j;
}
}
# Determine all Routes and their cost (sorted)
my #cities = keys %cost;
my #perms = circular_permutations(\#cities);
my #cost_with_perm = sort {$a->[0] <=> $b->[0]} map {
my $perm = $_;
my $prev = $perm->[-1];
my $cost = 0;
for (#$perm) {
$cost += $cost{$_}{$prev};
$prev = $_
}
[$cost, $perm]
} #perms;
# Print out lowest cost routes:
print "Lowest cost is: " . $cost_with_perm[0][0] . "\n";
for (#cost_with_perm) {
last if $_->[0] > $cost_with_perm[0][0];
print join(' ', #{$_->[1]}), "\n";
}
It ends up there are only 2 lowest cost solutions to this setup, and they're mirror images of each other, which makes sense since we didn't filter by direction in our circular permutations. Am intentionally not stating what they are here.

How can I improve Perl compare performance

I have an array ref of about 50,000 users. I want to go through all those users and compare each one to all the others in order to build a weighted list of matches (if the name is an exact match it's worth x, a partial match is worth y etc).
After going through the list and doing all the checks, I then want to go get the 10 highest weighted matches. Here is sort of a example of what I'm doing to help explain:
#!/usr/bin/perl
######################################################################
# Libraries
# ---------
use strict;
use warnings;
my $users = [];
$users->[0]{'Name'} = 'xxx';
$users->[0]{'Address'} = 'yyyy';
$users->[0]{'Phone'} = 'xxx';
$users->[1]{'Name'} = 'xxx';
$users->[1]{'Address'} = 'yyyy';
$users->[1]{'Phone'} = 'xxx';
$users->[2]{'Name'} = 'xxx';
$users->[3]{'Address'} = 'yyyy';
$users->[4]{'Phone'} = 'xxx';
foreach my $user_to_check (#$users) {
my $matched_users = [];
foreach my $user (#$users) {
$user_to_check->{'Weight'} = 0;
if (lc($user_to_check->{'Name'}) eq lc($user->{'Name'})) {
$user_to_check->{'Weight'} = ($user_to_check->{'Weight'} + 10);
} elsif ((length($user_to_check->{'Name'}) > 2) && (length($user->{'Name'}) > 2) && ($user_to_check->{'Name'} =~ /\Q$user->{'Name'}\E/i)) {
$user_to_check->{'Weight'} = ($user_to_check->{'Weight'} + 5);
}
if (lc($user_to_check->{'Address'}) eq lc($user->{'Address'})) {
.....
}
if ($user_to_check->{'Weight'} > 0) {
# We have matches, add to matched users
push (#$matched_users,$user);
}
}
# Now we want to get just the top 10 highest matching users
foreach my $m_user (sort { $b->{'Weight'} <=> $a->{'Weight'} } #$matched_users ) {
last if $counter == 10;
.... # Do stuff with the 10 we want
}
}
The problem is, it's sooo slow. It takes more than a day to run (and I've tried it on multiple machines). I know that the "sort" is a killer but I did also try inserting the results into a tmp mysql table and then at the end instead of doing the Perl sort, I just did an order by select, but the difference in time was very minor.
As I'm just going through a existing data structure and comparing it I'm not sure what I could do (if anything) to speed it up. I'd appreciate any advise.
O(n²)
You compare each element in #$users against every element in there. That is 5E4² = 2.5E9 comparisions. For example, you wouldn't need to compare an element against itself. You also don't need to compare an element against one you have already compared. I.e. in this comparision table
X Y Z
X - + +
Y - - +
Z - - -
there only have to be three comparision to have compared each element against all others. The nine comparisions you are doing are 66% unneccessary (asymptotically: 50% unneccessary).
You can implement this by looping over indices:
for my $i (0 .. $#$users) {
my $userA = $users->[$i];
for my $j ($i+1 .. $#$users) {
my $userB = $users->[$j];
...;
}
}
But this means that upon match, you have to increment the weight of both matching users.
Do things once, not 100,000 times
You lowercase the name of each user 1E5 times. This is 1E5 - 1 times to much! Just do it once for each element, possibly at data input.
As a side note, you shouldn't perform lowercasing, you should do case folding. This is available since at least v16 via the fc feature. Just lowercasing will be buggy when you have non-english data.
use feature 'fc'; # needs v16
$user->[NAME] = fc $name;
or
use Unicode::CaseFold;
$user->[NAME] = fc $name;
When hashes are not fast enough
Hashes are fast, in that a lookup takes constant time. But a single hash lookup is more expensive than an array access. As you only have a small, predefined set of fields, you can use the following trick to use hash-like arrays:
Declare some constants with the names of your fields that map to indices, e.g.
use constant {
WEIGHT => 0,
NAME => 1,
ADDRESS => 2,
...;
};
And then put your data into arrays:
$users->[0][NAME] = $name; ...;
You can access the fields like
$userA->[WEIGHT] += 10;
While this looks like a hash, this is actually a safe method to access only certain fields of an array with minimal overhead.
Regexes are slow
Well, they are quite fast, but there is a better way to determine if a string is a substring of another string: use index. I.e.
$user_to_check->{'Name'} =~ /\Q$user->{'Name'}\E/i
Can be written as
(-1 != index $user_to_check->{Name}, $user->{Name})
assuming both are already lowercased case folded.
Alternative implementation
Edit: this appears to be invalidated by your edit to your question. This assumed you were trying to find some global similarities, not to obtain a set of good matches for each user
Implementing these ideas would make your loops look somewhat like
for my $i (0 .. $#$users) {
my $userA = $users->[$i];
for my $j ($i+1 .. $#$users) {
my $userB = $users->[$j];
if ($userA->[NAME] eq $userB->[NAME]) {
$userA->[WEIGHT] += 10;
$userB->[WEIGHT] += 10;
} elsif ((length($userA->[NAME]) > 2) && (length($userB->[NAME]) > 2))
$userA->[WEIGHT] += 5 if -1 != index $userA->[NAME], $userB->[NAME];
$userB->[WEIGHT] += 5 if -1 != index $userB->[NAME], $userA->[NAME];
}
if ($userA->[ADDRESS] eq $userB->[ADDRESS]) {
..... # More checks
}
}
}
my (#top_ten) = (sort { $b->[WEIGHT] <=> $a->[WEIGHT] } #$users)[0 .. 9];
Divide and conquer
The task you show is highly parallelizable. If you have the memory, using threads is easy here:
my $top10 = Thread::Queue->new;
my $users = ...; # each thread gets a copy of this data
my #threads = map threads->create(\&worker, $_), [0, int($#$users/2)], [int($#$users/2)+1, $#users];
# process output from the threads
while (defined(my $ret = $top10->dequeue)) {
my ($user, #top10) = #$ret;
...;
}
$_->join for #threads;
sub worker {
my ($from, $to) = #_;
for my $i ($from .. $to) {
my $userA = $users->[$i];
for $userB (#$users) {
...;
}
my #top10 = ...;
$top10->enqueue([ $userA, #top10 ]); # yield data to the main thread
}
}
You should probably return your output via a queue (as shown here), but do as much processing as possible inside the threads. With more advanced partitioning of the workload, should spawn as many threads as you have processors available.
But if any kind of pipelining, filtering or caching can decrease the number of iterations needed in the nested loops, you should do such optimizations (think map-reduce-style programming).
Edit: Elegantly reducing complexity through hashes for deduplication
What we are essentially doing is calculating a matrix of how good our records match, e.g.
X Y Z
X 9 4 5
Y 3 9 2
Z 5 2 9
If we assume that X is similar to Y implies Y is similar to X, then the matrix is symmetric, and we only need half of it:
X Y Z
X \ 4 5
Y \ 2
Z \
Such a matrix is equivalent to a weighted, undirected graph:
4 X 5 | X – Y: 4
/ \ | X – Z: 5
Y---Z | Y – Z: 2
2 |
Therefore, we can represent it elegantly as a hash of hashes:
my %graph;
$graph{X}{Y} = 4;
$graph{X}{Z} = 5;
$graph{Y}{Z} = 2;
However, such a hash structure implies a direction (from node X to node Y). To make querying the data easier, we might as well include the other direction too (due to the implementation of hashes, this won't lead to a large memory increase).
$graph{$x}{$y} = $graph{$y}{$x} += 2;
Because each node is now only connected to those nodes it is similar to, we don't have to sort through 50,000 records. For the 100th record, we can get the ten most similar nodes like
my $node = 100;
my #top10 = (sort { $graph{$node}{$b} <=> $graph{$node}{$a} } keys %{ $graph{$node} })[0 .. 9];
This would change the implementation to
my %graph;
# build the graph, using the array indices as node ID
for my $i (0 .. $#$users) {
my $userA = $users->[$i];
for my $j ($i+1 .. $#$users) {
my $userB = $users->[$j];
if ($userA->[NAME] eq $userB->[NAME]) {
$graph{$j}{$i} = $graph{$i}{$j} += 10;
} elsif ((length($userA->[NAME]) > 2) && (length($userB->[NAME]) > 2))
$graph{$j}{$i} = $graph{$i}{$j} += 5
if -1 != index $userA->[NAME], $userB->[NAME]
or -1 != index $userB->[NAME], $userA->[NAME];
}
if ($userA->[ADDRESS] eq $userB->[ADDRESS]) {
..... # More checks
}
}
}
# the graph is now fully populated.
# do somethething with each top10
while (my ($node_id, $similar) = each %graph) {
my #most_similar_ids = (sort { $similar->{$b} <=> $similar->{$a} } keys %$similar)[0 .. 9];
my ($user, #top10) = #$users[ $node_id, #most_similar_ids ];
...;
}
Building the graph this way should take half the time of naive iteration, and if the average number of edges for each node is low enough, going through similar nodes should be considerably faster.
Parallelizing this is a bit harder, as the graph each thread produces has to be combined before the data can be queried. For this, it would be best for each thread to perform the above code with the exception that the iteration bounds are given as parameters, and that only one edge should produced. The pair of edges will be completed in the combination phase:
THREAD A [0 .. 2/3] partial
\ graph
=====> COMBINE -> full graph -> QUERY
/ partial
THREAD B [2/3 .. 1] graph
# note bounds recognizing the triangular distribution of workload
However, this is only beneficial if there are only very few similar nodes for a given node, as combination is expensive.

Why does or equals break my REPL in coffeescript?

If I open up a REPL with Coffeescript 1.3.3 and type:
y ?= 5
or
y = 5
y ||= 6
I get an an error that y isn't defined... For compiling the last one works, but the first still errors.. Is this expected?
coffee> y = 5
5
coffee> y ||= 6
Error: In repl, the variable "y" can't be assigned with ||= because it has not been defined.
This is because each evaluation involves separate compilation. The workaround is to access it as a property of the global object.
coffee> y = 5
5
coffee> global.y
5
coffee> #y
5
coffee> #y &&= 6
6
coffee> y
6