sorting nodes according to its members - perl

I don't know where to start. I have a long list of nodes comprised of descendant members, for which I want to make a linked tree, a plain text database in the form of child/parent. For example:
N115713
N115713 N96394
N117904 N18574
N140517 N171639 N179536 N208718 N210073 N226737 N4647 N80403
N171639
N171639 N18574
N171639 N208718
N171639 N208718 N210073
N171639 N208718 N210073 N3690
N171639 N208718 N210073 N96585
N171639 N210073
N18574
N18574 N80403
Obviously, "N115713" will go downstream of "N115713 N96394" but I seem unable to turn that recognition into an algorithm. There are several hundred nodes having up to several dozen members. Pointers to get started? I'm using perl.
Thanks!
UPDATE: Well, I have an idea but haven't been able to implement it yet. I'm searching each line in turn for the other lines it's a "member" of then selecting that result which has the next highest number of members as its parent.

Since the main problem here is to check if the input data is consistent and does not have cycles, I recommend using some graph-theoretical module, for example Graph.
If your data allows a child to have multiple parents you have to check if the directed graph produced from your data does not have a cycle.
Otherwise, if your data should be a tree, you have to check that the undirected graph does not have a cycle.
I sketched up a simple script, that implements these checks and outputs child/parent pairs, it is pretty self explanatory:
use strict;use warnings;
use Graph;
my $g=Graph->new(directed=>1);
while(<>) {
chomp;
my #fields=split;
# this assumes that each line starts with a parent and goes down through its descendants
# adjust the logic to your needs
my $parent;
for my $child(#fields) {
$g->add_vertex($child);
if ($parent) {
$g->add_edge($child,$parent);
}
$parent=$child;
}
}
# check if we have a DAG
my #cycle = $g->find_a_cycle();
if (#cycle) {
printf "The directed graph has a cycle: %s\n", join ',', #cycle
}
# check if we have a tree
my $un_g = $g->undirected_copy();
#cycle = $un_g->find_a_cycle();
if (#cycle) {
printf "The undirected graph has a cycle: %s\n", join ',', #cycle
}
print "child,parent\n";
for my $edge(sort { $a->[0] cmp $b->[0] } $g->edges) {
printf "%s,%s\n", $edge->[0], $edge->[1];
}
And the output for your data:
The undirected graph has a cycle: N179536,N171639,N208718
child,parent
N115713,N96394
N117904,N18574
N140517,N171639
N171639,N208718
N171639,N18574
N171639,N179536
N171639,N210073
N179536,N208718
N18574,N80403
N208718,N210073
N210073,N226737
N210073,N96585
N210073,N3690
N226737,N4647
N4647,N80403

Related

Irssi Loop Through Users in Channel

I am trying to piece together other scripts i have seen to be able to loop through a list of users on the channel.
Here is what i have come up with
my $channel = #_;
foreach my $nick ($channel->nicks()) {
$server->command("msg $chatchannel $nick->{nick}");
}
But all i get from that is
Can't call method "nicks" without a package or object reference at
/root/.irssi/scripts/test.pl line 64.
which is referring to
$channel->nicks()
Am i going about this the wrong way? or should this be working? I have seen many other scripts using $channel->nicks() so i know it must work?
Edit
I should also mention that this is already define further up in the code
my ($server, $msg, $target, $channel, $chatnet) = #_;
But when i try it with that $channel variable i get
Can't locate object method "nicks" via package
"mattigins#mattigins.tmi.twitch.tv" (perhaps you forgot to load
"mattigins#mattigins.tmi.twitch.tv"?) at /root/.irssi/scripts/test.pl
line 64.
Since the left hand side (LHS) of my $channel = #_; is a scalar it imposes scalar context on the #_ array. This means that the length of the array gets assigned to $channel. You want to assign with my ($channel) = #_; so that the LHS is in list context and that the first element in the #_ array gets assigned to your scalar.
Ref:
What is the difference between the scalar and list contexts in Perl?
Scalar and List context in Perl
I figured out how to do it.
$chan = $server->channel_find('#channel');
foreach my $nick ($chan->nicks()) {
$nickname = $nick->{nick};
}

Perl -> Avoiding unnecessary method calls

I have to read log files of a store. The log shows the item id and the word "sold" after it. So I made a script to read this file, counting how many times a word "sold" appears for each item id. Turns out that there are many "owners" for the items. That is, there is a relation between "owner_id" (a data in my DB) and "item_id". Im interested in knowing how many items owners sell per day, so I create a "%item_id_owner_map":
my %item_id_sold_times;
my %item_id_owner_map;
open my $infile, "<", $file_location or die("$!: $file_location");
while (<$infile>) {
if (/item_id:(\d+)\s*,\s*sold/) {
my $item_id = $1;
$item_id_sold_times{$item_id}++;
my $owner_ids =
Store::Model::Map::ItemOwnerMap->fetch_by_keys( [$item_id] )
->entry();
for my $owner_id (#$owner_ids) {
$item_id_owner_map{$owner_id}++;
}
}
}
close $infile;
The "Store::Model::Map::ItemOwnerMap->fetch_by_keys( [$item_id] )->entry();" method takes item_id or ids as input, and gives back owner_id as output.
Everything looks great but actually, you will see that every time Perl finds a regex match (that is, every time the "if" condition applies), my script will call "Store::Model::Map::ItemOwnerMap->fetch_by_keys" method, which is very expensive, as these log files are very very long.
Is there a way to make my script more efficient? If possible, I only want to call my Model method once.
Best!
Separate your logic into two loops:
while (<$infile>) {
if (/item_id:(\d+)\s*,\s*sold/) {
my $item_id = $1;
$item_id_sold_times{$item_id}++;
}
}
my #matched_items_ids = keys %item_id_sold_times;
my $owner_ids =
Store::Model::Map::ItemOwnerMap->fetch_by_keys( \#matched_item_ids )
->entry();
for my $owner_id (#$owner_ids) {
$item_id_owner_map{$owner_id}++;
}
I don't know if the entry() call is correct, but the general shape of that code should do it for you.
In general databases are good at fetching sets of rows, so you're right to minimise the calls to fetch from the DB.

WWW::Mechanize::Firefox looping though links

I am using a foreach to loop through links. Do I need a $mech->back(); to continue the loop or is that implicit.
Furthermore do I need a separate $mech2 object for nested for each loops?
The code I currently have gets stuck (it does not complete) and ends on the first page where td#tabcolor3 is not found.
foreach my $sector ($mech->selector('a.link2'))
{
$mech->follow_link($sector);
foreach my $place ($mech->selector('td#tabcolor3'))
{
if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
{
$mech->follow_link($place);
print $_->{innerHTML}, '\n'
for $mech->selector('td.dataCell');
$mech->back();
}
else
{
$mech->back();
}
}
You cannot access information from a page when it is no longer on display. However, the way foreach works is to build the list first before it is iterated through, so the code you have written should be fine.
There is no need for the call to back as the links are absolute. If you had used click then there must be a link in the page to click on, but with follow_link all you are doing is going to a new URL.
There is also no need to check the number of links to follow, as a for loop over an empty list will simply not be executed.
To make things clearer I suggest that you assign the results of selector to an array before the loop.
Like this
my #sectors = $mech->selector('a.link2');
for my $sector (#sectors) {
$mech->follow_link($sector);
my #places = $mech->selector('td#tabcolor3');
for my $place (#places) {
$mech->follow_link($place);
print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
}
}
Update
My apologies. It seems that follow_link is finicky and needs to follow a link on the current page.
I suggest that you extract the href attribute from each link and use get instead of follow_link.
my #selectors = map $_->{href}, $mech->selector('a.link2');
for my $selector (#selectors) {
$mech->get($selector);
my #places = map $_->{href}, $mech->selector('td#tabcolor3');
for my $place (#places) {
$mech->get($place);
print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
}
}
Please let me know whether this works on the site you are connecting to.
I recommend to use separate $mech object for this:
foreach my $sector ($mech->selector('a.link2'))
{
my $mech = $mech->clone();
$mech->follow_link($sector);
foreach my $place ($mech->selector('td#tabcolor3'))
{
if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
{
my $mech = $mech->clone();
$mech->follow_link($place);
print $_->{innerHTML}, '\n'
for $mech->selector('td.dataCell');
#$mech->back();
}
# else
# {
# $mech->back();
# }
}
I am using WWW:Mechanize::Firefox to loop over a bunch of URLs with loads of Javascript. The page does not render immediately so need test if a particular page element is visible (similar to suggestion in Mechanize::Firefox documentation except 2 xpaths in the test) before deciding next action.
The page eventually renders a xpath to 'no info' or some wanted stuff after about 2-3 seconds. If no info we go to next URL. I think there is some sort of race condition with both xpaths not existing at once causing the MozRepl::RemoteObject: TypeError: can't access dead object error intermittently (at the sleep 1 in the loop oddly enough).
My solution that seems to work/improve reliability is to enclose all the $mech->getand$mech->is_visible in an eval{}; like this:
eval{
$mech->get("$url");
$retries = 15; #test to see if element visible = page complete
while ($retries-- and ! $mech->is_visible( xpath => $xpath_btn ) and ! $mech->is_visible( xpath => $xpath_no_info )){
sleep 1;
};
last if($mech->is_visible( xpath => $xpath_no_info) ); #skip rest if no info page
};
Others might suggest improvements on this.

How do I create a perl sub by specifying its parse tree?

Given a CODE ref, is it possible to:
Access the parse tree of that CODE ref
Create a new CODE ref by specifying the parse tree of the CODE ref which can contain elements of the parse tree returned in 1
Normally we create subroutines in perl by specifying source code which is then parsed and converted into a parse tree.
I would like to write a perl function which can create a subroutine by specifying its parse tree, and that parse tree could be derived from another parse tree of some other subroutine.
Possible?
I don't know the full answer to your question, but I know that Data::Dumper can deparse a code reference. Looking at its documentation, I see that it uses B::Deparse to do the heavy lifting (the B:: modules are the ones that interact with the compiler). Unfortunately it seems that this only results in a textual representation of the coderef.
Instead I searched for Op on metacpan and got many more interesting possibilities. Since I am now far out of my depth in the deepest Perl magic, I will leave it to you to look over those results. Perhaps something will be useful.
This has nothing to do with opcodes, but it does enclose the same two variables in three different closures. The variables are enclosed within subroutines reminiscent of class get/set routines, and those closed vars are then shared by other closures via their access routine.
This is a response to the comment: I'm sure it will be necessary to access the underlying nodes in the parse tree so that I can create new closures which are closed over the same variables.
use strict;
use warnings;
use v5.14;
# create closed vars
my $v1 = access_closure(6);
my $v2 = access_closure(42);
# play with them
say "v1 ", &$v1;
say "v2 ", &$v2;
say "v1 ", &$v1(5);
say "v2 ", &$v2(43);
say "v1 ", &$v1;
say "v2 ", &$v2;
# create silly closures that use them
my $test1 = test_closure(2);
my $test2 = test_closure(17);
my $test3 = test_closure(50);
# play with those
&$test1;
&$test2;
&$test3;
# create the get/set routine for a closed var
sub access_closure {
my $val = shift;
return sub {
$val = shift if #_;
return $val;
}
}
# create a silly closure that encloses a control var and uses the two other vars
sub test_closure {
my $val = shift;
return sub {
say "\nval is $val";
printf "v1 is %2d, v2 is %2d\n",
&$v1, &$v2;
if (&$v1 < $val) {
say "Increment v1";
&$v1(&$v1+1);
}
if (&$v2 > $val) {
say "Decrement v2";
&$v2(&$v2-1);
}
printf "v1 is %2d, v2 is %2d\n",
&$v1, &$v2;
}
}

is there a return value to file::find in perl? How to tell if the find fails?

I'm trying to learn perl and in particular use the File::Find module to search through a directory tree of pictures to see if any filenames already existing match those on a camera. If the filename is there, I'll assume the file has already been transferred from this camera and won't process it any further. If the filename is not found, I'd like to take some sort of action on the file.
I've used find2perl to create a basic structure and it works for finding the file. But I can't seem to find a way to tell if the find failed. The File::Find::find doesn't seem to return any value to act upon, and I'm not sure how to act upon, or use any return value from the '&wanted' subdirectory that it's using.
What is the optimal method of determining if the File::Find::find was not successful in finding any matching files? Should I use a global flag variable that is set to a certain value at the top of the program and is only changed if the find is successful? I guess I could check that value after the find to see if it has changed (success) or not (nothing found).
Any ideas or suggestions?
Here's the basic structure:
# Traverse desired filesystems
File::Find::find({wanted => \&wanted}, '/files/multimedia/pictures/');
exit;
sub wanted {
my ($dev,$ino,$mode,$nlink,$uid,$gid);
if ( (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) && /^$ARGV[0]\z/si ) {
print("found: $name\n");
}
}
If no desired files are found, the wanted subroutine will not be called.
What I usually do is exactly what you suggested - set a flag to false before calling find, and have wanted set it to true.
That's in cases where I actually need to know that information - I haven't actually needed it for quite a while.
In fact, a count may be better than a flag since it delivers more information. The following code:
use File::Find;
sub wanted {
my ($dev,$ino,$mode,$nlink,$uid,$gid);
$quant += 1;
if ( (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) && /^$ARGV[0]\z/si ) {
print("found: $name\n");
}
}
$quant = 0;
File::Find::find({wanted => \&wanted}, '/nosuchdir');
print ("Found $quant files in /nosuchdir\n");
$quant = 0;
File::Find::find({wanted => \&wanted}, '/tmp');
print ("Found $quant files in /tmp\n");
generates the following output on my system:
Found 0 files in /nosuchdir
Found 39 files in /tmp
That way, a $count of zero means that no files were found, anything else tells you that there were files (and also tells you how many).