Perl -> Avoiding unnecessary method calls - perl

I have to read log files of a store. The log shows the item id and the word "sold" after it. So I made a script to read this file, counting how many times a word "sold" appears for each item id. Turns out that there are many "owners" for the items. That is, there is a relation between "owner_id" (a data in my DB) and "item_id". Im interested in knowing how many items owners sell per day, so I create a "%item_id_owner_map":
my %item_id_sold_times;
my %item_id_owner_map;
open my $infile, "<", $file_location or die("$!: $file_location");
while (<$infile>) {
if (/item_id:(\d+)\s*,\s*sold/) {
my $item_id = $1;
$item_id_sold_times{$item_id}++;
my $owner_ids =
Store::Model::Map::ItemOwnerMap->fetch_by_keys( [$item_id] )
->entry();
for my $owner_id (#$owner_ids) {
$item_id_owner_map{$owner_id}++;
}
}
}
close $infile;
The "Store::Model::Map::ItemOwnerMap->fetch_by_keys( [$item_id] )->entry();" method takes item_id or ids as input, and gives back owner_id as output.
Everything looks great but actually, you will see that every time Perl finds a regex match (that is, every time the "if" condition applies), my script will call "Store::Model::Map::ItemOwnerMap->fetch_by_keys" method, which is very expensive, as these log files are very very long.
Is there a way to make my script more efficient? If possible, I only want to call my Model method once.
Best!

Separate your logic into two loops:
while (<$infile>) {
if (/item_id:(\d+)\s*,\s*sold/) {
my $item_id = $1;
$item_id_sold_times{$item_id}++;
}
}
my #matched_items_ids = keys %item_id_sold_times;
my $owner_ids =
Store::Model::Map::ItemOwnerMap->fetch_by_keys( \#matched_item_ids )
->entry();
for my $owner_id (#$owner_ids) {
$item_id_owner_map{$owner_id}++;
}
I don't know if the entry() call is correct, but the general shape of that code should do it for you.
In general databases are good at fetching sets of rows, so you're right to minimise the calls to fetch from the DB.

Related

Mirc script to find exact match in customer list

I am using this to find customer name in text file. Names are each on a separate line. I need to find exact name. If searching for Nick specifically it should find Nick only but my code will say found even if only Nickolson is in te list.
On*:text:*!Customer*:#: {
if ($read(system\Customer.txt,$2)) {
.msg $chan $2 Customer found in list! | halt }
else { .msg $chan 4 $2 Customer not found in list. | halt }
}
You have to loop through every matching line and see if the line is an exact match
Something like this
On*:text:*!Custodsddmer*:#: {
var %nick
; loop over all lines that contains nick
while ($read(customer.txt, nw, *nick*, $calc($readn + 1))) {
; check if the line is an exact match
if ($v1 == nick) {
%nick = $v1
; stop the loop because a result is found
break;
}
}
if (%nick == $null) {
.msg $chan 4 $2 Customer not found in list.
}
else{
.msg $chan $2 Customer found in list!
}
You can find more here: https://en.wikichip.org/wiki/mirc/text_files#Iterating_Over_Matches
If you're looking for exact match in a new line separate list, then you can use the 'w' switch without using wildcard '*' character.
From mIRC documentation
$read(filename, [ntswrp], [matchtext], [N])
Scans the file info.txt for a line beginning with the word mirc and
returns the text following the match value. //echo $read(help.txt, w,
*help*)
Because we don't want the wildcard matching, but a exact match, we would use:
$read(customers.txt, w, Nick)
Complete Code:
ON *:TEXT:!Customer *:#: {
var %foundInTheList = $read(system\Customer.txt, w, $2)
if (%foundInTheList) {
.msg # $2 Customer found in list!
}
else {
.msg 4 # $2 Customer not found in list.
}
}
Few remarks on Original code
Halting
halt should only use when you forcibly want to stop any future processing to take place. In most cases, you can avoid it, by writing you code flow in a way it will behave like that without explicitly using halting.
It will also resolve new problems that may arise, in case you will want to add new code, but you will wonder why it isn't executing.. because of the darn now forgotten halt command.
This will also improve you debugging, in the case it will not make you wonder on another flow exit, without you knowing.
Readability
if (..) {
.... }
else { .. }
When considering many lines of codes inside the first { } it will make it hard to notice the else (or elseif) because mIRC remote parser will put on the same identification as the else line also the line above it, which contains the closing } code. You should almost always few extra code in case of readability, especially which it costs new nothing!, as i remember new lines are free of charge.
So be sure the to have the rule of thump of every command in a new line. (that includes the closing bracket)
Matching Text
On*:text:*!Customer*:#: {
The above code has critical problem, and bug.
Critical: Will not work, because on*:text contains no space between on and *:text
Bug: !Customer will match EVERYTHING-BEFORE!customerANDAFTER <NICK>, which is clearly not desired behavior. What you want is :!Customer *: will only match if the first word was !customer and you must enter at least another text, because I've used [SPACE]*.

sorting nodes according to its members

I don't know where to start. I have a long list of nodes comprised of descendant members, for which I want to make a linked tree, a plain text database in the form of child/parent. For example:
N115713
N115713 N96394
N117904 N18574
N140517 N171639 N179536 N208718 N210073 N226737 N4647 N80403
N171639
N171639 N18574
N171639 N208718
N171639 N208718 N210073
N171639 N208718 N210073 N3690
N171639 N208718 N210073 N96585
N171639 N210073
N18574
N18574 N80403
Obviously, "N115713" will go downstream of "N115713 N96394" but I seem unable to turn that recognition into an algorithm. There are several hundred nodes having up to several dozen members. Pointers to get started? I'm using perl.
Thanks!
UPDATE: Well, I have an idea but haven't been able to implement it yet. I'm searching each line in turn for the other lines it's a "member" of then selecting that result which has the next highest number of members as its parent.
Since the main problem here is to check if the input data is consistent and does not have cycles, I recommend using some graph-theoretical module, for example Graph.
If your data allows a child to have multiple parents you have to check if the directed graph produced from your data does not have a cycle.
Otherwise, if your data should be a tree, you have to check that the undirected graph does not have a cycle.
I sketched up a simple script, that implements these checks and outputs child/parent pairs, it is pretty self explanatory:
use strict;use warnings;
use Graph;
my $g=Graph->new(directed=>1);
while(<>) {
chomp;
my #fields=split;
# this assumes that each line starts with a parent and goes down through its descendants
# adjust the logic to your needs
my $parent;
for my $child(#fields) {
$g->add_vertex($child);
if ($parent) {
$g->add_edge($child,$parent);
}
$parent=$child;
}
}
# check if we have a DAG
my #cycle = $g->find_a_cycle();
if (#cycle) {
printf "The directed graph has a cycle: %s\n", join ',', #cycle
}
# check if we have a tree
my $un_g = $g->undirected_copy();
#cycle = $un_g->find_a_cycle();
if (#cycle) {
printf "The undirected graph has a cycle: %s\n", join ',', #cycle
}
print "child,parent\n";
for my $edge(sort { $a->[0] cmp $b->[0] } $g->edges) {
printf "%s,%s\n", $edge->[0], $edge->[1];
}
And the output for your data:
The undirected graph has a cycle: N179536,N171639,N208718
child,parent
N115713,N96394
N117904,N18574
N140517,N171639
N171639,N208718
N171639,N18574
N171639,N179536
N171639,N210073
N179536,N208718
N18574,N80403
N208718,N210073
N210073,N226737
N210073,N96585
N210073,N3690
N226737,N4647
N4647,N80403

How can I get Perl DBI's selectrow_hashref to return a new row each iteration?

I am trying to use DBI's selectrow_hashref instead of fetchrow_hashref in order to save a couple lines of code, but it keeps returning the same row of data over and over.
my $select="SELECT * FROM table";
while (my ($user_ref) = $dbh->selectrow_hashref()) {
# $user_ref is the same each time!
}
When I use fetchrow_hashref, everything is fine, and each iteration I get new data.
my $select="SELECT * FROM table";
my $sth = $dbh->prepare($select) || die "prepare: $select: $DBI::errstr";
$sth->execute() || die "execute: $select: $DBI::errstr";
while (my ($user_ref) = $sth->fetchrow_hashref()) {
# works great, new data in $user_ref each iteration
}
Pray tell, what am I doing wrong? Is selectrow_hashref only intended to retrieve a single record? It doesn't seem that way in the doc.
Is selectrow_hashref only intended to retrieve a single record?
Yes.
It doesn't seem that way in the doc.
Well, that documentation says:
It returns the first row of data from the statement.
Which seems pretty clear to me.
Are you looking for selectall_hashref instead?
Update: Actually, I think you want selectall_array:
my $select='SELECT * FROM table';
foreach my $user_ref ($dbh->selectall_array($select, { Slice => {} })) {
# $user_ref is a hash ref
say $user_ref->{some_column};
}

WWW::Mechanize::Firefox looping though links

I am using a foreach to loop through links. Do I need a $mech->back(); to continue the loop or is that implicit.
Furthermore do I need a separate $mech2 object for nested for each loops?
The code I currently have gets stuck (it does not complete) and ends on the first page where td#tabcolor3 is not found.
foreach my $sector ($mech->selector('a.link2'))
{
$mech->follow_link($sector);
foreach my $place ($mech->selector('td#tabcolor3'))
{
if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
{
$mech->follow_link($place);
print $_->{innerHTML}, '\n'
for $mech->selector('td.dataCell');
$mech->back();
}
else
{
$mech->back();
}
}
You cannot access information from a page when it is no longer on display. However, the way foreach works is to build the list first before it is iterated through, so the code you have written should be fine.
There is no need for the call to back as the links are absolute. If you had used click then there must be a link in the page to click on, but with follow_link all you are doing is going to a new URL.
There is also no need to check the number of links to follow, as a for loop over an empty list will simply not be executed.
To make things clearer I suggest that you assign the results of selector to an array before the loop.
Like this
my #sectors = $mech->selector('a.link2');
for my $sector (#sectors) {
$mech->follow_link($sector);
my #places = $mech->selector('td#tabcolor3');
for my $place (#places) {
$mech->follow_link($place);
print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
}
}
Update
My apologies. It seems that follow_link is finicky and needs to follow a link on the current page.
I suggest that you extract the href attribute from each link and use get instead of follow_link.
my #selectors = map $_->{href}, $mech->selector('a.link2');
for my $selector (#selectors) {
$mech->get($selector);
my #places = map $_->{href}, $mech->selector('td#tabcolor3');
for my $place (#places) {
$mech->get($place);
print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
}
}
Please let me know whether this works on the site you are connecting to.
I recommend to use separate $mech object for this:
foreach my $sector ($mech->selector('a.link2'))
{
my $mech = $mech->clone();
$mech->follow_link($sector);
foreach my $place ($mech->selector('td#tabcolor3'))
{
if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
{
my $mech = $mech->clone();
$mech->follow_link($place);
print $_->{innerHTML}, '\n'
for $mech->selector('td.dataCell');
#$mech->back();
}
# else
# {
# $mech->back();
# }
}
I am using WWW:Mechanize::Firefox to loop over a bunch of URLs with loads of Javascript. The page does not render immediately so need test if a particular page element is visible (similar to suggestion in Mechanize::Firefox documentation except 2 xpaths in the test) before deciding next action.
The page eventually renders a xpath to 'no info' or some wanted stuff after about 2-3 seconds. If no info we go to next URL. I think there is some sort of race condition with both xpaths not existing at once causing the MozRepl::RemoteObject: TypeError: can't access dead object error intermittently (at the sleep 1 in the loop oddly enough).
My solution that seems to work/improve reliability is to enclose all the $mech->getand$mech->is_visible in an eval{}; like this:
eval{
$mech->get("$url");
$retries = 15; #test to see if element visible = page complete
while ($retries-- and ! $mech->is_visible( xpath => $xpath_btn ) and ! $mech->is_visible( xpath => $xpath_no_info )){
sleep 1;
};
last if($mech->is_visible( xpath => $xpath_no_info) ); #skip rest if no info page
};
Others might suggest improvements on this.

DBIx::Class infinite results

Before I describe the details, the problem is, I run a $c->model('ResultName')->search({k=>v}) and when I loop on the results of it's has_many relation, there's only one in the database, yet it loops forever. I've tried googling and found one person who solved the problem, but with too brief an explanation for me. His post was here.
Basically I have 3 tables
Orders, OrderItems and Items. Items are what's available. Orders are collections of Items that one person wants. So I can tie them all together with something like
select oi.order_item_id,oi.order_id,i.item_id from orders as o inner join order_items as oi on oi.order_id = o.order_id inner join items as i on i.item_id = oi.item_id where blah blah blah....
I ran DBIx::Class::Schema::Loader and got what seemed like proper relationships
MyApp::Schema::Result::Order->has_many('order_items'...)
MyApp::Schema::Result::Items->has_many('order_items'...)
MyApp::Schema::Result::OrderItems->belongs_to('items'...)
in a test I try
my $orders = $schema->resultset('Order')->search({
'user_id'=>1
});
while(my $o = $orders->next) {
while(my $oi = $o->order_items->next) {
warn('order_item_id: '.$oi->order_item);
}
}
It loops infinitely on the inner loop
Your solution works but it loses the niceties of next in that it is an iterator. You are in effect loading all the rows as objects into memory and looping over them.
The issue, as you said is that $o->order_items->next recreates the order_items resultset each time. You should do this:
my $orders = $schema->resultset('Order')->search({
'user_id'=>1
});
while(my $o = $orders->next) {
my $oi_rs = $o->order_items;
while(my $oi = $oi_rs->next) {
warn('order_item_id: '.$oi->order_item);
}
}
Reading more carfully in the ResultSet documentation for "next" I found
"Note that you need to store the
resultset object, and call next on it.
Calling resultset('Table')->next
repeatedly will always return the
first record from the resultset."
from here
When I changed the loops to
for my $o ($orders->all) {
for my $oi ($o->order_items->all) {
# stuff
}
}
all is well.