I am trying to extract data from website using perl API. The process is to use a list of uris as input. Then I extract related information for each uri from website. If the information for one uri is not present it dies. Some thing like the code below
my #tags = $c->posts_for(uri =>"$currentURI");
die "No candidate related articles\n" unless #tags;
Now, I don't want the program to stop if it doesn't get any tags. I want the program to skip that particular uri and go to the next available uri. How can i do it?
Thank you for your time and help.
Thank you,
Sammed
Well, assuming that you're inside a loop processing each of the URIs in turn, you should be able to do something like:
next unless #tags;
For example, the following program only prints lines that are numeric:
while (<STDIN>) {
next unless /^\d+$/;
print;
}
The loop processes every input line in turn but, when one is found that doesn't match that regular expression (all numeric), it restarts the loop (for the next input line) without printing.
The same method is used in that first code block above to restart the loop if there are no tags, moving to the next URI.
Besides the traditional flow control tools, i.e. next/last in a loop or return in a sub, one can use exceptions in perl:
eval {
die "Bad bad thing";
};
if ($#) {
# do something about it
};
Or just use Try::Tiny.
However, from the description of the task it seems next is enough (so I voted for #paxdiablo's answer).
The question is rather strange, but as near as I can tell, you are asking how to control the flow of your current loop. Of course, using die will cause your program to exit, so if you do not want that, you should not use die. Seems elementary to me, that's why it is a strange questions.
So, I assume you have a loop such as:
for my $currentURI (#uris) {
my #tags = $c->posts_for(uri =>"$currentURI");
die "No candidate related articles\n" unless #tags;
# do stuff with #tags here....
}
And if #tags is empty, you want to go to the next URI. Well, that's a simple thing to solve. There are many ways.
next unless #tags;
for my $tag (#tags) { ... stuff ... }
if (#tags) { .... }
Next is the simplest one. It skips to the end of the loop block and starts with the next iteration. However, using a for or if block causes the same behaviour, and so are equivalent. For example:
for my $currentURI (#uris) {
my #tags = $c->posts_for(uri =>"$currentURI");
for my $tag (#tags) {
do_something($tag);
}
}
Or even:
for my $currentURI (#uris) {
for my $tag ($c->posts_for(uri =>"$currentURI")) {
do_something($tag);
}
}
In this last example, we removed #tags all together, because it is not needed. The inner loop will run zero times if there are no "tags".
This is not really complex stuff, and if you feel unsure, I suggest you play around a little with loops and conditionals to learn how they work.
Related
What is the difference between the following code in perl dbi?
1.
while (my ($p1, $p2, $p3) = $sth->fetchrow_array()) {
# ... some code ...
}
2.
$sth->bind_columns(\my ($p1, $p2, $p3));
while ($sth->fetch) {
# ... some code ...
}
Both leads to the same result.
Perlmonks advise on bind variant.
I would appreciate if someone explain why.
the docs says, that binding is more efficient way to fetch data:
The binding is performed at a low level using Perl aliasing. Whenever
a row is fetched from the database $var_to_bind appears to be
automatically updated simply because it now refers to the same memory
location as the corresponding column value. This makes using bound
variables very efficient.
I searched SO before asking this question, I am completely new to this and have no idea how to handle these errors. By this I mean Perl language.
When I put this
%name->{#id[$#id]} = $temp;
I get the error Using a hash as a reference is deprecated
I tried
$name{#id[$#id]} = $temp
but couldn't get any results back.
Any suggestions?
The correct way to access an element of hash %name is $name{'key'}. The syntax %name->{'key'} was valid in Perl v5.6 but has since been deprecated.
Similarly, to access the last element of array #id you should write $id[$#id] or, more simply, $id[-1].
Your second variation should work fine, and your inability to retrieve the value has an unrelated reason.
Write
$name{$id[-1]} = 'test';
and
print $name{$id[-1]};
will display test correctly
%name->{...}
has always been buggy. It doesn't do what it should do. As such, it now warns when you try to use it. The proper way to index a hash is
$name{...}
as you already believe.
Now, you say
$name{#id[$#id]}
doesn't work, but if so, it's because of an error somewhere else in the code. That code most definitely works
>perl -wE"#id = qw( a b c ); %name = ( a=>3, b=>4, c=>5 ); say $name{#id[$#id]};"
Scalar value #id[$#id] better written as $id[$#id] at -e line 1.
5
As the warning says, though, the proper way to index an array isn't
#id[...]
It's actually
$id[...]
Finally, the easiest way to get the last element of an array is to use index -1. The means your code should be
$name{ $id[-1] }
The popular answer is to just not dereference, but that's not correct. In other words %$hash_ref->{$key} and %$hash_ref{$key} are not interchangeable. The former is required to access a hash reference nested as an element in another hash reference.
For many moons it has been common place to nest hash references. In fact there are several modules that parse data and store it in this kind of data structure. Instantly depreciating the behavior without module updates was not a good thing. At times my data is trapped in a nested hash and the only way to get it is to do something like.
$new_hash_ref = $target_hash_ref->{$key1}
$new_hash_ref2 = $target_hash_ref->{$key2}
$new_hash_ref3 = $target_hash_ref->{$key3}
because I can't
foreach my $i(keys(%$target_hash_ref)) {
foreach(%$target_hash_ref->{$i} {
#do stuff with $_
}
}
anymore.
Yes the above is a little strange, but creating new variables just to avoid accessing a data structure in a certain way is worse. Am I missing something?
If you want one item from an array or hash use $. For a list of items use # and % respectively. Your use of # as a reference returned a list instead of an item which perl may have interpreted as a hash.
This code demonstrates your reference of a hash of arrays.
#!/usr/bin perl -w
my %these = ( 'first'=>101,
'second'=>102,
);
my #those = qw( first second );
print $these{$those[$#those]};
prints '102'
I am working on a user login and am having trouble with the user creation part. My problem is that I am trying to check the input username against a text file to see if that username already exists. I can't seem to get it to compare the input username to the array that I have brought in. I have tried two different ways of accomplishing this. One using an array and another using something I read online that I don't quite understand. Any help or explanation would be greatly appreciated.
Here is my attempt using an array to compare off of
http://codepad.org/G7xmsf3z
Here is my second attempt
http://codepad.org/SbeqmdbG
In your first attempt, try to put the if inside of the loop:
foreach my $pair(#incomingarray) {
(my $name,my $value) = split (/:/, $pair);
if ($name eq $username) {
print p("Username is already taken, try again");
close(YYY);
print end_html();
}
else {
open(YYY, ">>password.txt");
print YYY $username.":".$hashpass."\n";
print p("Your account has been created sucessfully");
close(YYY);
print end_html();
}
}
In you second attempt, I think you should try and change the line:
if (%users eq $username) {
with this one:
if (defined $users{$username}) {
As has been stated above regarding locking the flatfile from other processes there is the issue with scaling too. the more users you have the slower the lookup will be.
I started years ago with a flat file, believing I would never scale enough to require a real database and didn't want to learn how to use mySQL for example. Eventually after flatfile corruptions and long lookup times I had no choice but to move to a database.
Later you will find yourself wanting to store user preferences and such, it's easy to add a new field to a database. Flatfile will end up having the overhead of splitting each line into separate fields.
I'd suggest you do it properly with a database.
As in my comment, you should not be using a flatfile to hold your user info. You should use a proper database that will handle concurrent access for you rather than having to understand and code up how to deal with all of that yourself!
If you insist on using an array, you can search it with grep() if it is not "too large":
if (grep /^$username:/, #incomingarray) {
print "user name '$username' is already registered, try again\n";
}
else {
print "user name '$username' is not already registered\n";
}
I see some other problems in your code as well.
You should always prefer lexical (my) variables over package (our) variables.
Why do you think (erroneously) that $name and $username cannot be lexical variables?
You should always use the 3-arg form of open() and check its return value like in your 2nd code example. Your open() in the 1st code example is how it was done many many years ago.
I'm using HOP::Lexer to scan BlitzMax module source code to fetch some data from it. One particular piece of data I'm currently interested in is a module description.
Currently I'm searching for a description in the format of ModuleInfo "Description: foobar" or ModuleInfo "Desc: foobar". This works fine. But sadly, most modules I scan have their description defined elsewhere, inside a comment block. Which is actually the common way to do it in BlitzMax, as the documentation generator expects it.
This is how all modules have their description defined in the main source file.
Rem
bbdoc: my module description
End Rem
Module namespace.modulename
This also isn't really a problem. But the line after the End Rem also contains data I want (the module name). This is a problem, since now 2 definitions of tokens overlap each other and after the first one has been detected it will continue from where it left off (position of content that's being scanned). Meaning that the token for the module name won't detect anything.
Yes, I've made sure my order of tokens is correct. It just doesn't seem possible (somewhat understandable) to move the cursor back a line.
A small piece of code for fetching the description from within a Rem-End Rem block which is above a module definition (not worked out, but working for the current test case):
[ 'MODULEDESCRIPTION',
qr/[ \t]*\bRem\n(?:\n|.)*?\s*\bEnd[ \t]*Rem\nModule[\s\t]+/i,
sub {
my ($label, $value) = #_;
$value =~ /bbdoc: (.+)/;
[$label, $1];
}
],
So in my test case I first scan for a single comment, then the block above (MODULEDESCRIPTION), then a block comment (Rem-End Rem), module name, etc.
Currently the only solution I can think of is setup a second lexer only for the module description, though I wouldn't prefer that. Is what I want even possible at all with HOP::Lexer?
Source of my Lexer can be found at https://github.com/maximos/maximus-web/blob/develop/lib/Maximus/Class/Lexer.pm
I've solved it by adding (a slightly modified version of) the MODULEDESCRIPTION. Inside the subroutine I simply filter out the module name and return an arrayref with 4 elements, which I later on iterate over to create a nice usable array with tokens and their values.
Solution is again at https://github.com/maximos/maximus-web/blob/develop/lib/Maximus/Class/Lexer.pm
Edit: Or let me just paste the piece of code here
[ 'MODULEDESCRIPTION',
qr/[ \t]*\bRem\R(?:\R|.)*?\bEnd[ \t]*Rem\R\bModule[\s\t]\w+\.\w+/i,
sub {
my ($label, $value) = #_;
my ($desc) = ($value =~ /\bbbdoc: (.+)/i);
my ($name) = ($value =~ /\bModule (\w+\.\w+)/i);
[$label, $desc, 'MODULENAME', $name];
}
],
I've started a little pet project to parse log files for Team Fortress 2. The log files have an event on each line, such as the following:
L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")
Notice there are some common parts of the syntax for log files. Names, for example consist of four parts: the name, an ID, a Steam ID, and the team of the player at the time. Rather than rewriting this type of regular expression, I was hoping to abstract this out slightly.
For example:
my $name = qr/(.*)<(\d+)><(.*)><(Red|Blue)>/
my $kill = qr/"$name" killed "$name"/;
This works nicely, but the regular expression now returns results that depend on the format of $name (breaking the abstraction I'm trying to achieve). The example above would match as:
my ($name_1, $id_1, $steam_1, $team_1, $name_2, $id_2, $steam_2, $team_2)
But I'm really looking for something like:
my ($player1, $player2)
Where $player1 and $player2 would be tuples of the previous data. I figure the "killed" event doesn't need to know exactly about the player, as long as it has information to create the player, which is what these tuples provide.
Sorry if this is a bit of a ramble, but hopefully you can provide some advice!
I think I understand what you are asking. What you need to do is reverse your logic. First you need to regex to split the string into two parts, then you extract your tuples. Then your regex doesn't need to know about the name, and you just have two generic player parsing regexs. Here is an short example:
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $log = 'L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><
Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")';
my ($player1_string, $player2_string) = $log =~ m/(".*") killed (".*?")/;
my #player1 = $player1_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
my #player2 = $player2_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
print STDERR Dumper(\#player1, \#player2);
Hope this what you were looking for.
Another way to do it, but the same strategy as dwp's answer:
my #players =
map { [ /(.*)<(\d+)><(.*)><(Red|Blue)>/ ] }
$log_text =~ /"([^\"]+)" killed "([^\"]+)"/
;
Your log data contains several items of balanced text (quoted and parenthesized), so you might consider Text::Balanced for parts of this job, or perhaps a parsing approach rather than a direct attack with regex. The latter might be fragile if the player names can contain arbitrary input, for example.
Consider writing a Regexp::Log subclass.