Perl: variable scope issue with CGI & DBI modules - perl

I've run into what appears to be a variable scope issue I haven't encountered before. I'm using Perl's CGI module and a call to DBI's do() method. Here's the code structure, simplified a bit:
use DBI;
use CGI qw(:cgi-lib);
&ReadParse;
my $dbh = DBI->connect(...............);
my $test = $in{test};
$dbh->do(qq{INSERT INTO events VALUES (?,?,?)},undef,$in{test},"$in{test}",$test);
The #1 placeholder variable evaluates as if it is uninitialized. The other two placeholder variables work.
The question: Why is the %in hash not available within the context of do(), unless I wrap it in double quotes (#2 placeholder) or reassign the value to a new variable (#3 placeholder)?
I think it's something to do with how the CGI module's ReadParse() function assigns scope to the %in hash, but I don't know Perl scoping well enough to understand why %in is available at the top level but not from within my do() statement.
If someone does understand the scoping issue, is there a better way to handle it? Wrapping all the %in references in double quotes seems a little messy. Creating new variables for each query parameter isn't realistic.
Just to be clear, my question is about the variable scoping issue. I realize that ReadParse() isn't the recommended method to grab query params with CGI.
I'm using Perl 5.8.8, CGI 3.20, and DBI 1.52. Thank you in advance to anyone reading this.
#Pi & #Bob, thanks for the suggestions. Pre-declaring the scope for %in has no effect (and I always use strict). The result is the same as before: in the db, col1 is null while cols 2 & 3 are set to the expected value.
For reference, here's the ReadParse function (see below). It's a standard function that's part of CGI.pm. The way I understand it, I'm not meant to initialize the %in hash (other than satisfying strict) for purposes of setting scope, since the function appears to me to handle that:
sub ReadParse {
local(*in);
if (#_) {
*in = $_[0];
} else {
my $pkg = caller();
*in=*{"${pkg}::in"};
}
tie(%in,CGI);
return scalar(keys %in);
}
I guess my question is what is the best way to get the %in hash within the context of do()? Thanks again! I hope this is the right way to provide additional info to my original question.
#Dan: I hear ya regarding the &ReadParse syntax. I'd normally use CGI::ReadParse() but in this case I thought it was best to stick to how the CGI.pm documentation has it exactly.

It doesn't actually look like you're using it as described in the docs:
https://metacpan.org/pod/CGI#COMPATIBILITY-WITH-CGI-LIB.PL
If you must use it, then CGI::ReadParse(); seems more sensible and less crufty syntax. Although I can't see it making much difference in this situation, but then it is a tied variable, so who the hell knows what it's doing ;)
Is there a particular reason you can't use the more-common $cgi->param('foo') syntax? It's a little bit cleaner, and filths up your namespace in a considerably more predictable manner..

use strict;. Always.
Try declaring
our %in;
and seeing if that helps. Failing that, strict may produce a more useful error.

I don't know what's wrong, but I can tell you some things that aren't:
It's not a scoping issue. If it were then none of the instances of $in{test} would work.
It's not the archaic & calling syntax. (It's not "right" but it's harmless in this case.)
ReadParse is a nasty bit of code. It munges the symbol table to create the global variable %in in the calling package. What's worse is that it's a tied variable, so accessing it could (theoretically) do anything. Looking at the source code for CGI.pm, the FETCH method just invokes the params() method to get the data. I have no idea why the fetch in the $dbh->do() isn't working.

Firstly, that is not in the context/scope of do. It is still in the context of main or global. You dont leave context until you enter {} in some way relating to subroutines or different 'classes' in perl. Within () parens you are not leaving scope.
The sample you gave us is of an uninitialized hash and as Pi has suggested, using strict will certainly keep those from occuring.
Can you give us a more representative example of your code? Where are you setting %IN and how?

Something's very broken there. Perl's scoping is relatively simple, and you're unlikely to stumble upon anything odd like that unless you're doing something daft. As has been suggested, switch on the strict pragma (and warnings, too. In fact you should be using both anyway).
It's pretty hard to tell what's going on without being able to see how %in is defined (is it something to do with that nasty-looking ReadParse call? why are you calling it with the leading &, btw? that syntax has been considered dead and gone for a long time). I suggest posting a bit more code, so we can see what's going on..

What version of DBI are you using? From looking at the DBI changelog it appears that versions prior to 1.00 didn't support the attribute argument. I suspect that the "uninitialized" $in{test} is actually the undef that you're passing to $dbh->do().

From the example you gave, this is not a scoping issue, or none of the parameters would work.
Looks like DBI (or a DBD, not sure where bind parameters are used) isn't honoring tie magic.
The workaround would be to stringize or copy what you pass to it, like your second and third parameters do.
A simple test using SQLite and DBI 1.53 shows it working ok:
$ perl -MDBI -we'sub TIEHASH { bless {} } sub FETCH { "42" } tie %x, "main" or die; my $dbh = DBI->connect("dbi:SQLite:dbname=dbfile","",""); $dbh->do("create table foo (bar char(80))"); $dbh->do("insert into foo values (?)", undef, $x{foo}); print "got: " . $dbh->selectrow_array("select bar from foo") . "\n"; $dbh->do("drop table foo")'
got: 42
Care to share what database you are using?

Per the DBI documentation: Binding a tied variable doesn't work, currently.
DBI is pretty complicated under the hood, and unfortunately goes through some gyrations to be efficient that are causing your problem. I agree with everyone else who says to get rid of the ugly old cgi-lib style code. It's unpleasant enough to do CGI without a nice framework (go Catalyst), let alone something that's been obsolete for a decade.

Okay, try this:
use CGI;
my %in;
CGI::ReadParse(\%in);
That might help as it's actually using a variable that you've declared, and therefore can control the scope of (plus it'll let you use strict without other nastiness that could be muddying the waters)

As this is starting to look like a tie() problem, try the following experiment. Save this as a foo.pl and run it as perl foo.pl "x=1"
use CGI;
CGI::ReadParse();
p($in{x}, "$in{x}");
sub p { my #a = #_; print "#a\n" }
It should print 1 1. If it doesn't, we've found the culprit.

I just tried your test codce from http://www.carcomplaints.com/test/test.pl.txt, and it works right away on my computer, no problems. I get three values as expected. I didn't run it as CGI, but using:
...
use CGI qw/-debug/;
...
I write a variable on the console (test=test) and your scripts inserts without a problem.
If however your leave this out, tt will insert an empty string and two NULLs. This is a because you interpolate a value into a string. This will makes a string with value of $in{test} which is undef at the moment. undef stringifies to an empty string, which is what is inserted into database.

Try this
%in = ReadParse();
but i doubt that. Are you trying to get query parameters or something?

Related

What happens if I reference a package but don't use/require it?

As much as I can (mostly for clarity/documentation), I've been trying to say
use Some::Module;
use Another::Module qw( some namespaces );
in my Perl modules that use other modules.
I've been cleaning up some old code and see some places where I reference modules in my code without ever having used them:
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
So my questions are:
Is there a performance impact (I'm thinking compile time) by not stating use x and simply referencing it in the code?
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
What's the difference between calling functions in example 1's style versus example 2's style?
I would say that this falls firmly into the category of preemptive optimisation and if you're not sure, then leave it in. You would have to be including some vast unused libraries if removing them helped at all
It is typical of Perl to hide a complex issue behind a simple mechanism that will generally do what you mean without too much thought
The simple mechanisms are these
use My::Module 'function' is the same as writing
BEGIN {
require My::Module;
My::Module->import( 'function' );
}
The first time perl successfully executes a require statement, it adds an element to the global %INC hash which has the "pathified" module name (in this case, My/Module.pm) for a key and the absolute location where it found the source as a value
If another require for the same module is encountered (that is, it already exists in the %INC hash) then require does nothing
So your question
What happens if I reference a package but don't use/require it?
We're going to have a problem with use, utilise, include and reference here, so I'm code-quoting only use and require when I mean the Perl language words.
Keeping things simple, these are the three possibilities
As above, if require is seen more than once for the same module source, then it is ignored after the first time. The only overhead is checking to see whether there is a corresponding element in %INC
Clearly, if you use source files that aren't needed then you are doing unnecessary compilation. But Perl is damn fast, and you will be able to shave only fractions of a second from the build time unless you have a program that uses enormous libraries and looks like use Catalyst; print "Hello, world!\n";
We know what happens if you make method calls to a class library that has never been compiled. We get
Can't locate object method "new" via package "My::Class" (perhaps you forgot to load "My::Class"?)
If you're using a function library, then what matters is the part of use that says
My::Module->import( 'function' );
because the first part is require and we already know that require never does anything twice. Calling import is usually a simple function call, and you would be saving nothing significant by avoiding it
What is perhaps less obvious is that big modules that include multiple subsidiaries. For instance, if I write just
use LWP::UserAgent;
then it knows what it is likely to need, and these modules will also be compiled
Carp
Config
Exporter
Exporter::Heavy
Fcntl
HTTP::Date
HTTP::Headers
HTTP::Message
HTTP::Request
HTTP::Response
HTTP::Status
LWP
LWP::MemberMixin
LWP::Protocol
LWP::UserAgent
Storable
Time::Local
URI
URI::Escape
and that's ignoring the pragmas!
Did you ever feel like you were kicking your heels, waiting for an LWP program to compile?
I would say that, in the interests of keeping your Perl code clear and tidy, it may be an idea to remove unnecessary modules from the compilation phase. But don't agonise over it, and benchmark your build times before doing any pre-handover tidy. No one will thank you for reducing the build time by 20ms and then causing them hours of work because you removed a non-obvious requirement.
You actually have a bunch of questions.
Is there a performance impact (thinking compile time) by not stating use x and simply referencing it in the code?
No, there is no performance impact, because you can't do that. Every namespace you are using in a working program gets defined somewhere. Either you used or required it earlier to where it's called, or one of your dependencies did, or another way1 was used to make Perl aware of it
Perl keeps track of those things in symbol tables. They hold all the knowledge about namespaces and variable names. So if your Some::Module is not in the referenced symbol table, Perl will complain.
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
There is no question here. But yes, you should not do that.
It's hard to say if this is a performance impact. If you have a large Catalyst application that just runs and runs for months it doesn't really matter. Startup cost is usually not relevant in that case. But if this is a cronjob that runs every minute and processes a huge pile of data, then an additional module might well be a performance impact.
That's actually also a reason why all use and require statements should be at the top. So it's easy to find them if you need to add or remove some.
What's the difference between calling functions in example 1's style versus example 2's style?
Those are for different purposes mostly.
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
This syntax is very similar to the following:
my $e = Yet::Another::Module::AFunction('Yet::Another::Module', $data)
It's used for class methods in OOP. The most well-known one would be new, as in Foo->new. It passes the thing in front of the -> to the function named AFunction in the package of the thing on the left (either if it's blessed, or if it's an identifier) as the first argument. But it does more. Because it's a method call, it also takes inheritance into account.
package Yet::Another::Module;
use parent 'A::First::Module';
1;
package A::First::Module;
sub AFunction { ... }
In this case, your example would also call AFunction because it's inherited from A::First::Module. In addition to the symbol table referenced above, it uses #ISA to keep track of who inherits from whom. See perlobj for more details.
my $demo = Whats::The:Difference::Here($data); # EXAMPLE 2
This has a syntax error. There is a : missing after The.
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
This is a function call. It calls the function Here in the package Whats::The::Difference and passes $data and nothing else.
Note that as Borodin points out in a comment, your function names are very atypical and confusing. Usually functions in Perl are written with all lowercase and with underscores _ instead of camel case. So AFunction should be a_function, and Here should be here.
1) for example, you can have multiple package definitions in one file, which you should not normally do, or you could assign stuff into a namespace directly with syntax like *Some::Namespace::frobnicate = sub {...}. There are other ways, but that's a bit out of scope for this answer.

Checking for existence of hash key creates key

Given the following code
#!/usr/bin/perl
use Data::Dumper;
my %hash;
my #colos = qw(ac4 ch1 ir2 ird kr3);
foreach my $colo (#colos) {
if(exists $hash{output}{$colo}) {
print "$colo is in the hash\n";
}
}
print Dumper(\%hash);
I have an empty hash that is created. I have an array with a few abbreviations in it. If I cycle through the array to see if these guys are in the hash, nothing is displayed to STDOUT which is expected but the $hash{output} is created for some reason. This does not make sense. All I am doing is an if exists. Where did I go wrong?
exists looks for a hash element in a given hash. Your code is autogenerating the hash
%{ $hash{output} } and checking if a hash element with key $colo is existing in that hash.
Try the following:
if(exists $hash{output}{$colo}) {
changed to
if(exists $hash{output} and exists $hash{output}{$colo}) {
You can, of course, write a sub that is hiding that complexity from your code.
Perl creates it because exists tests the last key specified, it doesn't test recursively. It should not get created if you instead do:
if( exists $hash{output} && exists $hash{output}{$colo} ) {
However why do you need the additional key at all? Why not just $hash{$colo}? Also if you If you use strict you'd get a warning about an uninitialized value in $hash.
You've already got a couple good answers, but, if you want to read more about this behavior, it's normally called "autovivification" and there is a CPAN module available to disable it if you would prefer that it doesn't happen.
The actual code/hash is more complexed. The hash is: $rotation_hash{output}{oor}{$colo}{$type}{$hostname}{file}{$filename} = <html_status_code>
As others have stated, when you ask about the existence of $foo{bar}{fubar}, Perl automatically creates $foo{bar} in order to test whether $foo{bar}{fubar} exists. If you want to prevent this, you have to test to see if $foo{bar} exists, and if it does, then test if $foo{bar}{fubar} exists.
However, what caught my eye was your seven layer hash. When your data structures start to get that complex, you should really be using Perl Object Oriented coding. I know a lot of people are scared off by Perl objected oriented programming, but Perl is probably one of the easiest languages for people in picking up OOP.
If for nothing else, you use OOP for the same reason you use use strict;. When I use strict;, Perl will easily pickup where I used $foobar as a variable in one place, but then refer to it as $fubar in another place. You lose that protection with complex data structures. For example, you might put $rotation_hash{output}{oor} in one place, but $rotation_hash{oor}{output} in another place, and use strict won't catch that. But, if you declare objects via package, and use subroutines as methods and constructors, you gain that back.
Object oriented design will also help you eliminate the need to track your data structure. The objects handle these for you, and you can concentrate on your coding. And, you don't have to create multiple files. You can simply attach the object definitions on the bottom of your file.
There are some excellent tutorials included in the Perl documentation. If you're not familiar with OOP Perl, you should go through the tutorials and give it a try.

How do Perl method attributes work?

A little known built-in Perl feature is attributes. However, the official documentation is doing a rather bad job introducing newbies to the concept. At the same time, frameworks like Catalyst use attributes extensively which seems to make many things easier there. Since using something without knowing the implications sucks a bit, I'd like to know the details. Syntax-wise they look like Python's decorators, but the documentation implies something simpler.
Could you explain (with real-world examples if possible) what attributes are good for and what happens behind the doors?
You are right, the documentation is not very clear in this area, especially since attributes are not so complicated. If you define a subroutine attribute, like this:
sub some_method :Foo { }
Perl will while compiling your program (this is important) look for the magic sub MODIFY_CODE_ATTRIBUTES in the current package or any of its parent classes. This will be called with the name of the current package, a reference to your subroutine, and a list of the attributes defined for this subroutine. If this handler does not exist, compilation will fail.
What you do in this handler is entirely up to you. Yes, that's right. No hidden magic whatsoever. If you want to signal an error, returning the name of the offending attributes will cause the compilation to fail with an "invalid attribute" message.
There is another handler called FETCH_CODE_ATTRIBUTES that will be called whenever someone says
use attributes;
my #attrs = attributes::get(\&some_method);
This handler gets passed the package name and subroutine reference, and is supposed to return a list of the subroutine's attributes (though what you really do is again up to you).
Here is an example to enable simple "tagging" of methods with arbitrary attributes, which you can query later:
package MyClass;
use Scalar::Util qw( refaddr );
my %attrs; # package variable to store attribute lists by coderef address
sub MODIFY_CODE_ATTRIBUTES {
my ($package, $subref, #attrs) = #_;
$attrs{ refaddr $subref } = \#attrs;
return;
}
sub FETCH_CODE_ATTRIBUTES {
my ($package, $subref) = #_;
my $attrs = $attrs{ refaddr $subref };
return #$attrs;
}
1;
Now, in MyClass and all its subclasses, you can use arbitrary attributes, and query them using attributes::get():
package SomeClass;
use base 'MyClass';
use attributes;
# set attributes
sub hello :Foo :Bar { }
# query attributes
print "hello() in SomeClass has attributes: ",
join ', ', attributes::get(SomeClass->can('hello'));
1;
__END__
hello() in SomeClass has attributes: Foo, Bar
In summary, attributes don't do very much which on the other hand makes them very flexible: You can use them as real "attributes" (as shown in this example), implement something like decorators (see Mike Friedman's article), or for your own devious purposes.
Attributes are one of the things that if you don't know how to use them, you shouldn't bother with them. I once made a database_method attribute, to indicate to the system that a record set would be requested before entering this method and that the method knew it's main inputs would come from the stored procedure it corresponded to.
I was using attributes to wrap the actual, specified actions with that data. So one of the really seemingly useful ideas is to wrap methods with indirection, but it was harder to make caller work, without overriding it. In the end it was much too visible as an "expert-only" feature and would have required support to trace through the arcane innards--something you want to avoid, if you write Perl in a perl-also shop.
I take from the article cited by the other answer:
Caveats
Although this is a powerful technique, it isn't perfect. The code will not properly wrap anonymous subroutines, and it won't necessarily propagate calling context to the wrapped functions. Further, using this technique will significantly increase the number of subroutine dispatches that your program must execute during runtime. Depending on your program's complexity, this may significantly increase the size of your call stack. If blinding speed is a major design goal, this strategy may not be for you.
These are significant drawbacks unless you're willing to override caller. I don't care about "blinding speed" quite as much, and I'm half-willing to try my hand at overriding caller to bypass any subroutine that registers itself as "DO_NOT_REPORT" -- but I have some coding foolhardiness that hasn't yet been beaten out of me, too.
Even the article admits how ill-documented this feature is, and contains this caveat. Tell me when else it has been a good idea to use a snazzy, obscure feature? That often enough, people end up putting in the UNIVERSAL namespace to avoid the inheritance issue.

How can I mock %ENV in Perl tests?

I'm trying to retrofit some tests using Test::More to legacy code and I've bumped into a bit of a snag. I don't seem to be able to set %ENV in the test module. The called function definitely uses this variable so %ENV doesn't seem to be carried across to the test object.
#!/usr/bin/perl
use strict; use warnings;
use Test::More qw(no_plan);
BEGIN {
$ENV{HTTP_WWW_AUTHENTICATE} =
'WWW-Authenticate: MyType realm="MyRealm",userid="123",password="abc"';
use_ok('Util');
}
$ENV{HTTP_WWW_AUTHENTICATE} =
'WWW-Authenticate: MyType realm="MyRealm",userid="123",password="abc"';
printf qq{get_authentication_info = "%s"\n}, get_authentication_info();
ok(get_authentication_info(), 'Get authentication info');
I keep getting...
perl t\Util.t
ok 1 - use Util;
Use of uninitialized value in concatenation (.) or string at t\Util.t line 14.
get_authentication_info = ""
As with all things Perl, I'm pretty sure that some one has done this before.
UPDATE: Thanks to all for your help
The problem was between the keyboard & chair ... My test data was just plain wrong
It needed to be
$ENV{HTTP_WWW_AUTHENTICATE} =
'MyType realm="MyRealm",userid="123",password="abc"';
As Sinan said, the $ENV{...} lines are commented out, so it can't work. But if you want really testable code, I'd suggest to make the get_authentication_info function take a hash as an argument. That way you can test it without setting the global variable, and in the real code you can pass the real enviromnent hash. Global state will always become a problem eventually.
What does
get_authentication_info()
return?
My guess is nothing.
If this is always true, then line 14 will always return the "Use of uninitialized value..." warning.
If you expect a value, you need to investigate why get_authentication_info() is failing?
Agreed with Lukáš -- get your global environment (and perform validity checks etc) all in one place, such as in its own method, and then pass those values to all other methods that need it. That way in your unit tests you can just drop in a replacement method that determines the environment and config variables in a different way (such as from a file, or directly set at the top of your test script).
Why are the lines setting $ENV{HTTP_WWW_AUTHENTICATE} commented out?
Also, what are the specs for get_authentication_info()?
Try setting the env variable before BEGIN.
If not try this:
First, go to a command prompt and set the env var there. Then run your script. If the tests pass. Then as you predicted, the problem is with setting the env var.
If the tests fail, then the problem lies some where else (probably in get_authentication_info).

Is there a cron-like service written in Perl?

I am mulling over a web-app in Perl that will allow users to create bug monitors. So essentially each "bug watch" will be a bug ID that will be passed to a sub routine along with the "sleep" time, and once the "sleep time" is over it must recur without blocking the parent process or the peer processes.
I tried Schedule::Cron. It supports cron-like format but here the arguments to the subs must be simple scalars hence I ruled it out.
POE/Coro seem to be another options but I don't have much idea about it/ :(
Any insights ? TIA
-Matt.
What's wrong with Schedule::Cron? You can make any subroutine reference that you like, so you can make closures that refer to the extra or specific data you need. You don't have to rely on the argument list. Was there something else about the module that didn't work for you?
I tried Schedule::Cron. It supports cron-like format but here the arguments to the subs must be simple scalars hence I ruled it out.
The Schedule::Cron documentation says that arguments is a Reference to array containing arguments to be used when calling the subroutine. Pass a reference to a named array of your arguments, if you wish. Since the cron entry holds a reference to #data you can add or remove #data elements in your code as needed.
$cron->add_entry(
'* * * * *',
subroutine => \&mysub,
arguments => \#data,
);
You can also use a closure, as Brian suggested:
my $var = 42;
my #arr = get_stuff();
$cron->add_entry(
'* * * * *',
sub { mysub($var, #arr) },
);
See the perlref man page for more information on closures.
If u do decide to look into Coro then it might be worth having a look at Continuity as this is a web library/framework built around Coro.
Also take a look at Squatting web microframework which "squats" on top of Continuity by default. The Squatting distro comes with some examples of using Coro::Event.
#(brian d foy): The reasons why i think Schedule::Cron is good for me
1: $cron->add_entry doesn't seem to provide me an option to pass #arrays/$vars to the subs.
$cron->add_entry("$temp",{'subroutine' => \&test1,'arguments' => \#array}); is not allowed.
2: I am not sure if there is a way to add new cron entry after cron->run(detach=>1); has been fired without restarting the script..