Perl DATA filehandle is empty when read - perl

I have a Perl module with a template (for processing by the Template module) stored between the __DATA__ and __END__ keywords at the end of the file. When attempting to generate a file using the template, the resulting file comes out empty with no warnings or errors output. After debugging, I found that the DATA filehandle is actually empty before it is passed to the Template module.
A previous version of this module is able to correctly read the template from DATA, but none of the changes that I have made should be affecting this part of the code. These changes consist of logic changes within completely separate functions and adding the following use statements to the module:
use DBI;
use DBI::Const::GetInfoType;
use Switch;
I have tried adding write permissions on the perl module (it was originally read-only) and removing the __END__ keyword as I found that wasn't necessary. Unfortunately the DATA filehandle still appears empty.
What kind of problems could cause the DATA filehandle to be empty, and do any of these problems apply to my situation? I am using perl v5.12.5.

The reason that the DATA filehandle is empty in this case is down to the use of the Switch module. This module works by using a source filter which is clobbering the DATA filehandle during the course of its processing.
Alternatives include using if-elsif-else or using the given-when construct, although this is an experimental feature so it may not behave the same in later versions of Perl.
EDIT: Here is a simple reproducer for the issue described above:
# use Switch;
while(<DATA>) {
print($_);
}
__DATA__
One line of data
Second line of data
Without the "use Switch", you'll see the lines printed out, but with it nothing is printed.

Related

Setting default encoding to utf-8 in perl conditionally on command-line option

In order to process text in utf-8 in Perl, I have been using binmode(<file-handle>, ":encoding(UTF-8)"); on each stream I use. I just discovered that
use open ( ":encoding(UTF-8)", ":std" );
can be used to do the same thing globally. This is great, since it means a lot less repetitive code.
But now I have a problem: I would like to have a command line option to my script, -utf8, which turns everything utf-8 only when supplied. Since use open is a pragma, it is lexically scoped and I cannot put it in an if statement, but without an if statement it cannot depend on command line options.
Here is a minimal example illustrating the problem, call it problem.pl
#!/usr/bin/env perl
# hard-coded in my minimal example, normally set by command line option -utf8
my $use_utf8 = 1;
# use only applies within its lexical scope - this does not work
if ($use_utf8) {
use open ( ":encoding(UTF-8)", ":std" );
}
# if I put it at the right lexical scope, it's not conditional on $use_utf8
#..e open ( ":encoding(UTF-8)", ":std" );
while (<>) {
print length($_);
}
When I run this code on a file, call in input, containing one line with a 2-byte UTF-8 character, say à, it outputs 3:
$ ./problem.pl input
3
If I move the use open statement to the global scope, I get the expected results of a length of 2 (one character plus one newline):
$ ./problem.pl input
2
So how can I set the encoding to utf-8 globally, but conditionally on a command-line option, so that I would get 2 with -utf8 but 3 without.
Also, in my real use case, I use the spaceship operator (while (<>)) to provide high flexibility in the command line syntax to process multiple files, but in this case I can't call binmode since the file handles are managed automatically by Perl. use open would be a much nicer option, if I could make it conditional.
PS: Yes, I really do still have non-utf8 data that I want to continue to be able to handle. Thank God most of our data is now in utf-8, but unfortunately not all of it yet.
First: you can use if to conditionally apply a lexical pragma. Just make sure the condition is available at compile time (you may need to use a BEGIN block before).
my $use_utf8;
BEGIN { $use_utf8 = 1; }
use if $use_utf8, 'open', ':std', ':encoding(UTF-8)';
The -C option works similarly to the open pragma for utf8 layers. -CSD will set it on the standard handles (S) and any handles opened (D). Unfortunately it uses the less safe :utf8 layer instead of :encoding(UTF-8), so you may end up with broken strings if you use it for input that is not actually UTF-8. Also, -CD sets a default for any handles opened in the whole program, not just the lexical scope of your script, this can possibly break usage of modules that don't expect it. (-CS is always global, as is the ':std' effect of the open pragma, since the standard handles are global.)
perl -CSD problem.pl input

How to override a subroutine such as `length` in Perl?

I would like to simply override the length subroutine to take in account ANSI escape sequences so I wrote this:
sub length {
my $str = shift;
if ($cfg{color}) {
return length($str =~ s/\x1B\[\d+[^m]*m//gr);
}
return length($str);
}
Unfortunately Perl detect the ambiguous call that is remplaced with CORE::length.
How can I just tell Perl to use the local declaration instead?
Of course, an alternative solution would be to rename each call to length with ansi_length and rename the custom function accordingly.
To those who want more details:
The context where I would like to override the core module length is a short code that generate ASCII tables (a bit like Text::ASCIITable, but with different features like multicolumns and multirows). I don't want to write a dedicated Perl module because I would like to keep my program as monolithic as possible because the people what will use it are not familiar with CPAN or even modules installation.
In this code, I need to know the width of each columns in each rows in order to align them properly. When a cell contain a colored text with an ANSI sequence like ^[[33mgreen^[[0m, I need to ignore the coloring sequences.
As I already use UTF-8 chars in my Program, I had to add this to my Program:
use utf8;
use open ':std', ':encoding(UTF-8)';
I noticed the utf8 module also overload the core subroutine length. I realized this will also be a good solution in my case.
Eventually I think I added enough details to this question. I would be glad to be notified why I got downvotes on this question. I don't think I can make this more clear. Also I think all these details are not usefull at all to understand the initial question...
Overwriting a core function is not a good idea. If you use a library, that itself uses the core function, the library function would be confronted with the overwritten function and may fail. You could create an own module/namespace ANSI:: or so, then use ANSI::length, but I think it is better to use a name like you proposed: ansi_length.
If you still insist:
You can overwrite the core function with
BEGIN {
*CORE::GLOBAL::length = sub ...
}
Whenever you need access to the origin CORE function, use
CORE::length.
This is valid for all built in functions of Perl.
Here is a reference : http://perldoc.perl.org/CORE.html

Special character in the file exit the while loop in Perl

I wrote a simple parser for a .txt file with the following instructions:
my $file2 = "test.txt";
open ($process, "<",$file2) or die "couldn't manage to open the file:$file2!";
while (<$process>)
{
...
}
In some files that I am trying to parse there is a special character that is like the right arrow (->) and that I don't manage to paste here from the file.
Every time the parser hits that character (->), it exits the file without processing it till the end.
Is there a way to avoid it and continue processing the file till the very end?
I am using perl 5.6.1 (I cannot use a newer one) and the files that I need to process might have these special characters.
Thanks for your help.
I don't think it's perl that's causing your problem, but almost certainly something in that middle missing block. Are you using eval in the while block on the input from the file? This is a minimal example that shows that the stream containing -> doesn't cause difficulties:
#!/usr/bin/env perl
use warnings;
use strict;
while(<DATA>) {
print "Data[$.]: $_";
}
__DATA__
this is some data
this is also some data
-> this looks fine
foo->dingle also looks fine
This produces:
$ perl ./foo.pl
Data[1]: this is some data
Data[2]: this is also some data
Data[3]: -> this looks fine
Data[4]: foo->dingle also looks fine
So, the -> characters in perl are special:
"-> " is an infix dereference operator, just as it is in C and C++. If
the right side is either a [...] , {...} , or a (...) subscript, then
the left side must be either a hard or symbolic reference to an array,
a hash, or a subroutine respectively. (Or technically speaking, a
location capable of holding a hard reference, if it's an array or hash
reference being used for assignment.) See perlreftut and perlref.
Otherwise, the right side is a method name or a simple scalar variable
containing either the method name or a subroutine reference, and the
left side must be either an object (a blessed reference) or a class
name (that is, a package name). See perlobj.
So if I had to guess, you're definitely using eval to try to parse your content and it's failing to dereference the left side of the operator and crashing. Please provide command line error messages or the code in the while loop if you want further assistance.

Having a perl script make use of one among several secondary scripts

I have a main program mytool.pl to be run from the command line. There are several auxillary scripts special1.pl, special2.pl, etc. which each contain a couple subroutines and a hash, all identically named across scripts. Let's suppose these are named MySpecialFunction(), AnotherSpecialFunction() and %SpecialData.
I'd like for mytool to include/use/import the contents of one of the special*.pl files, only one, according to a command line option. For example, the user will do:
bash> perl mytool.pl --specialcase=5
and mytools will use MySpecialFunction() from special5.pl, and ignore all other special*.pl files.
Is this possible and how to do it?
It's important to note that the selection of which special file to use is made at runtime, so adding a "use" at the top of mytool.pl probably isn't the right thing to do.
Note I am a long-time C programmer, not a perl expert; I may be asking something obvious.
This is for a one-off project that will turn to dust in only a month. Neither mytool.pl nor special?.pl (nor perl itself) will be of interest beyond the end of this short project. Therefore, we don't care for solutions that are elaborate or require learning some deep magic. Quick and dirty preferred. I'm guessing that Perl's module mechanism is overkill for this, but have no idea what the alternatives are.
You can use a hash or array to map values of specialcase to .pl files and require or do them as needed.
#!/usr/bin/env perl
use strict; use warnings;
my #handlers = qw(one.pl two.pl);
my ($case) = #ARGV;
$case = 0 unless defined $case;
# check that $case is within range
do $handlers[$case];
print special_function(), "\n";
When you use a module, Perl just require's the module in a BEGIN block (and imports the modules exported items). Since you want to change what script you load at runtime, call require yourself.
if ($special_case_1) {
require 'special1.pl';
# and go about your business
}
Here's a good reference on when to use use vs. require.

Perl: Encoding messed up after text concatenation

I have encountered a weird situation while updating/upgrading some legacy code.
I have a variable which contains HTML. Before I can output it, it has to be filled with lots of data. In essence, I have the following:
for my $line (#lines) {
$output = loadstuff($line, $output);
}
Inside of loadstuff(), there is the following
sub loadstuff {
my ($line, $output) = #_;
# here the process is simplified for better understanding.
my $stuff = getOtherStuff($line);
my $result = $output.$stuff;
return $result;
}
This function builds a page which consists of different areas. All area is loaded up independently, that's why there is a for-loop.
Trouble starts right about here. When I load the page from ground up (click on a link, Perl executes and delivers HTML), everything is loaded fine. Whenever I load a second page via AJAX for comparison, that HTML has broken encoding.
I tracked down the problem to this line my $result = $output.$stuff. Before the concatenation, $output and $stuff are fine. But afterward, the encoding in $result is messed up.
Does somebody have a clue why concatenation messes up my encoding? While we are on the subject, why does it only happen when the call is done via AJAX?
Edit 1
The Perl and the AJAX call both execute the very same functions for building up a page. So, whenever I fix it for AJAX, it is broken for freshly reloaded pages. It really seems to happen only if AJAX starts the call.
The only difference in this particular case is that the current values for the page are compared with an older one (it is a backup/restore function). From here, everything is the same. The encoding in the variables (as far as I can tell) are ok. I even tried the Encode functions only on the values loaded from AJAX, but to no avail. The files themselves seem to be utf8 according to "Kate".
Besides that, I have a another function with the same behavior which uses the EXACT same functions, values and files. When the call is started from Perl/Apache, the encoding is ok. Via AJAX, again, it is messed up.
I have been examinating the AJAX Request (jQuery) and could not find anything odd. The encoding seems to be utf8 too.
Perl has a “utf8” flag for every scalar value, which may be “on” or “off”. “On” state of the flag tells perl to treat the value as a string of Unicode characters.
If you take a string with utf8 flag off and concatenate it with a string that has utf8 flag on, perl converts the first one to Unicode. This is the usual source of problems.
You need to either convert both variables to bytes with Encode::encode() or to perl's internal format with Encode::decode() before concatenation.
See perldoc Encode.
Expanding on the previous answer, here's a little more information that I found useful when I started messing with character encodings in Perl.
This is an excellent introduction to Unicode in perl: http://perldoc.perl.org/perluniintro.html. The section "Perl's Unicode Model" is particularly relevant to the issue you're seeing.
A good rule to use in Perl is to decode data to Perl characters on it's way in and encode it into bytes on it's way out. You can do this explicitly using Encode::encode and Encode::decode. If you're reading from/writing to a file handle you can specify an encoding on the filehandle by using binmode and setting layer: perldoc -f binmode
You can tell which of the strings in your example has been decoded into Perl characters using Encode::is_utf8:
use Encode qw( is_utf8 );
print is_utf8($stuff) ? 'characters' : 'bytes';
A colleague of mine found the answer to this problem. It really had something to do with the fact that AJAX started the call.
The file structure is as follows:
1 Handler, accessed by Apache
1 Handler, accessed by Apache but who only contains AJAX responders. We call it the AJAX-Handler
1 package, which contains functions relevant for the entire software, who access yet other packages from our own Framework
Inside of the AJAX-Handler, we print the result as such
sub handler {
my $r = shift;
# processing output
$r->print($output);
return Apache2::Const::OK;
}
Now, when I replace $r->print($output); by print($output);, the problem disappears! I know that this is not the recommended way to print stuff in mod_perl, but this seems to work.
Still, any ideas how to do this the proper way are welcome.