How Can I Store a File Handle in a Perl Object and how can I access the result? - perl

I wanted to store a file handle in a Perl Object. Here is how I went about it.
sub openFiles {
my $self = shift;
open (my $itemsFile, "<", "items.txt") or die $!;
open (my $nameFile, "<", "FullNames.txt") or die $!;
$self->{itemsFile} = $itemsFile;
$self->{nameFile} = $nameFile;
return $self;
}
Then I'm looking to access some information from one of these files. Here is how I go about it.
sub getItemDescription {
my $self = #_;
chomp(my $record = $self->{itemsFile});
return $record;
}
I attempt to access it in another procedure as follows:
print "Test 3: $self->getItemDescription()\n";
My questions are as follows:
Is the way I'm saving the file handle in the object correct? If not, how is it wrong?
Is the way I'm reading the lines of the file correct? If not, how can I get it right?
Finally, is the way I'm printing the returned object correct?
This is really important to me. If there is any way that I can improve the structure of my code, i.e. making a global variable for file handling or changing the structure of the object, please let me know.

Is the way I'm saving the file handle in the object correct?
Yes.
Is the way I'm reading the lines of the file correct?
No. That just assigns the file handle. One reads a line from the file using the readline operator.
One would normally use the <...> syntax of the readline operator, but <...> is a shortcut for both readline(...) and glob(qq<...>), and Perl thinks <$self->{itemsFile}> is short for glob(qq<$self->{itemsFile}>). You have to use readline specifically
my $record = readline($self->{itemsFile});
chomp($record) if defined($record);
or do some extra work
my $fh = $self->{itemsFile};
my $record = <$fh>;
chomp($record) if defined($record);
(Note that I don't call chomp unconditionally since readline/<> can return undef.)
Finally, is the way I'm printing the returned object correct?
I presume you mean returned string, as in the string returned by getItemDescription. The catch is, you never actually call the method. ->getItemDescription() has no meaning in double quoted string literals, even after a variable. You need to move $self->getItemDescription() out of the double quotes.
You also fail to check if you've reached the end of the file.

You are close.
To read a record (line) from a filehandle, you use the builtin readline function or the <...> operator AFTER you assign the filehandle to a "simple scalar" (see edit below).
chomp(my $record = readline( $self->{itemsFile} );
my $fh = $self->{itemsFile};
chomp(my $record = <$fh>);
There is also a bug in your getItemDescription method. You'll want to say
my ($self) = #_;
instead of
my $self = #_;
The latter call is a scalar assignment of an array, which resolves to the length of the array, not the first element of the array.
EDIT: <$self->{itemsFile}> and <{$self->{itemsFile}}> do not work, as perlop explains:
If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means <$x> is always a readline() from an indirect handle, but <$hash{key}> is always a glob(). That's because $x is a simple scalar variable, but $hash{key} is not--it's a hash element. Even <$x > (note the extra space) is treated as glob("$x "), not readline($x).

The openFiles piece is correct.
The errors occur primarily getItemDescription method.
First as previously mentioned my $self = #_; should be my ($self) = #_;.
However, the crux of the question is solved in the following fashion:
Change chomp(my $record = $self->{itemsFile}); to two lines:
$file1 = $self->{itemsFile};
chomp(my $record = $file1);
To clarify you must (in my experience and I tried all the solutions suggested) use a scalar value.
Finally, see the last two paragraphs in ikagami's answer.

Related

using file handle returned by select

I am pulling out my hair on using the file handle returned by select.
The documentation about select reads:
select
Returns the currently selected filehandle.
I have a piece of code, that prints some data and usually is executed without any re-direction. But there is one use case, where select is used to re-direct the print output to a file.
In this piece of code, I need to use the current selected file handle. I tried the following code fragment:
my $fh = select;
print $fh "test\n";
I wrote a short test program to demonstrate my problem:
#!/usr/bin/perl
use strict;
use warnings;
sub test
{
my $fh=select;
print $fh "#_\n";
}
my $oldfh;
# this works :-)
open my $test1, "> test1.txt";
$oldfh = select $test1;
test("test1");
close select $oldfh if defined $oldfh;
#this doesn't work. :-(
# Can't use string ("main::TEST2") as a symbol ref while "strict refs" in use
open TEST2,">test2.txt";
$oldfh = select TEST2;
test("test2");
close select $oldfh if defined $oldfh;
#this doesn't work, too. :-(
# gives Can't use string ("main::STDOUT") as a symbol ref while "strict refs" in use at
test("test");
It seems, that select is not returning a reference to the file handle but a string containing the name of the file handle.
What do I have to do to always get a usable file handle from select's return value?
P.S. I need to pass this file handle as OutputFile to XML::Simple's XMLout().
Just use
print XMLout(...);
It seems, that select is not returning a reference to the file handle but a string containing the name of the file handle.
It can indeed return a plain ordinary string.
>perl -MDevel::Peek -E"Dump(select())"
SV = PV(0x6cbe38) at 0x260e850
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x261ce48 "main::STDOUT"\0
CUR = 12
LEN = 24
But that's perfectly acceptable as a file handle to Perl. There are four things that Perl accepts as file handles:
A reference to an IO object.
>perl -e"my $fh = *STDOUT{IO}; CORE::say($fh 'foo');"
foo
A glob that contains a reference to an IO object.
>perl -e"my $fh = *STDOUT; CORE::say($fh 'foo');"
foo
A reference to a glob that contains a reference to an IO object.
>perl -e"my $fh = \*STDOUT; CORE::say($fh 'foo');"
foo
A "symbolic reference" to a glob that contains a reference to an IO object.
>perl -e"my $fh = 'STDOUT'; CORE::say($fh 'foo');"
foo
This type doesn't work under strict refs, though.
>perl -Mstrict -e"my $fh = 'STDOUT'; CORE::say($fh 'foo');"
Can't use string ("STDOUT") as a symbol ref while "strict refs" in use at -e line 1.
What do I have to do to always get a usable file handle from select's return value?
As demonstrated above, it already returns a perfectly usable file handle. If XMLout doesn't support it, then it's a bug in XMLout. You could work around it as follows:
my $fh = select();
if (!ref($fh) && ref(\$fh) ne 'GLOB') {
no strict qw( refs );
$fh = \*$fh;
}
This can also be used to make the handle usable in a strict environment
As bad as XML::Simple is at reading XML, it's a million times worse at generating it. See Why is XML::Simple Discouraged?.
Consider XML::LibXML or XML::Twig if you're modifying XML.
Consider XML::Writer if you're generating XML.
The point of select is you don't need to specify the handle at all, since it's the default one.
sub test {
print "#_\n";
}
That's also the reason why select isn't recommended: it introduces global state which is hard to track and debug.
First of all, you shouldn't use XML::Simple , because it will need lots of work to make sure that your output will generate consistent XML. At least make sure you're using the appropriate ForceArray parameters.
Instead of doing filehandle shenanigans, why don't you use the simpler
print XMLout($data, %options);
... instead of trying to pass a default filehandle around?

Data::Dumper wraps second word's output

I'm experiencing a rather odd problem while using Data::Dumper to try and check on my importing of a large list of data into a hash.
My Data looks like this in another file.
##Product ID => Market for product
ABC => Euro
XYZ => USA
PQR => India
Then in my script, I'm trying to read in my list of data into a hash like so:
open(CONFIG_DAT_H, "<", $config_data);
while(my $line = <CONFIG_DAT_H>) {
if($line !~ /^\#/) {
chomp($line);
my #words = split(/\s*\=\>\s/, $line);
%product_names->{$words[0]} = $words[1];
}
}
close(CONFIG_DAT_H);
print Dumper (%product_names);
My parsing is working for the most part that I can find all of my data in the hash, but when I print it using the Data::Dumper it doesn't print it properly. This is my output.
$VAR1 = 'ABC';
';AR2 = 'Euro
$VAR3 = 'XYZ';
';AR4 = 'USA
$VAR5 = 'PQR';
';AR6 = 'India
Does anybody know why the Dumper is printing the '; characters over the first two letters on my second column of data?
There is one unclear thing in the code: is *product_names a hash or a hashref?
If it is a hash, you should use %product_names{key} syntax, not %product_names->{key}, and need to pass a reference to Data::Dumper, so Dumper(\%product_names).
If it is a hashref then it should be labelled with a correct sigil, so $product_names->{key} and Dumper($product_names}.
As noted by mob if your input has anything other than \n it need be cleaned up more explicitly, say with s/\s*$// per comment. See the answer by ikegami.
I'd also like to add, the loop can be simplified by loosing the if branch
open my $config_dat_h, "<", $config_data or die "Can't open $config_data: $!";
while (my $line = <$config_dat_h>)
{
next if $line =~ /^\#/; # or /^\s*\#/ to account for possible spaces
# ...
}
I have changed to the lexical filehandle, the recommended practice with many advantages. I have also added a check for open, which should always be in place.
Humm... this appears wrong to me, even you're using Perl6:
%product_names->{$words[0]} = $words[1];
I don't know Perl6 very well, but in Perl5 the reference should be like bellow considering that %product_names exists and is declared:
$product_names{...} = ... ;
If you could expose the full code, I can help to solve this problem.
The file uses CR LF as line endings. This would become evident by adding the following to your code:
local $Data::Dumper::Useqq = 1;
You could convert the file to use unix line endings (seeing as you are on a unix system). This can be achieved using the dos2unix utility.
dos2unix config.dat
Alternatively, replace
chomp($line);
with the more flexible
$line =~ s/\s+\z//;
Note: %product_names->{$words[0]} makes no sense. It happens to do what you want in old versions of Perl, but it rightfully throws an error in newer versions. $product_names{$words[0]} is the proper syntax for accessing the value of an element of a hash.
Tip: You should be using print Dumper(\%product_names); instead of print Dumper(%product_names);.
Tip: You might also find local $Data::Dumper::Sortkeys = 1; useful. Data::Dumper has such bad defaults :(
Tip: Using split(/\s*=>\s*/, $line, 2) instead of split(/\s*=>\s*/, $line) would permit the value to contain =>.
Tip: You shouldn't use global variable without reason. Use open(my $CONFIG_DAT_H, ...) instead of open(CONFIG_DAT_H, ...), and replace other instances of CONFIG_DAT_H with $CONFIG_DAT_H.
Tip: Using next if $line =~ /^#/; would avoid a lot of indenting.

Perl while loops not working

I'm quite new to perl and apologies if this has already been answered in a previous discussion. I have a script that needs to use the declared variables outside the loops, but only one loop is working, even though I have declared the variables outside of the loop, the code is:
my $sample;
open(IN, 'ls /*_R1_*.gz |');
while (my $sample = <IN>) {
chomp $sample;
print "sample = $sample\n";
my $fastq1="${sample}"; #need to use fastq1 later on hence it's declared here
my $sample2;
open(IN, 'ls /*_R2_*.gz |');
while (my $sample2 = <IN>) {
chomp $sample2;
print "sample2 = $sample2\n";
my $fastq2="${sample2}"; #need to use fastq2 later on hence it's declared here
}
}
Sample2 works but sample1 does not, only the first sample is output and then the loop goes onto sample2, the output is:
sample =/sample1_R1_001.fastq.gz
sample2 =/sample1_R2_001.fastq.gz
sample2 =/sample2_R2_001.fastq.gz
sample2 =/sample3_R2_001.fastq.gz
etc..
Can anyone figure this out?
Thanks
From your comments, I assume that your problem is probably that you declare $fastq1 and $fastq2 inside the loop. That means they will be out of scope outside the loops, and not accessible. You need something like:
my ($fastq1, $fastq2);
while ( ... ) {
....
$fastq1 = $sample;
}
Note that this will only save the last value in the loop of that variable. The others will of course be overwritten each loop iteration. If you have more values to save, use an array or hash.
Some other notes on your code.
You should always use
use strict;
use warnings;
Not doing so is a very bad idea, as it will only hide the errors and warnings, not solve them.
my $sample;
You declare this variable twice.
open(IN, 'ls /*_R1_*.gz |');
This is just bad on all possible levels:
System calls are always the least desirable option, unless no alternatives exist
Perl has many ways of reading file names
Parsing the output of ls is fragile and not portable
Piping the result of the system command through open is compounding the other flaws with this approach.
Recommended solution: Use either opendir + readdir or glob:
for my $files (</*_R1_*.gz>) { ... }
# or
opendir my $dh, "/" or die $!;
while (my file = readdir $dh) {
next unless $file =~ /_R1_.*\.gz$/;
...
}
my $fastq1 = "${sample}";
You do not need to quote a variable. Nor use support curly braces.
When declaring the variable with my inside a loop, it only retains its value that single loop iteration. Since you never use this variable, I assume you meant to use it outside the loop. But it will be out of scope there.
This can be written
my $fastq1 = $sample;
But you probably want to declare those variables outside your while loops, or they will be out of scope there. You should know that this will only save the last value for these variables, of course.
Also, as Rohit says, your loops are nested, which I assume is not what you wanted. This is most likely because you do not use a proper text editor to write your code, so your indentation is all messed up, and it is hard to see where one loop ends. Follow Rohit's advice there.
You are closing the first while loop after the end of 2nd while loop. Because of that, your 2nd while loop become a part of your 1st while loop, wherein, you are re-assigning the file handler - IN to a different file. And since you are exhausting it in the inner while loop, your outer while loop never run again.
You should close the brace before starting the next while:
while(my $sample = <IN>){
chomp $sample;
print "sample = $sample\n";
my $fastq1="${sample}";
} # You need this
my $sample2;
open(IN, 'ls /data_n2/vmistry/Fluidigm_Exome/300bp_fastq/*_R2_*.gz |');
while(my $sample2 = <IN>){
chomp $sample2;
print "sample2 = $sample2\n";
my $fastq2="${sample2}";
}
# } # Remove this

what does these perl variables mean?

I'm a little noobish to perl coding conventions, could someone help explain:
why are there / and /< in front of perl variables?
what does\= and =~ mean, and what is the difference?
why does the code require an ending / before the ;, e.g. /start=\'([0-9]+)\'/?
The 1st 3 sub-questions were sort of solved by really the perldocs, but what does the following line means in the code?
push(#{$Start{$start}},$features);
i understand that we are pushing the $features into a #Start array but what does #$Start{$start} mean? Is it the same as:
#Start = ($start);
Within the code there is something like this:
use FileHandle;
sub open_infile {
my $file = shift;
my $in = FileHandle->new($file,"<:encoding(UTF-8)")
or die "ERROR: cannot open $file: $!\n" if ($Opt_utf8);
$in = new FileHandle("$file")
or die "ERROR: cannot open $file: $!\n" if (!$Opt_utf8);
return $in;
}
$uamf = shift #ARGV;
$uamin = open_infile($uamf);
while (<$uamin>) {
chomp;
if(/<segment /){
/start=\'([0-9]+)\'/;
/end=\'([0-9]+)\'/;
/features=\'([^\']+)\'/;
$features =~ s/annotation;//;
push(#{$Start{$start}},$features);
push(#{$End{$end}},$features);
}
}
EDITED
So after some intensive reading of the perl doc, here's somethings i've gotten
The /<segment / is a regex check that checks whether the readline
in while (<$uamin>) contains the following string: <segment.
Similarly the /start=\'([0-9]+)\'/ has nothing to to do with
instantiating any variable, it's a regex check to see whether the
readline in while (<$uamin>) contains start=\'([0-9]+)\' which
\'([0-9]+)\' refers to a numeric string.
In $features =~ s/annotation;// the =~ is use because the string
replacement was testing a regular expression match. See
What does =~ do in Perl?
Where did you see this syntax (or more to the point: have you edited stuff out of what you saw)? /foo/ represents the match operator using regular expressions, not variables. In other words, the first line is checking to see if the input string $_ contains the character sequence <segment.
The subsequent three lines essentially do nothing useful, in the sense that they run regular expression matches and then discard the results (there are side-effects, but subsequent regular expressions discard the side-effects, too).
The last line does a substitution, replacing the first occurance of the characters annotation; with the empty string in the string $features.
Run the command perldoc perlretut to learn about regex in Perl.

Why can't I say print $somehash{$var}{fh} "foo"?

I have a line of code along the lines of:
print $somehash{$var}{fh} "foo";
The hash contains the filehandle a few levels down. The error is:
String found where operator expected at test.pl line 10, near "} "foo""
I can fix it by doing this:
my $fh = $somehash{$var}{fh};
print $fh "foo";
...but is there a one-liner?
see http://perldoc.perl.org/functions/print.html
Note that if you're storing
FILEHANDLEs in an array, or if you're
using any other expression more
complex than a scalar variable to
retrieve it, you will have to use a
block returning the filehandle value
instead: ...
So, in your case, you would use a block like this:
print { $somehash{$var}{fh} } "foo";
If you have anything other than a simple scalar as your filehandle, you need to wrap the reference holding the filehandle in braces so Perl knows how to parse the statement:
print { $somehash{$var}{fh} } $foo;
Part of Perl Best Practices says to always wrap filehandles in braces just for this reason, although I don't get that nutty with it.
The syntax is odd because print is an indirect method on a filehandle object:
method_name Object #arguments;
You might have seen this in old-school CGI.pm. Here are two indirect method calls:
use CGI;
my $cgi_object = new CGI 'cat=Buster&bird=nightengale';
my $value = param $cgi_object 'bird';
print "Indirect value is $value\n";
That almost works fine (see Schwern's answer about the ambiguity) as long as the object is in a simple scalar. However, if I put the $cgi_object in a hash, I get the same syntax error you got with print. I can put the braces around the hash access to make it work out. Continuing with the previous code:
my %hash;
$hash{animals}{cgi} = $cgi_object;
# $value = param $hash{animals}{cgi} 'cat'; # syntax error
$value = param { $hash{animals}{cgi} } 'cat';
print "Braced value is $value\n";
That's all a bit clunky so just use the arrow notation for everything instead:
my $cgi_object = CGI->new( ... );
$cgi_object->param( ... );
$hash{animals}{cgi}->param( ... );
You can do the same with filehandles, although you have to use the IO::Handle module to make it all work out:
use IO::Handle;
STDOUT->print( 'Hello World' );
open my( $fh ), ">", $filename or die ...;
$fh->print( ... );
$hash{animals}{fh} = $fh;
$hash{animals}{fh}->print( ... );
The above answers are all correct. The reason they don't allow a full expression in there is print FH LIST is already pretty weird syntax. To put anything more complicated in there would introduce a ton of ambiguous syntax. The block removed that ambiguity.
To see where this madness leads to, consider the horror that is indirect object syntax.
foo $bar; # Is that foo($bar) or $bar->foo()? Good luck!