How to use variable instead of a file handle - perl

I have a big data file dump.all.lammpstrj which I need to split/categorize into a series of files, such as Z_1_filename, Z_2_filename, Z_3_filename etc. based on the coordinates in each record.
The coordinates are saved in a disordered way, so my program reads each line and determines which file this record should be sent to.
I use a variable, $filehandle = "Z_$i_DUMP"
and I hope to open all of the possible files like this
for ( my $i = 1; $i <= 100; $i++ ) {
$filehandle = "Z_$i_DUMP";
open $filehandle,'>', "Z_$i_dump.all.lammpstrj.dat";
...
}
But when running my program, I get a message
Can't use string ("Z_90_DUMP") as a symbol ref while "strict refs" in use at ...
I don't want to scan all the data for each output file, because dump.all.lammpstrj is so big that a scan would take a long time.
Is there any way to use a defined variable as a file handle?

To give you an idea on how this might be done. Put file handles in a hash (or perhaps array if indexed by numbers).
use strict;
use warnings;
my %fh; #file handles
open $fh{$_}, '>', "Z_${_}_dump.all.lammpstrj.dat" for 1..100; #open 100 files
for(1..10000){ #write 10000 lines in 100 files
my $random=int(1+rand(100)); #pick random file handle
print {$fh{$random}} "something $_\n";
}
close $fh{$_} for 1..100;

Don't assign anything to $filehandle or set it to undef before you call open(). You get this error because you have assigned a string to $filehandle (which is of no use anyway).
Also see "open" in perldoc:
If FILEHANDLE is an undefined scalar variable (or array or hash element), a new filehandle is autovivified, meaning that the variable is assigned a reference to a newly allocated anonymous filehandle. Otherwise if FILEHANDLE is an expression, its value is the real filehandle. (This is considered a symbolic reference, so use strict "refs" should not be in effect.)
To have more file handles at a time and to conveniently map them to the file names consider using a hash with the file name (or whatever identifier suits you) as key to store them in. You can check if the key exists (see "exists") and the value is defined (see "defined") to avoid reopening the file unnecessarily.

I sincerely appreciate Kjetil S. and sticky bit. I tested it and their suggestion work well. And I noticed that there is another way to write data to different files WITHOUT CHANGING filehandler. Actually I changed file names using same file handler.
....
for my $i=0;$i<=$max_number;$i++;
{
$file="$i\_foo.dat";
open DAT,'>>',"$file";
......
}

Related

Call a subroutine defined as a variable

I am working on a program which uses different subroutines in separate files.
There are three parts
A text file with the name of the subroutine
A Perl program with the subroutine
The main program which extracts the name of the subroutine and launches it
The subroutine takes its data from a text file.
I need the user to choose the text file, the program then extracts the name of the subroutine.
The text file contains
cycle.name=cycle01
Here is the main program :
# !/usr/bin/perl -w
use strict;
use warnings;
use cycle01;
my $nb_cycle = 10;
# The user chooses a text file
print STDERR "\nfilename: ";
chomp($filename = <STDIN>);
# Extract the name of the cycle
open (my $fh, "<", "$filename.txt") or die "cannot open $filename";
while ( <$fh> ) {
if ( /cycle\.name/ ) {
(undef, $cycleToUse) = split /\s*=\s*/;
}
}
# I then try to launch the subroutine by passing variables.
# This fails because the subroutine is defined as a variable.
$cycleToUse($filename, $nb_cycle);
And here is the subroutine in another file
# !/usr/bin/perl
package cycle01;
use strict;
use warnings;
sub cycle01 {
# Get the total number of arguments passed
my ($filename, $nb_cycle) = #_;
print "$filename, $nb_cycle";
Your code doesn't compile, because in the final call, you have mistyped the name of $nb_cycle. It's helpful if you post code that actually runs :-)
Traditionally, Perl module names start with a capital letter, so you might want to rename your package to Cycle01.
The quick and dirty way to do this is to use the string version of eval. But evaluating an arbitrary string containing code is dangerous, so I'm not going to show you that. The best way is to use a dispatch table - basically a hash where the keys are valid subroutine names and the values are references to the subroutines themselves. The best place to add this is in the Cycle01.pm file:
our %subs = (
cycle01 => \&cycle01,
);
Then, the end of your program becomes:
if (exists $Cycle01::subs{$cycleToUse}) {
$Cycle01::subs{$cycleToUse}->($filename, $nb_cycle);
} else {
die "$cycleToUse is not a valid subroutine name";
}
(Note that you'll also need to chomp() the lines as you read them in your while loop.)
To build on Dave Cross' answer, I usually avoid the hash table, partly because, in perl, everything is a hash table anyway. Instead, I have all my entry-point subs start with a particular prefix, that prefix depends on what I'm doing, but here we'll just use ep_ for entry-point. And then I do something like this:
my $subname = 'ep_' . $cycleToUse;
if (my $func = Cycle01->can($subname))
{
$func->($filename, $nb_cycle);
}
else
{
die "$cycleToUse is not a valid subroutine name";
}
The can method in UNIVERSAL extracts the CODE reference for me from perl's hash tables, instead of me maintaining my own (and forgetting to update it). The prefix allows me to have other functions and methods in that same namespace that cannot be called by the user code directly, allowing me to still refactor code into common functions, etc.
If you want to have other namespaces as well, I would suggest having them all be in a single parent namespace, and potentially all prefixed the same way, and, ideally, don't allow :: or ' (single quote) in those names, so that you minimise the scope of what the user might call to only that which you're willing to test.
e.g.,
die "Invalid namespace $cycleNameSpaceToUse"
if $cycleNameSpaceToUse =~ /::|'/;
my $ns = 'UserCallable::' . $cycleNameSpaceToUse;
my $subname = 'ep_' . $cycleToUse;
if (my $func = $ns->can($subname))
# ... as before
There are definitely advantages to doing it the other way, such as being explicit about what you want to expose. The advantage here is in not having to maintain a separate list. I'm always horrible at doing that.

Reading/dumping a perl hash from shell

I have a read-only perl file with a huge hash defined in it. Is there anyway for me to read this perl file and dump out the hash contents?
this is basic structure of the hash within the file.
%hash_name = {
-files => [
'<some_path>',
],
-dirs => [
'<some_path>',
'<some_path>',
'<some_path>',
'<some_path>',
'<some_path>',
],
};
Ideally you'd copy the file so that you can edit it, then turn it into a module so to use it nicely.
But if for some reason this isn't feasible here are your options.
If that hash is the only thing in the file, "load" it using do† and assign to a hash
use warnings;
use strict;
my $file = './read_this.pl'; # the file has *only* that one hash
my %hash = do $file;
This form of do executes the file (runs it as a script), returning the last expression that is evaluated. With only the hash in the file that last expression is the hash definition, precisely what you need.
If the hash is undeclared, so a global variable (or declared with our), then declare as our a hash with the same name in your program and again load the file with do
our %hash_name; # same name as in the file
do $file; # file has "%hash" or "our %hash" (not "my %hash")
Here we "pick up" the hash that is evaluated as do runs the file by virtues of our
If the hash is "lexical", declared as my %hash (as it should be!) ... well, this is bad. Then you need to parse the text of the file so to extract lines with the hash. This is in general very hard to do, as it amounts to parsing Perl. (A hash can be built using map, returned from a sub as a reference or a flat list ...) Once that is done you eval the variable which contains the text defining that hash.
However, if you know how the hash is built, as you imply, with no () anywhere inside
use warnings;
use strict;
my $file = './read_this.pl';
my $content = do { # "slurp" the file -- read it into a variable
local $/;
open my $fh, '<', $file or die "Can't open $file: $!";
<$fh>;
};
my ($hash_text) = $content =~ /\%hash_name\s*=\s*(\(.*?\)/s;
my %hash = eval $hash_text;
This simple shot leaves out a lot, assuming squarely that the hash is as shown. Also note that this form of eval carries real and serious security risks.
†
Files are also loaded using require. Apart from it doing a lot more than do, the important thing here is that even if it runs multiple times require still loads that file only once. This matters for modules in the first place, which shouldn't be loaded multiple times, and use indeed uses require.
On the other hand, do does it every time, what makes it suitable for loading files to be used as data, which presumably should be read every time. This is the recommended method. Note that require itself uses do to actually load the file.
Thanks to Schwern for a comment.

How is a Perl filehandle a scalar if it can return multiple lines?

I have kind of fundamental question about scalars in Perl. Everything I read says scalars hold one value:
A scalar may contain one single value in any of three different
flavors: a number, a string, or a reference. Although a scalar may not
directly hold multiple values, it may contain a reference to an array
or hash which in turn contains multiple values.
--from perldoc
Was curious how the code below works
open( $IN, "<", "phonebook.txt" )
or die "Cannot open the file\n";
while ( my $line = <$IN> ) {
chomp($line);
my ( $name, $area, $phone ) = split /\|/, $line;
print "$name $phone $phone\n";
}
close $IN;
Just to clarify the code above is opening a pipe delimited text file in the following format name|areacode|phone
It opens the file up and then it splits them into $name $area $phone; how does it go through the multiple lines of the file and print them out?
Going back to the perldoc quote from above "A scalar may contain a single value of a string, number, reference." I am assuming that it has to be a reference, but doesn't even really seem like a reference and if it is looks like it would a reference of a scalar? so I am wondering what is going on internally that allows Perl to iterate through all of the lines in the code?
Nothing urgent, just something I noticed and was curious about. Thanks.
It looks like Borodin zeroed in on the part you wanted, but I'll add to it.
There are variables, which store things for us, and there are operators, which do things for us. A file handle, the thing you have in $IN, isn't the file itself or the data in the file. It's a connection that the program to use to get information from the file.
When you use the line input operator, <>, you give it a file handle to tell it where to grab the next line from. By itself, it defaults to ARGV, but you can put any file handle in there. In this case, you have <$IN>. Borodin already explained the reference and bareword stuff.
So, when you use the line input operator, it look at the connection you give in then gets a line from that file and returns it. You might be able to grok this more easily with it's function form:
my $line = readline( $IN );
The thing you get back doesn't come out of $IN, but the thing it points to. Along the way, $IN keeps track of where it is in the file. See seek and tell.
Along the same lines are Perl's regexes. Many people call something like /foo.*bar/ a regular expression. They are slightly wrong. There's a regular expression inside the pattern match operator //. The pattern is the instructions, but it doesn't do anything by itself until the operator uses it.
I find in my classes if I emphasize the difference between the noun and verb parts of the syntax, people have a much easier time with this sort of stuff.
Old Answer
Through each iteration of the while loop, exactly one value is put into the scalar variables. When the loop is done with a line, everything is reset.
The value in $line is a single value: the entire line which you have not broken up yet. Perl doesn't care what that single value looks like. With each iteration, you deal with exactly one line and that's what's in $line. Remember, these are variables, which means you can modify and replace their values, so they can only hold one thing at a time, but there can be multiple times.
The scalars $name, $area, and $phone have single values, each produced by split. Those are lexical variables (my), so they are only visible inside the specific loop iteration where they are defined.
Beyond that, I'm not sure which scalar you might be confused about.
The old-fashioned way of opening files is to use a bare name for the file handle, like so
open IN, 'phonebook.txt'
A file handle is a special type of value, like scalar, hash, array etc. but it has no prefix symbol to differentiate it. (This isn't actually the full extent of the truth, but I am worried about confusing you if I add even more detail.)
Perl still works like this, but it is best avoided for a couple of reasons.
All such file handles are global, and there is no way to restrict access to them by scope
There is no way to pass the value to a subroutine or store it in a data structure
So Perl was enhanced several years ago so that you can use references to file handles. These can be stored in scalar variables, arrays, or hashes, and can be passed as subroutine parameters.
What happens now when you write
open my $in, '<', 'phonebook.txt'
is that perl autovivifies an anonymous file handle, and puts a reference to it in variable $in, so yes, you were right, it is a reference. (Another thing that was changed about the same time was the move to three-parameter open calls, which allow you to open a file called, say, >.txt for input.)
I hope that helps you to understand. It's an unnecessary level of detail, but it can often help you to remember the way Perl works to understand the underlying details.
Incidentally, it is best to keep to lower-case letters for lexical variables, even for file handle references. I often add fh to the end to indicate that the variable holds a file handle, like $in_fh. But there's no need to use capitals, which are generally reserved for global variables like Package::Names.
Update - The Rest of the Story
I thought I should add something to explain what I have mised out, for fear of misleading people who care about the gory detail.
Perl keeps a symbol table hash - a stash - that work very like ordinary Perl hashes. There is one such stash for each package, including the default package main. Note that this hash nothing to do with lexical variables - declared with my - which are stored entirely separately.
Ther indexes for the stashes are the names of the package variables, without the initial symbol. So, for example, if you have
our $val;
our #val;
our %val;
then the stash will have only a single element, with a key of val and a value which is a reference to an intermediate structure called a typeglob. This is another hash structure, with one element for each different type of variable that has been declared. In this case our val typeglob will have three elements, for the scalar, array, and hash varieties of the val variables.
One of these elements may also be an IO variable type, which is where file handles are kept. But, for historical reasons, the value that is passed around as a file handle is in fact a reference to the typeglob that contains it. That is why, if you write open my $in, '<', 'phonebook.txt' and then print $in you will see something like GLOB(0x269581c) - the GLOB being short for typeglob.
Apart from that, the account above is accurate. Perl autovivifies an anonymous typeglob in the current package, and uses only its IO slot for the file handle.
Scalars in Perl are denoted by a $ and they can indeed contain the type of values you mention in your questions but next to that they can also contain a file handle. You can create file handles in Perl in two ways one way is Lexical
open my $filehandle, '>', '/path/to/file' or die $!;
and the other is global
open FILEHANDLE, '>', '/path/to/file' or die $!;
You should use the Lexical version which is what you're doing.
The while loop in your code uses the <> operator on your lexical filehandle which returns a line out of your file every time it's called, until it's out of lines (when End Of File is reached) in which case it returns false.
I went into a bit more detail on file handles as it seems it's a concept you're not completely clear on.

Parallel reading of input file with Parallel::Loops module

I often come across a scenario where I need to parse a very large input file and then process the lines for final output. With many of these files it can take a while to process.
Since it's usually the same process, and usually I want to stored the processed data to a hash for the final manipulation, it seems that maybe something like Parallel::Loops would be helpful and speed the process up.
If I'm not thinking this through correctly, please let me know.
I've used Parallel::Loops before to process many files at a time with great results, but I can't figure out how to process many lines from one file as I don't know how to pass each line of the file in as a reference.
If I try to do this:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Parallel::Loops;
my $procs = 12;
my $pl = Parallel::Loops->new($procs);
my %data;
$pl->share(\%data);
my $input_file = shift;
open( my $in_fh, "<", $input_file ) || die "Can't open the file for reading: $!";
$pl->while( <$in_fh>, sub {
<some kind of munging and processing here>
});
I get the error:
Can't use string ("6334") as a subroutine ref while "strict refs" in use at /usr/local/share/perl/5.14.2/Parallel/Loops.pm line 518, <$in_fh> line 501.
I know that I need to pass a reference to the parallel object but I can't figure out how to make a reference to a readline element.
I also know that I can slurp the whole file in first and then pass an array reference of all of the lines, but for very large files that takes a lot of memory, and intuitively a lot more time as it technically needs to then read the file twice.
Is there a way to pass each line of a file into the Parallel::Loops object so that I can process many of the lines of a file at once?
I'm not in a position to test this as my laptop doesn't have Parallel::Loops installed and I have no consistent internet access.
However, from the documentation, the while method clearly takes two subroutine reference for parameters and you are passing <$in_fh> as the first. The method probably coerces its parameters to scalars using a prototype, so that means you are passing a simple string where a subroutine reference is expected.
Because of my situation I am far from certain, but you may get a result from
$pl->while(
sub {
scalar <$in_fh>;
},
sub {
# Process a line of data
}
);
I hope this helps. I will investigate further when I get home on Friday.

How do I delete a random value from an array in Perl?

I'm learning Perl and building an application that gets a random line from a file using this code:
open(my $random_name, "<", "out.txt");
my #array = shuffle(<$random_name>);
chomp #array;
close($random_name) or die "Error when trying to close $random_name: $!";
print shift #array;
But now I want to delete this random name from the file. How I can do this?
shift already deletes a name from the array.
So does pop (one from the beginning, one from the end) - I would suggest using pop as it may be more efficient and being a random one, you don't care which on you use.
Or do you need to delete it from a file?
If that's the case, you need to:
A. get a count of names inside a file (if small, read it all in memory using File::Slurp, if large, either read it line-by-line and count or simply execute wc -l $filename command via backticks.
B. Generate a random # from 1 to <$ of lines> (say, $random_line_number
C. Read the file line by line. For every line read, WRITE it to another temp file (use File::Temp to generate temp files. Except do NOT write the line numbered $random_line_number to text file
D. Close temp file and move it instead of your original file
If the list contains filenames and you need to delete the file itself (the random file), use unlink() function. Don't forget to process return code from unlink() and, like with any IO operation, print error message containing $! which will be the text of system error on failure.
Done.
D.
When you say "delete this … from the list" do you mean delete it from the file? If you simply mean remove it from #array then you've already done that by using shift. If you want it removed from the file, and the order doesn't matter, simply write the remaining names in #array back into the file. If the file order does matter, you're going to have to do something slightly more complicated, such as reopen the file, read the items in in order, except for the one you don't want, and then write them all back out again. Either that, or take more notice of the order when you read the file.
If you need to delete a line from a file (its not entirely clear from your question) one of the simplest and most efficient ways is to use Tie::File to manipulate a file as if it were an array. Otherwise perlfaq5 explains how to do it the long way.