Reading/dumping a perl hash from shell - perl

I have a read-only perl file with a huge hash defined in it. Is there anyway for me to read this perl file and dump out the hash contents?
this is basic structure of the hash within the file.
%hash_name = {
-files => [
'<some_path>',
],
-dirs => [
'<some_path>',
'<some_path>',
'<some_path>',
'<some_path>',
'<some_path>',
],
};

Ideally you'd copy the file so that you can edit it, then turn it into a module so to use it nicely.
But if for some reason this isn't feasible here are your options.
If that hash is the only thing in the file, "load" it using do† and assign to a hash
use warnings;
use strict;
my $file = './read_this.pl'; # the file has *only* that one hash
my %hash = do $file;
This form of do executes the file (runs it as a script), returning the last expression that is evaluated. With only the hash in the file that last expression is the hash definition, precisely what you need.
If the hash is undeclared, so a global variable (or declared with our), then declare as our a hash with the same name in your program and again load the file with do
our %hash_name; # same name as in the file
do $file; # file has "%hash" or "our %hash" (not "my %hash")
Here we "pick up" the hash that is evaluated as do runs the file by virtues of our
If the hash is "lexical", declared as my %hash (as it should be!) ... well, this is bad. Then you need to parse the text of the file so to extract lines with the hash. This is in general very hard to do, as it amounts to parsing Perl. (A hash can be built using map, returned from a sub as a reference or a flat list ...) Once that is done you eval the variable which contains the text defining that hash.
However, if you know how the hash is built, as you imply, with no () anywhere inside
use warnings;
use strict;
my $file = './read_this.pl';
my $content = do { # "slurp" the file -- read it into a variable
local $/;
open my $fh, '<', $file or die "Can't open $file: $!";
<$fh>;
};
my ($hash_text) = $content =~ /\%hash_name\s*=\s*(\(.*?\)/s;
my %hash = eval $hash_text;
This simple shot leaves out a lot, assuming squarely that the hash is as shown. Also note that this form of eval carries real and serious security risks.
†
Files are also loaded using require. Apart from it doing a lot more than do, the important thing here is that even if it runs multiple times require still loads that file only once. This matters for modules in the first place, which shouldn't be loaded multiple times, and use indeed uses require.
On the other hand, do does it every time, what makes it suitable for loading files to be used as data, which presumably should be read every time. This is the recommended method. Note that require itself uses do to actually load the file.
Thanks to Schwern for a comment.

Related

List all variables loaded by 'require' function

I have a config file with bunch of data structures (arrays, hashes) and I load them into my perl script using
require '<config>';
I can use the variables from config that I know of but is there a way that I can list all the variables loaded by the require function? Ideally I would want them to load into a hash variable and refer them to avoid variable name conflicts
Not easily, and this is why relying on global named variables is problematic. Instead, have your config file return a single data structure (like a hashref, so you can name parts of it) and load it with do into a lexical variable:
use strict;
use warnings;
my $file = '/path/to/foo.conf';
my $data = do $file;
die "Failed to parse $file: $#" if !defined $data and $#;
die "Failed to read $file: $!" if !defined $data;
Make sure either to pass an absolute path to the file (recommended, to avoid depending on what your current working directory happens to be) or prepend a relative path with ./, otherwise do (and require) will search #INC for the file, which since Perl 5.26 does not contain the current working directory. See Path::This for a way to get an absolute path relative to the current file.

Call a subroutine defined as a variable

I am working on a program which uses different subroutines in separate files.
There are three parts
A text file with the name of the subroutine
A Perl program with the subroutine
The main program which extracts the name of the subroutine and launches it
The subroutine takes its data from a text file.
I need the user to choose the text file, the program then extracts the name of the subroutine.
The text file contains
cycle.name=cycle01
Here is the main program :
# !/usr/bin/perl -w
use strict;
use warnings;
use cycle01;
my $nb_cycle = 10;
# The user chooses a text file
print STDERR "\nfilename: ";
chomp($filename = <STDIN>);
# Extract the name of the cycle
open (my $fh, "<", "$filename.txt") or die "cannot open $filename";
while ( <$fh> ) {
if ( /cycle\.name/ ) {
(undef, $cycleToUse) = split /\s*=\s*/;
}
}
# I then try to launch the subroutine by passing variables.
# This fails because the subroutine is defined as a variable.
$cycleToUse($filename, $nb_cycle);
And here is the subroutine in another file
# !/usr/bin/perl
package cycle01;
use strict;
use warnings;
sub cycle01 {
# Get the total number of arguments passed
my ($filename, $nb_cycle) = #_;
print "$filename, $nb_cycle";
Your code doesn't compile, because in the final call, you have mistyped the name of $nb_cycle. It's helpful if you post code that actually runs :-)
Traditionally, Perl module names start with a capital letter, so you might want to rename your package to Cycle01.
The quick and dirty way to do this is to use the string version of eval. But evaluating an arbitrary string containing code is dangerous, so I'm not going to show you that. The best way is to use a dispatch table - basically a hash where the keys are valid subroutine names and the values are references to the subroutines themselves. The best place to add this is in the Cycle01.pm file:
our %subs = (
cycle01 => \&cycle01,
);
Then, the end of your program becomes:
if (exists $Cycle01::subs{$cycleToUse}) {
$Cycle01::subs{$cycleToUse}->($filename, $nb_cycle);
} else {
die "$cycleToUse is not a valid subroutine name";
}
(Note that you'll also need to chomp() the lines as you read them in your while loop.)
To build on Dave Cross' answer, I usually avoid the hash table, partly because, in perl, everything is a hash table anyway. Instead, I have all my entry-point subs start with a particular prefix, that prefix depends on what I'm doing, but here we'll just use ep_ for entry-point. And then I do something like this:
my $subname = 'ep_' . $cycleToUse;
if (my $func = Cycle01->can($subname))
{
$func->($filename, $nb_cycle);
}
else
{
die "$cycleToUse is not a valid subroutine name";
}
The can method in UNIVERSAL extracts the CODE reference for me from perl's hash tables, instead of me maintaining my own (and forgetting to update it). The prefix allows me to have other functions and methods in that same namespace that cannot be called by the user code directly, allowing me to still refactor code into common functions, etc.
If you want to have other namespaces as well, I would suggest having them all be in a single parent namespace, and potentially all prefixed the same way, and, ideally, don't allow :: or ' (single quote) in those names, so that you minimise the scope of what the user might call to only that which you're willing to test.
e.g.,
die "Invalid namespace $cycleNameSpaceToUse"
if $cycleNameSpaceToUse =~ /::|'/;
my $ns = 'UserCallable::' . $cycleNameSpaceToUse;
my $subname = 'ep_' . $cycleToUse;
if (my $func = $ns->can($subname))
# ... as before
There are definitely advantages to doing it the other way, such as being explicit about what you want to expose. The advantage here is in not having to maintain a separate list. I'm always horrible at doing that.

How to use variable instead of a file handle

I have a big data file dump.all.lammpstrj which I need to split/categorize into a series of files, such as Z_1_filename, Z_2_filename, Z_3_filename etc. based on the coordinates in each record.
The coordinates are saved in a disordered way, so my program reads each line and determines which file this record should be sent to.
I use a variable, $filehandle = "Z_$i_DUMP"
and I hope to open all of the possible files like this
for ( my $i = 1; $i <= 100; $i++ ) {
$filehandle = "Z_$i_DUMP";
open $filehandle,'>', "Z_$i_dump.all.lammpstrj.dat";
...
}
But when running my program, I get a message
Can't use string ("Z_90_DUMP") as a symbol ref while "strict refs" in use at ...
I don't want to scan all the data for each output file, because dump.all.lammpstrj is so big that a scan would take a long time.
Is there any way to use a defined variable as a file handle?
To give you an idea on how this might be done. Put file handles in a hash (or perhaps array if indexed by numbers).
use strict;
use warnings;
my %fh; #file handles
open $fh{$_}, '>', "Z_${_}_dump.all.lammpstrj.dat" for 1..100; #open 100 files
for(1..10000){ #write 10000 lines in 100 files
my $random=int(1+rand(100)); #pick random file handle
print {$fh{$random}} "something $_\n";
}
close $fh{$_} for 1..100;
Don't assign anything to $filehandle or set it to undef before you call open(). You get this error because you have assigned a string to $filehandle (which is of no use anyway).
Also see "open" in perldoc:
If FILEHANDLE is an undefined scalar variable (or array or hash element), a new filehandle is autovivified, meaning that the variable is assigned a reference to a newly allocated anonymous filehandle. Otherwise if FILEHANDLE is an expression, its value is the real filehandle. (This is considered a symbolic reference, so use strict "refs" should not be in effect.)
To have more file handles at a time and to conveniently map them to the file names consider using a hash with the file name (or whatever identifier suits you) as key to store them in. You can check if the key exists (see "exists") and the value is defined (see "defined") to avoid reopening the file unnecessarily.
I sincerely appreciate Kjetil S. and sticky bit. I tested it and their suggestion work well. And I noticed that there is another way to write data to different files WITHOUT CHANGING filehandler. Actually I changed file names using same file handler.
....
for my $i=0;$i<=$max_number;$i++;
{
$file="$i\_foo.dat";
open DAT,'>>',"$file";
......
}

Parallel reading of input file with Parallel::Loops module

I often come across a scenario where I need to parse a very large input file and then process the lines for final output. With many of these files it can take a while to process.
Since it's usually the same process, and usually I want to stored the processed data to a hash for the final manipulation, it seems that maybe something like Parallel::Loops would be helpful and speed the process up.
If I'm not thinking this through correctly, please let me know.
I've used Parallel::Loops before to process many files at a time with great results, but I can't figure out how to process many lines from one file as I don't know how to pass each line of the file in as a reference.
If I try to do this:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Parallel::Loops;
my $procs = 12;
my $pl = Parallel::Loops->new($procs);
my %data;
$pl->share(\%data);
my $input_file = shift;
open( my $in_fh, "<", $input_file ) || die "Can't open the file for reading: $!";
$pl->while( <$in_fh>, sub {
<some kind of munging and processing here>
});
I get the error:
Can't use string ("6334") as a subroutine ref while "strict refs" in use at /usr/local/share/perl/5.14.2/Parallel/Loops.pm line 518, <$in_fh> line 501.
I know that I need to pass a reference to the parallel object but I can't figure out how to make a reference to a readline element.
I also know that I can slurp the whole file in first and then pass an array reference of all of the lines, but for very large files that takes a lot of memory, and intuitively a lot more time as it technically needs to then read the file twice.
Is there a way to pass each line of a file into the Parallel::Loops object so that I can process many of the lines of a file at once?
I'm not in a position to test this as my laptop doesn't have Parallel::Loops installed and I have no consistent internet access.
However, from the documentation, the while method clearly takes two subroutine reference for parameters and you are passing <$in_fh> as the first. The method probably coerces its parameters to scalars using a prototype, so that means you are passing a simple string where a subroutine reference is expected.
Because of my situation I am far from certain, but you may get a result from
$pl->while(
sub {
scalar <$in_fh>;
},
sub {
# Process a line of data
}
);
I hope this helps. I will investigate further when I get home on Friday.

How to create a common array and be used between several perl scripts?

I have an application in which scripts will be run that need to be able access stored data. I want to run a script (main.pl) which will create an array. Later, if I run A.pl or B.pl, I want those scripts to be able access the previously created array and change values within it. What do I need to code in main.pl A.pl B.pl so I can achieve that?
Normally one perl instance can not access the variables of another instance. The question then becomes "what can one do that is almost like sharing variables"?
One approach is to store data somewhere it can persist, such as in a database or a CSV file on disk. This means reading the data at the beginning of the program, and writing it or updating it, and naturally leads to questions about race conditions, locking, etc... and greatly expands the scope that any possible answer would need to cover.
Another approach is to write your programs to use CSV or YAML or some other format easily read and written by libraries from CPAN, and use STDIN and STDOUT for input and output. This allows decoupling of storage, and also chaining several tools together with a pipe from the shell prompt.
For an in-memory solution for tieing hashes to shared memory, you can check out IPC::Shareable
http://metacpan.org/pod/IPC::Shareable
Perl memory structures can't be stored and then accessed later by other Perl scripts. However, you can write out those memory structures as a file. This can be done through hard raw coding, or by using a wide variety of Perl modules. The Storable is a standard Perl module and has been around for quite a while.
Since all you're installing is an array, you could have one program write the array to a file, and then have the other file read the array.
use strict;
use warnings;
use autodie;
use constant {
ARRAY_FILE => "$Env{HOME}/perl_arry.txt",
};
my #array;
[...] #Build the array
open my $output_fh, ">", ARRAY_FILE;
while my $item ( #array ) {
say {$output_fh} $item;
}
close $output_fh;
Now, have your second program read in this array:
use strict;
use warnings;
use autodie;
use constant {
ARRAY_FILE => "$Env{HOME}/perl_arry.txt",
};
my #new_array;
open my $input_fh, "<", ARRAY_FILE;
while ( my $item = <$input_fh> ) {
push #new_array, $item;
}
close $input_fh;
More complex data can be stored with Storable, but's it's pretty much the same thing: You need to write Storable to a physical file and then reopen that file to pull in your data once again.