I'm having an issue with a Perl script relating to the Weather Research Forecast (WRF) model configuration. The script in question is a part of the download located here (login required, simple signup). If you download the most recent WRF-NMM core, in the unzipped directory is arch/Config_new.pl. The error that I'm having lies somewhere within lines 262-303:
until ( $validresponse ) {
print "------------------------------------------------------------------------\n" ;
print "Please select from among the following supported platforms.\n\n" ;
$opt = 1 ;
open CONFIGURE_DEFAULTS, "< ./arch/configure_new.defaults"
or die "Cannot open ./arch/configure_new.defaults for reading" ;
while ( <CONFIGURE_DEFAULTS> )
{
for $paropt ( #platforms )
{
if ( substr( $_, 0, 5 ) eq "#ARCH"
&& ( index( $_, $sw_os ) >= 0 ) && ( index( $_, $sw_mach ) >= 0 )
&& ( index($_, $paropt) >= 0 ) )
{
$optstr[$opt] = substr($_,6) ;
$optstr[$opt] =~ s/^[ ]*// ;
$optstr[$opt] =~ s/#.*$//g ;
chomp($optstr[$opt]) ;
$optstr[$opt] = $optstr[$opt]." (".$paropt.")" ;
if ( substr( $optstr[$opt], 0,4 ) ne "NULL" )
{
print " %2d. %s\n",$opt,$optstr[$opt] ;
$opt++ ;
}
}
}
}
close CONFIGURE_DEFAULTS ;
$opt -- ;
print "\nEnter selection [%d-%d] : ",1,$opt ;
$response = <STDIN> ;
if ( $response == -1 ) { exit ; }
if ( $response >= 1 && $response <= $opt )
{ $validresponse = 1 ; }
else
{ print("\nInvalid response (%d)\n",$response);}
}
Specifically, I am sent to an input line without any kind of prompting or list of what my options are. Only after I select a valid choice am I presented with the previous options. This is repeated a second time with another chunk of code further down (lines 478-528). What's got me confused is that, when I entered debugging mode, I inserted a break before the start of this portion of code. I ran p $validresponse and got the following:
0
If you REALLY want Grib2 output from WRF, modify the arch/Config_new.pl script.
Right now you are not getting the Jasper lib, from the environment, compiled into WRF.
This intrigues me, as the paragraph is from a printf from several lines before. In this particular script, it is the only printf that has run so far, but why the output was saved to the next created variable is beyond me. Any suggestions?
EDIT: After looking at choroba's suggestion, the same problem occurs with any type of redirection, whether piping, using tee, or stderr/stdout redirection. As such, I'm thinking it may be a problem with bash? That is, the only way I can run it is without any kind of logging (at least to my knowledge, which is admittedly quite limited).
You want to enable autoflushing, so that the Perl print buffer is flushed automatically after something is printed.
This is the default behavior when a Perl script outputs to a terminal window, but when the output is redirected in any way, the default is to buffer the output. Enabling autoflushing disables the buffering.
You can enable autoflushing by adding the following two lines to the top of the Perl script (below the Perl hashbang line, of course):
use IO::Handle qw();
STDOUT->autoflush(1);
When you redirect with pipes or similar you are (normally) redirecting STDOUT. All of the print statements go to STDOUT, so when redirecting the will be sent to whatever process you are piping to. Without seeing the full command you are using I can't say exactly why you aren't seeing the STDOUT messages, but they are obviously being swallowed by the redirection. Whether or not that is actually a problem if for you to decide.
the line
$response = <STDIN> ;
causes the script to wait for input from STDIN which is why you see the prompt. You are not piping anything in to STDIN so it waits.
Related
i m trying to write a perl script to deal with some 3+ gb text files, that are structured like :
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212123x534534534534xx4545454x232323xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
I want to perform two operations :
Count the number of delimiters per line and compare it to a static number (ie 5), those lines that exceed said number should be output to a file.control.
Remove duplicates on the file by substring($line, 0, 7) - first 7 numbers, but i want to preserve order. I want the output of that in a file.output.
I have coded this in simple shell script (just bash), but it took too long to process, the same script calling on perl one liners was quicker, but i m interested in a way to do this purely in perl.
The code i have so far is :
open $file_hndl_ot_control, '>', $FILE_OT_CONTROL;
open $file_hndl_ot_out, '>', $FILE_OT_OUTPUT;
# INPUT.
open $file_hndl_in, '<', $FILE_IN;
while ($line_in = <$file_hndl_in>)
{
# Calculate n. of delimiters
my $delim_cur_line = $line_in =~ y/"$delimiter"//;
# print "$commas \n"
if ( $delim_cur_line != $delim_amnt_per_line )
{
print {$file_hndl_ot_control} "$line_in";
}
# Remove duplicates by substr(0,7) maintain order
my substr_in = substr $line_in, 0, 11;
print if not $lines{$substr_in}++;
}
And i want the file.output file to look like
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
and the file.control file to look like :
(assuming delimiter control number is 6)
4352342xx23232xxx345545x45454x23232xxx
Could someone assist me? Thank you.
Posting edits : Tried code
my %seen;
my $delimiter = 'x';
my $delim_amnt_per_line = 5;
open(my $fh1, ">>", "outputcontrol.txt");
open(my $fh2, ">>", "outputoutput.txt");
while ( <> ) {
my $count = ($_ =~ y/x//);
print "$count \n";
# print $_;
if ( $count != $delim_amnt_per_line )
{
print fh1 $_;
}
my ($prefix) = substr $_, 0, 7;
next if $seen{$prefix}++;
print fh2;
}
I dont know if i m supposed to post new code in here. But i tried the above, based on your example. What baffles me (i m still very new in perl) is that it doesnt output to either filehandle, but if i redirected from the command line just as you said, it worked perfect. The problem is that i need to output into 2 different files.
It looks like entries with the same seven-character prefix may appear anywhere in the file, so it's necessary to use a hash to keep track of which ones have already been encountered. With a 3GB text file this may result in your perl process running out of memory, in which case a different approach is necessary. Please give this a try and see if it comes in under the bar
The tr/// operator (the same as y///) doesn't accept variables for its character list, so I've used eval to create a subroutine delimiters() that will count the number of occurrences of $delimiter in $_
It's usually easiest to pass the input file as a parameter on the command line, and redirect the output as necessary. That way you can run your program on different files without editing the source, and that's how I've written this program. You should run it as
$ perl filter.pl my_input.file > my_output.file
use strict;
use warnings 'all';
my %seen;
my $delimiter = 'x';
my $delim_amnt_per_line = 5;
eval "sub delimiters { tr/$delimiter// }";
while ( <> ) {
next if delimiters() == $delim_amnt_per_line;
my ($prefix) = substr $_, 0, 7;
next if $seen{$prefix}++;
print;
}
output
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am rather new to Perl and am trying to combine several .pm files into a single script. Most of the modules copy over just fine, but some have an error where the end of file is reached, but the script keeps printing. Here is an example of the code:
$copy_line = 0;
sysopen(FILE, $file_path, O_WRONLY | O_CREAT, 0711);
sysopen(MODULE, $module_path, O_RDONLY | O_EXCL);
while(<MODULE>)
{
my $line = $_;
if(($line ne "# START\n") and ($copy_line eq 0))
{
}
else
{
print FILE "$line";
$copy_line = 1;
}
}
close FILE;
close MODULE;
Each module has start and end tags to I do not copy any use statements, and so I know when to stop copying. An example of the module is
#!/usr/bin/perl
# START
some code to copy over
some more code to copy
even more code to copy
# END
What happens in some files is I see the end tag, followed by repeated code from the module. The output looks something like
# START
some code to copy over
some more code to copy
even more code to copy
# END
code to copy
even more code to copy
# END
What might be causing this?
Thanks,
-rusty
There are various things wrong with your script:
You didn't show the whole script; constants like O_WRONLY don't exist by default.
Therefore it may be that you didn't use strict; use warnings; at the beginning of your script. This is neccessary to get warned about errors or possible mistakes.
The strict mode requires you to declare all your variables. You can do so with the my keyword, e.g. my $copy_line = 0.
Never use sysopen, except when you fully understand how open works and why it wouldn't be the best choice for a given situation. Considering that I don't have that level of knowledge, I think we'll stick to the normal open.
The open takes a variable, a mode, and a filename, like
open my $file, "<", $filename;
I encourage you to use autodie for automatic error handling, otherwise you should do
open my $file, "<", $filename or die "Can't open $filename: $!";
where $! contains the reason for the error. You can specify various modes for open, which are modelled after shell redirection operators. Important are: < read, > write (or create), >> append, |- write pipe to command, -| read pipe from command.
The eq operator tests for string equality. If you want to test for numeric equality, use the == operator.
if (COND) {} else { STUFF } could rather be written unless (COND) {STUFF}.
You have successfully implemented some twisted logic that starts copying at the START marker. However, you don't stop at the END. For stuff like this, the flip-flop-operator .. can be used: It takes two operands, which are arbitrary expressions. The operator returns false until the first operand is true, and remains true until after the second operand returned true. If one operand is a constant integer, it is interpreted as a line number. Thus, the script
while (<>) {
print if 5 .. 10;
}
prints lines number 5–10 inclusive of the input.
For your problem, you should probably use regexes that match the start and the end marker:
while (<>) {
print if /^ \s* # \s* START/x .. /^ \s* # \s* END/x
}
I'll assume here that you know regexes, but I can add explanations if needed.
If the readline operator <> is used without an operand, it takes the command line arguments of the script, opens them, and reads them in sequence. If no arguments were provided, it uses STDIN.
This allows for flexible little scripts. The code can be summarized in the command-line oneliner
$ perl -ne'print if /^\s*#\s*START/../^\s*#\s*END/' INPUT-FILE1 INPUT-FILE2 >OUTPUT
There are two issues with this:
It prints out the start/end markers as well
If a file doesn't contain an # END, the next files will be printed out in full until the next # END is found.
We can mitigate issue #2 by testing for the end of file in the termination condition:
print if /^\s*#\s*START/ .. (/^\s*#\s*END/ or eof);
Issue #1 is slightly more complex; I'd reintroduce a flag for that:
my $print_this = 0;
while (<>) {
if (/^\s*#\s*END/ or eof) {
$print_this = 0;
} elsif ($print_this) {
print;
} elsif (/^\s*#\s*START/) {
$print_this = 1;
}
}
Partial test case:
$ perl -e'
my $print_this = 0;
while (<>) {
if (/^\s*#\s*END/ or eof) { $print_this = 0 }
elsif ($print_this) { print }
elsif (/^\s*#\s*START/) { $print_this = 1 }
}' <<'__END__'
no a 1
no a 2
# START
yes b 1
yes b 2
yes b 3
#END
no c 1
no c 2
# START
yes d 1
# END
no e 1
__END__
Output:
yes b 1
yes b 2
yes b 3
yes d 1
If you're copying files without modifying their contents, you should look into File::Copy http://perldoc.perl.org/File/Copy.html
File::Copy is a standard module and is installed along with Perl. For a list of standard modules, see perldoc perlmodlib http://perldoc.perl.org/perlmodlib.html#Standard-Modules
As per solution provided in perldoc, I am trying to emulate tail -f but it's not working as expected. The below code could print all lines first time but not the newly lines appended. Could you please elaborate if I am missing any thing here.
#!/usr/bin/perl
open (LOGFILE, "aa") or die "could not open file reason $! \n";
for (;;)
{
seek(LOGFILE,0,1); ### clear OF condition
for ($curpos = tell(LOGFILE); <LOGFILE>; $curpos = tell(LOGFILE))
{
print "$_ \n";
}
sleep 1;
seek(LOGFILE,$curpos,0); ### Setting cursor at the EOF
}
works fine for me. How are you updating "aa" ?
You wont see the data immediately if it is a buffered write to "aa".
can you try the following in a different terminal and check whether you are seeing any update.
while ( 1 )
echo "test" >> aa
end
If you are using perl to update aa, check this section on buffering and how to disable.
Can you hook the opening of the DATA handle for a module while Perl is still compiling? And by that I mean is there a way that I can insert code that will run after Perl has opened the DATA glob for reading but before the compilation phase has ceased.
Failing that, can you at least see the raw text after __DATA__ before the compiler opens it up?
In response to Ikegami, on recent scripts that I have been working on, I have been using __DATA__ section + YAML syntax to configure the script. I've also been building up a vocabulary of YAML configuration handlers where the behavior is requested by use-ing the modules. And in some scripts that are quick-n-dirty, but not quite enough to forgo strict, I wanted to see if I could expose variables from the YAML specification.
It's been slightly annoying though just saving data in the import subs and then waiting for an INIT block to process the YAML. But it's been doable.
The file handle in DATA is none other than the handle the parser uses to read the code found before __DATA__. If that code is still being compiled, then __DATA__ hasn't been reached, then the handle hasn't been stored in DATA.
You could do something like the following instead:
open(my $data_fh, '<', \<<'__EOI__');
.
. Hunk of text readable via $data_fh
.
__EOI__
I don’t know where you want the hook. Probably in UNITCHECK.
use warnings;
sub i'm {
print "in #_\n";
print scalar <DATA>;
}
BEGIN { i'm "BEGIN" }
UNITCHECK { i'm "UNITCHECK" }
CHECK { i'm "CHECK" }
INIT { i'm "INIT" }
END { i'm "END" }
i'm "main";
exit;
__END__
Data line one.
Data line two.
Data line three.
Data line four.
Data line five.
Data line six.
Produces this when run:
in BEGIN
readline() on unopened filehandle DATA at /tmp/d line 5.
in UNITCHECK
Data line one.
in CHECK
Data line two.
in INIT
Data line three.
in main
Data line four.
in END
Data line five.
You can use any of the before runtime but after compilation blocks to change the *DATA handle. Here is a short example using INIT to change *DATA to uc.
while (<DATA>) {
print;
}
INIT { # after compile time, so DATA is opened, but before runtime.
local $/;
my $file = uc <DATA>;
open *DATA, '<', \$file;
}
__DATA__
hello,
world!
prints:
HELLO,
WORLD!
Which one of the blocks to use depends on other factors in your program. More detail about the various timed blocks can be found on the perlmod manpage.
I'm afraid not, if I got your question right. It's written in The Doc:
Note that you cannot read from the DATA filehandle in a BEGIN block:
the BEGIN block is executed as soon as it is seen (during
compilation), at which point the corresponding DATA (or END)
token has not yet been seen.
There's another way, though: read the file with DATA section as a normal text file, parse this section, then require the script file itself (which will be done at run-time). Don't know whether it'll be relevant in your case. )
perlmod says:
CHECK code blocks are run just after the initial Perl compile phase ends and before the run time begins, in LIFO order.
May be you are looking for something like this?
CHECK {
say "Reading from <DATA> ...";
while (<DATA>) {
print;
$main::count++;
};
}
say "Read $main::count lines from <DATA>";
__DATA__
1
2
3
4
5
This produces the following output:
Reading from <DATA> ...
1
2
3
4
5
Read 5 lines from <DATA>
I found out that ::STDIN actually gives me access to the stream '-'. And that I can save the current location, through tell( $inh ) and then seek() it when I'm done.
By using that method, I could read the __DATA__ section in the import sub!
sub import {
my ( $caller, $file ) = ( caller 0 )[0,1];
my $yaml;
if ( $file eq '-' ) {
my $place = tell( ::STDIN );
local $RS;
$yaml = <::STDIN>;
seek( ::STDIN, $place, 0 );
}
else {
open( my $inh, '<', $file );
local $_ = '';
while ( defined() and !m/^__DATA__$/ ) { $_ = <$inh>; }
local $RS;
$yaml = <$inh>;
close $inh;
}
if ( $yaml ) {
my ( $config ) = YAML::XS::Load( $yaml );;
no strict 'refs';
while ( my ( $n, $v ) = each %$config ) {
*{"$caller\::$n"} = ref $v ? $v : \$v;
}
}
return;
}
This worked on Strawberry Perl 5.16.2, so I don't know how portable this is. But right now, to me, this is working.
Just a background. I used to do a bit of programming with Windows Script Files. One thing I liked about the wsf format was that you could specify globally useful objects outside of the code. <object id="xl" progid="Application.Excel" />. I have always liked the look of programming by specification and letting some modular handler sort the data out. Now I can get a similar behavior through a YAML handler: excel: !ActiveX: Excel.Application.
This works for me.
The test is here, in case you're interested:
use strict;
use warnings;
use English qw<$RS>;
use Test::More;
use data_mayhem; # <-- that's my module.
is( $k, 'Excel.Application' );
is( $l[1], 'two' );
{ local $RS;
my $data = <DATA>;
isnt( $data, '' );
say $data
}
done_testing;
__DATA__
---
k : !ActiveX Excel.Application
l :
- one
- two
- three
I'm a beginner in Perl and I have some trouble using the "system" call. Here is a little piece of code where I try to execute 2 shell commands :
# First command is :
# dot -Tpng $dottmpfile > $pngfile
# Second command is :
# rm $dottmpfile
if (!($pngfile eq "")) {
my #args = ("dot", "-Tpng", $dottmpfile, " > ", $pngfile);
system (join (' ' , #args ))
or die "system #args failed : $!";
unlink $dottmpfile;
}
EDIT : Here is my code now, and I still get an error :
system dot -Tpng toto.dot > toto.png failed : Inappropriate ioctl for device at /home/claferri/bin/fractal.pl line 79.
I've used system to produce this piece of code.
Looking at perldoc -f system, note:
If there is more than one argument in LIST, or if LIST is an array with more than one value, starts the program given by the first element of the list with arguments given by the rest of the list. If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing
You are invoking system LIST so the > ends up being passed to dot instead of being interpreted by the shell.
I would recommend that you keep using system LIST because it is a little safer than passing everything through the shell. According to the docs, you can specify the output file by using the -o option to dot, so do that.
If you really want to dot your is and cross your ts (pun not intended), then you can use:
if ( defined $pngfile and $pngfile ne '') {
my #args = (dot => '-Tpng', $dottmpfile, "-o$pngfile");
if ( system #args ) {
warn "'system #args' failed\n";
my $reason = $?;
if ( $reason == -1 ) {
die "Failed to execute: $!";
}
elsif ( $reason & 0x7f ) {
die sprintf(
'child died with signal %d, %s coredump',
($reason & 0x7f),
($reason & 0x80) ? 'with' : 'without'
);
}
else {
die sprintf('child exited with value %d', $reason >> 8);
}
}
warn "'system #args' executed successfully\n";
unlink $dottmpfile;
}
You are using > to tell the shell to redirect output to a file yet by using invoking system LIST, you are bypassing the shell. Therefore, you can use:
system ( join (' ' , #args ) );
or
system "#args";
system returns 0 on success and non-zero on "failure". It's contrary to the way most of these idioms look and a little counter-intuitive, but with system calls you should use an expression like:
system($command) and warn "system $command: failed $?\n"; # and not or
or
if (system($command) != 0) { ... handle error ... }
Is the "dot" executable in the PATH? Does it have executable permissions? Which specific error are you getting with this code?
It seems that is correct according to perldoc -f system.