What does <DATA> mean in Perl error messages? - perl

I'm trying to debug a error I am currently experiencing in Perl, and my first clues are the files and the lines stated. However, I'm not sure what <DATA> is.
So what is it?

It means you had read 228 lines from the DATA file handle when the error occurred. It's unlikely to be relevant in this case.
It's even less likely to be relevant when the handle in question is DATA. DATA allows a program to read data from the end of its source file. It's usually used to store hard-coded data or part of the program itself. It's usually read from start to finish early in the program execution. But few bother to close the handle, so unrelated error message end up tagged with the number of the last line of that data.

<DATA> is default filehandle for __DATA__ or __END__ tokens in Perl.
What it means is, there should be a __DATA__ or __END__ sections towards the end of the perl script you are running. Whatever text you have after those tokens is considered by perl interpreter as a file and is made available to the program through the <DATA> file handle.
print while (<DATA>);
# End of Perl script. Whatever follows goes into <DATA> fh.
__DATA__
line 1
line 2
line 3
line 4
line 5
line 6

Related

perl print to a file and STDOUT which is the file

My program is trying to print to a file which for it is the STDOUT.
To say, print "text here"; prints to a file x.log , while I am also trying to print to the file x.log using file-handler method as in print FH1 "text here"; . I notice that when the file-handler method statement is provided first and then followed by the STDOUT procedure. My second print can override the first.I would like to know more on why this occurs.
This makes me think of a race condition or that the file-handler is relatively slow (if it goes through buffer?) than the STDOUT print statements. This I am not sure on how if that is the way Perl works. Perl version - 5.22.0
As far as I understand your program basically looks like this:
open(my $fh,'>','foobar.txt');
print $fh "foo\n";
print "bar\n"; # prints to STDOUT
And then you use it in a way that STDOUT is redirected in the shell to the same file which is already opened in your program:
$ perl test.pl > foobar.txt
This will open two independent file handles to the same file: one within your program and the other within the shell where you start the program. Both file handles manage their own file position for writing, start at position 0 and advance the position after each write.
Since these file handles are independent from each other they will not care if there are other file handles dealing currently with this file, no matter if these other file handles are inside or outside the program. This means that these writes will overwrite each other.
In addition to this there is also internal buffering done, i.e. each print will first result into a write into some internal buffer and might result immediately into a write to the file handle. When data are written to the file handle depends on the mode of the file handle, i.e. unbuffered, line-buffered or a buffer of a specific size. This makes the result kind of unpredictable.
If you don't want this behavior but still want to write to the same file using multiple file handle you better use the append-mode, i.e. open with >> instead of > in both Perl code and shell. This will make sure that all data will be appended to the end of the file instead of written to the file position maintained by the file handle. This way data will not get overwritten. Additionally you might want to make the file handles unbuffered so that data in the file end up in the same order as the print statements where done:
open(my $fh,'>>','foobar.txt');
$fh->autoflush(1); # make $fh unbuffered
$|=1; # make STDOUT unbuffered
print $fh "foo\n";
print "bar\n"; # prints to STDOUT
$ perl test.pl >> foobar.txt

Perl - error reading text file

I am trying to read a file in perl.
I just want to print the names of each line of file players.txt.
I am using this code:
#!/usr/local/bin/perl
open (MYFILE, 'players.txt');
while () {
chomp;
print "$_\n";
}
close (MYFILE);
This produces this error:
./program5.pl: line 2: syntax error near unexpected token `MYFILE,'
./program5.pl: line 2: ` open (MYFILE, 'players.txt');'
I am using a Unix operation system to do this. I can't find a guide that works for reading in a file in Perl.
The errors you're getting are bash errors, which means that you're asking bash to process Perl code. That isn't going to work
At a guess, your shebang line #!/usr/local/bin/perl doesn't start at the beginning of the very first line. The two characters #! must be the first two bytes of the file
Alternatively you can drop the shebang line if you specify that perl should run the program:
perl program5.pl
You also have an error in while (), which is an endless loop and doesn't read from the file. You need
while ( <MYFILE> ) {
...
}
You have found a very poor and old-fashioned source of advice to learn Perl. I suggest you take a look at the
Perl tag information
which has a list of many excellent tutorials and resources

Error when redirecting data to file

Frequently [not always] when i run procedure define a file handler i get strange error on internal function which i dont understand how to debug.
In My PERL code i have the following line [111]:
open V_FILE_SEC, ">>$file/V_$file$dir.csvT" or die $!;
And when i am operating the script [>myscript.pl DPX_*] i get:
"No such file or directory at myscript.pl line 111, line 18004."
What is the meaning of line 18004? How to start debug?
Thanks.
From perldoc -f die:
If the last element of LIST does not end in a newline, the current script line number and input line number (if any) are also printed, and a newline is supplied. [Emphasis added]
The "input line number" is the value in $., roughly the number of lines of input you have read from the most recent filehandle you accessed.
In your case, you could use to look at your program input and see if there is anything unusual around line 18004 that your program wasn't expecting.

Find data point generating Error in Perl code

I have a program in Perl that reads one line at a time from a data file and computes certain statistics for each line of data. Every now and then, while the program reads through my dataset, I get a warning about an ...uninitialized value... and I would like to know which line of data generates this warning.
Is there any way I can tell Perl to print (to screen or file) the data point that is generating the error?
If your script prints one line for each input line, it would be simpler to see when the error occurs by flushing the standard error along with the output (making the message appear at the "right" point):
$| = 1;
That is, turn on the perl autoflush feature, as discussed in How to flush output to the console?
What (auto)flushing does:
error messages are written to the predefined STDERR stream, normal printf's go to the (default) predefined STDOUT.
data written on these streams is saved up by the system (under Perl's control) to write out in chunks (called "buffers") to improve efficiency.
STDERR and STDOUT buffers are independent, and could be written line-by-line or by buffers (many characters, not necessarily lines).
using autoflush tells Perl to modify its scheme for writing buffers so that their content is written via the operating system at the end of a print/printf call.
normally STDERR is written line-by-line. The command tells Perl to enable this feature for the current/default stream, i.e., STDOUT.
doing that makes both of them write line-by-line, so that messages sent close in time via either appears close together in the output of your script.
Perl usually includes the file handle and line number in warnings by default; i.e.
>echo hello | perl -lnwe 'print $x'
Name "main::x" used only once: possible typo at -e line 1.
Use of uninitialized value $x in print at -e line 1, <> line 1.
So if you're doing the computation while reading, you get the appropriate warning.

What is the fastest way to 'print' to file in perl?

I've been writing output from perl scripts to files for some time using code as below:
open( OUTPUT, ">:utf8", $output_file ) or die "Can't write new file: $!";
print OUTPUT "First line I want printed\n";
print OUTPUT "Another line I want printing\n";
close(OUTPUT);
This works, and is faster than my initial approach which used "say" instead of print (Thank you NYTProf for enlightening my to that!)
However, my current script is looping over hundreds of thousands of lines and is taking many hours to run using this method and NYTProf is pointing the finger at my thousands of 'print' commands. So, the question is... Is there a faster way of doing this?
Other Info that's possibly relevant...
Perl Version: 5.14.2 (On Ubuntu)
Background of the script in question...
A number of '|' delimited flat files are being read into hashes, each file has some sort of primary key matching entries from one to another. I'm manipulating this data and them combining them into one file for import into another system.
The output file is around 3 Million lines, and the program starts to noticeably slow down after writing around 30,000 lines to said file. (A little reading around seemed to point towards running out of write buffer in other languages but I couldn't find anything about this with regard to perl?)
EDIT: I've now tried adding the line below, just after the open() statement, to disable print buffering, but the program still slows around the 30,000th line.
OUTPUT->autoflush(1);
I think you need to redesign the algorithm your program uses. File output speed isn't influenced by the amount of data that has been output, and it is far more likely that your program is reading and processing data but not releasing it.
Check the amount of memory used by your process to see if it increases inexorably
Beware of for (<$filehandle>) loops, which read whole files into memory at once
As I said in my comment, disable the relevant print statements to see how performance changes
Have you tried to concat all the single print's into a single scalar and then print scalar all at once? I have a script that outputs an average of 20 lines of text for each input line. When using individual print statements, even sending the output to /dev/null, took a long time. But when I packed all the output (for a single input line) together, using things like:
$output .= "...";
$output .= sprintf("%s...", $var);
Then just before leaving the line processing sub-routine, I 'print $output'. Printing all the lines at once. The number of calls to print went from ~7.7M to about 386K - equal to the number of lines in the input date file. This shaved about 10% off of my total execution time.