Write to the Data Section in Perl - perl

It is quite easy to read from the data section __DATA__ in Perl.
The following code uses the preexisting DATA handle:
foreach (<DATA>) {
print("- $_");
}
__DATA__
1
2
The End of The End
produces:
- 1
- 2
- The End of The End
However I haven't found any neat way of writing data to the section.
For example I've tried the following:
use feature say;
say STDERR 'Some Error';
say STDOUT 'Some Message';
say DATA 'Some Data';
__DATA__
foo bar
The print to STDOUT and STDERR works as expected:
perl write.pl 2> /dev/null
Some Message
But the print to DATA does nothing at all:
tail -2 write.pl
__DATA__
foo bar
I also have looked at several modules from CPAN, but I always only found support for read, not write.
Thanks for any hints and help!

You can write to the DATA section with a text editor.
Everything else is dark magic. Store input and program state in a separate file (or a database).

There is probably no good reason to this, but here is how:
#!/usr/bin/env perl
use strict;
use warnings;
seek DATA, 0, 2; # go to the end of the source file
my $offset = tell(DATA); # remember where that is
open my $f, '>>', $0 # open source file for appending
or die $!; # of course, you could use DATA instead of $f here
seek $f, $offset, 0; # go to the previously recorded position
print $f "This is a test\n"; # put something there
__DATA__

If the idea is to add things to what DATA can output, you can close it, then reopen it on a reference to a variable containing additional text, like this:
#!/usr/bin/perl
use strict;use warnings;
while(<DATA>) {
print("- $_");
}
close DATA;
my $additional_text = 'test2';
open DATA, '<', \$additional_text;
$additional_text .= "\ntest3";
while(<DATA>) {
print("- $_");
}
__DATA__
test
But you need to use the originally data contained in the DATA section before, I think.

Writing to DATA would change the script. This is a bad idea because DATA is open for read on the script itself. This program illustrates why this is a bad idea.
#!/usr/bin/perl
use strict;
use warnings;
seek DATA, 0, 0;
while( <DATA> ){
print;
}
__DATA__
1
2
3

Related

Perl: How to print a line to a file from within a while loop?

I have a very basic perl script which prints the next line in a text file after matching a search pattern.
#ARGV = <dom_boot.txt>;
while ( <> ) {
print scalar <> if /name=sacux445/;
}
Which works, However I would like to capture the output into a file for further use, rather than printing it to STDOUT.
I'm just learning (slowly) so attempted this:
my $fh;
my $dom_bootdev = 'dom_bootdev.txt';
open ($fh, '>', $dom_bootdev) or die "No such file";
#ARGV = <dom_boot.txt>;
while ( <> ) {
print $fh <> if /name=sacux445/;
}
close $fh;
But I get a syntax error.
syntax error at try.plx line 19, near "<>"
I'm struggling to figure this out. I'm guessing it's probably very simple so any help would be appreciated.
Thanks,
Luke.
The Perl parser sometimes has problems with indirect notation. The canonical way to handle it is to wrap the handle into a block:
print {$fh} <> if /name=sacux445/;
Are you sure you want to remove scalar?
Simply fetch the next line within the loop and print it, if the line matches the pattern:
while (<>) {
next unless /name=sacux445/;
my $next = <>;
last unless defined $next;
print $fh $next;
}
Note, you need to check the return value of the diamond operator.
Input
name=sacux445 (1)
aaa
name=sacux445 (2)
bbb
name=sacux445 (3)
Output
aaa
bbb
One should learn to use state machines for parsing data. A state machine allows the input read to be in only one place in the code. Rewriting the code as a state machine:
use strict;
use warnings;
use autodie; # See http://perldoc.perl.org/autodie.html
my $dom_bootdev = 'dom_bootdev.txt';
open ( my $fh, '>', $dom_bootdev ); # autodie handles open errors
use File::Glob qw( :bsd_glob ); # Perl's default glob() does not handle spaces in file names
#ARGV = glob( 'dom_boot.txt' );
my $print_next_line = 0;
while( my $line = <> ){
if( $line =~ /name=sacux445/ ){
$print_next_line = 1;
next;
}
if( $print_next_line ){
print {$fh} $line;
$print_next_line = 0;
next;
}
}
When To Us a State Machine
If the data is context-free, it can be parsed using only regular expressions.
If the data has a tree structure, it can be parsed using a simple state machine.
For more complex structures, a least one state machine with a push-down stack is required. The stack records the previous state so that the machine can return to it when the current state is finished.
The most complex data structure in use is XML. It requires a state machine for its syntax and a second one with a stack for its semantics.

Find a string in one file and append part of it to matching line in another file

I have two files
first:
8237764738;00:78:9E:EE:CA:6F;FTTH;MULTI
8237764738;2C:39:96:52:47:82;FTTH;MULTI
0415535921;E8:BE:81:86:F1:6F;FTTH;MULTI
0415535921;2C:39:96:5B:12:C6;EZ;SINGLE
...etc
second:
00:78:9E:EE:CA:6F;2013/10/28 13:37:50
E8:BE:81:86:F1:6F;2013/11/05 13:38:30
00:78:9E:EC:4A:B0;2013/10/28 13:59:16
2C:E4:12:AA:F7:95;2013/10/31 13:57:55
...etc
and I have to take mac_address (second position) from the first file and find it in the second one
and append (if match) to first file the date at end from the second file.
output:
8237764738;00:78:9E:EE:CA:6F;FTTH;MULTI;2013/10/28 13:37:50
0415535921;E8:BE:81:86:F1:6F;FTTH;MULTI;2013/11/05 13:38:30
I write a simple script to find the mac_address
but I don't know how to put in the script to add the date.
my %iptv;
my #result;
open IN, "/home/terminals.csv";
while (<IN>) {
chomp;
#wynik = split(/;/,$_);
$iptv{$result[1]} = $result[0];
}
close IN;
open IN, "/home/reboots.csv";
open OUT, ">/home/out.csv";
while (<IN>) {
chomp;
my ($mac, $date) = split(/;/,$_);
if (defined $iptv{$mac})
{
print OUT "$date,$mac \n";
}
}
close IN;
close OUT;
Assuming that the first file lists each MAC number once and that you want an output line for each time the MAC appears in the second file, then:
#!/usr/bin/env perl
use strict;
use warnings;
die "Usage: $0 terminals reboots\n" unless scalar(#ARGV) == 2;
my %iptv;
open my $in1, '<', $ARGV[0] or die "Failed to open file $ARGV[0] for reading";
while (<$in1>)
{
chomp;
my #result = split(/;/, $_); # Fix array used here
$iptv{$result[1]} = $_; # Fix what's stored here
}
close $in1;
open my $in2, '<', $ARGV[1] or die "Failed to open file $ARGV[1] for reading";
while (<$in2>)
{
chomp;
my ($mac, $date) = split(/;/,$_);
print "$iptv{$mac};$date\n" if (defined $iptv{$mac});
}
close $in2;
This uses two file names on the command line and writes to standard output; it is a more general purpose program than your original. It also gets me around the problem that I don't have a /home directory.
For your sample inputs, the output is:
8237764738;00:78:9E:EE:CA:6F;FTTH;MULTI;2013/10/28 13:37:50
0415535921;E8:BE:81:86:F1:6F;FTTH;MULTI;2013/11/05 13:38:30
You were actually fairly close to this, but were making some silly little mistakes.
In your code, you either aren't showing everything or you aren't using:
use strict;
use warnings;
Perl experts use both to make sure they don't make silly mistakes; beginners should do so too. It would have pointed out that #wynik was not declared with my and was assigned to but not used, for example. You could have meant to write #result = split...;. You were not saving the correct data; you were not writing out the information from the $iptv{$mac} that you needed to.

'merging' 2 files into a third using perl

I am reviewing for a test and I can't seem to get this example to code out right.
Problem: Write a perl script, called ileaf, which will linterleave the lines of a file with those of another file writing the result to a third file. If the files are a different length then the excess lines are written at the end.
A sample invocation:
ileaf file1 file2 outfile
This is what I have:
#!/usr/bin/perl -w
open(file1, "$ARGV[0]");
open(file2, "$ARGV[1]");
open(file3, ">$ARGV[2]");
while(($line1 = <file1>)||($line2 = <file2>)){
if($line1){
print $line1;
}
if($line2){
print $line2;
}
}
This sends the information to screen so I can immediately see the result. The final verson should "print file3 $line1;" I am getting all of file1 then all of file2 w/out and interleaving of the lines.
If I understand correctly, this is a function of the use of the "||" in my while loop. The while checks the first comparison and if it's true drops into the loop. Which will only check file1. Once file1 is false then the while checks file2 and again drops into the loop.
What can I do to interleave the lines?
You're not getting what you want from while(($line1 = <file1>)||($line2 = <file2>)){ because as long as ($line1 = <file1>) is true, ($line2 = <file2>) never happens.
Try something like this instead:
open my $file1, "<", $ARGV[0] or die;
open my $file2, "<", $ARGV[1] or die;
open my $file3, ">", $ARGV[2] or die;
while (my $f1 = readline ($file1)) {
print $file3 $f1; #line from file1
if (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
}
while (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
close $file1;
close $file2;
close $file3;
You'd think if they're teaching you Perl, they'd use the modern Perl syntax. Please don't take this personally. After all, this is how you were taught. However, you should know the new Perl programming style because it helps eliminates all sorts of programming mistakes, and makes your code easier to understand.
Use the pragmas use strict; and use warnings;. The warnings pragma replaces the need for the -w flag on the command line. It's actually more flexible and better. For example, I can turn off particular warnings when I know they'll be an issue. The use strict; pragma requires me to declare my variables with either a my or our. (NOTE: Don't declare Perl built in variables). 99% of the time, you'll use my. These variables are called lexically scoped, but you can think of them as true local variables. Lexically scoped variables don't have any value outside of their scope. For example, if you declare a variable using my inside a while loop, that variable will disappear once the loop exits.
Use the three parameter syntax for the open statement: In the example below, I use the three parameter syntax. This way, if a file is called >myfile, I'll be able to read from it.
**Use locally defined file handles. Note that I use my $file_1_fh instead of simply FILE_1_HANDLE. The old way, FILE_1_HANDLE is globally scoped, plus it's very difficult to pass the file handle to a function. Using lexically scoped file handles just works better.
Use or and and instead of || and &&: They're easier to understand, and their operator precedence is better. They're more likely not to cause problems.
Always check whether your open statement worked: You need to make sure your open statement actually opened a file. Or use the use autodie; pragma which will kill your program if the open statements fail (which is probably what you want to do anyway.
And, here's your program:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
open my $file_1, "<", shift;
open my $file_2, "<", shift;
open my $output_fh, ">", shift;
for (;;) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
no warnings qw(uninitialized);
print {$output_fh} $line_1 . $line_2;
use warnings;
}
In the above example, I read from both files even if they're empty. If there's nothing to read, then $line_1 or $line_2 is simply undefined. After I do my read, I check whether both $line_1 and $line_2 are undefined. If so, I use last to end my loop.
Because my file handle is a scalar variable, I like putting it in curly braces, so people know it's a file handle and not a variable I want to print out. I don't need it, but it improves clarity.
Notice the no warnings qw(uninitialized);. This turns off the uninitialized warning I'll get. I know that either $line_1 or $line_3 might be uninitialized, so I don't want the warning. I turn it back on right below my print statement because it is a valuable warning.
Here's another way to do that for loop:
while ( 1 ) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
print {$output_fh} $line_1 if defined $line_1;
print {$output_fh} $line_2 if defined $line_2;
}
The infinite loop is a while loop instead of a for loop. Some people don't like the C style of for loop and have banned it from their coding practices. Thus, if you have an infinite loop, you use while ( 1 ) {. To me, maybe because I came from a C background, for (;;) { means infinite loop, and while ( 1 ) { takes a few extra milliseconds to digest.
Also, I check whether $line_1 or $line_2 is defined before I print them out. I guess it's better than using no warning and warning, but I need two separate print statements instead of combining them into one.
Here's another option that uses List::MoreUtils's zip to interleave arrays and File::Slurp to read and write files:
use strict;
use warnings;
use List::MoreUtils qw/zip/;
use File::Slurp qw/read_file write_file/;
chomp( my #file1 = read_file shift );
chomp( my #file2 = read_file shift );
write_file shift, join "\n", grep defined $_, zip #file1, #file2;
Just noticed Tim A has a nice solution already posted. This solution is a bit wordier, but might illustrate exactly what is going on a bit more.
The method I went with reads all of the lines from both files into two arrays, then loops through them using a counter.
#!/usr/bin/perl -w
use strict;
open(IN1, "<", $ARGV[0]);
open(IN2, "<", $ARGV[1]);
my #file1_lines;
my #file2_lines;
while (<IN1>) {
push (#file1_lines, $_);
}
close IN1;
while (<IN2>) {
push (#file2_lines, $_);
}
close IN2;
my $file1_items = #file1_lines;
my $file2_items = #file2_lines;
open(OUT, ">", $ARGV[2]);
my $i = 0;
while (($i < $file1_items) || ($i < $file2_items)) {
if (defined($file1_lines[$i])) {
print OUT $file1_lines[$i];
}
if (defined($file2_lines[$i])) {
print OUT $file2_lines[$i];
}
$i++
}
close OUT;

Parsing the large files in Perl

I need to compare the big file(2GB) contains 22 million lines with the another file. its taking more time to process it while using Tie::File.so i have done it through 'while' but problem remains. see my code below...
use strict;
use Tie::File;
# use warnings;
my #arr;
# tie #arr, 'Tie::File', 'title_Nov19.txt';
# open(IT,"<title_Nov19.txt");
# my #arr=<IT>;
# close(IT);
open(RE,">>res.txt");
open(IN,"<input.txt");
while(my $data=<IN>){
chomp($data);
print"$data\n";
my $occ=0;
open(IT,"<title_Nov19.txt");
while(my $line2=<IT>){
my $line=$line2;
chomp($line);
if($line=~m/\b$data\b/is){
$occ++;
}
}
print RE"$data\t$occ\n";
}
close(IT);
close(IN);
close(RE);
so help me to reduce it...
Lots of things wrong with this.
Asides from the usual (lack of use strict, use warnings, use of 2-argument open(), not checking open() result, use of global filehandles), the specific problem in your case is that you are opening/reading/closing the second file once for every single line of the first. This is going to be very slow.
I suggest you open the file title_Nov19.txt once, read all the lines into an array or hash or something, then close it; and then you can open the first file, input.txt and walk along that once, comparing to things in the array so you don't have to reopen that second file all the time.
Futher I suggest you read some basic articles on style/etc.. as your question is likely to gain more attention if it's actually written in vaguely modern standards.
I tried to build a small example script with a better structure but I have to say, man, your problem description is really very unclear. It's important to not read the whole comparison file each time as #LeoNerd explained in his answer. Then I use a hash to keep track of the match count:
#!/usr/bin/env perl
use strict;
use warnings;
# cache all lines of the comparison file
open my $comp_file, '<', 'input.txt' or die "input.txt: $!\n";
chomp (my #comparison = <$comp_file>);
close $comp_file;
# prepare comparison
open my $input, '<', 'title_Nov19.txt' or die "title_Nov19.txt: $!\n";
my %count = ();
# compare each line
while (my $title = <$input>) {
chomp $title;
# iterate comparison strings
foreach my $comp (#comparison) {
$count{$comp}++ if $title =~ /\b$comp\b/i;
}
}
# done
close $input;
# output (sorted by count)
open my $output, '>>', 'res.txt' or die "res.txt: $!\n";
foreach my $comp (#comparison) {
print $output "$comp\t$count{$comp}\n";
}
close $output;
Just to get you started... If someone wants to further work on this: these were my test files:
title_Nov19.txt
This is the foo title
Wow, we have bar too
Nothing special here but foo
OMG, the last title! And Foo again!
input.txt
foo
bar
And the result of the program was written to res.txt:
foo 3
bar 1
Here's another option using memowe's (thank you) data:
use strict;
use warnings;
use File::Slurp qw/read_file write_file/;
my %count;
my $regex = join '|', map { chomp; $_ = "\Q$_\E" } read_file 'input.txt';
for ( read_file 'title_Nov19.txt' ) {
my %seen;
!$seen{ lc $1 }++ and $count{ lc $1 }++ while /\b($regex)\b/ig;
}
write_file 'res.txt', map "$_\t$count{$_}\n",
sort { $count{$b} <=> $count{$a} } keys %count;
Numerically-sorted output to res.txt:
foo 3
bar 1
An alternation regex which quotes meta characters (\Q$_\E) is built and used, so only one pass against the large file's lines is needed. The hash %seen is used to insure that the input words are only counted once per line.
Hope this helps!
Try this:
grep -i -c -w -f input.txt title_Nov19.txt > res.txt

Why is this printing twice in perl?

So I'm hoping somebody can just explain to my why when I run the following code, it prints ".link/output" at both the beginning and end of the line. I was trying to get it to print only at the end of the line. Any thoughts?
#!/usr/local/bin/perl
use warnings;
use strict;
my $logfiles = $ARGV[0]; #file containing the list of all the log file names
my #logf = ();
my $i;
open (F2, "<", $logfiles);
while(<F2>){
#logf = $_;
foreach $i(#logf){
print $_.".link/output";
}
}
close F2;
So for example, if the file I'm reading in is:
cat
dog
I want to see:
cat.link/output
dog.link/output
But isntead I am getting:
.link/outputcat.link/output
.link/outputdog.link/output
Could anybody please explain to me why this is happening and/or how to fix it? Thank you.
you have an empty element at the beginning of your list. simply shift #logf
I don't see what #logf does. Couldn't you just do this:
#!/usr/bin/env perl
use warnings;
use strict;
my $logfiles = $ARGV[0]; #file containing the list of all the log file names
#open(my $f2, "<", $logfiles);
# FOR TESTING, use above in your code
my $f2 = \*DATA;
# ===========
while(<$f2>){
chomp;
print "$_.link/output\n";
}
__DATA__
cat
dog