Read and write a textfile with Perl - perl

I am trying to open and read a textfile and then write the content of this file line per line into an HTML-File. So far, I've come up with this:
use strict;
use locale;
my (#datei, $i);
open (FHIN,"HSS_D.txt") || die "couldn't open file $!";
#datei= <in>;
close FHIN;
open (FHOUT, ">pz2.html");
print FHOUT "<HTML>\n";
print FHOUT "<HEAD>\n";
print FHOUT "<TITLE>pz 2</TITLE>\n";
print FHOUT '<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">';
print FHOUT "\n</HEAD>\n";
print FHOUT "<BODY>\n";
for ($i = 0; $i < #datei; $i++) {
print FHOUT "<p>$datei[$i]</p>\n";
}
print FHOUT "</BODY></html>\n";
close (FHOUT);
However, I get a compilation error every time and I can't figure out what's wrong. Thanks for your help!

If you had enabled warnings via use warnings or use warnings qw(all)—which you should always do—you would have seen something like this:
Name "main::in" used only once: possible typo at foo.pl line 6.
That is, of course, this line:
#datei= <in>;
The root cause of the problem is that you opened a filehandle named FHIN, but you tried to read from a filehandle named in. However, the whole operation would be better written using lexical filehandles and the three-argument form of open, which is considered a best practice:
open(my $fh, '<', 'HSS_D.txt') or die "couldn't open file $!";
As an aside, I've voted to close this question as off-topic because it is about a problem that was caused by a simple typographical error.

Problem in your script
You are storing incorrect handler in your array that is your problem. #datei = <in> it should be
#datei = <FHIN>;
Some etc things you should know
1) Always put use warnings and use strict on a top of the program.
2) Don't store the whole file in an array instead you have to process the file line by line.
while (my $line = <FHIN>)
{
Do your stuff here.
3) use three arguments for file handle. Like as follow
open my $fh,'<', "filename"
4) to access the each element from an array​ you can use Perl foreach instead of C style looping
for my $elemnts(#arrray)
{
If you have suppose want to iterate loop through its index use the following format.
for my $index(0..$#arrray)
{
Above .. means range operator$# will give the last index value

Related

Out of memory when serving a very big binary file over HTTP

The code below is the original code of a Perl CGI script we are using. Even for very big files it seems to be working, but not for really huge files.
The current code is :
$files_location = $c->{target_dir}.'/'.$ID;
open(DLFILE, "<$files_location") || Error('open', 'file');
#fileholder = <DLFILE>;
close (DLFILE) || Error ('close', 'file');
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$name\n\n";
print #fileholder;
binmode $DLFILE;
If I understand the code correctly, it is loading the whole file in memory before "printing" it. Of course I suppose it would be a lot better to load and display it by chunks ? But after having read many forums and tutorials I am still not sure how to do it best, with standard Perl libraries...
Last question, why is "binmode" specified at the end ?
Thanks a lot for any hint or advice,
I have no idea what binmode $DLFILE is for. $DLFILE is nothing to do with the file handle DLFILE, and it's a bit late to set the binmode of the file now that it has been read to the end. It's probably just a mistake
You can use this instead. It uses modern Perl best practices and reads and sends the file in 8K chunks
The file name seems to be made from $ID so I'm not sure that $name would be correct, but I can't tell
Make sure to keep the braces, as the block makes Perl restore the old value of $/ and close the open file handle
my $files_location = "$c->{target_dir}/$ID";
{
print "Content-Type: application/x-download\n";
print "Content-Disposition: attachment; filename=$name\n\n";
open my $fh, '<:raw', $files_location or Error('open', "file $files_location");
local $/ = \( 8 * 1024 );
print while <$fh>;
}
You're pulling the entire file at once into memory. Best to loop over the file line-by-line, which eliminates this problem.
Note also that I've modified the code to use the proper 3-arg open, and to use a lexical file handle instead of a global bareword one.
open my $fh, '<', $files_location or die $!;
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$name\n\n";
while (my $line = <$fh>){
print $line;
}
The binmode call appears to be useless in the context of what you've shown here, as $DLFILE doesn't appear to be a valid, in-use variable (add use strict; and use warnings; at the top of your script...)

PERL Net::DNS output to file

Completely new to Perl (in the process of learning) and need some help. Here is some code that I found which prints results to the screen great, but I want it printed to a file. How can I do this? When I open a file and send output to it, I get garbage data.
Here is the code:
use Net::DNS;
my $res = Net::DNS::Resolver->new;
$res->nameservers("ns.example.com");
my #zone = $res->axfr("example.com");
foreach $rr (#zone) {
$rr->print;
}
When I add:
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
.....
$rr -> $fh; #I get garbage.
Your #zone array contains a list of Net::DNS::RR objects, whose print method stringifies the object and prints it to the currently selected file handle
To print the same thing to a different file handle you will have to stringify the object yourself
This should work
open my $fh, '>', $filename or die "Could not open file '$filename': $!";
print $fh $_->string, "\n" for #zone;
When you're learning a new language, making random changes to code in the hope that they will do what you want is not a good idea. A far better approach is to read the documentation for the libraries and functions that you are using.
The original code uses $rr->print. The documentation for Net::DNS::Resolver says:
print
$resolver->print;
Prints the resolver state on the standard output.
The print() method there is named after the standard Perl print function which we can use to print data to any filehandle. There's a Net::DNS::Resolver method called string which is documented like this:
string
print $resolver->string;
Returns a string representation of the resolver state.
So it looks like $rr->print is equivalent to print $rr->string. And it's simple enough to change that to print to your new filehandle.
print $fh $rr->string;
p.s. And, by the way, it's "Perl", not "PERL".

Read specific part of a filehandle in PERL

Hi I have a large file I would like to read. To save resource I want to read it slowly, one line at a time. However I'm wondering if there is a way to read specific line from a filehandle instead. For example, say I have a test.txt file containing a billion numbers starting with 1. Each number is on a separate line.
1
2
3
...
so now what I currently do to get say line 10 is this,
open (FILE, "< test.txt") or die "$!";
#reads = <FILE>
print $reads[9];
however, is there a way I can access certain part of the FILE without reading everything into a big array, say I want line 10 instead.
something like FILE->[9]
-
thanks for helping in advance!
Two methods, do line by line processing your skip to the desired line. You can use the Input Line Number variable, $. to help:
use strict;
use warnings;
use autodie;
my $line10 = sub {
open my $fh, '<', 'text.txt';
while (<$fh>) {
return $_ if $. == 10;
}
}->();
Alternatively, you could use Tie::File as you already noticed. However, while that interface is very convenient, and I'd recommend it's use, it also will loop through the file behind the scenes.
use strict;
use warnings;
use autodie;
use Tie::File;
tie my #array, 'Tie::File', 'text.txt' or die "Can't open text.txt: $!";
print $array[9] // die "Line 10 does not exist";
For memory purposes large files should be read in using a while loop which will read the file line by line:
open my $fh, '<', 'somefile.txt';
while ( my $line = <$fh> ) {
//read in text line by line
}
Either way to get at that line number you are going to have to read the whole file in. Now I would recommend using the while loop and a counter to print / save the line you are looking for.

How to delete common lines from one of 2 files in Perl?

I have 2 files, a small one and a big one. The small file is a subset of the big one.
For instance:
Small file:
solar:1000
alexey:2000
Big File:
andrey:1001
solar:1000
alexander:1003
alexey:2000
I want to delete all the lines from Big.txt which are also present in Small.txt. In other words, I want to delete the lines in Big file which are common to the small File.
So, I wrote a Perl Script as shown below:
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
open(BIG, "<$big") || die("Couldn't read from the file: $big\n");
my #contents = <BIG>;
close (BIG);
open(SMALL, "<$small") || die ("Couldn't read from the file: $small\n");
while(<SMALL>)
{
chomp $_;
#contents = grep !/^\Q$_/, #contents;
}
close(SMALL);
open(OUTPUT, ">>$output") || die ("Couldn't open the file: $output\n");
print OUTPUT #contents;
close(OUTPUT);
However, this Perl Script does not delete the lines in Big.txt which are common to Small.txt
In this script, I first open the big file stream and copy the entire contents into the array, #contents. Then, I iterate over each entry in the small file and check for its presence in the bigger file. I filter the line from Big File and save it back into the array.
I am not sure why this script does not work? Thanks
Your script does NOT work because grep uses $_ and takes over (for the duration of grep) the old value of your $_ from the loop (e.g. the variable $_ you use in the regex is NOT the variable used for storing the loop value in the while block - they are named the same, but have different scopes).
Use a named variable instead (as a rule, NEVER use $_ for any code longer than 1 line, precisely to avoid this type of bug):
while (my $line=<SMALL>) {
chomp $line;
#contents = grep !/^\Q$line/, #contents;
}
However, as Oleg pointed out, a more efficient solution is to read small file's lines into a hash and then process the big file ONCE, checking hash contents (I also improved the style a bit - feel free to study and use in the future, using lexical filehandle variables, 3-arg form of open and IO error printing via $!):
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
use File::Slurp;
my #small = read_file($small);
my %small = map { ($_ => 1) } #small;
open(my $big, "<", $big) or die "Can not read $big: Error: $!\n";
open(my $output, ">", $output) or die "Can not write to $output: Error: $!\n";
while(my $line=<$big>) {
chomp $line;
next if $small{$line}; # Skip common
print $output "$line\n";
}
close($big);
close($output);
It doesn't work for several reasons. First, lines in #content still have their newlines in. And second, when you grep, $_ in !/^\Q$_/ is set not to the last line from small file, but for each element of #contents array, effectively making it: for each element in list return everything except this element, leaving you with empty list at the end.
This isn't really the good way to do it - you're reading big file and then trying to reprocess it several times. First, read a small file and put every line in hash. Then read big file inside while(<>) loop, so you won't waste your memory reading it entirely. On each line, check if key exists in previously populated hash and if it does - go to next iteration, otherwise print the line.
Here is a small and efficient solution to your problem:
#!/usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
my %diffx;
open my $bfh, "<", $big or die "Couldn't read from the file $big: $!\n";
# load big file's contents
my #big = <$bfh>;
chomp #big;
# build a lookup table, a structured table for big file
#diffx{#big} = ();
close $bfh or die "$!\n";
open my $sfh, "<", $small or die "Couldn't read from the file $small: $!\n";
my #small = <$sfh>;
chomp #small;
# delete the elements that exist in small file from the lookup table
delete #diffx{#small};
close $sfh;
# print join "\n", keys %diffx;
open my $ofh, ">", $output or die "Couldn't open the file $output for writing: $!\n";
# what is left is unique lines from big file
print $ofh join "\n", keys %diffx;
close $ofh;
__END__
P.S. I learned this trick and many others from Perl Cookbook, 2nd Edition. Thanks

How to open/join more than one file (depending on user input) and then use 2 files simultaneously

EDIT: Sorry for the misunderstanding, I have edited a few things, to hopefully actually request what I want.
I was wondering if there was a way to open/join two or more files to run the rest of the program on.
For example, my directory has these files:
taggedchpt1_1.txt, parsedchpt1_1.txt, taggedchpt1_2.txt, parsedchpt1_2.txt etc...
The program must call a tagged and parsed simultaneously. I want to run the program on both of chpt1_1 and chpt1_2, preferably joined together in one .txt file, unless it would be very slow to do so. For instance run what would be accomplished having two files:
taggedchpt1_1_and_chpt1_2 and parsedchpt1_1_and_chpt1_2
Can this be done through Perl? Or should I just combine the text files myself(or automate that process, making chpt1.txt which would include chpt1_1, chpt1_2, chpt1_3 etc...)
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
print "Please type in the chapter and section NUMBERS in the form chp#_sec#:\n"; ##So the user inputs 31_3, for example
chomp (my $chapter_and_section = "chpt".<>);
print "Please type in the search word:\n";
chomp (my $search_key = <>);
open(my $tag_corpus, '<', "tagged${chapter_and_section}.txt") or die $!;
open(my $parse_corpus, '<', "parsed${chapter_and_section}.txt") or die $!;
For the rest of the program to work, I need to be able to have:
my #sentences = <$tag_corpus>; ##right now this is one file, I want to make it more
my #typeddependencies = <$parse_corpus>; ##same as above
EDIT2: Really sorry about the misunderstanding. In the program, after the steps shown, I do 2 for loops. Reading through the lines of the tagged and parsed.
What I want is to accomplish this with more files from the same directory, without having to re-input the next files. (ie. I can run taggedchpt31_1.txt and parsedchpt31_1.txt...... I want to run taggedchpt31 and parsedchpt31 - which includes ~chpt31_1, ~chpt31_2, etc...)
Ultimately, it would be best if I joined all the tagged files and all the parsed files that have a common chapter (in the end still requiring only two files I want to run) but not have to save the joined file to the directory... Now that I put it into words, I think I should just save files that include all the sections.
Sorry and Thanks for all your time! Look at FMc's breakdown of my question for more help.
You could iterate over the file names, opening and reading each one in turn. Or you could produce an iterator that knows how to read lines from sequence of files.
sub files_reader {
# Takes a list of file names and returns a closure that
# will yield lines from those files.
my #handles = map { open(my $h, '<', $_) or die $!; $h } #_;
return sub {
shift #handles while #handles and eof $handles[0];
return unless #handles;
return readline $handles[0];
}
}
my $reader = files_reader('foo.txt', 'bar.txt', 'quux.txt');
while (my $line = $reader->()) {
print $line;
}
Or you could use Perl's built-in iterator that can do the same thing:
local #ARGV = ('foo.txt', 'bar.txt', 'quux.txt');
while (my $line = <>) {
print $line;
}
Edit in response to follow-up questions:
Perhaps it would help to break your problem down into smaller sub-tasks. As I understand it, you have three steps.
Step 1 is to get some input from the user -- perhaps a directory name, or maybe a couple of file name patterns (taggedchpt and parsedchpt).
Step 2 is for the program to find all of the relevant file names. For this task, glob() or readdir()might be useful. There are many questions on StackOverflow related to such issues. You'll end up with two lists of file names, one for the tagged files and one for the parsed files.
Step 3 is to process the lines across all of the files in each of the two sets. Most of the answers you have received, including mine, will help you with this step.
No one has mentioned the #ARGV hack yet? Ok, here it is.
{
local #ARGV = ('taggedchpt1_1.txt', 'parsedchpt1_1.txt', 'taggedchpt1_2.txt',
'parsedchpt1_2.txt');
while (<ARGV>) {
s/THIS/THAT/;
print FH $_;
}
}
ARGV is a special filehandle that iterates through all the filenames in #ARGV, closing a file and opening the next one as necessary. Normally #ARGV contains the command-line arguments that you passed to perl, but you can set it to anything you want.
You're almost there... this is a bit more efficient than discrete opens on each file...
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
print "Please type in the chapter and section NUMBERS in the for chp#_sec#:\n";
chomp (my $chapter_and_section = "chpt".<>);
print "Please type in the search word:\n";
chomp (my $search_key = <>);
open(FH, '>output.txt') or die $!; # Open an output file for writing
foreach ("tagged${chapter_and_section}.txt", "parsed${chapter_and_section}.txt") {
open FILE, "<$_" or die $!; # Read a filename (from the array)
foreach (<FILE>) {
$_ =~ s/THIS/THAT/g; # Regex replace each line in the open file (use
# whatever you like instead of "THIS" &
# "THAT"
print FH $_; # Write to the output file
}
}