I'm trying to read a binary file with the following code:
open(F, "<$file") || die "Can't read $file: $!\n";
binmode(F);
$data = <F>;
close F;
open (D,">debug.txt");
binmode(D);
print D $data;
close D;
The input file is 16M; the debug.txt is only about 400k. When I look at debug.txt in emacs, the last two chars are ^A^C (SOH and ETX chars, according to notepad++) although that same pattern is present in the debug.txt. The next line in the file does have a ^O (SI) char, and I think that's the first occurrence of that particular character.
How can I read in this entire file?
If you really want to read the whole file at once, use slurp mode. Slurp mode can be turned on by setting $/ (which is the input record separator) to undef. This is best done in a separate block so you don't mess up $/ for other code.
my $data;
{
open my $input_handle, '<', $file or die "Cannot open $file for reading: $!\n";
binmode $input_handle;
local $/;
$data = <$input_handle>;
close $input_handle;
}
open $output_handle, '>', 'debug.txt' or die "Cannot open debug.txt for writing: $!\n";
binmode $output_handle;
print {$output_handle} $data;
close $output_handle;
Use my $data for a lexical and our $data for a global variable.
TIMTOWTDI.
File::Slurp is the shortest way to express what you want to achieve. It also has built-in error checking.
use File::Slurp qw(read_file write_file);
my $data = read_file($file, binmode => ':raw');
write_file('debug.txt', {binmode => ':raw'}, $data);
The IO::File API solves the global variable $/ problem in a more elegant fashion.
use IO::File qw();
my $data;
{
my $input_handle = IO::File->new($file, 'r') or die "could not open $file for reading: $!";
$input_handle->binmode;
$input_handle->input_record_separator(undef);
$data = $input_handle->getline;
}
{
my $output_handle = IO::File->new('debug.txt', 'w') or die "could not open debug.txt for writing: $!";
$output_handle->binmode;
$output_handle->print($data);
}
I don't think this is about using slurp mode or not, but about correctly handling binary files.
instead of
$data = <F>;
you should do
read(F, $buffer, 1024);
This will only read 1024 bytes, so you have to increase the buffer or read the whole file part by part using a loop.
Related
I'm incredibly new to Perl, and never have been a phenomenal programmer. I have some successful BVA routines for controlling microprocessor functions, but never anything embedded, or multi-facted. Anyway, my question today is about a boggle I cannot get over when trying to figure out how to remove duplicate lines of text from a text file I created.
The file could have several of the same lines of txt in it, not sequentially placed, which is problematic as I'm practically comparing the file to itself, line by line. So, if the first and third lines are the same, I'll write the first line to a new file, not the third. But when I compare the third line, I'll write it again since the first line is "forgotten" by my current code. I'm sure there's a simple way to do this, but I have issue making things simple in code. Here's the code:
my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";
open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
while (<FILE>) {
print "$_";
print "$searchString\n";
if (($_ =~ /$searchString/) and ($count == "0")) {
++ $count;
print FILE2 $_;
} else {
print "This isn't working\n";
}
}
close (FILE);
close (FILE2);
Excuse the way filehandles and scalars do not match. It is a work in progress... :)
The secret of checking for uniqueness, is to store the lines you have seen in a hash and only print lines that don't exist in the hash.
Updating your code slightly to use more modern practices (three-arg open(), lexical filehandles) we get this:
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
open my $in_fh, '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";
my %seen;
while (<$in_fh>) {
print $out_fh unless $seen{$_}++;
}
But I would write this as a Unix filter. Read from STDIN and write to STDOUT. That way, your program is more flexible. The whole code becomes:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
Assuming this is in a file called my_filter, you would call it as:
$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt
Update: But this doesn't use your $searchString variable. It's not clear to me what that's for.
If your file is not very large, you can store each line readed from the input file as a key in a hash variable. And then, print the hash keys (ordered). Something like that:
my %lines = ();
my $order = 1;
open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
$lines {$line} = $order++;
}
close $fhi;
open my $fho, ">", $file3 or die "Cannot open file: $!";
#Sort the keys, only if needed
my #ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( #ordered_lines ) {
print $fho $key;
}
close $fho;
You need two things to do that:
a hash to keep track of all the lines you have seen
a loop reading the input file
This is a simple implementation, called with an input filename and an output filename.
use strict;
use warnings;
open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";
my %seen;
while (my $line = <$fh_in>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $fh_out $line;
}
# remember this line
$seen{$line}++;
}
To test it, I've included it with the DATA handle as well.
use strict;
use warnings;
my %seen;
while (my $line = <DATA>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $line;
}
# remember this line
$seen{$line}++;
}
__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world
This will print
foo
bar
asdf
asdfg
hello world
Keep in mind that the memory consumption will grow with the file size. It should be fine as long as the text file is smaller than your RAM. Perl's hash memory consumption grows a faster than linear, but your data structure is very flat.
I dont know what exactly is wrong but everytime I execute this script i keep getting "No such file or directory at ./reprioritize line 35, line 1".
here is my script that is having an issue:
my $newresult = "home/user/newresults_percengtage_vs_pn";
sub pushval
{
my #fields = #_;
open OUTFILE, ">$newresult/fixedhomdata_030716-031316.csv" or die $!; #line 35
while(<OUTFILE>)
{
if($fields[5] >= 13)
{
print OUTFILE "$fields[0]", "$fields[1]","$fields[2]","$fields[3]","$fields[4]","$fields[5]", "0";
}
elsif($fields[5] < 13 && $fields[5] > 1)
{
print OUTFILE "$fields[0]", "$fields[1]","$fields[2]","$fields[3]","$fields[4]","$fields[5]", "1";
}
elsif($fields[5] <= 1)
{
print OUTFILE "$fields[0]", "$fields[1]","$fields[2]","$fields[3]","$fields[4]","$fields[5]", "2";
}
}
close (OUTFILE);
You may want to have a look at Perl's tutorial on opening files.
I simplify it a bit. There are basically three modes: open for reading, open for writing, and open for appending.
Reading
Opening for reading is indicated by either a < preceeding the filename or on its own, as a separate parameter to the open() call (preferred), i.e.:
my $fh = undef;
my $filename = 'fixedhomdata_030716-031316.csv';
open($fh, "<$filename") or die $!; # bad
open($fh, '<', $filename) or die $!; # good
while( my $line = <$fh> ) { # read one line from filehandle $fh
...
}
close($fh);
When you open the file this way, it must exist, else you get your error (No such file or directory at ...).
Writing
Opening for writing is indicated by a >, i.e.:
open($fh, ">$filename") or die $!; # bad
open($fh, '>', $filename) or die $!; # good
print $fh "some text\n"; # write to filehandle $fh
print $fh "more text\n"; # write to filehandle $fh
...
close($fh);
When you open the file this way, it is truncated (cleared) and overwritten if it existed. If it did not exist, it will get created.
Appending
Opening for appending is indicated by a >>, i.e.:
open($fh, ">>$filename") or die $!; # bad
open($fh, '>>', $filename) or die $!; # good
print $fh "some text\n"; # append to filehandle $fh
print $fh "more text\n"; # append to filehandle $fh
...
close($fh);
When you open the file this way and it existed, then the new lines will be appended to the file, i.e. nothing is lost. If the file did not
exist, it will be created (as if only > had been given).
Your error message doesn't match your code. You opened the file for writing (>) but got doesn't exist, which indicates that you actually opened it for reading.
This might have happened because you use OUTPUT as a filehandle instead of a scoped variable, e.g. $fh. OUTPUT is a global filehandle, i.e. if you open a file this way, then all of your code (no matter which function in) can use OUTPUT. Don't do that. From the docs:
An older style is to use a bareword as the filehandle, as
open(FH, "<", "input.txt")
or die "cannot open < input.txt: $!";
Then you can use FH as the filehandle, in close FH and and so on.
Note that it's a global variable, so this form is not recommended
in new code.
To summarize:
use scoped variables as filehandles ($fh instead of OUTPUT)
open your file in the right mode (> vs. <)
always use three-argument open (open($fh, $mode, $filename) vs. open($fh, "$mode$filename")
The comments explain that your two issues with the snippet are
The missing leading '/' in the $newresult declaration
You are treating your filehandle as both a read and a write.
The first is easy to fix. The second is not as easy to fix properly with knowing the rest of the script. I am making an assumption that pushval is called once per record in a Array of Arrays(?). This snippet below should get the result you want, but there is likely a better way of doing it.
my $newresult = "/home/user/newresults_percengtage_vs_pn";
sub pushval{
my #fields = #_;
open OUTFILE, ">>$newresult/fixedhomdata_030716-031316.csv" or die $!; #line 35
print OUTFILE "$fields[0]", "$fields[1]","$fields[2]","$fields[3]","$fields[4]","$fields[5]"
if($fields[5] >= 13) {
print OUTFILE "0\n";
} elsif($fields[5] < 13 && $fields[5] > 1) {
print OUTFILE "1\n";
} elsif($fields[5] <= 1) {
print OUTFILE "2\n";
}
close (OUTFILE);
I am trying to read a newline-delimited file into an array in Perl. I do NOT want the newlines to be part of the array, because the elements are filenames to read later. That is, each element should be "foo" and not "foo\n". I have done this successfully in the past using the methods advocated in Stack Overflow question Read a file into an array using Perl and Newline Delimited Input.
My code is:
open(IN, "< test") or die ("Couldn't open");
#arr = <IN>;
print("$arr[0] $arr[1]")
And my file 'test' is:
a
b
c
d
e
My expected output would be:
a b
My actual output is:
a
b
I really don't see what I'm doing wrong. How do I read these files into arrays?
Here is how I generically read from files.
open (my $in, "<", "test") or die $!;
my #arr;
while (my $line = <$in>) {
chomp $line;
push #arr, $line;
}
close ($in);
chomp will remove newlines from the line read. You should also use the three-argument version of open.
Put the file path in its own variable so that it can be easily
changed.
Use the 3-argument open.
Test all opens, prints, and closes for success, and if not, print the error and the file name.
Try:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
# put file path in a variable so it can be easily changed
my $file = 'test';
open my $in_fh, '<', $file or die "could not open $file: $OS_ERROR\n";
chomp( my #arr = <$in_fh> );
close $in_fh or die "could not close $file: $OS_ERROR\n";
print "#arr[ 0 .. 1 ]\n";
A less verbose option is to use File::Slurp::read_file
my $array_ref = read_file 'test', chomp => 1, array_ref => 1;
if, and only if, you need to save the list of file names anyway.
Otherwise,
my $filename = 'test';
open (my $fh, "<", $filename) or die "Cannot open '$filename': $!";
while (my $next_file = <$fh>) {
chomp $next_file;
do_something($next_file);
}
close ($fh);
would save memory by not having to keep the list of files around.
Also, you might be better off using $next_file =~ s/\s+\z// rather than chomp unless your use case really requires allowing trailing whitespace in file names.
this
is just
an example.
Lets assume the above is out.txt. I want to read out.txt and write onto the same file.
<Hi >
<this>
<is just>
<an example.>
Modified out.txt.
I want to add tags in the beginning and end of some lines.
As I will be reading the file several times I cannot keep writing it onto a different file each time.
EDIT 1
I tried using "+<" but its giving an output like this :
Hi
this
is just
an example.
<Hi >
<this>
<is just>
<an example.>
**out.txt**
EDIT 2
Code for reference :
open(my $fh, "+<", "out.txt");# or die "cannot open < C:\Users\daanishs\workspace\CCoverage\out.txt: $!";
while(<$fh>)
{
$s1 = "<";
$s2 = $_;
$s3 = ">";
$str = $s1 . $s2 . $s3;
print $fh "$str";
}
The very idea of what you are trying to do is flawed. The file starts as
H i / t h i s / ...
If you were to change it in place, it would look as follows after processing the first line:
< H i > / i s / ...
Notice how you clobbered "th"? You need to make a copy of the file, modify the copy, the replace the original with the copy.
The simplest way is to make this copy in memory.
my $file;
{ # Read the file
open(my $fh, '<', $qfn)
or die "Can't open \"$qfn\": $!\n";
local $/;
$file = <$fh>;
}
# Change the file
$file =~ s/^(.*)\n/<$1>\n/mg;
{ # Save the changes
open(my $fh, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
print($fh $file);
}
If you wanted to use the disk instead:
rename($qfn, "$qfn.old")
or die "Can't rename \"$qfn\": $!\n";
open(my $fh_in, '<', "$qfn.old")
or die "Can't open \"$qfn\": $!\n";
open(my $fh_out, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
while (<$fh_in>) {
chomp;
$_ = "<$_>";
print($fh_out "$_\n");
}
unlink("$qfn.old");
Using a trick, the above can be simplified to
local #ARGV = $qfn;
local $^I = '';
while (<>) {
chomp;
$_ = "<$_>";
print(ARGV "$_\n");
}
Or as a one-liner:
perl -i -pe'$_ = "<$_>"' file
Read contents in memory and then prepare required string as you write to your file. (SEEK_SET to zero't byte is required.
#!/usr/bin/perl
open(INFILE, "+<in.txt");
#a=<INFILE>;
seek INFILE, 0, SEEK_SET ;
foreach $i(#a)
{
chomp $i;
print INFILE "<".$i.">"."\n";
}
If you are worried about amount of data being read in memory, you will have to create a temporary result file and finally copy the result file to original file.
You could use Tie::File for easy random access to the lines in your file:
use Tie::File;
use strict;
use warnings;
my $filename = "out.txt";
my #array;
tie #array, 'Tie::File', $filename or die "can't tie file \"$filename\": $!";
for my $line (#array) {
$line = "<$line>";
# or $line =~ s/^(.*)$/<$1>/g; # -- whatever modifications you need to do
}
untie #array;
Disclaimer: Of course, this option is only viable if the file is not shared with other processes. Otherwise you could use flock to prevent shared access while you modify the file.
Disclaimer-2 (thanks to ikegami): Don't use this solution if you have to edit big files and are concerned about performance. Most of the performance loss is mitigated for small files (less than 2MB, though this is configurable using the memory arg).
One option is to open the file twice: Open it once read-only, read the data, close it, process it, open it again read-write (no append), write the data, and close it. This is good practice because it minimizes the time you have the file open, in case someone else needs it.
If you only want to open it once, then you can use the +< file type - just use the seek call between reading and writing to return to the beginning of the file. Otherwise, you finish reading, are at the end of the file, and start writing there, which is why you get the behavior you're seeing.
Need to specify
use Fcntl qw(SEEK_SET);
in order to use
seek INFILE, 0, SEEK_SET;
Thanks user1703205 for the example.
I need to create Perl code which allows counting paragraphs in text files. I tried this and doesn't work:
open(READFILE, "<$filename")
or die "could not open file \"$filename\":$!";
$paragraphs = 0;
my($c);
while($c = getc(READFILE))
{
if($C ne"\n")
{
$paragraphs++;
}
}
close(READFILE);
print("Paragraphs: $paragraphs\n");
See perlfaq5: How can I read in a file by paragraphs?
local $/ = ''; # enable paragraph mode
open my $fh, '<', $file or die "can't open $file: $!";
1 while <$fh>;
my $count = $.;
Have a look at the Beginning Perl book at http://www.perl.org/books/beginning-perl/. In particular, the following chapter will help you: http://docs.google.com/viewer?url=http%3A%2F%2Fblob.perl.org%2Fbooks%2Fbeginning-perl%2F3145_Chap06.pdf
If you're determining paragraphs by a double-newline ("\n\n") then this will do it:
open READFILE, "<$filename"
or die "cannot open file `$filename' for reading: $!";
my #paragraphs;
{local $/; #paragraphs = split "\n\n", <READFILE>} # slurp-split
my $num_paragraphs = scalar #paragraphs;
__END__
Otherwise, just change the "\n\n" in the code to use your own paragraph separator. It may even be a good idea to use the pattern \n{2,}, just in case someone went crazy on the enter key.
If you are worried about memory consumption, then you may want to do something like this (sorry for the hard-to-read code):
my $num_paragraphs;
{local $/; $num_paragraphs = #{[ <READFILE> =~ /\n\n/g ]} + 1}
Although, if you want to keep using your own code, you can change if($C ne"\n") to if($c eq "\n").