Read and Write to a file in perl - perl

this
is just
an example.
Lets assume the above is out.txt. I want to read out.txt and write onto the same file.
<Hi >
<this>
<is just>
<an example.>
Modified out.txt.
I want to add tags in the beginning and end of some lines.
As I will be reading the file several times I cannot keep writing it onto a different file each time.
EDIT 1
I tried using "+<" but its giving an output like this :
Hi
this
is just
an example.
<Hi >
<this>
<is just>
<an example.>
**out.txt**
EDIT 2
Code for reference :
open(my $fh, "+<", "out.txt");# or die "cannot open < C:\Users\daanishs\workspace\CCoverage\out.txt: $!";
while(<$fh>)
{
$s1 = "<";
$s2 = $_;
$s3 = ">";
$str = $s1 . $s2 . $s3;
print $fh "$str";
}

The very idea of what you are trying to do is flawed. The file starts as
H i / t h i s / ...
If you were to change it in place, it would look as follows after processing the first line:
< H i > / i s / ...
Notice how you clobbered "th"? You need to make a copy of the file, modify the copy, the replace the original with the copy.
The simplest way is to make this copy in memory.
my $file;
{ # Read the file
open(my $fh, '<', $qfn)
or die "Can't open \"$qfn\": $!\n";
local $/;
$file = <$fh>;
}
# Change the file
$file =~ s/^(.*)\n/<$1>\n/mg;
{ # Save the changes
open(my $fh, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
print($fh $file);
}
If you wanted to use the disk instead:
rename($qfn, "$qfn.old")
or die "Can't rename \"$qfn\": $!\n";
open(my $fh_in, '<', "$qfn.old")
or die "Can't open \"$qfn\": $!\n";
open(my $fh_out, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
while (<$fh_in>) {
chomp;
$_ = "<$_>";
print($fh_out "$_\n");
}
unlink("$qfn.old");
Using a trick, the above can be simplified to
local #ARGV = $qfn;
local $^I = '';
while (<>) {
chomp;
$_ = "<$_>";
print(ARGV "$_\n");
}
Or as a one-liner:
perl -i -pe'$_ = "<$_>"' file

Read contents in memory and then prepare required string as you write to your file. (SEEK_SET to zero't byte is required.
#!/usr/bin/perl
open(INFILE, "+<in.txt");
#a=<INFILE>;
seek INFILE, 0, SEEK_SET ;
foreach $i(#a)
{
chomp $i;
print INFILE "<".$i.">"."\n";
}
If you are worried about amount of data being read in memory, you will have to create a temporary result file and finally copy the result file to original file.

You could use Tie::File for easy random access to the lines in your file:
use Tie::File;
use strict;
use warnings;
my $filename = "out.txt";
my #array;
tie #array, 'Tie::File', $filename or die "can't tie file \"$filename\": $!";
for my $line (#array) {
$line = "<$line>";
# or $line =~ s/^(.*)$/<$1>/g; # -- whatever modifications you need to do
}
untie #array;
Disclaimer: Of course, this option is only viable if the file is not shared with other processes. Otherwise you could use flock to prevent shared access while you modify the file.
Disclaimer-2 (thanks to ikegami): Don't use this solution if you have to edit big files and are concerned about performance. Most of the performance loss is mitigated for small files (less than 2MB, though this is configurable using the memory arg).

One option is to open the file twice: Open it once read-only, read the data, close it, process it, open it again read-write (no append), write the data, and close it. This is good practice because it minimizes the time you have the file open, in case someone else needs it.
If you only want to open it once, then you can use the +< file type - just use the seek call between reading and writing to return to the beginning of the file. Otherwise, you finish reading, are at the end of the file, and start writing there, which is why you get the behavior you're seeing.

Need to specify
use Fcntl qw(SEEK_SET);
in order to use
seek INFILE, 0, SEEK_SET;
Thanks user1703205 for the example.

Related

Recursive search in Perl?

I'm incredibly new to Perl, and never have been a phenomenal programmer. I have some successful BVA routines for controlling microprocessor functions, but never anything embedded, or multi-facted. Anyway, my question today is about a boggle I cannot get over when trying to figure out how to remove duplicate lines of text from a text file I created.
The file could have several of the same lines of txt in it, not sequentially placed, which is problematic as I'm practically comparing the file to itself, line by line. So, if the first and third lines are the same, I'll write the first line to a new file, not the third. But when I compare the third line, I'll write it again since the first line is "forgotten" by my current code. I'm sure there's a simple way to do this, but I have issue making things simple in code. Here's the code:
my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";
open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
while (<FILE>) {
print "$_";
print "$searchString\n";
if (($_ =~ /$searchString/) and ($count == "0")) {
++ $count;
print FILE2 $_;
} else {
print "This isn't working\n";
}
}
close (FILE);
close (FILE2);
Excuse the way filehandles and scalars do not match. It is a work in progress... :)
The secret of checking for uniqueness, is to store the lines you have seen in a hash and only print lines that don't exist in the hash.
Updating your code slightly to use more modern practices (three-arg open(), lexical filehandles) we get this:
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
open my $in_fh, '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";
my %seen;
while (<$in_fh>) {
print $out_fh unless $seen{$_}++;
}
But I would write this as a Unix filter. Read from STDIN and write to STDOUT. That way, your program is more flexible. The whole code becomes:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
Assuming this is in a file called my_filter, you would call it as:
$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt
Update: But this doesn't use your $searchString variable. It's not clear to me what that's for.
If your file is not very large, you can store each line readed from the input file as a key in a hash variable. And then, print the hash keys (ordered). Something like that:
my %lines = ();
my $order = 1;
open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
$lines {$line} = $order++;
}
close $fhi;
open my $fho, ">", $file3 or die "Cannot open file: $!";
#Sort the keys, only if needed
my #ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( #ordered_lines ) {
print $fho $key;
}
close $fho;
You need two things to do that:
a hash to keep track of all the lines you have seen
a loop reading the input file
This is a simple implementation, called with an input filename and an output filename.
use strict;
use warnings;
open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";
my %seen;
while (my $line = <$fh_in>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $fh_out $line;
}
# remember this line
$seen{$line}++;
}
To test it, I've included it with the DATA handle as well.
use strict;
use warnings;
my %seen;
while (my $line = <DATA>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $line;
}
# remember this line
$seen{$line}++;
}
__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world
This will print
foo
bar
asdf
asdfg
hello world
Keep in mind that the memory consumption will grow with the file size. It should be fine as long as the text file is smaller than your RAM. Perl's hash memory consumption grows a faster than linear, but your data structure is very flat.

what is wrong with the syntax that does not give me an outfile?

I dont know what exactly is wrong but everytime I execute this script i keep getting "No such file or directory at ./reprioritize line 35, line 1".
here is my script that is having an issue:
my $newresult = "home/user/newresults_percengtage_vs_pn";
sub pushval
{
my #fields = #_;
open OUTFILE, ">$newresult/fixedhomdata_030716-031316.csv" or die $!; #line 35
while(<OUTFILE>)
{
if($fields[5] >= 13)
{
print OUTFILE "$fields[0]", "$fields[1]","$fields[2]","$fields[3]","$fields[4]","$fields[5]", "0";
}
elsif($fields[5] < 13 && $fields[5] > 1)
{
print OUTFILE "$fields[0]", "$fields[1]","$fields[2]","$fields[3]","$fields[4]","$fields[5]", "1";
}
elsif($fields[5] <= 1)
{
print OUTFILE "$fields[0]", "$fields[1]","$fields[2]","$fields[3]","$fields[4]","$fields[5]", "2";
}
}
close (OUTFILE);
You may want to have a look at Perl's tutorial on opening files.
I simplify it a bit. There are basically three modes: open for reading, open for writing, and open for appending.
Reading
Opening for reading is indicated by either a < preceeding the filename or on its own, as a separate parameter to the open() call (preferred), i.e.:
my $fh = undef;
my $filename = 'fixedhomdata_030716-031316.csv';
open($fh, "<$filename") or die $!; # bad
open($fh, '<', $filename) or die $!; # good
while( my $line = <$fh> ) { # read one line from filehandle $fh
...
}
close($fh);
When you open the file this way, it must exist, else you get your error (No such file or directory at ...).
Writing
Opening for writing is indicated by a >, i.e.:
open($fh, ">$filename") or die $!; # bad
open($fh, '>', $filename) or die $!; # good
print $fh "some text\n"; # write to filehandle $fh
print $fh "more text\n"; # write to filehandle $fh
...
close($fh);
When you open the file this way, it is truncated (cleared) and overwritten if it existed. If it did not exist, it will get created.
Appending
Opening for appending is indicated by a >>, i.e.:
open($fh, ">>$filename") or die $!; # bad
open($fh, '>>', $filename) or die $!; # good
print $fh "some text\n"; # append to filehandle $fh
print $fh "more text\n"; # append to filehandle $fh
...
close($fh);
When you open the file this way and it existed, then the new lines will be appended to the file, i.e. nothing is lost. If the file did not
exist, it will be created (as if only > had been given).
Your error message doesn't match your code. You opened the file for writing (>) but got doesn't exist, which indicates that you actually opened it for reading.
This might have happened because you use OUTPUT as a filehandle instead of a scoped variable, e.g. $fh. OUTPUT is a global filehandle, i.e. if you open a file this way, then all of your code (no matter which function in) can use OUTPUT. Don't do that. From the docs:
An older style is to use a bareword as the filehandle, as
open(FH, "<", "input.txt")
or die "cannot open < input.txt: $!";
Then you can use FH as the filehandle, in close FH and and so on.
Note that it's a global variable, so this form is not recommended
in new code.
To summarize:
use scoped variables as filehandles ($fh instead of OUTPUT)
open your file in the right mode (> vs. <)
always use three-argument open (open($fh, $mode, $filename) vs. open($fh, "$mode$filename")
The comments explain that your two issues with the snippet are
The missing leading '/' in the $newresult declaration
You are treating your filehandle as both a read and a write.
The first is easy to fix. The second is not as easy to fix properly with knowing the rest of the script. I am making an assumption that pushval is called once per record in a Array of Arrays(?). This snippet below should get the result you want, but there is likely a better way of doing it.
my $newresult = "/home/user/newresults_percengtage_vs_pn";
sub pushval{
my #fields = #_;
open OUTFILE, ">>$newresult/fixedhomdata_030716-031316.csv" or die $!; #line 35
print OUTFILE "$fields[0]", "$fields[1]","$fields[2]","$fields[3]","$fields[4]","$fields[5]"
if($fields[5] >= 13) {
print OUTFILE "0\n";
} elsif($fields[5] < 13 && $fields[5] > 1) {
print OUTFILE "1\n";
} elsif($fields[5] <= 1) {
print OUTFILE "2\n";
}
close (OUTFILE);

Why does this corrupt my yaml file?

When I run the below script I get a corrupt yaml file like so
---
1:
name1: abc
name2: abc
---
me3: abc
---
Question
Can anyone see that I am doing wrong?
#!/usr/bin/perl
use strict;
use YAML::Syck;
use Fcntl ':flock', 'SEEK_SET';
use warnings;
use Data::Dumper;
my $acc;
my $acc_fh;
$acc->{1}{name1} = "abc";
unlink 'test.yaml';
# write initial
open F, '>', 'test.yaml';
print F YAML::Syck::Dump($acc);
close F;
($acc, $acc_fh) = read_yaml_with_lock('test.yaml');
$acc->{1}{name2} = "abc";
$acc->{1}{name3} = "abc";
write_yaml_with_lock($acc, $acc_fh);
($acc, $acc_fh) = read_yaml_with_lock('test.yaml');
delete $acc->{1}{name3};
write_yaml_with_lock($acc, $acc_fh);
sub read_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh, 0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}
You write to the same file twice. During the second time the YAML code you're writing is shorter than the first time because you delete that hash key inbetween the calls. However, you neither unlink the file after the first time nor do you truncate it after writing to it the second time. So what you see as corruption is the part of the file that has been written the first time but that hasn't been overwritten the second time.
The "me3" part is what is left of " name3", which gets partially overwritten by "---\n" (4 characters). When you write the first time, you have more data. Then you rewind the file handle position and write a shorter data, which does not overwrite all of the old.
I think your solution "should" be to skip this passing a file handle around and rewinding it and instead use the appropriate open for each subroutine. E.g.:
sub read_yaml {
my $file = shift;
open my $fh, '<', $file or die $!;
...
close $fh;
}
sub write_yaml {
my ($file, $obj) = #_;
open my $fh, '>', $file or die $!;
...
close $fh;
}
Keeping the file handle open in between operations not really that useful or efficient, and it introduces some difficulties.

Problems reading a binary file with ActivePerl?

I'm trying to read a binary file with the following code:
open(F, "<$file") || die "Can't read $file: $!\n";
binmode(F);
$data = <F>;
close F;
open (D,">debug.txt");
binmode(D);
print D $data;
close D;
The input file is 16M; the debug.txt is only about 400k. When I look at debug.txt in emacs, the last two chars are ^A^C (SOH and ETX chars, according to notepad++) although that same pattern is present in the debug.txt. The next line in the file does have a ^O (SI) char, and I think that's the first occurrence of that particular character.
How can I read in this entire file?
If you really want to read the whole file at once, use slurp mode. Slurp mode can be turned on by setting $/ (which is the input record separator) to undef. This is best done in a separate block so you don't mess up $/ for other code.
my $data;
{
open my $input_handle, '<', $file or die "Cannot open $file for reading: $!\n";
binmode $input_handle;
local $/;
$data = <$input_handle>;
close $input_handle;
}
open $output_handle, '>', 'debug.txt' or die "Cannot open debug.txt for writing: $!\n";
binmode $output_handle;
print {$output_handle} $data;
close $output_handle;
Use my $data for a lexical and our $data for a global variable.
TIMTOWTDI.
File::Slurp is the shortest way to express what you want to achieve. It also has built-in error checking.
use File::Slurp qw(read_file write_file);
my $data = read_file($file, binmode => ':raw');
write_file('debug.txt', {binmode => ':raw'}, $data);
The IO::File API solves the global variable $/ problem in a more elegant fashion.
use IO::File qw();
my $data;
{
my $input_handle = IO::File->new($file, 'r') or die "could not open $file for reading: $!";
$input_handle->binmode;
$input_handle->input_record_separator(undef);
$data = $input_handle->getline;
}
{
my $output_handle = IO::File->new('debug.txt', 'w') or die "could not open debug.txt for writing: $!";
$output_handle->binmode;
$output_handle->print($data);
}
I don't think this is about using slurp mode or not, but about correctly handling binary files.
instead of
$data = <F>;
you should do
read(F, $buffer, 1024);
This will only read 1024 bytes, so you have to increase the buffer or read the whole file part by part using a loop.

Can I find a filename from a filehandle in Perl?

open(my $fh, '>', $path) || die $!;
my_sub($fh);
Can my_sub() somehow extrapolate $path from $fh?
A filehandle might not even be connected to a file but instead to a network socket or a pipe hooked to the standard output of a child process.
If you want to associate handles with paths your code opens, use a hash and the fileno operator, e.g.,
my %fileno2path;
sub myopen {
my($path) = #_;
open my $fh, "<", $path or die "$0: open: $!";
$fileno2path{fileno $fh} = $path;
$fh;
}
sub myclose {
my($fh) = #_;
delete $fileno2path{fileno $fh};
close $fh or warn "$0: close: $!";
}
sub path {
my($fh) = #_;
$fileno2path{fileno $fh};
}
Whoever might be looking for better way to find the file name from filehandle or file descriptor:
I would prefer to use the find -inum , if available.
Or, how about using following way, always - any drawbacks except the unix/linux compatible!
my $filename='/tmp/tmp.txt';
open my $fh, '>', $filename;
my $fd = fileno $fh;
print readlink("/proc/$$/fd/$fd");
You can call stat or IO::Handle::stat on a filehandle -- that will give you the device and inode of the file that you have opened. With that and a little operating system wizardry you can find the filename. OK, maybe a lot of operating system wizardry.
The find command has an -inum option to find a file with a specified inode number. This is probably not going to be as efficient as caching the path when you open the file, as gbacon recommends.