Are there any gotchas with open(my $f, '<:encoding(UTF-8)', $n) - perl

I am having a problem that I am unable to reproduce in a manner suitable for Stackoverflow although it's reproducable in my production environment.
The problem occors in a Perl script that, among others, iterates over a file that looks like so:
abc-4-9|free text, possibly containing non-ascii characters|
cde-3-8|hällo wörld|
# comment
xyz-9-1|and so on|
qrs-2-8|and so forth|
I can verify the correctness of the file with this Perl script:
use warnings;
use strict;
open (my $f, '<:encoding(UTF-8)', 'c:\path\to\file') or die "$!";
while (my $s = <$f>) {
chomp($s);
next unless $s;
next if $s =~ m/^#/;
$s =~ m!(\w+)-(\d+)-(\d+)\|([^|]*)\|! or die "\n>$s<\n didn't match on line $.";
}
print "Ok\n";
close $f;
When I run this script, it won't die on line 10 and consequently print Ok.
Now, I use essentially the same construct in a huge Perl script (hence irreproducable for Stackoverflow) and it will die on line 2199 of the input file.
If I change the first line (which is completely unrelated to line 2199) from something like
www-1-1|A line with some words|
to
www-1-1|x|
the script will process line 2199 (but fail later).
Interestingly, this behaviour was introduced when I changed
open (my $f, '<', 'c:\path\to\file') or die "$!";
to
open (my $f, '<:encoding(UTF-8)', 'c:\path\to\file') or die "$!";
Without the :encoding(UTF-8) directive, the script does not fail. Of course, I need the encoding directive since the file contains non-ascii characters.
BTW, the same script runs without problems on Linux.
On Windows, where it fails, I use Strawberry Perl 5.24

I do not have a full and correct explanation of why this is necessary, but you can try opening the file with
'<:unix:encoding(UTF-8)'
This may be related to my question "Why is CRLF set for the unix layer on Windows?" which I noticed when I was trying to figure out stuff which I ended up never figuring out.

Related

How to write to an existing file in Perl?

I want to open an existing file in my desktop and write to it, for some reason I can't do it in ubuntu. Maybe I don't write the path exactly?
Is it possible without modules and etc.
open(WF,'>','/home/user/Desktop/write1.txt';
$text = "I am writing to this file";
print WF $text;
close(WF);
print "Done!\n";
You have to open a file in append (>>) mode in order to write to same file.
(Use a modern way to read a file, using a lexical filehandle:)
Here is the code snippet (tested in Ubuntu 20.04.1 with Perl v5.30.0):
#!/usr/bin/perl
use strict;
use warnings;
my $filename = '/home/vkk/Scripts/outfile.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
print $fh "Write this line to file\n";
close $fh;
print "done\n";
For more info, refer these links - open or appending-to-files by Gabor.
Please see following code sample, it demonstrates some aspects of correct usage of open, environment variables and reports an error if a file can not be open for writing.
Note: Run a search in Google for Perl bookshelf
#!/bin/env perl
#
# vim: ai ts=4 sw=4
#
use strict;
use warnings;
use feature 'say';
my $fname = $ENV{HOME} . '/Desktop/write1.txt';
my $text = 'I am writing to this file';
open my $fh, '>', $fname
or die "Can't open $fname";
say $fh $text;
close $fh;
say 'Done!';
Documentation quote
About modes
When calling open with three or more arguments, the second argument -- labeled MODE here -- defines the open mode. MODE is usually a literal string comprising special characters that define the intended I/O role of the filehandle being created: whether it's read-only, or read-and-write, and so on.
If MODE is <, the file is opened for input (read-only). If MODE is >, the file is opened for output, with existing files first being truncated ("clobbered") and nonexisting files newly created. If MODE is >>, the file is opened for appending, again being created if necessary.
You can put a + in front of the > or < to indicate that you want both read and write access to the file; thus +< is almost always preferred for read/write updates--the +> mode would clobber the file first. You can't usually use either read-write mode for updating textfiles, since they have variable-length records. See the -i switch in perlrun for a better approach. The file is created with permissions of 0666 modified by the process's umask value.
These various prefixes correspond to the fopen(3) modes of r, r+, w, w+, a, and a+.
Documentation: open, close,

How to modify content of a file using single file handle

I'm trying to modify content of a file using Perl.
The following script works fine.
#!/usr/bin/perl
use strict;
use warnings;
open(FH,"test.txt") || die "not able to open test.txt $!";
open(FH2,">","test_new.txt")|| die "not able to opne test_new.txt $!";
while(my $line = <FH>)
{
$line =~ s/perl/python/i;
print FH2 $line;
}
close(FH);
close(FH2);
The content of test.txt:
im learning perl
im in File handlers chapter
The output in test_new.txt:
im learning python
im in File handlers chapter
If I try to use same file handle for modifying the content of file, then I'm not getting expected output. The following is the script that attempts to do this:
#!/usr/bin/perl
use strict;
use warnings;
open(FH,"+<","test.txt") || die "not able to open test.txt $!";
while(my $line = <FH>)
{
$line =~ s/perl/python/i;
print FH $line;
}
close(FH);
Incorrect output in test.txt:
im learning perl
im learning python
chapter
chapter
How do I modify the file contents using single file handle?
You can't delete from a file (except at the end).
You can't insert characters into a file (except at the end).
You can replace a character in a file.
You can append to a file.
You can shorten a file.
That's it.
You're imagining you can simply replace "Perl" with "Python" in the file. Those aren't of the same length, so it would require inserting characters into the file, and you can't do that.
You can effectively insert characters into a file by loading the rest of the file into memory and writing it back out two characters further. But doing this gets tricky for very large files. It's also very slow since you end up copying a (possibly very large) portion of the file every time you want to insert characters.
The other problem with in-place modifications is that you can't recover from an error. If something happens, you'll be left with an incomplete or corrupted file.
If the file is small and you're ok with losing the data if something goes wrong, the simplest approach is to load the entire file into memory.
open(my $fh, '<+', $qfn)
or die("Can't open \"$qfn\": $!\n");
my $file = do { local $/; <$fh> };
$file =~ s/Perl/Python/g;
seek($fh, 0, SEEK_SET)
or die $!;
print($fh $file)
or die $!;
truncate($fh)
or die $!;
A safer approach is to write the data to a new file, then rename the file when you're done.
my $new_qfn = $qfn . ".tmp";
open(my $fh_in, '<', $qfn)
or die("Can't open \"$qfn\": $!\n");
open(my $fh_out, '<', $new_qfn)
or die("Can't create \"$new_qfn\": $!\n");
while (<$fh_in>) {
s/Perl/Python/g;
print($fh_out $_);
}
close($fh_in);
close($fh_out);
rename($qfn_new, $qfn)
or die $!;
The downside of this approach is it might change the file's permissions, and hardlinks will point to the old content instead of the new file. You also need permissions to create a file.
As #Сухой27 answered
it's typical situation that perl onliner pleasingly used.
perl -i -pe 's/perl/python/i'
perl takes below options
-p : make line by line loop(every line assign into $_ and print after evaluated $_)
-e : evaluate code block in above loop ( regex take $_ as default operand )
-i : in plcae file edit (if you pass arguments for -i, perl preserve original files with that extention)
if you run below script
perl -i.bak -pe 's/perl/python/i' test.txt
you will get modified test.txt
im learning python
im in File handlers chapter
and get original text files named in test.txt.bak
im learning perl
im in File handlers chapter

Irregular Behavior with Perl print

I'm attempting to print out to two different files. For some reason, print statements work fine for one file, but not for the other. When I run this program, filter2.out consists of a single line that reads "Beginning". filter2.err remains empty.
open(OUTPUT, "> Filter2/filter2.out");
open(ERROR, "> Filter2/filter2.err");
print OUTPUT "Beginning\n";
print ERROR "Beginning\n";
UPDATE: So I was running this at the beginning of a larger program and realized that it only updates the ERROR file in batches or when the file is closed. Any idea why this occurs?
Consider adding
use strict;
use warnings;
to the top of your script. These statements will help catch errors that are otherwise silently ignored by Perl. In addition, consider adding error checking to your open calls: in all likelihood, it's not actually opening. I'd write it like this:
use strict;
use warnings;
open(OUTPUT, "> Filter2/filter2.out")
or die "Can't open filter2.out: $!";
open(ERROR, "> Filter2/filter2.err")
or die "Can't open filter2.err: $!";
print OUTPUT "Beginning\n";
print ERROR "Beginning\n";
for example, by just adding adding strict and warnings I got:
print() on closed filehandle OUTPUT at .\printer.pl line 6.
print() on closed filehandle ERROR at .\printer.pl line 7.
Hmm...!
By adding error checking, I got:
PS C:\dev> perl .\printer.pl
Can't open filter2.out: No such file or directory at .\printer.pl line 4.
Aah! Looking, I didn't have the folder. After I added the folder, everything ran fine. You'll probably find something similar.
Finally, you should probably also use the modern, lexical file handles. This helps catch other errors (like re-used handle names.) Thus, the final script would look like:
use strict;
use warnings;
open(my $output, ">", "Filter2/filter2.out")
or die "Can't open filter2.out: $!";
open(my $error, ">", "Filter2/filter2.err")
or die "Can't open filter2.err: $!";
print $output "Beginning\n";
print $error "Beginning\n";
Viola! Now you can see exactly where the problem fails, as it fails, and make sure that other libraries or code you write later can't accidentally interfere with your file handles.
You need to check that your files were properly opened. Also it's better to use local variables as file handles instead of bare words:
open( my $err, "> Filter2/filter2.err") or die "Couldn't open error: $!"
print $err "Beginning\n"

Atomic open of non-existing file in Perl

I want to write something to a file which name is in variable $filename.
I don't want to overwrite it, so I check first if it exists and then open it:
#stage1
if(-e $filename)
{
print "file $filename exists, not overwriting\n";
exit 1;
}
#stage2
open(OUTFILE, ">", $filename) or die $!;
But this is not atomic. Theoretically someone can create this file between stage1 and stage2. Is there some variant of open command that will do these both things in atomic way, so it will fail to open a file for writing if the file exists?
Here is an atomic way of opening files:
#!/usr/bin/env perl
use strict;
use warnings qw(all);
use Fcntl qw(:DEFAULT :flock);
my $filename = 'test';
my $fh;
# this is "atomic open" part
unless (sysopen($fh, $filename, O_CREAT | O_EXCL | O_WRONLY)) {
print "file $filename exists, not overwriting\n";
exit 1;
}
# flock() isn't required for "atomic open" per se
# but useful in real world usage like log appending
flock($fh, LOCK_EX);
# use the handle as you wish
print $fh scalar localtime;
print $fh "\n";
# unlock & close
flock($fh, LOCK_UN);
close $fh;
Debug session:
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ cat test
Wed Dec 19 12:10:37 2012
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ perl sysopen.pl
file test exists, not overwriting
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ cat test
Wed Dec 19 12:10:37 2012
If you're concerned about multiple Perl scripts modifying the same file, just use the flock() function in each one to lock the file you're interested in.
If you're worried about external processes, which you probably don't have control over, you can use the sysopen() function. According to the Programming Perl book (which I highly recommend, by the way):
To fix this problem of overwriting, you’ll need to use sysopen, which
provides individual controls over whether to create a new file or
clobber an existing one. And we’ll ditch that –e file existence test
since it serves no useful purpose here and only increases our exposure
to race conditions.
They also provide this sample block of code:
use Fcntl qw/O_WRONLY O_CREAT O_EXCL/;
open(FH, "<", $file)
|| sysopen(FH, $file, O_WRONLY | O_CREAT | O_EXCL)
|| die "can't create new file $file: $!";
In this example, they first pull in a few constants (to be used in the sysopen call). Next, they try to open the file with open, and if that fails, they then try sysopen. They continue on to say:
Now even if the file somehow springs into existence between when open
fails and when sysopen tries to open a new file for writing, no harm
is done, because with the flags provided, sysopen will refuse to open
a file that already exists.
So, to make things clear for your situation, remove the file test completely (no more stage 1), and only do the open operation using code similar to the block above. Problem solved!

filehandle - won't write to a file

I cannot get the script below to write to the file, data.txt, using a FILEHANDLE. Both the files are in the same folder, so that's not the issue. Since I started with Perl, I have noticed to run scripts, I have to use a full path: c:\programs\scriptname.pl and also the same method to input files. I thought that could be the issue and tried this syntax below but that didn't work either...
open(WRITE, ">c:\programs\data.txt") || die "Unable to open file data.txt: $!";
Here is my script. I have checked the syntax until it makes me crazy and cannot see an issue. Any help would be greatly appreciated!. I'm also puzzled, why the die function hasn't kicked in.
#!c:\strawberry\perl\bin\perl.exe
#strict
#diagnostics
#warnings
#obtain info in variables to be written to data.txt
print("What is your name?");
$name = <STDIN>;
print("How old are you?");
$age = <STDIN>;
print("What is your email address?");
$email = <STDIN>;
#data.txt is in the same file as this file.
open(WRITE, ">data.txt") || die "Unable to open file data.txt: $!";
#print information to data.txt
print WRITE "Hi, $name, you are \s $age and your email is \s $email";
#close the connection
close(WRITE);
How I solved this problem solved.
I have Strawberry Perl perl.exe installed on the c: drive, installed through and using the installer with a folder also on c with my scripts in, which meant I couldn't red/write to a file (directional or using functions, ie the open one) and I always had to use full paths to launch a script. I solved this problem after a suggestion of leaving the interpreter installed where it was and moving my scripts file to the desktop (leave the OS command in the first line of the script where it is as the interpreter is still in the same place it was initially). Now I can run the scripts with one click and read/write and append to file with CMD prompt and using Perl functions with ease.
Backslashes have a special meaning in double-quoted strings. Try escaping the backslashes.
open(WRITE, ">c:\\programs\\data.txt") || die ...;
Or, as you're not interpolating variables, switch to single quotes.
open(WRITE, '>c:\programs\data.txt') || die ...;
It's also worth using the three-argument version of open and lexical filehandles.
open(my $write_fh, '>', 'c:\programs\data.txt') || die ...;
you must use "/" to ensure portability, so: open(WRITE, ">c:/programs/data.txt")
Note: I assume that c:/programs folder exists
You may want to try FindBin.
use strict;
use warnings;
use autodie; # open will now die on failure
use FindBin;
use File::Spec::Functions 'catfile';
my $filename = catfile $FindBin::Bin, 'data.txt';
#obtain info in variables to be written to data.txt
print("What is your name?"); my $name = <STDIN>;
print("How old are you?"); my $age = <STDIN>;
print("What is your email address?"); my $email = <STDIN>;
{
open( my $fh, '>', $filename );
print {$fh} "Hi, $name, you are $age, and your email is $email\n";
close $fh;
}
If you have an access problem when you try to print to data.txt you can change that line to:
print WRITE "Hi, $name, you are \s $age and your email is \s $email" || die $!;
to get more information. A read only file will cause this error message:
Unable to open file data.txt: Permission denied at perl.pl line 12, <STDIN> line 3.