perl's open() fails sometimes when file name ends with whitespace - perl

I'm facing a problem with Perl's open() function. It is related to the files whose names end with whitespace. If I use open() with 2 arguments (filehandle and filename) and filename ends with whitespace, open() fails. Error message says that file cannot be found, although file exists. No such thing happen when opening mode is specified, e.g., if I state explicitly that file is opened for reading. Here is some sample code:
use warnings;
use strict;
my $file = '/tmp/test_with_ending_space ';
open WRITE, ">", $file or die "open with mode got error: $!";
print WRITE "my open() test\n";
close WRITE;
# open() with mode
open READ, "<", $file or die "open without mode got error: $!";
while (<READ>) {
print;
}
close READ;
# open() without mode
open READ1, $file or die $!;
while (<READ1>) {
print;
}
close READ1;
And here is the output from such code:
marius#mariusm-PC:~/perl$ ./test.pl
my open() test
open without mode got error: No such file or directory at ./test.pl line 21.
No such things happen with "usual" filenames, i.e., when filenames end with some other character.
Any ideas if this is a known problem? If yes, is there a way how to workaround it?
And just in case, before you start telling me "be nice, specify mode and tell your open() how to open the file". Unfortunately, this issue is present in some core modules, e.g., IO::File::open() (that's where I got stuck originaly). Last call in this function is open($fh, $file), i.e., it calls native open() without any particular mode.

It's documented in open
The filename passed to the one- and two-argument forms of
open() will have leading and trailing whitespace deleted
Read the following paragrpahs for more details.

#choroba gave the "why", but you also asked for a workaround.
Well, this is VERY kludgey, but if you're desperate and can't change the open() calls, this will work. First, detect if the filename ends with whitespace (I assume you can handle that). If it does, create a temp symlink to the file (without trailing whitespace!), and open the symlink.
WFM in my (old) Solaris 2.6 box.

Related

backtick vs native way of doing things in PERL

Consider these 2 snippets :
#!/bin/bash/perl
open(DATA,"<input.txt");
while(<DATA>)
{
print($_) ;
}
and
$abcd = `cat input.txt`;
print $abcd;
Both will print the content of file input.txt as output
Question : Is there any standard, as to which one (backticks or native-method) should be preferred over the other, in any particular case or both are equal always??
Reason i am asking this is because i find cat method to be easier than opening a file in native perl method, so, this puts me in doubt that if i can achieve something through backtick way, shall i go with it or prefer other native ways of doing it!!
I checked this thread too : What's the difference between Perl's backticks, system, and exec? but it went a different route than my doubt!!
Use builtin functions wherever possible:
They are more portable: open works on Windows, while `cat input.txt` will not.
They have less overhead: Using backticks will fork, exec a shell which parses the command, which execs the cat program. This unnecessarily loads two programs. This is in contrast to open which is a builtin Perl function.
They make error handling easier. The open function will return a false value on error, which allows you to take different actions, e.g. like terminating the program with an error message:
open my $fh, "<", "input.txt" or die "Couldn't open input.txt: $!";
They are more flexible. For example, you can add encoding layers if your data isn't Latin-1 text:
open my $fh, "<:utf8", "input.txt" or die "Couldn't open input.txt: $!";
open my $fh, "<:raw", "input.bin" or die "Couldn't open input.bin: $!";
If you want a “just read this file into a scalar” function, look at the File::Slurp module:
use File::Slurp;
my $data = read_file "input.txt";
Using the back tick operators to call cat is highly inefficient, because:
It spawns a separate process (or maybe more than one if a shell is used) which does nothing more than read the file, which perl could do itself.
You are reading the whole file into memory instead of processing it one line at a time. OK for a small file, not so good for a large one.
The back tick method is ok for a quick and dirty script but I would not use it for anything serious.

In Perl, why does print not generate any output after I close STDOUT?

I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file
I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";
It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.
A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed
open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";
You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt

How to append to a file?

I am trying to append some text to the end of a file in Mac OSX having a .conf extension. I am using the following code to do that:
open NEW , ">>$self->{natConf}";
print NEW "$hostPort = $vmIP";
where
$self->{natConf} = \Library\Preferences\VMware Fusion\vmnet8\nat.conf
So basically this is a .conf file. And even though its not returning any error, but it is not appending anything to the end of the file. I checked all the permissions, and read-write privilege has been provided. Is there anything I am missing here.
First of all use strict and use warnings. This would have thrown errors and warnings for your code.
On Mac OS the delimiter in a path is / like in other unix-like systems not \.
To asign a string to a variable use quotation marks.
Do not use open(2) but open(3) (the arrow operator does not work in your usage of open anyway) and it is considered bad practice to use bareword filehandlers.
use strict;
use warnings;
# your code here
$self->{natConf} = '/Library/Preferences/VMware Fusion/vmnet8/nat.conf';
# more code here
open my $fh, '>>', $self->{natConf} or die "open failed: $!\n";
print $fh "$hostPort = $vmIP";
close $fh;
# rest of code here
Suffering from buffering? Call close NEW when you are done writing to it, or call (*NEW)->autoflush(1) on it after you open it to force Perl to flush the output after every print.
Also check the return values of the open and print calls. If either of these functions fail, they will return false and set the $! variable.
And I second the recommendation about using strict and warnings.

Can someone explain this Perl code snippet?

This little piece of code has been a staple in a bunch of my scripts, but I took the syntax from another working script that someone else wrote and adapted it to fit my needs. I'm not even sure that the syntax used here is the best or most common way to open a file handler either.
The code is:
$fh = \*STAT_FILE;
open ($fh,">>".$stat_file) or die "Can't open $stat_file: $!\n";
my $print_flag = ( -z $stat_file );
I don't fully understand the first line and also the last line of the code above. Specifically, \*STAT_FILE and -z, respectively.
I know that, for the most part, the second line will open a file for appending or quit and throw an error. But again, I don't understand what purpose the $! serves in that line either.
Can someone explain this Perl code, line-by-line, to me in pseudo? Also, if the method above is not the preferred method, then what is?
Thanks in advance
Before perl 5.6, file handles could only be globs (bare words) or references to globs (which is what \*STAT_FILE is). Also, it's better to use 3-argument open (See the docs. Also see perlopentut). So you can now do:
open(my $fh, ">>", $stat_file) or die "Failed to open $stat_file: $!";
and forget about \*STAT_FILE.
-z is one of the file test functions (and takes a file name or file handle as an argument) and tests to see if the file has zero size.
$! is one of the Special Variables and contains the most recent system error message (in this case why you can not open the file, perhaps permission issues, or a directory in the path to the file does not exist, etc.).
You should learn to use perldoc, all of this is in perldoc:
perldoc perlfunc (specifically perldoc -f open and perldoc -f -X)
perldoc perlvar
The first row assign to the variable a reference (the backslash sign) to the typeglob (a fullsymbol table entry) STAT_FILE. This has been a quite idiomatic perl construct to pass filehandles as reported, just to name it, in the Larry Wall "Programming perl". The $! variable contains the error message reurned by the operating system.
So the whole meaning is:
line 1. put in the $fh variable a filehandle;
line 2. Open for append the file reporting the system message error should a fault happens;
line 3. Set a flag variable warning if the file has zero length

How can I pass a filehandle to Perl Expect's log_file function?

I feel stupid for asking this, but I've tried a couple things and I'm not sure where to go with it.
From the Expect.pm documentation:
$object->log_file("filename" | $filehandle | \&coderef | undef)
Log session to a file. All characters send to or received from
the spawned process are written to the file.
I'd like to pass the $filehandle to log_file. However, when I tried this:
open (LOG, ">>" .$opt{l});
my $sess = Expect->spawn("telnet $ip");
$sess->log_file(LOG)
I get a file named 'LOG' in the directory that I'm running the script out of. After some investigation, I tried this:
open (LOG, ">>" .$opt{l});
my $sess = Expect->spawn("telnet $ip");
my $fh = *LOG;
$sess->log_file($fh)
Now, I get a file named *main::LOG in the directory. I do have another file as well, named whatever I specified on the -l option, but it only contains the lines that I send to print LOG.
I'm not sure if the filehandling functionality is hosed in the function, or if I'm doing something wrong.
If you have a bareword filehandle named LOG, you can pass it to a function by saying \*LOG (you can read more about this in perldoc perldata), but don't do that. Bareword filehandles are a very old style and should no longer be used. Try using a lexical filehandle and the three argument version of open:
open my $log, ">>", $opt{l}
or die "could not open $opt{l}: $!";
you can use $log anywhere you used LOG in the past.
You should also be using the strict and warnings pragmas.
Try using a lexical filehandle (and the three-argument open, and die) to begin with:
open my $logfh, ">>", $opt{l} or die "Could not open log file $opt{l}: $!\n";
$sess->log_file( $logfh );
LOG is incredibly generic and could be getting trumped (or doing the trumping) of another filehandle somewhere in your code. Using a lexical filehandle helps to prevent confusion. And you should always check the return status of open() (or use autodie) in case you can't actually open the file.
It might be a better idea to return a filehandle using log_file by passing it the filename instead.
From the Expect documentation:
$object->log_file("filename" | $filehandle | \&coderef | undef)
Log session to a file. All characters
send to or received from the spawned
process are written to the file.
Normally appends to the logfile, but
you can pass an additional mode of "w"
to truncate the file upon open():
$object->log_file("filename", "w");
Returns the logfilehandle.
So you should be able to achieve the same functionality using the following:
my $sess = Expect->spawn("telnet $ip");
$sess->log_file($opt{l}); # Or my $fh = $sess->log_file...
# if that filehandle is needed
Now all your session activity will be logged to the file. Append mode is the default.