Should I pop after each binmode? - perl

When using binmode, should I pop the layers from a possibly previous used binmode?
#!/usr/bin/env perl
use warnings;
use 5.012;
use autodie;
open my $tty, '>:encoding(utf8)', '/dev/tty'; # ...
close $tty;
open $tty, '>:encoding(latin1)', '/dev/tty'; # ...
close $tty;
open $tty, '>:encoding(utf8)', '/dev/tty'; # ...
close $tty;
open $tty, '>:encoding(latin1)', '/dev/tty'; # ...
close $tty;
open $tty, '>:bytes', '/dev/tty';
say "#{[ PerlIO::get_layers( $tty ) ]}"; # unix perlio
close $tty;
say "----------------------------------------";
binmode STDOUT, ':encoding(utf8)'; # ...
binmode STDOUT, ':encoding(latin1)'; # ...
binmode STDOUT, ':encoding(utf8)'; # ...
binmode STDOUT, ':encoding(latin1)'; # ...
binmode STDOUT, ':bytes';
say "#{[ PerlIO::get_layers( *STDOUT ) ]}"; # unix perlio encoding(utf8) /
# utf8 encoding(iso-8859-1) utf8 encoding(utf8) utf8 encoding(iso-8859-1)
binmode STDOUT, ':pop:pop:pop:pop:bytes';
say "#{[ PerlIO::get_layers( *STDOUT ) ]}"; # unix perlio
.
#!/usr/bin/env perl
use warnings;
use 5.012;
use autodie;
open my $tty, '>:encoding(utf8)', '/dev/tty'; # ...
close $tty;
open $tty, '>:raw', '/dev/tty';
say "#{[ PerlIO::get_layers( $tty ) ]}"; # unix
close $tty;
say "----------------------------------------";
binmode STDOUT, ':encoding(utf8)'; # ...
binmode STDOUT, ':raw';
say "#{[ PerlIO::get_layers( *STDOUT ) ]}"; # unix perlio
binmode STDOUT, ':pop:raw';
say "#{[ PerlIO::get_layers( *STDOUT ) ]}"; # unix

:pop is required to pop real layers, such as :encoding(...). So yes, if you want to replace a real layer by another one, then you'll have to :pop.
But note that pushing :raw actually results in a series of pop... And :perlio automatically inserts :unix underneath. So the exact number of pops really depends on the current layers.
As the documentation says itself:
A more elegant (and safer) interface is needed.

Related

Perl and utf8 output from file [duplicate]

This question already has answers here:
How can I output UTF-8 from Perl?
(6 answers)
Closed 6 months ago.
I have a problem with perl output : the french word "préféré" is sometimes outputted "pr�f�r�" :
The sample script :
devel#k0:~/tmp$ cat 02.pl
#!/usr/bin/env perl
use strict;
use warnings;
print "préféré\n";
open( my $fh, '<:encoding(UTF-8)', 'text' ) ;
while ( <$fh> ) { print $_ }
close $fh;
exit;
The execution :
devel#k0:~/tmp$ ./02.pl
préféré
pr�f�r�
devel#k0:~/tmp$ cat text
préféré
devel#k0:~/tmp$ file text
text: UTF-8 Unicode text
Can please someone help me ?
Decode your inputs, encode your outputs. You have two bugs related to failure to properly decode and encode.
Specifically, you're missing
use utf8;
use open ":std", ":encoding(UTF-8)";
Details follow.
Perl source code is expected to be ASCII (with 8-bit clean string literals) unless you use use utf8 to tell Perl it's UTF-8.
I believe you have a UTF-8 terminal. We can conclude from the fact that cat 02.pl works that your source code is encoded using UTF-8. This means Perl sees the equivalent of this:
print "pr\x{C3}\x{A9}f\x{C3}\x{A9}r\x{C3}\x{A9}\n"; # C3 A9 = é encoded using UTF-8
You should be using use utf8; so Perl sees the equivalent of
print "pr\x{E9}f\x{E9}r\x{E9}\n"; # E9 = Unicode Code Point for é
You correctly decode the file you read.
The file presumably contains
70 72 C3 A9 66 C3 A9 72 C3 A9 0A # préféré␊ encoded using UTF-8
Because of the encoding layer you add, you are effectively doing
$_ = decode( "UTF-8", "\x{70}\x{72}\x{C3}\x{A9}\x{66}\x{C3}\x{A9}\x{72}\x{C3}\x{A9}\x{0A}" );
or
$_ = "pr\x{E9}f\x{E9}r\x{E9}\n";
This is correct.
Finally, you fail to encode your outputs.
The following does what you want:
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
BEGIN {
binmode( STDIN, ":encoding(UTF-8)" ); # Well, not needed here.
binmode( STDOUT, ":encoding(UTF-8)" );
binmode( STDERR, ":encoding(UTF-8)" );
}
print "préféré\n";
open( my $fh, '<:encoding(UTF-8)', 'text' ) or die $!;
while ( <$fh> ) { print $_ }
close $fh;
But the open pragma makes it a lot cleaner.
The following does what you want:
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use open ":std", ":encoding(UTF-8)";
print "préféré\n";
open( my $fh, '<', 'text' ) or die $!;
while ( <$fh> ) { print $_ }
close $fh;
UTF-8 is an interesting problem. First, your Perl itself will print correctly, because you don't do any UTF-8 Handling. You have an UTF-8 String, but Perl itself don't really know that it is UTF-8, and it will also print it, as-is.
So an an UTF-8 Terminal. Everything looks fine. Even that's not the case.
When you add use utf8; to your source-code. You will see, that your print now will produce the same garbage. But if you have string containing UTF-8. That's what you should do.
use utf8;
# Now also prints garbage
print "préféré\n";
open my $fh, '<:encoding(UTF-8)', 'text';
while ( <$fh> ) {
print $_;
}
close $fh;
Next. For every input you do from the outside, you need to do an decode, and for every output you do. You need todo an encode.
use utf8;
use Encode qw(encode decode);
# Now correct
print encode("UTF-8", "préféré\n");
open my $fh, '<:encoding(UTF-8)', 'text';
while ( <$fh> ) {
print encode("UTF-8", $_);
}
close $fh;
This can be tedious. But you can enable Auto-Encoding on a FileHandle with binmode
use utf8;
# Activate UTF-8 Encode on STDOUT
binmode STDOUT, ':utf8';
print "préféré\n";
open my $fh, '<:encoding(UTF-8)', 'text';
while ( <$fh> ) {
print $_;
}
close $fh;
Now everything is UTF-8! You also can activate it on STDERR. Remember that if you want to print binary data on STDOUT (for whatever reason) you must disable the Layer.
binmode STDOUT, ':raw';

Why is the utf8 IO layer stripped when spawning a daemon process?

The following code works as expected:
use feature qw(say);
use strict;
use warnings;
use open qw/:std IN :encoding(utf-8) OUT :utf8/;
say join ' ', (PerlIO::get_layers(\*STDOUT));
my $pid = fork();
die "fork() failed: $!" unless defined $pid;
if ($pid == 0) {
say join ' ', (PerlIO::get_layers(\*STDOUT));
}
Output:
unix perlio utf8
unix perlio utf8
But if I use a daemon process instead of a regular fork:
use feature qw(say);
use strict;
use warnings;
use open qw/:std IN :encoding(utf-8) OUT :utf8/;
use Cwd qw(getcwd);
use Proc::Daemon;
my $work_dir = getcwd;
my $daemon = Proc::Daemon->new(
work_dir => $work_dir,
child_STDOUT => 'stdout.txt',
child_STDERR => 'stderr.txt',
pid_file => 'pid.txt',
);
my $pid = $daemon->Init();
if ( $pid == 0 ) {
say join ' ', (PerlIO::get_layers(\*STDOUT));
}
The output to the file stdout.txt is:
unix perlio
so the utf8 IO layer has been stripped off.
Consider this example:
Foo.pm:
package Foo;
use warnings;
use strict;
sub test {
close STDIN;
open \*STDIN, "<", "/dev/null";
}
1;
test.pl:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use open qw/:std IN :encoding(UTF-8) OUT :encoding(UTF-8)/;
use lib qw/./;
use Foo;
$, = ' ';
say "Original STDIN:", PerlIO::get_layers(\*STDIN);
close STDIN;
open \*STDIN, "<", "/dev/null";
say "Reopened STDIN:", PerlIO::get_layers(\*STDIN);
Foo::test();
say "Reopened STDIN in different package:", PerlIO::get_layers(\*STDIN);
Results:
$ perl test.pl
Original STDIN: unix perlio encoding(utf-8-strict) utf8
Reopened STDIN: unix perlio encoding(utf-8-strict) utf8
Reopened STDIN in different package: unix perlio
Looks like use open is like other pragmas and only applies to the file it's in it. So when Proc::Daemon closes standard input, output, and error, and reopens them, it naturally doesn't see the extra layers.

Trying to improve Encode::decode warning message: Segfault in $SIG{__WARN__} handler

I am trying to improve the warning message issued by Encode::decode(). Instead of printing the name of the module and the line number in the module, I would like it to print the name of the file being read and the line number in that file where the malformed data was found. To a developer, the origial message can be useful, but to an end user not familiar with Perl, it is probably quite meaningless. The end user would probably rather like to know which file is giving the problem.
I first tried to solve this using a $SIG{__WARN__} handler (which is probably not a good idea), but I get a segfault. Probably a silly mistake, but I could not figure it out:
#! /usr/bin/env perl
use feature qw(say);
use strict;
use warnings;
use Encode ();
binmode STDOUT, ':utf8';
binmode STDERR, ':utf8';
my $fn = 'test.txt';
write_test_file( $fn );
# Try to improve the Encode::FB_WARN fallback warning message :
#
# utf8 "\xE5" does not map to Unicode at <module_name> line xx
#
# Rather we would like the warning to print the filename and the line number:
#
# utf8 "\xE5" does not map to Unicode at line xx of file <filename>.
my $str = '';
open ( my $fh, "<:encoding(utf-8)", $fn ) or die "Could not open file '$fn': $!";
{
local $SIG{__WARN__} = sub { my_warn_handler( $fn, $_[0] ) };
$str = do { local $/; <$fh> };
}
close $fh;
say "Read string: '$str'";
sub my_warn_handler {
my ( $fn, $msg ) = #_;
if ( $msg =~ /\Qdoes not map to Unicode\E/ ) {
recover_line_number_and_char_pos( $fn, $msg );
}
else {
warn $msg;
}
}
sub recover_line_number_and_char_pos {
my ( $fn, $err_msg ) = #_;
chomp $err_msg;
$err_msg =~ s/(line \d+)\.$/$1/; # Remove period at end of sentence.
open ( $fh, "<:raw", $fn ) or die "Could not open file '$fn': $!";
my $raw_data = do { local $/; <$fh> };
close $fh;
my $str = Encode::decode( 'utf-8', $raw_data, Encode::FB_QUIET );
my ($header, $last_line) = $str =~ /^(.*\n)([^\n]*)$/s;
my $line_no = $str =~ tr/\n//;
++$line_no;
my $pos = ( length $last_line ) + 1;
warn "$err_msg, in file '$fn' (line: $line_no, pos: $pos)\n";
}
sub write_test_file {
my ( $fn ) = #_;
my $bytes = "Hello\nA\x{E5}\x{61}"; # 2 lines ending in iso 8859-1: åa
open ( my $fh, '>:raw', $fn ) or die "Could not open file '$fn': $!";
print $fh $bytes;
close $fh;
}
Output:
utf8 "\xE5" does not map to Unicode at ./p.pl line 27
, in file 'test.txt' (line: 2, pos: 2)
Segmentation fault (core dumped)
Here is another way to locate where the warning fires, with un-buffered sysread
use warnings;
use strict;
binmode STDOUT, ':utf8';
binmode STDERR, ':utf8';
my $file = 'test.txt';
open my $fh, "<:encoding(UTF-8)", $file or die "Can't open $file: $!";
$SIG{__WARN__} = sub { print "\t==> WARN: #_" };
my $char_cnt = 0;
my $char;
while (sysread($fh, $char, 1)) {
++$char_cnt;
print "$char ($char_cnt)\n";
}
The file test.txt was written by the posted program, except that I had to add to it to reproduce the behavior -- it runs without warnings on v5.10 and v5.16. I added \x{234234} to the end. The line number can be tracked with $char =~ /\n/.
The sysread returns undef on error. It can be moved into the body of while (1) to allow reads to continue and catch all warnings, breaking out on 0 (returned on EOF).
This prints
H (1)
e (2)
l (3)
l (4)
o (5)
(6)
A (7)
å (8)
a (9)
==> WARN: Code point 0x234234 is not Unicode, may not be portable at ...
(10)
While this does catch the character warned about, re-reading the file using Encode may well be better than reaching for sysread, in particular if sysread uses Encode.
However, Perl is utf8 internally and I am not sure that sysread needs Encode.
Note. The page for sysread supports its use on data with encoding layers
Note that if the filehandle has been marked as :utf8 Unicode
characters are read instead of bytes (the LENGTH, OFFSET, and the
return value of sysread are in Unicode characters). The
:encoding(...) layer implicitly introduces the :utf8 layer.
See binmode, open, and the open pragma.
Note Apparently, things have moved on and after a certain version sysread does not support encoding layers. The link above, while for an older version (v5.10 for one) indeed shows what is quoted, with a newer version tells us that there'll be an exception.

Trouble with IPC::Open3

I am writing a simple script using IPC::Open3. The script produces no output to either stdout or stderr, while I would expect output to both.
The complete source code:
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use IPC::Open3;
pipe my $input, my $output or die $!;
my $pid = open3(\*STDIN, $output, \*STDERR, 'dd', 'if=/usr/include/unistd.h') or die $!;
while(<$input>) {
print $_."\n";
}
waitpid $pid, 0;
I am fairly certain that I am using IPC::Open3 incorrectly. However, I am still confused as to what I should be doing.
It's the pipe. Without knowing why it's there I can't say more. This works fine.
my $reader;
my $pid = open3(\*STDIN, $reader, \*STDERR, 'dd', 'if=/usr/include/unistd.h') or die $!;
while(<$reader>) {
print $_."\n";
}
waitpid $pid, 0;
I realize it's probably just an example, but in case it's not... this is complete overkill for what you're doing. You can accomplish that with backticks.
print `dd if=/usr/include/unistd.h`
IPC::Open3 is a bit overcomplicated. There are better modules such as IPC::Run and IPC::Run3.
use strict;
use warnings;
use IPC::Run3;
run3(['perl', '-e', 'warn "Warning!"; print "Normal\n";'],
\*STDIN, \*STDOUT, \*STDERR
);
Your program suffers from the following problems:
\*STDIN (open STDIN as a pipe tied to the child's STDIN) should be <&STDIN (use the parent's STDIN as the child's STDIN).
\*STDERR (open STDERR as a pipe tied to the child's STDERR) should be >&STDERR (use the parent's STDERR as the child's STDERR).
The value you place in $output is being ignored and overwritten. Fortunately, it's being overwritten with a correct value!
You use print $_."\n";, but $_ is already newline-terminated. Either chomp first, or don't add a newline.
open3 isn't a system call, so it doesn't set $!.
open3 doesn't return false on error; it throws an exception.
So we get something like:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw( say );
use IPC::Open3;
my $pid = open3('<&STDIN', my $output, '>&STDERR',
'dd', 'if=/usr/include/unistd.h');
while (<$output>) {
chomp;
say "<$_>";
}
waitpid($pid, 0);

Perl using `IO::Handle` or `IO::File` when not reading actual files

I like using IO::File to open and read files rather than the built in way.
Built In Way
open my $fh, "<", $flle or die;
IO::File
use IO::File;
my $fh = IO::File->new( $file, "r" );
However, what if I am treating the output of a command as my file?
The built in open function allows me to do this:
open my $cmd_fh, "-|", "zcat $file.gz" or die;
while ( my $line < $cmd_fh > ) {
chomp $line;
}
What would be the equivalent of IO::File or IO::Handle?
By the way, I know can do this:
open my $cmd_fh, "-|", "zcat $file.gz" or die;
my $cmd_obj = IO::File-> new_from_fd( fileno( $cmd_fh ), 'r' );
But then why bother with IO::File if there's already a file handle?
First of all, your snippet fails if $file contains spaces or other special characters.
open my $cmd_fh, "-|", "zcat $file.gz" or die $!;
should be
open my $cmd_fh, "-|", "zcat", "$file.gz" or die $!;
or
use String::ShellQuote qw( shell_quote );
open my $cmd_fh, "-|", shell_quote("zcat", "$file.gz") or die $!;
or
use String::ShellQuote qw( shell_quote );
open my $cmd_fh, shell_quote("zcat", "$file.gz")."|" or die $!;
I mention the latter variants because passing a single arg to IO::File->open boils down to passing that arg to open($fh, $that_arg), so you could use
use String::ShellQuote qw( shell_quote );
IO::File->open(shell_quote("zcat", "$file.gz")."|") or die $!;
If all you want is to use IO::File's methods, you don't need to use IO::File->open.
use IO::Handle qw( ); # Needed on older versions of Perl
open my $cmd_fh, "-|", "zcat", "$file.gz" or die $!;
$cmd_fh->autoflush(1); # Example.
$cmd_fh->print("foo"); # Example.
You can open them just like in open, because that's exactly what IO::File does - it initializes IO::Handle object and links it to file opened with Perl's native open.
use IO::File;
if (my $fh = new IO::File('dmesg|')) {
print <$fh>;
$fh->close;
}
IO::File is really just a pretty wrapper. If what it does not complex enough for you, you can just go and init IO::Handle from any FD you like yourself. You need rest of IO::* OO functionality, I suppose, so who cares how does initializer looks?
If you want use pipes with IO::Handle, you can use IO::Pipe module.