Why is the utf8 IO layer stripped when spawning a daemon process? - perl

The following code works as expected:
use feature qw(say);
use strict;
use warnings;
use open qw/:std IN :encoding(utf-8) OUT :utf8/;
say join ' ', (PerlIO::get_layers(\*STDOUT));
my $pid = fork();
die "fork() failed: $!" unless defined $pid;
if ($pid == 0) {
say join ' ', (PerlIO::get_layers(\*STDOUT));
}
Output:
unix perlio utf8
unix perlio utf8
But if I use a daemon process instead of a regular fork:
use feature qw(say);
use strict;
use warnings;
use open qw/:std IN :encoding(utf-8) OUT :utf8/;
use Cwd qw(getcwd);
use Proc::Daemon;
my $work_dir = getcwd;
my $daemon = Proc::Daemon->new(
work_dir => $work_dir,
child_STDOUT => 'stdout.txt',
child_STDERR => 'stderr.txt',
pid_file => 'pid.txt',
);
my $pid = $daemon->Init();
if ( $pid == 0 ) {
say join ' ', (PerlIO::get_layers(\*STDOUT));
}
The output to the file stdout.txt is:
unix perlio
so the utf8 IO layer has been stripped off.

Consider this example:
Foo.pm:
package Foo;
use warnings;
use strict;
sub test {
close STDIN;
open \*STDIN, "<", "/dev/null";
}
1;
test.pl:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use open qw/:std IN :encoding(UTF-8) OUT :encoding(UTF-8)/;
use lib qw/./;
use Foo;
$, = ' ';
say "Original STDIN:", PerlIO::get_layers(\*STDIN);
close STDIN;
open \*STDIN, "<", "/dev/null";
say "Reopened STDIN:", PerlIO::get_layers(\*STDIN);
Foo::test();
say "Reopened STDIN in different package:", PerlIO::get_layers(\*STDIN);
Results:
$ perl test.pl
Original STDIN: unix perlio encoding(utf-8-strict) utf8
Reopened STDIN: unix perlio encoding(utf-8-strict) utf8
Reopened STDIN in different package: unix perlio
Looks like use open is like other pragmas and only applies to the file it's in it. So when Proc::Daemon closes standard input, output, and error, and reopens them, it naturally doesn't see the extra layers.

Related

Perl and utf8 output from file [duplicate]

This question already has answers here:
How can I output UTF-8 from Perl?
(6 answers)
Closed 6 months ago.
I have a problem with perl output : the french word "préféré" is sometimes outputted "pr�f�r�" :
The sample script :
devel#k0:~/tmp$ cat 02.pl
#!/usr/bin/env perl
use strict;
use warnings;
print "préféré\n";
open( my $fh, '<:encoding(UTF-8)', 'text' ) ;
while ( <$fh> ) { print $_ }
close $fh;
exit;
The execution :
devel#k0:~/tmp$ ./02.pl
préféré
pr�f�r�
devel#k0:~/tmp$ cat text
préféré
devel#k0:~/tmp$ file text
text: UTF-8 Unicode text
Can please someone help me ?
Decode your inputs, encode your outputs. You have two bugs related to failure to properly decode and encode.
Specifically, you're missing
use utf8;
use open ":std", ":encoding(UTF-8)";
Details follow.
Perl source code is expected to be ASCII (with 8-bit clean string literals) unless you use use utf8 to tell Perl it's UTF-8.
I believe you have a UTF-8 terminal. We can conclude from the fact that cat 02.pl works that your source code is encoded using UTF-8. This means Perl sees the equivalent of this:
print "pr\x{C3}\x{A9}f\x{C3}\x{A9}r\x{C3}\x{A9}\n"; # C3 A9 = é encoded using UTF-8
You should be using use utf8; so Perl sees the equivalent of
print "pr\x{E9}f\x{E9}r\x{E9}\n"; # E9 = Unicode Code Point for é
You correctly decode the file you read.
The file presumably contains
70 72 C3 A9 66 C3 A9 72 C3 A9 0A # préféré␊ encoded using UTF-8
Because of the encoding layer you add, you are effectively doing
$_ = decode( "UTF-8", "\x{70}\x{72}\x{C3}\x{A9}\x{66}\x{C3}\x{A9}\x{72}\x{C3}\x{A9}\x{0A}" );
or
$_ = "pr\x{E9}f\x{E9}r\x{E9}\n";
This is correct.
Finally, you fail to encode your outputs.
The following does what you want:
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
BEGIN {
binmode( STDIN, ":encoding(UTF-8)" ); # Well, not needed here.
binmode( STDOUT, ":encoding(UTF-8)" );
binmode( STDERR, ":encoding(UTF-8)" );
}
print "préféré\n";
open( my $fh, '<:encoding(UTF-8)', 'text' ) or die $!;
while ( <$fh> ) { print $_ }
close $fh;
But the open pragma makes it a lot cleaner.
The following does what you want:
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use open ":std", ":encoding(UTF-8)";
print "préféré\n";
open( my $fh, '<', 'text' ) or die $!;
while ( <$fh> ) { print $_ }
close $fh;
UTF-8 is an interesting problem. First, your Perl itself will print correctly, because you don't do any UTF-8 Handling. You have an UTF-8 String, but Perl itself don't really know that it is UTF-8, and it will also print it, as-is.
So an an UTF-8 Terminal. Everything looks fine. Even that's not the case.
When you add use utf8; to your source-code. You will see, that your print now will produce the same garbage. But if you have string containing UTF-8. That's what you should do.
use utf8;
# Now also prints garbage
print "préféré\n";
open my $fh, '<:encoding(UTF-8)', 'text';
while ( <$fh> ) {
print $_;
}
close $fh;
Next. For every input you do from the outside, you need to do an decode, and for every output you do. You need todo an encode.
use utf8;
use Encode qw(encode decode);
# Now correct
print encode("UTF-8", "préféré\n");
open my $fh, '<:encoding(UTF-8)', 'text';
while ( <$fh> ) {
print encode("UTF-8", $_);
}
close $fh;
This can be tedious. But you can enable Auto-Encoding on a FileHandle with binmode
use utf8;
# Activate UTF-8 Encode on STDOUT
binmode STDOUT, ':utf8';
print "préféré\n";
open my $fh, '<:encoding(UTF-8)', 'text';
while ( <$fh> ) {
print $_;
}
close $fh;
Now everything is UTF-8! You also can activate it on STDERR. Remember that if you want to print binary data on STDOUT (for whatever reason) you must disable the Layer.
binmode STDOUT, ':raw';

Trouble with IPC::Open3

I am writing a simple script using IPC::Open3. The script produces no output to either stdout or stderr, while I would expect output to both.
The complete source code:
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use IPC::Open3;
pipe my $input, my $output or die $!;
my $pid = open3(\*STDIN, $output, \*STDERR, 'dd', 'if=/usr/include/unistd.h') or die $!;
while(<$input>) {
print $_."\n";
}
waitpid $pid, 0;
I am fairly certain that I am using IPC::Open3 incorrectly. However, I am still confused as to what I should be doing.
It's the pipe. Without knowing why it's there I can't say more. This works fine.
my $reader;
my $pid = open3(\*STDIN, $reader, \*STDERR, 'dd', 'if=/usr/include/unistd.h') or die $!;
while(<$reader>) {
print $_."\n";
}
waitpid $pid, 0;
I realize it's probably just an example, but in case it's not... this is complete overkill for what you're doing. You can accomplish that with backticks.
print `dd if=/usr/include/unistd.h`
IPC::Open3 is a bit overcomplicated. There are better modules such as IPC::Run and IPC::Run3.
use strict;
use warnings;
use IPC::Run3;
run3(['perl', '-e', 'warn "Warning!"; print "Normal\n";'],
\*STDIN, \*STDOUT, \*STDERR
);
Your program suffers from the following problems:
\*STDIN (open STDIN as a pipe tied to the child's STDIN) should be <&STDIN (use the parent's STDIN as the child's STDIN).
\*STDERR (open STDERR as a pipe tied to the child's STDERR) should be >&STDERR (use the parent's STDERR as the child's STDERR).
The value you place in $output is being ignored and overwritten. Fortunately, it's being overwritten with a correct value!
You use print $_."\n";, but $_ is already newline-terminated. Either chomp first, or don't add a newline.
open3 isn't a system call, so it doesn't set $!.
open3 doesn't return false on error; it throws an exception.
So we get something like:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw( say );
use IPC::Open3;
my $pid = open3('<&STDIN', my $output, '>&STDERR',
'dd', 'if=/usr/include/unistd.h');
while (<$output>) {
chomp;
say "<$_>";
}
waitpid($pid, 0);

Should I pop after each binmode?

When using binmode, should I pop the layers from a possibly previous used binmode?
#!/usr/bin/env perl
use warnings;
use 5.012;
use autodie;
open my $tty, '>:encoding(utf8)', '/dev/tty'; # ...
close $tty;
open $tty, '>:encoding(latin1)', '/dev/tty'; # ...
close $tty;
open $tty, '>:encoding(utf8)', '/dev/tty'; # ...
close $tty;
open $tty, '>:encoding(latin1)', '/dev/tty'; # ...
close $tty;
open $tty, '>:bytes', '/dev/tty';
say "#{[ PerlIO::get_layers( $tty ) ]}"; # unix perlio
close $tty;
say "----------------------------------------";
binmode STDOUT, ':encoding(utf8)'; # ...
binmode STDOUT, ':encoding(latin1)'; # ...
binmode STDOUT, ':encoding(utf8)'; # ...
binmode STDOUT, ':encoding(latin1)'; # ...
binmode STDOUT, ':bytes';
say "#{[ PerlIO::get_layers( *STDOUT ) ]}"; # unix perlio encoding(utf8) /
# utf8 encoding(iso-8859-1) utf8 encoding(utf8) utf8 encoding(iso-8859-1)
binmode STDOUT, ':pop:pop:pop:pop:bytes';
say "#{[ PerlIO::get_layers( *STDOUT ) ]}"; # unix perlio
.
#!/usr/bin/env perl
use warnings;
use 5.012;
use autodie;
open my $tty, '>:encoding(utf8)', '/dev/tty'; # ...
close $tty;
open $tty, '>:raw', '/dev/tty';
say "#{[ PerlIO::get_layers( $tty ) ]}"; # unix
close $tty;
say "----------------------------------------";
binmode STDOUT, ':encoding(utf8)'; # ...
binmode STDOUT, ':raw';
say "#{[ PerlIO::get_layers( *STDOUT ) ]}"; # unix perlio
binmode STDOUT, ':pop:raw';
say "#{[ PerlIO::get_layers( *STDOUT ) ]}"; # unix
:pop is required to pop real layers, such as :encoding(...). So yes, if you want to replace a real layer by another one, then you'll have to :pop.
But note that pushing :raw actually results in a series of pop... And :perlio automatically inserts :unix underneath. So the exact number of pops really depends on the current layers.
As the documentation says itself:
A more elegant (and safer) interface is needed.

How to know whether there are pipelines input to a perl program

I can use <> to loop there the pipeline input to a perl program. However how can I decide whether there are pipelined input, if there is no pipelined input I will use environment variable to load a file. I am trying to use:
my #lines = (<>);
if ($#lines == -1) {
use setenv;
open FILE, "$ENV{'ART_FILE_LIST'}" or die $!;
#lines = <FILE>;
}
Obviously it doesn't work, because the program will waiting at the first line
use 5.010_000;
use utf8;
use strict;
use autodie;
use warnings qw< FATAL all >;
use open qw< :std :utf8 >;
END {
close(STDOUT)
|| die "can't close stdout: $!";
}
if (#ARGV == 0 && -t STDIN) {
# NB: This is magic open, so the envariable
# could hold a pipe, like 'cat -n /some/file |'
#ARGV = $ENV{ART_FILE_LIST}
|| die q(need $ART_FILE_LIST envariable set);
}
while (<>) {
# blah blah blah
}
You can use the -t operator to see if you are a terminal, i.e., not a pipeline:
if (-t STDIN) { print "Terminal\n" }
else { print "Not a terminal\n" }
Use Getopt::Long
perl -Mylib -e 'Mylib::do_stuff' --i_am_pipe_lined
One of the things about UNIX pipelines is that they achieve their usefulness by not caring what's before them or after them. They just have a job to do and they do it. They do one thing, simply, but they all have switches to do their simple job with a little more customization.

How can I fake STDIN in Perl?

I am unit testing a component that requires user input. How do I tell Test::More to use some input that I predefined so that I don't need to enter it manually?
This is what I have now:
use strict;
use warnings;
use Test::More;
use TestClass;
*STDIN = "1\n";
foreach my $file (#files)
{
#this constructor asks for user input if it cannot find the file (1 is ignore);
my $test = TestClass->new( file=> #files );
isa_ok( $test, 'TestClass');
}
done_testing;
This code does press enter but the function is retrieving 0 not 1;
If the program reads from STDIN, then just set STDIN to be the open filehandle you want it to be:
#!perl
use strict;
use warnings;
use Test::More;
*STDIN = *DATA;
my #a = <STDIN>;
is_deeply \#a, ["foo\n", "bar\n", "baz\n"], "can read from the DATA section";
my $fakefile = "1\n2\n3\n";
open my $fh, "<", \$fakefile
or die "could not open fake file: $!";
*STDIN = $fh;
my #b = <STDIN>;
is_deeply \#b, ["1\n", "2\n", "3\n"], "can read from a fake file";
done_testing;
__DATA__;
foo
bar
baz
You may want to read more about typeglobs in perldoc perldata and more about turning strings into fake files in the documentation for open (look for "Since v5.8.0, perl has built using PerlIO by default.") in perldoc perlfunc.
The following minimal script seems to work:
#!/usr/bin/perl
package TestClass;
use strict;
use warnings;
sub new {
my $class = shift;
return unless <STDIN> eq "1\n";
bless {} => $class;
}
package main;
use strict;
use warnings;
use Test::More tests => 1;
{
open my $stdin, '<', \ "1\n"
or die "Cannot open STDIN to read from string: $!";
local *STDIN = $stdin;
my $test = TestClass->new;
isa_ok( $test, 'TestClass');
}
Output:
C:\Temp> t
1..1
ok 1 - The object isa TestClass