UTF8 output from Strawberry Perl - perl

I have a text file test.txt with the UTF8 encoded content äöü (these are German umlauts and just an example, the file size is 6 Bytes). I also have a Cygwin terminal on a Windows 10 PC with the correct LANG settings:
$ cat test.txt
äöü
I'd like to print the content of this file with a Perl script, but can't get it to work.
open my $fh, '<', 'test.txt';
print <$fh>;
close $fh;
results in
$ perl test.pl
├ñ├Â├╝
I tried all variations I found at How can I output UTF-8 from Perl? - none of them solved my problem. What's wrong?
EDIT per request:
$ file test.txt
test.txt: UTF-8 Unicode text, with no line terminators
$ echo $LANG
I also tried setting LANG to de_DE.UTF-8.
EDIT to narrow down the problem: If I try this with the Perl version 5.32.1 included in Cygwin, it works as expected. It still doesn't work in Strawberry Perl version 5.32.1. So it's probably no Perl problem nor a Windows problem nor something with language or encoding settings, it's a Strawberry Perl problem.

If you are in a cmd.exe window or in PowerShell, you can change the codepage to 65001:
chcp 65001
If you do not want to change the codepage find out what chcp (or "cp".Win32::GetConsoleOutputCP()) returns and encode to that encoding.
use Encode;
open my $fh, '<:utf8','test.txt';
while(<$fh>){
print encode('cp850',$_); # needs a recent Encode to support cp850
};
close $fh;
If you are in cygwin bash, you can call chcp with system() like so:
use strict;
use warnings;
use Encode;
system("chcp 65001 > NUL");
open my $fh, '<:utf8','test.txt';
while(<$fh>){
print encode('utf8',$_); # needs a recent Encode to support cp850
};
close $fh;

It seems you are missing the LANG setting
$ export LANG=de_DE.UTF-8
$ echo $LANG
de_DE.UTF-8
$ cat test.txt
äöü
$ perl test.pl
äöü
$ file test.txt
test.txt: UTF-8 Unicode text
$ od -c test.txt
0000000 303 244 303 266 303 274 \n
0000007
$ which perl
/usr/bin/perl

$ "$( cygpath 'C:\progs\sp5302-x64\perl\bin\perl.exe' )" -M5.010 -e'
use Win32;
BEGIN {
Win32::SetConsoleCP(65001);
Win32::SetConsoleOutputCP(65001);
}
use open ":std", ":encoding(UTF-8)";
say chr(0x2660);
'
♠
(BEGIN { `chcp 65001` } would also have done the trick.)

You may explicitly define encoding of input and output.
open( my $fh, '<:utf8', 'test.txt');
binmode(STDOUT,':utf8');
print <$fh>;
close $fh;

Related

awk usage in perl scripting

Hi am writing a script which its need to grep the 6th column of the output using awk command but am getting other output.
What is the exact syntax in perl to extract 6th column using awk?
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
my $filesystem=`df -h |grep -i dev|grep -vE '^Filesystem|proc|none|udev|tmpfs'`;
print "(\"$filesystem\"|awk '{print \$6}')"
Output :
7831c1c4be8c% ./test.pl
("/dev/disk1 112Gi 43Gi 69Gi 39% 11227595 18084674 38% /
devfs 183Ki 183Ki 0Bi 100% 634 0 100% /dev
"|awk '{print $6}')%
Am trying to remove the % how it can be done ?
7831c1c4be8c% cat test.pl
#!/usr/bin/perl
use warnings;
use strict;
open my $FS, q(df -h |) or die $!;
while (<$FS>) {
print +(split)[4], "\n"
if /dev/i and not /devfs/;
}
7831c1c4be8c% ./test.pl
40%
You don't need awk inside Perl.
#!/usr/bin/perl
use warnings;
use strict;
open my $FS, '-|', q(df -h) or die $!;
while (<$FS>) {
print +(split)[5], "\n"
if /dev/i and not /^Filesystem|proc|none|udev|tmpfs/;
}
as the previous answer says, you don't need awk or grep system calls in perl. however, I will tell you that one reason your code isn't working is because you never made the awk system call. print does not execute the system call. you would have to use system() to execute it.
anyway fwiw you can also do what you want in a one-liner like so:
df -h | perl -lnae 'next if $F[0] =~ /regex/; print $F[5]'

How to replace characters in a file utf8 in perl?

I have something like this (it works):
perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' unicode_text.txt
and now i want to do the same in the script:
#!/usr/bin/perl -w -CSA
use utf8;
use Text::Unidecode;
while(<>)
{
print unidecode($_);
}
but it doesn't work.
You should have got the error message
Too late for "-CSA" option
which is what makes the program read the input file as UTF-8-encoded.
Instead you need to put
use open qw( :std :utf8 );
before the while loop, which does the same as -CS on the command line, i.e. to set the STDIN, STDOUT and STDERR handles to UTF-8 encoding

Use sed to read file into middle of line

File A contains this text (assume that "alpha" and "bravo" are arbitrarily long chunks on a single line):
alpha {FOO} bravo
File B contains an arbitrary amount of text, including all sorts of wacky characters.
I want to replace the string "{FOO}" in file A with the contents of file B. Using sed's 'r' command as follows doesn't work because it inserts the content of file B after that line:
cat A | sed -e "/{FOO}/r B"
Is there any way, using sed, to end up with a file that consists of:
alpha [the contents of B] bravo
? If it would be easier to do this with, say, perl, that's fine too. But I know even less about perl than I do about sed. ;)
Short perl solution:
FOO="$( cat replacement.txt )" perl -pe's/\{FOO\}/$ENV{FOO}/g'
This will work with any character except 00 NUL. If you have to deal with binary files, you can use:
perl -pe'
BEGIN {
open(my $fh, "<", shift(#ARGV)) or die $!;
local $/;
$FOO = <$fh>;
}
s/\{FOO\}/$FOO/g
' replacement.txt
Usage:
perl -i~ -pe'...' file # In-place edit with backup
perl -i -pe'...' file # In-place edit without backup
perl -pe'...' file.in >file.out # Read from named file(s)
perl -pe'...' <file.in >file.out # Read from STDIN
If this is Bash on Linux, this seems to work:
sed -i "s/{FOO}/$(cat B.txt)/g" A.txt
This will directly edit the file A.txt - they don't have to be .txt files, I just added those in to make it more obvious.
As #ikegami points out, this will have problems if there any / in the file - also any \ will probably be ignored.
So in an attempt to solve that, you should be able to use:
sed -i "s/\//%2F/g" B.txt
sed -i "s/{FOO}/$(cat B.txt)/g" A.txt
sed -i "s/%2F/\//g" B.txt
You won't have to use %2F though.
#!/usr/bin/perl
use strict;
use warnings;
open my $fh_a, '<', $ARGV[0] or die "Failed to open $ARGV[0] for reading";
open my $fh_b, '<', $ARGV[1] or die "Failed to open $ARGV[1] for reading";
my $a;
my $b;
{
local $/;
$a = <$fh_a>;
$b = <$fh_b>;
}
close $fh_a;
close $fh_b;
$a =~ s/{FOO}/$b/;
print $a;
As long as your files both fit in memory twice, this should be fine. The local $/; puts the I/O system into 'slurp' mode, reading the whole file in a single operation.
Usage:
perl replace_foo.pl fileA fileB

Perl backticks: flags do not exist

When executing the following segment of code,
sub list {
my($self)=#_;
my $file = $self->{P_Dir}."/".$self->{Name};
print `ls –l $file`;
}
I get this error:
ls: cannot access –l: No such file or directory
I am not really sure what is causing that, since if I manually type ls -l into the command line, I do not see that error.
That – that you've thankfully copy & pasted is a Unicode en dash character (U+2013) and not the ASCII hyphen character - (U+002D).
Hmmm... It works for me though:
$ cat test.pl
#!/usr/bin/perl -w
use strict;
my $file = "rpm.pl";
print `ls -l $file`;
$ perl test.pl
-rw-r--r-- 1 dheeraj dheeraj 922 2012-10-22 19:56 rpm.pl

Perl - One liner file edit: "perl -n -i.bak -e "print unless /^$id$,/" $filetoopena;" Not working

I cannot get this to work.
#!/usr/bin/perl -w
use strict;
use CGI::Carp qw(fatalsToBrowser warningsToBrowser);
my $id='123456';
my $filetoopen = '/home/user/public/somefile.txt';
file contains:
123456
234564
364899
437373
So...
A bunch of other subs and code
if(-s $filetoopen){
perl -n -i.bak -e "print unless /^$id$,/" $filetoopen;
}
I need to remove the line that matches $id from file $filetoopen
But, I don't want script to "crash" if $id is not in $filetoopen either.
This is in a .pl scripts sub, not being run from command line.
I think I am close but, after reading for hours here, I had to resort to posting the question.
Will this even work in a script?
I tried TIE with success but, I need to know alternatively how to do this without TIE::FILE.
When I tried I got the error:
syntax error at mylearningcurve.pl line 456, near "bak -e "
Thanks for teaching this old dog...
First of all (this is not the cause of your problem) $, (aka $OUTPUT_FIELD_SEPARATOR) defaults to undef, I'm not sure why you are using it in the regex. I have a feeling the comma was a typo.
It's unclear if you are calling this from a shell script or from Perl?
If from Perl, you should not call a nested Perl interpreter at all.
If the file is small, slurp it in and print:
use File::Slurp;
my #lines = read_file($filename);
write_file($filename, grep { ! /^$id$/ } #lines);
If the file is large, read line by line as a filter.
use File::Copy;
move($filename, "$filename.old") or die "Can not rename: $!\n";
open(my $fh_old, "<", "$filename.old") or die "Can not open $filename.old: $!\n";
open(my $fh, ">", $filename) or die "Can not open $filename: $!\n";
while my $line (<$fh_old>) {
next if $line =~ /^id$/;
print $fh $_;
}
close($fh_old);
close($fh);
If from a shell script, this worked for me:
$ cat x1
123456
234564
364899
437373
$ perl -n -i.bak -e "print unless /^$id$/" x1
$ cat x1
234564
364899
437373
if(-s $filetoopen){
perl -n -i.bak -e "print unless /^$id$,/" $filetoopen;
}
I'm not at all sure what you expect this to do. You can't just put a command line program in the middle of Perl code. You need to use system to call an external program. And Perl is just an external program like any other.
if(-s $filetoopen){
system('perl', '-n -i.bak -e "print unless /^$id$,/"', $filetoopen);
}
The functionality of the -i command line argument can be accessed via $^I.
local #ARGV = $filetoopen;
local $^I = '.bak';
local $_;
while (<>) {
print if !/^$id$/;
}