How to replace characters in a file utf8 in perl? - perl

I have something like this (it works):
perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' unicode_text.txt
and now i want to do the same in the script:
#!/usr/bin/perl -w -CSA
use utf8;
use Text::Unidecode;
while(<>)
{
print unidecode($_);
}
but it doesn't work.

You should have got the error message
Too late for "-CSA" option
which is what makes the program read the input file as UTF-8-encoded.
Instead you need to put
use open qw( :std :utf8 );
before the while loop, which does the same as -CS on the command line, i.e. to set the STDIN, STDOUT and STDERR handles to UTF-8 encoding

Related

Is there a way to execute a file and one line of program in perl?

I want to execute some code before execution(redirect stderr to stdout).
perl -e "BEGIN {open STDERR, '>&STDOUT'}" perl.pl
But when there is -e, no file will be executed. I know $Config{sitelib}/sitecustomize.pl can pre-execute some code, and -f option can disable it. But this way is inflexible. In most cases, I do not need to add extra code. I don't want to add -f every time.
I cannot use shell to redirect. I want to set org-babel-perl-command in emacs org mode so that stdout and stderr can be printed in the same way, instead of opening another window to print stderr. org-babel-perl-command should be like perl.
For example, org-babel-python-command can be set to python -i -c "import sys; sys.stderr = sys.stdout".
perl -e'
open( STDERR, ">&STDOUT" );
do( shift( #ARGV ) );
' perl.pl
(Error handling needed.)
For the case in question, the following would suffice:
perl perl.pl 2>&1
Maybe even
./perl.pl 2>&1
You could just make a wrapper for perl. For example:
#!/bin/bash
exec perl "$#" 2>&1
Then make it executable and use instead of perl in your org-babel-perl-command. Ensure it can be found in your PATH or use full location.

UTF8 output from Strawberry Perl

I have a text file test.txt with the UTF8 encoded content äöü (these are German umlauts and just an example, the file size is 6 Bytes). I also have a Cygwin terminal on a Windows 10 PC with the correct LANG settings:
$ cat test.txt
äöü
I'd like to print the content of this file with a Perl script, but can't get it to work.
open my $fh, '<', 'test.txt';
print <$fh>;
close $fh;
results in
$ perl test.pl
├ñ├Â├╝
I tried all variations I found at How can I output UTF-8 from Perl? - none of them solved my problem. What's wrong?
EDIT per request:
$ file test.txt
test.txt: UTF-8 Unicode text, with no line terminators
$ echo $LANG
I also tried setting LANG to de_DE.UTF-8.
EDIT to narrow down the problem: If I try this with the Perl version 5.32.1 included in Cygwin, it works as expected. It still doesn't work in Strawberry Perl version 5.32.1. So it's probably no Perl problem nor a Windows problem nor something with language or encoding settings, it's a Strawberry Perl problem.
If you are in a cmd.exe window or in PowerShell, you can change the codepage to 65001:
chcp 65001
If you do not want to change the codepage find out what chcp (or "cp".Win32::GetConsoleOutputCP()) returns and encode to that encoding.
use Encode;
open my $fh, '<:utf8','test.txt';
while(<$fh>){
print encode('cp850',$_); # needs a recent Encode to support cp850
};
close $fh;
If you are in cygwin bash, you can call chcp with system() like so:
use strict;
use warnings;
use Encode;
system("chcp 65001 > NUL");
open my $fh, '<:utf8','test.txt';
while(<$fh>){
print encode('utf8',$_); # needs a recent Encode to support cp850
};
close $fh;
It seems you are missing the LANG setting
$ export LANG=de_DE.UTF-8
$ echo $LANG
de_DE.UTF-8
$ cat test.txt
äöü
$ perl test.pl
äöü
$ file test.txt
test.txt: UTF-8 Unicode text
$ od -c test.txt
0000000 303 244 303 266 303 274 \n
0000007
$ which perl
/usr/bin/perl
$ "$( cygpath 'C:\progs\sp5302-x64\perl\bin\perl.exe' )" -M5.010 -e'
use Win32;
BEGIN {
Win32::SetConsoleCP(65001);
Win32::SetConsoleOutputCP(65001);
}
use open ":std", ":encoding(UTF-8)";
say chr(0x2660);
'
♠
(BEGIN { `chcp 65001` } would also have done the trick.)
You may explicitly define encoding of input and output.
open( my $fh, '<:utf8', 'test.txt');
binmode(STDOUT,':utf8');
print <$fh>;
close $fh;

What is the use -I option in perl

In perl program I have found this line #!/usr/bin/perl -I Directory_name
So I searched in net about the option -I in perl but i have not find the correct explaination.Can anyone tell me that what is the use of this option?
There is a problem with your question, not the fact that you are just asking a researchable question, but the fact that you specify file_name in:
#!/usr/bin/perl -I file_name
the above is incorrect as -I searches a given directory for packages and you are specifiying a filename. The below switches will work, but each performs a different function.
#!/usr/bin/perl -i file_name
#!/usr/bin/perl -I DIR
There is a difference between -i and -I in perl.
So short answer:
-i: Modifies your input file in-place (making a backup of the
original). Handy to modify files without the {copy,
delete-original, rename} process.
-I: Directories specified by -I are prepended to the search path for modules (#INC )
Source of more detail perlrun
-I includes a directory to search for packages. See perldoc perlrun.
Here is an example, notice that we use Bar instead of use Foo::Bar:
test.pl
Foo/Bar.pm
test.pl
#!/usr/bin/perl -I Foo
use warnings;
use strict;
use Bar qw( frobnicate ); # Foo/Bar.pm
my $output = frobnicate('hello');
print $output . "\n";
Foo/Bar.pm
package Bar;
require Exporter;
#ISA = qw(Exporter);
#EXPORT_OK = qw(munge frobnicate); # symbols to export on request
1;
sub munge {
my ($txt) = #_;
return join '*', sort split //, $txt;
}
sub frobnicate {
my ($txt) = #_;
return join '....', sort split //, $txt;
}
Output
./test.pl
e....h....l....l....o
We can also find the short description of perl command line options using the below command. So there is no necessary to install perldoc.
perl --help
The above command gives an short description of each options which are may not find in man pages.

awk usage in perl scripting

Hi am writing a script which its need to grep the 6th column of the output using awk command but am getting other output.
What is the exact syntax in perl to extract 6th column using awk?
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
my $filesystem=`df -h |grep -i dev|grep -vE '^Filesystem|proc|none|udev|tmpfs'`;
print "(\"$filesystem\"|awk '{print \$6}')"
Output :
7831c1c4be8c% ./test.pl
("/dev/disk1 112Gi 43Gi 69Gi 39% 11227595 18084674 38% /
devfs 183Ki 183Ki 0Bi 100% 634 0 100% /dev
"|awk '{print $6}')%
Am trying to remove the % how it can be done ?
7831c1c4be8c% cat test.pl
#!/usr/bin/perl
use warnings;
use strict;
open my $FS, q(df -h |) or die $!;
while (<$FS>) {
print +(split)[4], "\n"
if /dev/i and not /devfs/;
}
7831c1c4be8c% ./test.pl
40%
You don't need awk inside Perl.
#!/usr/bin/perl
use warnings;
use strict;
open my $FS, '-|', q(df -h) or die $!;
while (<$FS>) {
print +(split)[5], "\n"
if /dev/i and not /^Filesystem|proc|none|udev|tmpfs/;
}
as the previous answer says, you don't need awk or grep system calls in perl. however, I will tell you that one reason your code isn't working is because you never made the awk system call. print does not execute the system call. you would have to use system() to execute it.
anyway fwiw you can also do what you want in a one-liner like so:
df -h | perl -lnae 'next if $F[0] =~ /regex/; print $F[5]'

How to delete first line of file in perl script

How can i remove first line of txt file in perl script?
`sed "1d" filename.txt`
Dosen't work.
You can use Tie::File:
use Tie::File;
tie #array, 'Tie::File', $filename or die $!;
shift #array;
untie #array;
`sed 1d filename.txt > newfile.txt`
should work. If you don't redirect it to a file, it will just read the whole file minus the first line to stdout.
That sends to the output to sed's STDOUT (which Perl proceeds to capture into a variable) instead of filename.txt. You want to use sed's -i.
sed -i "1d" filename.txt
Since there's no output to capture, it makes no sense to use backticks. You want system.
system('sed -i "1d" filename.txt');
Better: (Avoids launching another shell)
system('sed', '-i', '1d', 'filename.txt');
Best: (Does error checking for you)
use IPC::System::Simple qw( systemx );
systemx('sed', '-i', '1d', 'filename.txt');
Because you're doing this using an inline sed, here is a Perl equiv..
perl -ne'$.==1?next:print' <(seq 1 10)
Where the options mean,
-n assume "while (<>) { ... }" loop around program
-e program one line of program (several -e's allowed, omit programfile)
Other notes,
$. is the variable for the current line number.
<() is Bash voodoo for generating a FIFO in the background.