Perl substitute all Numbers to Alphabet
abc4xyz5u
to
abcdxyzeu
I try this,but it not work:
echo 'abc4xyz5u' | perl -pe'#n=1..9;#a=a..j;#h{#n}=#a;s#$n[$_]#$h{$&}#g for 0..$#n'
I know y/[1-9]/[a-j]/, but I want to use a substitute.
Your issue is within
s#$n[$_]#$h{$&}#g for 0..$#n
You expect $_ to be your input (so that s### is applied on it), but also $n[$_] to use the $_ from the for loop (0 to $#n). If you were to add a print, you'd notice that $_'s value within this loop is 0 to $#n, rather than your input.
What you could do instead to fix it is something like:
$r=$_; $r=~s#$n[$_]#$h{$&}#g for 0..$#n; $_=$r
But that's much more complicated that it has to be. I would instead do:
s#([1-9])#$h{$1}#g
Or, without using %h (since, let's face it, an hash with 0 => a, 1 => b etc. should be an array):
perl -pe '#a="a".."j"; s#([1-9])#$a[$1-1]#g'
Or, without requiring an array at all (I'll let you decide if you find it easier or harder to read; personally I'm fine with it),
perl -pe 's/([1-9])/chr(ord("a")+$1-1)/ge'
I would suggest to write it properly as a perl script.
The one liner you mentioned is little hard to understand.
use strict;
use warnings;
my #alphabets = ("a".."z");
my $input = $ARGV[0];
$input =~ s/(\d)/$alphabets[$1 - 1]/g;
print $input;
Run -
perl substitute.pl abc4xyz5u
Output -
abcdxyzeu
I am serching for the number in the string and replacing it with the alphabet on the same position(remenber array start form 0 index and hence 'position -1') in the 'alphabets' array
Related
There is a file which contains more than 20 million records. I need to use perl to aggregate the numbers and print the TOTAL on the last line. The numbers that I am supposed to aggregate are very big numbers and they could be positive or negative. I am using bignum module of perl to aggregate the numbers. However, it is not showing the correct results. Please advise.
sample.txt (in reality, this file contains more than 20 million records):
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
Expected output (my perl one liner is showing incorrect TOTAL on the last line):
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
TOTAL=<<total_should_be_printed>>
The perl one liner I am using:
perl -Mbignum -ne 'BEGIN{my $sum=0;} s/\r?\n$//; $sum=$sum+$_; print "$_\n"; END{print "TOTAL=$sum"."\n";}' sample.txt
The perl one-liner is showing the TOTAL as 40648913273951600.00 and this is INCORRECT.
EDIT: Following one-liner is showing 40648913273951631.2469 as answer. Now it is really getting weird......
perl -Mbignum -e 'my $num1=Math::BigFloat->new("12345678910111213.00"); my $num2=Math::BigFloat->new("14151617181920212.12345"); my $num3=Math::BigFloat->new("23242526272829301.54321"); my $num4=Math::BigFloat->new("32333435363738394.23456"); my $num5=Math::BigFloat->new("-41424344454647489.65432"); my $sum=$num1+$num2+$num3+$num4+$num5; print $sum."\n";'
Please verify calculation based on Math::BigFloat module.
use strict;
use warnings;
use feature 'say';
use Math::BigFloat;
my $sum = Math::BigFloat->new(0);
$sum->precision(-5);
while( <DATA> ) {
my $x = Math::BigFloat->new($_);
$sum->badd($x);
say $x;
}
say "\nSUM: $sum";
exit 0;
__DATA__
12345678910111213.00
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
Output
12345678910111213
14151617181920212.12345
23242526272829301.54321
32333435363738394.23456
-41424344454647489.65432
SUM: 40648913273951631.24690
The main job of the bignum pragma is to turn literal numbers into Math::BigInt objects. Once assigned to a variable, that variable will also be an object, and any arithmetic operations carried out using it will be done using Math::BigInt operator overloading.
Since you are reading values from a file, they won't automatically be converted into Math::BigInt values. So you need something else to be the object, in this case $sum. By initialising to the literal 0 value as you have done, $sum becomes an object. Unfortunately you declare my $sum within the scope of the BEGIN block. Outside of this scope, $sum refers to a different package variable, which hasn't been initialised into an object.
So you need to declare the variable outside of the BEGIN, or add a literal zero to it to coerce it into an object:
perl -Mbignum -lne' $sum += 0+$_; END {print $sum}'
I have a file which has a lot of floating point numbers like this:
4.5268e-06 4.5268e-08 4.5678e-01 4.5689e-04...
I need to check if there is atleast one number with an expoenent -1. So, I wrote this short snippet with the regex. The regex works because I checked and it does. But what I am getting in the output is all 1s. I know I am missing something very basic. Please help.
#!usr/local/bin/perl
use strict;
use warnings;
my $i;
my #values;
open(WPR,"test.txt")||die "couldnt open $!";
while(<WPR>)
{
chomp();
push #values,(/\d\.\d\d\d\de+[+-][0][1]/);
}
foreach $i (#values){
print "$i\n";}
close(WPR);
The regular expression match operator m (which you have omitted) returns true if it matches. True in Perl is usually returned as 1. (Note that most stuff is true, though).
If you want to stick with the short syntax, do this:
push #values, $1 if /(\d\.\d\d\d\de+[+-][0][1])/;
If I move the parenthesis, it works fine:
push #values,/(\d\.\d\d\d\de+[+-][0][1])/;
If there's going to be more than one match on the line, I'd add a g at the end.
If you have capture groups, and a list context, then match returns a list of capture results.
If you want to take this to its insane conclusion then:
my #values = map { /(\d\.\d\d\d\de+[+-][0][1])/g } <WPR> ;
Yes, you can use <WPR> in a list context too.
BTW, while your regex works, it probably isn't exactly what you meant. For example e+ matches one or more es. A little simpler might be:
/\d\.\d{4}e[+-]01/ ;
Which is still going to have other issues like matching x.xxxxe+01 as well.
You could try with this one:
/\d+\.\d+e-01/
In Perl, is there any reason to encapsulate a single variable in double quotes (no concatenation) ?
I often find this in the source of the program I am working on (writen 10 years ago by people that don't work here anymore):
my $sql_host = "something";
my $sql_user = "somethingelse";
# a few lines down
my $db = sub_for_sql_conection("$sql_host", "$sql_user", "$sql_pass", "$sql_db");
As far as I know there is no reason to do this. When I work in an old script I usualy remove the quotes so my editor colors them as variables not as strings.
I think they saw this somewhere and copied the style without understanding why it is so. Am I missing something ?
Thank you.
All this does is explicitly stringify the variables. In 99.9% of cases, it is a newbie error of some sort.
There are things that may happen as a side effect of this calling style:
my $foo = "1234";
sub bar { $_[0] =~ s/2/two/ }
print "Foo is $foo\n";
bar( "$foo" );
print "Foo is $foo\n";
bar( $foo );
print "Foo is $foo\n";
Here, stringification created a copy and passed that to the subroutine, circumventing Perl's pass by reference semantics. It's generally considered to be bad manners to munge calling variables, so you are probably okay.
You can also stringify an object or other value here. For example, undef stringifies to the empty string. Objects may specify arbitrary code to run when stringified. It is possible to have dual valued scalars that have distinct numerical and string values. This is a way to specify that you want the string form.
There is also one deep spooky thing that could be going on. If you are working with XS code that looks at the flags that are set on scalar arguments to a function, stringifying the scalar is a straight forward way to say to perl, "Make me a nice clean new string value" with only stringy flags and no numeric flags.
I am sure there are other odd exceptions to the 99.9% rule. These are a few. Before removing the quotes, take a second to check for weird crap like this. If you do happen upon a legit usage, please add a comment that identifies the quotes as a workable kludge, and give their reason for existence.
In this case the double quotes are unnecessary. Moreover, using them is inefficient as this causes the original strings to be copied.
However, sometimes you may want to use this style to "stringify" an object. For example, URI ojects support stringification:
my $uri = URI->new("http://www.perl.com");
my $str = "$uri";
I don't know why, but it's a pattern commonly used by newcomers to Perl. It's usually a waste (as it is in the snippet you posted), but I can think of two uses.
It has the effect of creating a new string with the same value as the original, and that could be useful in very rare circumstances.
In the following example, an explicit copy is done to protect $x from modification by the sub because the sub modifies its argument.
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f($x);
say $x;
'
Abc
Abc
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f("$x");
say $x;
'
Abc
abc
By virtue of creating a copy of the string, it stringifies objects. This could be useful when dealing with code that alters its behaviour based on whether its argument is a reference or not.
In the following example, explicit stringification is done because require handles references in #INC differently than strings.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib $lib;
use DBI;
say "ok";
'
Can't locate object method "INC" via package "Path::Class::Dir" at -e line 4.
BEGIN failed--compilation aborted at -e line 4.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib "$lib";
use DBI;
say "ok";
'
ok
In your case quotes are completely useless. We can even says that it is wrong because this is not idiomatic, as others wrote.
However quoting a variable may sometime be necessary: this explicitely triggers stringification of the value of the variable. Stringification may give a different result for some values if thoses values are dual vars or if they are blessed values with overloaded stringification.
Here is an example with dual vars:
use 5.010;
use strict;
use Scalar::Util 'dualvar';
my $x = dualvar 1, "2";
say 0+$x;
say 0+"$x";
Output:
1
2
My theory has always been that it's people coming over from other languages with bad habits. It's not that they're thinking "I will use double quotes all the time", but that they're just not thinking!
I'll be honest and say that I used to fall into this trap because I came to Perl from Java, so the muscle memory was there, and just kept firing.
PerlCritic finally got me out of the habit!
It definitely makes your code more efficient, but if you're not thinking about whether or not you want your strings interpolated, you are very likely to make silly mistakes, so I'd go further and say that it's dangerous.
A common 'Perlism' is generating a list as something to loop over in this form:
for($str=~/./g) { print "the next character from \"$str\"=$_\n"; }
In this case the global match regex returns a list that is one character in turn from the string $str, and assigns that value to $_
Instead of a regex, split can be used in the same way or 'a'..'z', map, etc.
I am investigating unpack to generate a field by field interpretation of a string. I have always found unpack to be less straightforward to the way my brain works, and I have never really dug that deeply into it.
As a simple case, I want to generate a list that is one character in each element from a string using unpack (yes -- I know I can do it with split(//,$str) and /./g but I really want to see if unpack can be used this way...)
Obviously, I can use a field list for unpack that is unpack("A1" x length($str), $str) but is there some other way that kinda looks like globbing? ie, can I call unpack(some_format,$str) either in list context or in a loop such that unpack will return the next group of character in the format group until $str is exausted?
I have read The Perl 5.12 Pack pod and the Perl 5.12 pack tutorial and the Perkmonks tutorial
Here is the sample code:
#!/usr/bin/perl
use warnings;
use strict;
my $str=join('',('a'..'z', 'A'..'Z')); #the alphabet...
$str=~s/(.{1,3})/$1 /g; #...in groups of three
print "str=$str\n\n";
for ($str=~/./g) {
print "regex: = $_\n";
}
for(split(//,$str)) {
print "split: \$_=$_\n";
}
for(unpack("A1" x length($str), $str)) {
print "unpack: \$_=$_\n";
}
pack and unpack templates can use parentheses to group things much like regexps can. The group can be followed by a repeat count. * as a repeat count means "repeat until you run out of things to pack/unpack".
for(unpack("(A1)*", $str)) {
print "unpack: \$_=$_\n";
}
You'd have to run a benchmark to find out which of these is the fastest.
I am just a beginner in Perl and need some help in filtering columns using a Perl script.
I have about 10 columns separated by comma in a file and I need to keep 5 columns in that file and get rid of every other columns from that file. How do we achieve this?
Thanks a lot for anybody's assistance.
cheers,
Neel
Have a look at Text::CSV (or Text::CSV_XS) to parse CSV files in Perl. It's available on CPAN or you can probably get it through your package manager if you're using Linux or another Unix-like OS. In Ubuntu the package is called libtext-csv-perl.
It can handle cases like fields that are quoted because they contain a comma, something that a simple split command can't handle.
CSV is an ill-defined, complex format (weird issues with quoting, commas, and spaces). Look for a library that can handle the nuances for you and also give you conveniences like indexing by column names.
Of course, if you're just looking to split a text file by commas, look no further than #Pax's solution.
Use split to pull the line apart then output the ones you want (say every second column), create the following xx.pl file:
while(<STDIN>) {
chomp;
#fields = split (",",$_);
print "$fields[1],$fields[3],$fields[5],$fields[7],$fields[9]\n"
}
then execute:
$ echo 1,2,3,4,5,6,7,8,9,10 | perl xx.pl
2,4,6,8,10
If you are talking about CSV files in windows (e.g., generated from Excel), you will need to be careful to take care of fields that contain comma themselves but are enclosed by quotation marks.
In this case, a simple split won't work.
Alternatively, you could use Text::ParseWords, which is in the standard library. Add
use Text::ParseWords;
to the top of Pax's example above, and then substitute
my #fields = parse_line(q{,}, 0, $_);
for the split.
You can use some of Perl's built in runtime options to do this on the command line:
$ echo "1,2,3,4,5" | perl -a -F, -n -e 'print join(q{,}, $F[0], $F[3]).qq{\n}'
1,4
The above will -a(utosplit) using the -F(ield) of a comma. It will then join the fields you are interested in and print them back out (with a line separator). This assumes simple data without nested comma's. I was doing this with an unprintable field separator (\x1d) so this wasn't an issue for me.
See http://perldoc.perl.org/perlrun.html#Command-Switches for more details.
Went looking didn't find a nice csv compliant filter program thats flexible to be useful for than just a one-of, so I wrote one. Enjoy.
Basic usage is:
bash$ csvfilter [-r <columnTitle>]* [-quote] <csv.file>
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Text::CSV;
my $always_quote=0;
my #remove;
if ( ! GetOptions('remove:s'=> \#remove,
'quote-always'=>sub {$always_quote=1;}) ) {
die "$0:invalid option (use --remove [--quote-always])";
}
my #cols2remove;
sub filter(#)
{
my #fields=#_;
my #r;
my $i=0;
for my $c (#cols2remove) {
my $p;
#if ( $i $i ) {
push(#r, splice(#fields, $i));
}
return #r;
}
# create just one if these
my $csvOut=new Text::CSV({always_quote=>$always_quote});
sub printLine(#)
{
my #fields=#_;
my $combined=$csvOut->combine(filter(#fields));
my $str=$csvOut->string();
if ( length($str) ) {
print "$str\n";
}
}
my $csv = Text::CSV->new();
my $od;
open($od, "| cat") || die "output:$!";
while () {
$csv->parse($_);
if ( $. == 1 ) {
my $failures=0;
my #cols=$csv->fields;
for my $rm (#remove) {
for (my $c=0; $c$b} #cols2remove);
}
printLine($csv->fields);
}
exit(0);
\
In addition to what people here said about processing comma-separated files, I'd like to note that one can extract the even (or odd) array elements using an array slice and/or map:
#myarray[map { $_ * 2 } (0 .. 4)]
Hope it helps.
My personal favorite way to do CSV is using the AnyData module. It seems to make things pretty simple, and removing a named column can be done rather easily. Take a look on CPAN.
This answers a much larger question, but seems like a good relevant bit of information.
The unix cut command can do what you want (and a whole lot more). It has been reimplemented in Perl.