Is there an analogue of Ruby gsub method in Perl? [duplicate] - perl

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How do I perform a Perl substitution on a string while keeping the original?
How do I do one line replacements in Perl without modifying the string itself? I also want it to be usable inside expressions, much like I can do p s.gsub(/from/, 'to') in Ruby.
All I can think of is
do {my $r = $s; $r =~ s/from/to/; $r}
but sure there is a better way?

Starting on the day you feel comfortable writing use 5.14.0 at the top of all of your programs, you can use the s/foo/bar/r variant of the s/// operator, which returns the changed string instead of modifying the original in place (added in perl 5.13.2).

The solution you found with do is not bad, but you can shorten it a little:
do {(my $r = $s) =~ s/from/to/; $r}
It still reveals the mechanics though. You can hide the implementation, and also apply substitutions to lists by writing a subroutine. In most implementations, this function is called apply which you could import from List::Gen or List::MoreUtils or a number of other modules. Or since it is so short, just write it yourself:
sub apply (&#) { # takes code block `&` and list `#`
my ($sub, #ret) = #_; # shallow copy of argument list
$sub->() for #ret; # apply code to each copy
wantarray ? #ret : pop #ret # list in list context, last elem in scalar
}
apply creates a shallow copy of the argument list, and then calls its code block, which is expected to modify $_. The block's return value is not used. apply behaves like the comma , operator. In list context, it returns the list. In scalar context, it returns the last item in the list.
To use it:
my $new = apply {s/foo/bar/} $old;
my #new = apply {s/foo/bar/} qw( foot fool fooz );

From Perl's docs: Regexp-like operators:
($foo = $bar) =~ s/this/that/g; # copy first, then change would match gsub, while
$bar =~ s/this/that/g; # change would match gsub!

Related

Perl trim spaces from scalar or array

This is a very basic Perl question but I just want to make sure the actual good practice to it.
Consider I have built a function to trim spaces from strings and I will pass to it either single scalar as string or array of strings, I have this basic working example:
sub trim_spaces {
my (#out) = #_;
for (#out) {
s/\s+//g;
}
return (scalar #out >1)? #out : $out[0];
}
this works in the following calls:
trim_spaces(" These Spaces Are All Removed");
and
#str = (" Str Number 1 ", " Str Number 2 ", " Str Number 3 ");
trim_spaces(#str);
What I am trying to do and understand is the shortest version of this function like this:
sub trim_spaces {
s/\s+//g for (#_);
return #_;
}
This works only if I pass an array:
trim_spaces(#str);
but it does not work if I pass a scalar string:
trim_spaces(" These Spaces Are All Removed");
I understand it should be converted from scalar ref to array, how this can be done in the short version.
Trying to understand the best practices of Perl.
The strict best practice answer to this is to always unpack the contents of #_ into lexical variables, first thing. Perl Best Practices provides the following (paraphrased) arguments:
It's not self-documenting to directly access #_. $_[0], $_[1], and so on tell you nothing about what these parameters are for.
The aliasing behavior of #_ is easily forgotten and can be a source of hard-to-find bugs in a program. Whenever possible, avoid spooky action at a distance.
You can verify each argument while unpacking the #_ array.
And one argument not from PBP:
Seeing my $self = shift; at the beginning of a subroutine clearly marks it as an OO method instead of an ordinary sub.
Sources: Perl Best Practices (Conway 2005), Perl::Critic's relevant policy from PBP.
The elements in #_ are aliases to the original values, which means modifying them inside the subroutine will change them outside as well. The array you're returning is ignored in your examples.
If you store the string in a variable this would work:
my $string = ' These Spaces Are Removed ';
trim_spaces($string); # $string is now 'TheseSpacesAreRemoved'
Or you could use non-destructive substitution and assign the results created by this:
sub trim_spaces { return map { s/\s+//gr } #_ }
my #trimmed = trim_spaces('string one', ' string two');
my ($trimmed_scalar) = trim_spaces('string three');
map will create a list of the values returned by the substitution with the /r flag. The parens around $trimmed_scalar are necessary; see the last example for a version where it isn't.
Alternatively, you could copy the parameters inside the subroutine into lexical variables to avoid action at a distance, which is generally better practice than directly modifying the elements of #_:
sub trim_spaces
{
my #strings = #_;
s/\s+//g for #strings;
return #strings;
}
Personally, I find it nicer when the subroutine returns a value without side effects, and the /r flag saves me the trouble of thinking of a better name for a lexical copy. We can use wantarray to make it smarter in regards to the calling context:
sub trim_spaces
{
return if not defined wantarray;
return map { s/\s+//gr } #_ if wantarray;
return shift =~ s/\s+//gr;
}
On a side note, trim_spaces would be better named remove_whitespace or something similar. Trimming usually means to remove leading and trailing whitespace, and the \s character class matches tabs, newlines, form feeds, and carriage returns in addition to spaces. Use tr/ //dcr to remove just spaces instead if that's what you wanted.

How do you override substitution operations?

I'm playing around with Perl and creating a string object. I know that this is a very bad idea to do in the real world. I'm doing it purely for fun.
I'm using overload to overload standard Perl string operators with the standard operators you would find in most other languages.
use strict;
use warnings;
use feature qw(say);
my $obj_string1 = Object::String->new("foo");
my $obj_string2 = Object::String->new("bar");
my $reg_string1 = "foobar";
my $reg_string2 = "barfu";
# Object::String "stringifies" correctly inside quotes
say "$obj_string1 $obj_string2";
# Use "+" for concatenations
say $obj_string1 + $obj_string2; # Works
say $obj_string1 + $reg_string1 + $reg_string2 # Works
say $reg_string1 + $obj_string1 # Still works!
say $reg_string1 + $obj_string1 + $reg_string2; # Still works!
say $reg_string1 + $reg_string2 + $obj_string1; # Does't work, of course.
# Overload math booleans with their string boolean equivalents
my $forty = Object::String(40);
my $one_hundred = "100";
if ( $forty > $one_hundred ) { # Valid
say "$forty is bigger than $one_hundred (in strings!)";
}
if ( $one_hundred < $forty ) { # Also Valid
say "$one_hundred is less than $forty (In strings!)";
}
# Some standard "string" methods
say $forty->length # Prints 5
say $forty->reverse; # Prints "ytrof"
say $forty; # Prints "ytrof"
Now comes the hard part:
my $string = Object::String("I am the best programmer around!");
say $string; # Prints "I am the best programmer around"
say $string->get_value; # Prints "I am the best programmer around" with get_value Method
# But, it's time to speak the truth...
$string =~ s/best programer/biggest liar/;
say $string; # Prints "I am the biggest liar around"
say $string->get_value; # Whoops, no get_value method on scalar strings
As you can see, when I do my substitution, it works correctly, but returns a regular scalar string instead of an Object::String.
I am trying to figure out how to override the substitution operation. I've looked in the Perldoc, and I've gone through various Perl books (Advance Perl Programming, Intermediate Perl Programming, Perl Cookbook, etc.), but haven't found a way to override the substitution operation, so it returns an Object::String.
How do I override the substitution operation?
Unfortunately Perl's overload support isn't very universal in the area of strings. There's many operations that overloading isn't party to; and s/// is one of them.
I have started a module to fix this; overload::substr but as yet it's incomplete. It allows you to overload the substr() function for your object, but so far it doesn't yet have power to apply to m// or s///.
You might however, be able to use lvalue (or 4-argument) substr() on your objects as a way to cheat this; if the objects at least stringify into regular strings that can be matched upon, the substitution can be done using the substr()-rewrite trick.
Turn
$string =~ s/pattern/replacement/;
into
$string =~ m/pattern/ and substr($string, $-[0], $+[0]-$-[0]) = "replacement";
and then you'll have some code which will respect a substr() overload on the $string object, if you use my module above.
At some point of course it would be nice if overload::substr can perform that itself; I just haven't got around to writing it yet.

Perl - How to create commands that users can input in console?

I'm just starting in Perl and I'm quite enjoying it. I'm writing some basic functions, but what I really want to be able to do is to use those functions intelligently using console commands. For example, say I have a function adding two numbers. I'd want to be able to type in console "add 2, 4" and read the first word, then pass the two numbers as parameters in an "add" function. Essentially, I'm asking for help in creating some basic scripting using Perl ^^'.
I have some vague ideas about how I might do this in VB, but Perl, I have no idea where I'd start, or what functions would be useful to me. Is there something like VB.net's "Split" function where you can break down the contents of a scalar into an array? Is there a simple way to analyse one word at a time in a scalar, or iterate through a scalar until you hit a separator, for example?
I hope you can help, any suggestions are appreciated! Bear in mind, I'm no expert, I started Perl all of a few weeks ago, and I've only been doing VB.net half a year.
Thank you!
Edit: If you're not sure what to suggest and you know any simple/intuitive resources that might be of help, that would also be appreciated.
Its rather easy to make a script which dispatches to a command by name. Here is a simple example:
#!/usr/bin/env perl
use strict;
use warnings;
# take the command name off the #ARGV stack
my $command_name = shift;
# get a reference to the subroutine by name
my $command = __PACKAGE__->can($command_name) || die "Unknown command: $command_name\n";
# execute the command, using the rest of #ARGV as arguments
# and print the return with a trailing newline
print $command->(#ARGV);
print "\n";
sub add {
my ($x, $y) = #_;
return $x + $y;
}
sub subtract {
my ($x, $y) = #_;
return $x - $y;
}
This script (say its named myscript.pl) can be called like
$ ./myscript.pl add 2 3
or
$ ./myscript.pl subtract 2 3
Once you have played with that for a while, you might want to take it further and use a framework for this kind of thing. There are several available, like App::Cmd or you can take the logic shown above and modularize as you see fit.
You want to parse command line arguments. A space serves as the delimiter, so just do a ./add.pl 2 3 Something like this:
$num1=$ARGV[0];
$num2=$ARGV[1];
print $num1 + $num2;
will print 5
Here is a short implementation of a simple scripting language.
Each statement is exactly one line long, and has the following structure:
Statement = [<Var> =] <Command> [<Arg> ...]
# This is a regular grammar, so we don't need a complicated parser.
Tokens are seperated by whitespace. A command may take any number of arguments. These can either be the contents of variables $var, a string "foo", or a number (int or float).
As these are Perl scalars, there is no visible difference between strings and numbers.
Here is the preamble of the script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
strict and warnings are essential when learning Perl, else too much weird stuff would be possible. The use 5.010 is a minimum version, it also defines the say builtin (like a print but appends a newline).
Now we declare two global variables: The %env hash (table or dict) associates variable names with their values. %functions holds our builtin functions. The values are anonymous functions.
my %env;
my %functions = (
add => sub { $_[0] + $_[1] },
mul => sub { $_[0] * $_[1] },
say => sub { say $_[0] },
bye => sub { exit 0 },
);
Now comes our read-eval-loop (we don't print by default). The readline operator <> will read from the file specified as the first command line argument, or from STDIN if no filename is provided.
while (<>) {
next if /^\s*\#/; # jump comment lines
# parse the line. We get a destination $var, a $command, and any number of #args
my ($var, $command, #args) = parse($_);
# Execute the anonymous sub specified by $command with the #args
my $value = $functions{ $command }->(#args);
# Store the return value if a destination $var was specified
$env{ $var } = $value if defined $var;
}
That was fairly trivial. Now comes some parsing code. Perl “binds” regexes to strings with the =~ operator. Regexes may look like /foo/ or m/foo/. The /x flags allows us to include whitespace in our regex that doesn't match actual whitespace. The /g flag matches globally. This also enables the \G assertion. This is where the last successful match ended. The /c flag is important for this m//gc style parsing to consume one match at a time, and to prevent the position of the regex engine in out string to being reset.
sub parse {
my ($line) = #_; # get the $line, which is a argument
my ($var, $command, #args); # declare variables to be filled
# Test if this statement has a variable declaration
if ($line =~ m/\G\s* \$(\w+) \s*=\s* /xgc) {
$var = $1; # assign first capture if successful
}
# Parse the function of this statement.
if ($line =~ m/\G\s* (\w+) \s*/xgc) {
$command = $1;
# Test if the specified function exists in our %functions
if (not exists $functions{$command}) {
die "The command $command is not known\n";
}
} else {
die "Command required\n"; # Throw fatal exception on parse error.
}
# As long as our matches haven't consumed the whole string...
while (pos($line) < length($line)) {
# Try to match variables
if ($line =~ m/\G \$(\w+) \s*/xgc) {
die "The variable $1 does not exist\n" if not exists $env{$1};
push #args, $env{$1};
}
# Try to match strings
elsif ($line =~ m/\G "([^"]+)" \s*/xgc) {
push #args, $1;
}
# Try to match ints or floats
elsif ($line =~ m/\G (\d+ (?:\.\d+)? ) \s*/xgc) {
push #args, 0+$1;
}
# Throw error if nothing matched
else {
die "Didn't understand that line\n";
}
}
# return our -- now filled -- vars.
return $var, $command, #args;
}
Perl arrays can be handled like linked list: shift removes and returns the first element (pop does the same to the last element). push adds an element to the end, unshift to the beginning.
Out little programming language can execute simple programs like:
#!my_little_language
$a = mul 2 20
$b = add 0 2
$answer = add $a $b
say $answer
bye
If (1) our perl script is saved in my_little_language, set to be executable, and is in the system PATH, and (2) the above file in our little language saved as meaning_of_life.mll, and also set to be executable, then
$ ./meaning_of_life
should be able to run it.
Output is obviously 42. Note that our language doesn't yet have string manipulation or simple assignment to variables. Also, it would be nice to be able to call functions with the return value of other functions directly. This requires some sort of parens, or precedence mechanism. Also, the language requires better error reporting for batch processing (which it already supports).

Where is $_ being modified in this perl code?

The following perl code generates a warning in PerlCritic (by Activestate):
sub natural_sort {
my #sorted;
#sorted = grep {s/(^|\D)0+(\d)/$1$2/g,1} sort grep {s/(\d+)/sprintf"%06.6d",$1/ge,1} #_;
}
The warning generated is:
Don't modify $_ in list functions
More info about that warning here
I don't understand the warning because I don't think I'm modifying $_, although I suppose I must be.
Can someone explain it to me please?
Both of your greps are modifying $_ because you're using s//. For example, this:
grep {s/(^|\D)0+(\d)/$1$2/g,1}
is the same as this:
grep { $_ =~ s/(^|\D)0+(\d)/$1$2/g; 1 }
I think you'd be better off using map as you are not filtering anything with your greps, you're just using grep as an iterator:
sub natural_sort {
my $t;
return map { ($t = $_) =~ s/(^|\D)0+(\d)/$1$2/g; $t }
sort
map { ($t = $_) =~ s/(\d+)/sprintf"%06.6d",$1/ge; $t }
#_;
}
That should do the same thing and keep critic quiet. You might want to have a look at List::MoreUtils if you want some nicer list operators than plain map.
You are doing a substitution ( i.e. s/// ) in the grep, which modifies $_ i.e. the list being grepped.
This and other cases are explained in perldoc perlvar:
Here are the places where Perl will
assume $_ even if you don't use it:
The following functions:
abs, alarm, chomp, chop, chr, chroot,
cos, defined, eval, exp, glob, hex,
int, lc, lcfirst, length, log, lstat,
mkdir, oct, ord, pos, print,
quotemeta, readlink, readpipe, ref,
require, reverse (in scalar context
only), rmdir, sin, split (on its
second argument), sqrt, stat, study,
uc, ucfirst, unlink, unpack.
All file tests (-f , -d ) except for -t , which defaults to STDIN.
See -X
The pattern matching operations m//, s/// and tr/// (aka y///) when
used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is
supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put an input record when a operation's result
is tested by itself as the sole
criterion of a while test. Outside a
while test, this will not happen.
Many people have correctly answered that the s operator is modifying $_, however in the soon to be released Perl 5.14.0 there will be the new r flag for the s operator (i.e. s///r) which rather than modify in-place will return the modified elements. Read more at The Effective Perler . You can use perlbrew to install this new version.
Edit: Perl 5.14 is now available! Announcement Announcement Delta
Here is the function suggested by mu (using map) but using this functionality:
use 5.14.0;
sub natural_sort {
return map { s/(^|\D)0+(\d)/$1$2/gr }
sort
map { s/(\d+)/sprintf"%06.6d",$1/gre }
#_;
}
The VERY important part that other answers have missed is that the line
grep {s/(\d+)/sprintf"%06.6d",$1/ge,1} #_;
Is actually modifying the arguments passed into the function, and not copies of them.
grep is a filtering command, and the value in $_ inside the code block is an alias to one of the values in #_. #_ in turn contains aliases to the arguments passed to the function, so when the s/// operator performs its substitution, the change is being made to the original argument. This is shown in the following example:
sub test {grep {s/a/b/g; 1} #_}
my #array = qw(cat bat sat);
my #new = test #array;
say "#new"; # prints "cbt bbt sbt" as it should
say "#array"; # prints "cbt bbt sbt" as well, which is probably an error
The behavior you are looking for (apply a function that modifies $_ to a copy of a list) has been encapsulated as the apply function in a number of modules. My module List::Gen contains such an implementation. apply is also fairly simple to write yourself:
sub apply (&#) {
my ($sub, #ret) = #_;
$sub->() for #ret;
wantarray ? #ret : pop #ret
}
With that, your code could be rewritten as:
sub natural_sort {
apply {s/(^|\D)0+(\d)/$1$2/g} sort apply {s/(\d+)/sprintf"%06.6d",$1/ge} #_
}
If your goal with the repeated substitutions is to perform a sort of the original data with a transient modification applied, you should look into a Perl idiom known as the Schwartzian transform which is a more efficient way of achieving that goal.

How would I do the equivalent of Prototype's Enumerator.detect in Perl with the least amount of code?

Lately I've been thinking a lot about functional programming. Perl offers quite a few tools to go that way, however there's something I haven't been able to find yet.
Prototype has the function detect for enumerators, the descriptions is simply this:
Enumerator.detect(iterator[, context]) -> firstElement | undefined
Finds the first element for which the iterator returns true.
Enumerator in this case is any list while iterator is a reference to a function, which is applied in turn on each element of the list.
I am looking for something like this to apply in situations where performance is important, i.e. when stopping upon encountering a match saves time by disregarding the rest of the list.
I am also looking for a solution that would not involve loading any extra module, so if possible it should be done with builtins only. And if possible, it should be as concise as this for example:
my #result = map function #array;
You say you don't want a module, but this is exactly what the first function in List::Util does. That's a core module, so it should be available everywhere.
use List::Util qw(first);
my $first = first { some condition } #array;
If you insist on not using a module, you could copy the implementation out of List::Util. If somebody knew a faster way to do it, it would be in there. (Note that List::Util includes an XS implementation, so that's probably faster than any pure-Perl approach. It also has a pure-Perl version of first, in List::Util::PP.)
Note that the value being tested is passed to the subroutine in $_ and not as a parameter. This is a convenience when you're using the first { some condition} #values form, but is something you have to remember if you're using a regular subroutine. Some more examples:
use 5.010; # I want to use 'say'; nothing else here is 5.10 specific
use List::Util qw(first);
say first { $_ > 3 } 1 .. 10; # prints 4
sub wanted { $_ > 4 }; # note we're using $_ not $_[0]
say first \&wanted, 1 .. 10; # prints 5
my $want = \&wanted; # Get a subroutine reference
say first \&$want, 1 .. 10; # This is how you pass a reference in a scalar
# someFunc expects a parameter instead of looking at $_
say first { someFunc($_) } 1 .. 10;
Untested since I don't have Perl on this machine, but:
sub first(\&#) {
my $pred = shift;
die "First argument to "first" must be a sub" unless ref $pred eq 'CODE';
for my $val (#_) {
return $val if $pred->($val);
}
return undef;
}
Then use it as:
my $first = first { sub performing test } #list;
Note that this doesn't distinguish between no matches in the list and one of the elements in the list being an undefined value and having that match.
Just since its not here, a Perl function definition of first that localizes $_ for its block:
sub first (&#) {
my $code = shift;
for (#_) {return $_ if $code->()}
undef
}
my #array = 1 .. 10;
say first {$_ > 5} #array; # prints 6
While it will work fine, I don't advocate using this version, since List::Util is a core module (installed by default), and its implementation of first will usually use the XS version (written in C) which is much faster.