This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What does =~ do in Perl?
In a Perl program I am examining (namly plutil.pl), I see a lot of =~ on the XML parser portion. For example, here is UnfixXMLString (lines 159 to 167 on 1.7):
sub UnfixXMLString {
my ($s) = #_;
$s =~ s/</</g;
$s =~ s/>/>/g;
$s =~ s/&/&/g;
return $s;
}
From what I can tell, it's taking a string, modifying it with the =~ operator, then returning that modified string, but what exactly is it doing?
=~ is the Perl binding operator. It's generally used to apply a regular expression to a string; for instance, to test if a string matches a pattern:
if ($string =~ m/pattern/) {
Or to extract components from a string:
my ($first, $rest) = $string =~ m{^(\w+):(.*)$};
Or to apply a substitution:
$string =~ s/foo/bar/;
=~ is the Perl binding operator and can be used to determine if a regular expression match occurred (true or false)
$sentence = "The river flows slowly.";
if ($sentence =~ /river/)
{
print "Matched river.\n";
}
else
{
print "Did not match river.\n";
}
Related
This question already has an answer here:
Regex $1 into variable interferes with another variable
(1 answer)
Closed 5 years ago.
I want the following code to print out "bye", and then print out "hello". However, when I run it, it prints out "bye" and then perl tells me that $str2 has not been initialized.
my $item = "hello/bye";
if($item =~ m/.*(bye)/g){
my $str1 = $1;
print "$str1\n";
my $str2 = ($item =~ m/(hello).*/g)[0];
print "$str2\n";
}
I think that there is probably something I do not understand about the m//g part, but I am having trouble finding my answer in the perldoc page for perlre.
When you do
if($item =~ m/.*(bye)/g)
that does not reset the match iterator (we are in scalar context). The "position" remains at the character after the bye substring. So the following m//g picks up from there the previous one left off.
You can verify this yourself:
if ($item =~ /(bye)/g) {
printf "pos \$item = %d\n", pos $item;
...
}
which will print pos $item =9.
Incidentally $item =~ /.*(bye)/ is better written as $item =~ /(bye)/ (assuming you don't care if you match the first or the last bye substring, just that $item has bye somewhere). Similarly, $item =~ /(hello).*/ is better written as $item =~ /(hello)/.
#!/usr/bin/env perl
use strict;
use warnings;
my $item = "hello/bye";
if ($item =~ /(bye)/) {
my $str1 = $1;
print "$str1\n";
my $str2 = ($item =~ /(hello)/g)[0];
print "$str2\n";
}
What is the simplest method in Perl to convert special symbols "&'<> to entities " & ' < > in Perl? It is easy to write functions like this, but I think this problem has been solved a lot of times and there is no need to write your own functions.
sub add_entities {
my ($text) = #_;
$text =~ s/&/&/g;
$text =~ s/"/"/g;
$text =~ s/'/'/g;
$text =~ s/</</g;
$text =~ s/>/>/g;
return $text;
}
sub remove_entities {
my ($text) = #_;
$text =~ s/"/"/g;
$text =~ s/&/&/g;
$text =~ s/'/'/g;
$text =~ s/</</g;
$text =~ s/>/>/g;
return $text;
}
You should never ever need remove_entities. Your parser shouldn't return any entities. Seems you have a horribly broken parser. I recommend XML::LibXML.
Same goes for add_entities XML. The XML writing library will handle all of that for you. You could use XML::LibXML for this too, but XML::Writer is much simpler to use for this task.
Note that both of your routines are horribly broken. add_entities doesn't consider character set. remove_entities doesn't handle numerical and entities outside of the base XML spec.
I am using Perl to replace all instances of
../../../../../../abc' and
in a string with
/ and , respectively.
The method I am using looks like this:
sub encode
{
my $result = $_[0];
$result =~ s/..\/..\/..\/..\/..\/..\//\//g;
$result =~ s/ / /g;
return $result;
}
Is this correct?
Essentially, yes, although the first regex has to be written in a different way: because . matches any character, we have to escape it \. or put it in its own character class [.]. The first regex can also be written cleaner as
...;
$result =~ s{ (?: [.][.]/ ){6} }
{/}gx;
...;
We look for the literal pattern ../ repeated 6 times and then replace it. Because I use curly braces as a delimiter I don't have to escape the slash. Because I use the /x modifier I can have these spaces inside the regex improving readability.
Try this. It will print /foo bar/baz.
#!/usr/bin/perl -w
use strict;
my $result = "../../../../../../foo bar/baz";
#$result =~ s/(\.\.\/)+/\//g; #for any number of ../
$result =~ s/(\.\.\/){6}/\//g; #for 6 exactly
$result =~ s/ / /g;
print $result . "\n";
you forgot the abc, i think:
sub encode
{
my $result = $_[0];
$result =~ s/(?:..\/){6}abc/\//g;
$result =~ s/ / /g;
return $result;
}
In perl, I have to determine whether user input is a palindrome or not and it must display like this:
Enter in 7 characters: ghghghg #one line here #
Palindrome! #second line answer#
But instead this is what it does:
Enter in 7 characters: g #one line#
h #second line#
g #third line#
h #fourth line#
g #fifth line#
h #sixth line#
g Palindrom
e! #seventh line#
My problem seems to be on the chomp lines with all the variables but I just can't figure out what to do and I've been at if for hours. I need a simple solution, but have not progressed to arrays yet so need some simple to fix this. Thanks
And here is what i have so far, the formula seems to work but it keeps printing a new line for each character:
use strict;
use warnings;
my ($a, $b, $c, $d, $e, $f, $g);
print "Enter in 7 characters:";
chomp ($a = <>); chomp ($b = <>); chomp ($c = <>); chomp ($d = <>); chomp ($e = <>); chomp ($f = <>); chomp ($g = <>);
if (($a eq $g) && ($b eq $f) && ($c eq $e) && ($d eq $d) && ($e eq $c) && ($f eq $b) && ($g eq $a))
{print "Palindrome! \n";}
else
{print "Not Palindrome! \n";}
If you're going to determine if a word is the same backwards, may I suggest using reverse and lc?
chomp(my $word = <>);
my $reverse = reverse $word;
if (lc($word) eq lc($reverse)) {
print "Palindrome!";
} else {
print "Not palindrome!";
}
Perl is famous for its TIMTOWTDI. Here are two more ways of doing it:
print "Enter 7 characters: ";
chomp(my $i= <STDIN>);
say "reverse: ", pal_reverse($i) ? "yes" : "no";
say "regex: ", pal_regex($i) ? "yes" : "no";
sub pal_reverse {
my $i = (#_ ? shift : $_);
return $i eq reverse $i;
}
sub pal_regex {
return (#_ ? shift() : $_) =~ /^(.?|(.)(?1)\2)$/ + 0;
}
use strict;
use warnings;
use feature 'say';
print "Please enter 7 characters : ";
my $input = <>; # Read in input
chomp $input; # To remove trailing "\n"
# Season with input validation
warn 'Expected 7 characters, got ', length $input, ' instead'
unless length $input == 7;
# Determine if it's palindromic or not
say $input eq reverse $input
? 'Palindrome'
: 'Not palindrome' ;
TIMTOWTDI for the recursion-prone:
sub is_palindrome {
return 1 if length $_[0] < 2; # Whole string is palindromic
goto \&is_palindrome
if substr $_[0], 0, 1, '' eq substr $_[0], -1, 1, ''; # Check next chars
return; # Not palindromic if we reach here
}
say is_palindrome( 'ghghghg' ) ? 'Palindromic' : 'Not palindromic' ;
And perldoc perlretut for those who aren't :)
Recursive patterns
This feature (introduced in Perl 5.10) significantly extends the power
of Perl's pattern matching. By referring to some other capture group
anywhere in the pattern with the construct (?group-ref), the pattern
within the referenced group is used as an independent subpattern in
place of the group reference itself. Because the group reference may
be contained within the group it refers to, it is now possible to
apply pattern matching to tasks that hitherto required a recursive
parser.
To illustrate this feature, we'll design a pattern that matches if a
string contains a palindrome. (This is a word or a sentence that,
while ignoring spaces, interpunctuation and case, reads the same
backwards as forwards. We begin by observing that the empty string or
a string containing just one word character is a palindrome. Otherwise
it must have a word character up front and the same at its end, with
another palindrome in between.
/(?: (\w) (?...Here be a palindrome...) \g{-1} | \w? )/x
Adding \W* at either end to eliminate what is to be ignored, we
already have the full pattern:
my $pp = qr/^(\W* (?: (\w) (?1) \g{-1} | \w? ) \W*)$/ix;
for $s ( "saippuakauppias", "A man, a plan, a canal: Panama!" ){
print "'$s' is a palindrome\n" if $s =~ /$pp/;
}
I have strings similar to this
INSERT INTO `log_action` VALUES (1,'a',1,4),(2,'a',1,1),(3,'a',4,4),(4,'a',1,1),(5,'a',6,4);
where I would like to add a number each of the first values, so it becomes (when value is 10)
INSERT INTO `log_action` VALUES (11,'a',1,4),(12,'a',1,1),(13,'a',4,4),(14,'a',1,1),(15,'a',6,4);
I have tried this
#!/usr/bin/perl -w
use strict;
my $input;
if ($#ARGV == 0) {
$input = $ARGV[0];
} else {
print "Usage: test.pl filename\n\n";
die "Wrong number of arguments.\n";
}
my $value;
$value = 10;
open(FILE, '<', $input) or die $!;
foreach my $line (<FILE>) {
if ($line =~ m/^INSERT INTO \`log_action\` VALUES/) {
$line =~ s/\((\d+),/\($1+$value,/ge;
print $line . "\n";
}
}
close FILE;
It fails because of the \($1+$value,. The \( and , is there to as the search eats those.
Any suggestions how to solve it?
You where almost there, but the part you put in the replacement side of s///e needs to be valid Perl. You are evaluating Perl code:
my $string =<<HERE;
INSERT INTO `log_action` VALUES
(1,'a',1,4),(2,'a',1,1),(3,'a',4,4),(4,'a',1,1),(5,'a',6,4);
HERE
my $value = 10;
$string =~ s/\((\d+),/ '(' . ($1+$value) . ',' /ge;
print "$string\n";
The Perl code that /e evaluates is just a string concatenation:
'(' . ($1+$value) . ','
However, when I want to match parts of the string that I don't want to replace, I use lookarounds so those parts aren't part of the replacement:
my $string =<<HERE;
INSERT INTO `log_action` VALUES
(1,'a',1,4),(2,'a',1,1),(3,'a',4,4),(4,'a',1,1),(5,'a',6,4);
HERE
my $value = 10;
$string =~ s/ (?<=\() (\d+) (?=,) / $1+$value /xge;
print "$string\n";