how to save the pattern you search for in perl - perl

if ($_=~m/^[\w](.+)\n/)
{
$seq.= $1;
}
I am using this pattern to recognise a character sequence but I also want to include the first character (the [\w])

If you want to include the first character, include it into the capturing part of the regex:
$seq .= $1 if /^(\w.+)\n/;

The parens determine what ends up in $1, so you want
if ($_=~m/^([\w].+)\n/)
{
$seq.= $1;
}
This simplifies to
$seq .= $1 if /^(\w.+\n)/;
You probably meant .* (0 or more non-linefeed) instead of .+ (1 or more non-linefeed).
$seq .= $1 if /^(\w.*)\n/;
I'd write that as follows:
chomp;
$seq .= $_ if /^\w/;
This last one is not strictly equivalent.
It doesn't check if the second character of $_ is a non-linfeed.
It doesn't check if $_ contains a line feed.
If $_ contains a line feed, it's expected to be the last character of the string.
$_ is modified.

Related

m operator of Regular Expression in Perl

What is the difference between m[] and m{} regular expression in Perl? How does $1, $2, etc matches the pattern "(((Cu)(Na))(Hg))"?
thanks
There is no difference between m{} and m[]. Perl lets you change the delimiters of regexes to make them easier to read in a given context.
$var =~ m/*.zip/ and $var =~ m{*.zip} and $var =~ m[*.zip] and $var =~ m#*.zip# all match the same way.
For capture groups, captures are always handled from left to right, so for your example:
my $foo = 'CuNaHg';
if ( $foo =~ m{(((Cu)(Na))(Hg))} ) {
print $1; # CuNaHg
print $2; # CuNa
print $3; # Cu
print $4; # Na
print $5; # Hg
}

How to regex one word from escaped and closed parenthesis?

I am trying to get "loginuser" value from this line. Please suggest
my $ln = CN=xuser\\,user(loginuser),OU=Site-Omg,OU=Accounts_User,OU
if (/ln: (\S.*\S)\s*$/)
{ print $1; }
This will work
use strict;
use warnings;
my $ln = qq{CN=xuser\\,user(loginuser),OU=Site-Omg,OU=Accounts_User,OU};
print $1 . "\n" if $ln =~ /\(([^)]*)/
Things to note
I have used strict and warnings to show any errors in the script( would have been very useful for your original)
I have used qq{...} to quote the original string
I have ended the line with ;
I have performed the regex match on $ln instead of $_ using $ln =~ ...
I have written correct regex to get the match.

Unmatched ) in reg when using lc function

I am trying to run the following code:
$lines = "Enjoyable )) DAY";
$lines =~ lc $lines;
print $lines;
It fails on the second line where I get the error mentioned in the title. I understand the brackets are causing the trouble. I think I could use "quotemeta", but the thing is that my string contains info that I go on to process later, so I would like to keep the string intact as far as possible and not tamper with it too much.
You have two problems here.
1. =~ is used to execute a specific set of operations
The =~ operator is used to either match with //, m//, qr// or a string; or to substitute with s/// or tr///.
If all you want to do is lowercase the contents of $lines then you should use = not =~.
$lines = "Enjoyable )) DAY";
$lines = lc $lines;
print $lines;
2. Regular expressions have special characters which must be escaped
If you want to match $lines against a lower case version of $Lines, which should return true if $lines was already entirely lower case and false otherwise, then you need to escape the ")" characters.
#!/usr/bin/env perl
use strict;
use warnings;
my $lines = "enjoyable )) day";
if ($lines =~ lc quotemeta $lines) {
print "lines is lower case\n";
}
print $lines;
Note this is a toy example trying to find a reason for doing $lines =~ lc $lines - It would be much better (faster, safer) to solve this with eq as in $lines eq lc $lines.
See perldoc -f quotemeta or http://perldoc.perl.org/functions/quotemeta.html for more details on quotemeta.
=~ is used for regular expressions. "lc" is not part of regex, it's a function like this: $new = lc($old);
I don't recall the regex operator for lowercase, because I use lc() all the time.

perl find and replace ../ and  

I am using Perl to replace all instances of
../../../../../../abc' and  
in a string with
/ and , respectively.
The method I am using looks like this:
sub encode
{
my $result = $_[0];
$result =~ s/..\/..\/..\/..\/..\/..\//\//g;
$result =~ s/ / /g;
return $result;
}
Is this correct?
Essentially, yes, although the first regex has to be written in a different way: because . matches any character, we have to escape it \. or put it in its own character class [.]. The first regex can also be written cleaner as
...;
$result =~ s{ (?: [.][.]/ ){6} }
{/}gx;
...;
We look for the literal pattern ../ repeated 6 times and then replace it. Because I use curly braces as a delimiter I don't have to escape the slash. Because I use the /x modifier I can have these spaces inside the regex improving readability.
Try this. It will print /foo bar/baz.
#!/usr/bin/perl -w
use strict;
my $result = "../../../../../../foo bar/baz";
#$result =~ s/(\.\.\/)+/\//g; #for any number of ../
$result =~ s/(\.\.\/){6}/\//g; #for 6 exactly
$result =~ s/ / /g;
print $result . "\n";
you forgot the abc, i think:
sub encode
{
my $result = $_[0];
$result =~ s/(?:..\/){6}abc/\//g;
$result =~ s/ / /g;
return $result;
}

Perl, pattern matching and metacharacters

I am trying to match two things which both are full of metacharacters that needs to be used as 'Literal' in my match pattern. \Q is suppose to quote all metacharacter in a string until \E...but it doesn't work.
Whats up with that?
this is the line that gives me trouble : if (/\Q$prev\E/ !~ /\Q$ww[0]\E/) {
Absent the use of =~ or !~,
/.../
is short for
$_ =~ m/.../
so
/\Q$prev\E/ !~ /\Q$ww[0]\E/
is short for
($_ =~ /\Q$prev\E/) !~ /\Q$ww[0]\E/
which is equivalent to one of the following depending on whether the left regex match succeeds or not:
"" !~ /\Q$ww[0]\E/
"1" !~ /\Q$ww[0]\E/
You simply want:
$prev !~ /\Q$ww[0]\E/ # $ww[0] doesn't contains $prev
If you actually want
$prev !~ /^\Q$ww[0]\E\z/ # $ww[0] isn't equal to $prev
then you can simplify that to
$prev ne $ww[0] # $ww[0] isn't equal to $prev
By the way, always use use strict; use warnings;. It may have identified a problem here (but not necessarily, depending on the value of $_).
It looks like you want to compare a string in $prev to one in $ww[0]. If this is the case, a regex match should look like this:
$result = $prev !~ /\Q$ww[0]\E/
$result will return 1 if $prev is not the same as whatever is in www[0], ignoring metacharacters.
However if that is all you wanted to do, you might as well use ne:
if ($prev ne $ww[0]){
#do this if $prev and $ww[0] are not the same
}
Also, as #toolic has mentioned, add the following line to the top of your script:
use warnings;
This will give you some feedback on possible problems in your scripts.