In Perl, I want to mask/cut of X number of characters at end of string (X can be one of a set of character strings) - perl

I have a two strings, XXXXXXnumber and XXXXXXdate and I want to strip all the XXXXXX from each string. The actual number of character represented by XXXXXX can vary. The suffixes 'number' and 'date' are constant. XXXXXXnumber and XXXXXXXdate should become XXXXXX.

my ($prefix) = ($string =~ /\A (.+?) (?:date|number) \z/x);
Alternatively:
$string =~ s/ (?:date|number) \z//x;

I would use a regular expression like $line =~ s/(number|date)$// for that task, where $line can be either line.
If your line has additional characters after number or date, they must be filtered out, too. An alternative approach would be using an expression like ($num) = ($line =~ /^(.*)(number|date).*$/);

use regexes:
($newvar = $oldvar) =~ s/^(.*)(number|date)$/$1/;
if you have no mor euse for $oldvar's original value (including the Xes) this simplifies to
$oldvar =~ s/^(.*)(number|date)$/$1/;

A simple substitution takes care of it:
$str =~ s/(?:number|date)\z/;

Related

In a string replacements how we use '/r' modifier

I need to increment a numeric value in a string:
my $str = "tool_v01.zip";
(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1++);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1+1);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ $1=~s{(\d+)}{$1+1}/r; /eri;
print $newstr;
Expected output is tool_v02.zip
Note: the version number 01 may contain any number of leading zeroes
I don't think this question has anything to do with the /r modifier, but rather how to properly format the output. For that, I'd suggest sprintf:
my $newstr = $str =~ s{ _v (\d+) \.zip$ }
{ sprintf("_v%0*d.zip", length($1), $1+1 ) }xeri;
Or, replacing just the number with zero-width Lookaround Assertions:
my $newstr = $str =~ s{ (?<= _v ) (\d+) (?= \.zip$ ) }
{ sprintf("%0*d", length($1), $1+1 ) }xeri;
Note: With either of these solutions, something like tool_v99.zip would be altered to tool_v100.zip because the new sequence number cannot be expressed in two characters. If that's not what you want then you need to specify what alternative behaviour you require.
The bit you're missing is sprintf which works the same way as printf except rather than outputting the formatted string to stdout or a file handle, it returns it as a string. Example:
sprintf("%02d",3)
generates a string 03
Putting this into your regex you can do this. Rather than using /r you can use do a zero-width look ahead ((?=...)) to match the file suffix and just replace the matched number with the new value
s/(\d+)(?=.zip$)/sprintf("%02d",$1+1)/ei

how to remove date identifier and special charactar from string

I want to remove date identifier and * from string .
$string = "*102015 Supplied air hood";
$output = "Supplied air hood";
i have used
$string =~ s/[#\%&\""*+]//g;
$string =~ s/^\s+//;
what should i used to get string value = "Supplied air hood";
Thanks in advance
To remove everything from the string up to the first space, you can write
$str =~ s/^\S*\s+//;
Your pattern doesn't contain numbers. It would remove the *, but nothing else. If you want to remove a * followed by six digits and a blank at the beginning of the string, do it like this:
$string =~ s/^\*\d{6} //;
However, if that string always contains a pattern like this, you don't need a regular expression substitution. You can simply take a substring.
my $output = substr $string, 8;
That will assign the content of $string starting from the 9th character
The script below does what you want, assuming that the date always appears at the beginning the line, and that it is follow by exactly one space.
use strict;
use warnings;
while (<DATA>)
{
# skip one or more characters not a space
# then skip exactly one space
# then capture all remaining characters
# and assign them to $s
my ($s) = $_ =~ /[^ ]+ (.*)/;
print $s, "\n";
}
__DATA__
*110115 first date
*110115 second date
*110315 third date
Output is:
first date
second date
third date

issue in matching regexp in perl

I am having following code
$str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
$val = $str =~ /[
]*([\n]?[\n]+
[\n]?) ([^;^
]+)/s;
print "$1 and $2";
Getting output as
and PLNA
Why it is getting PLNA as output. I believe it should stop at first\n. I assume output should be OTNPKT0553 04-02-03 21:43:46
Your regex is messy and contains a lot of redundancy. The following steps demonstrate how it can be simplified and then it becomes more clear why it is matching PLNA.
1) Translating the literal new lines in your regex:
$val = $str =~ /[\n\n]*([\n]?[\n]+\n[\n]?) ([^;^\n]+)/s;
2) Then simplifying this code to remove the redundancy:
$val = $str =~ /(\n{2}) ([^;^\n]+)/s;
So basically, the regex is looking for two new lines followed by 3 spaces.
There are three spaces before OTNPKT0553, but there is only a single new line, so it won't match.
The next three spaces are before PLNA which IS preceded by two new lines, and so matches.
You have a whole lot of newlines in there - some literal and some encoded as \n. I'm not clear how you were thinking. Did you think \n matched a number maybe? A \d matches a digit, and will also match many Unicode characters that are digits in other languages. However for simple ASCII text it works fine.
What you need is something like this
use strict;
use warnings;
my $str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
my $val = $str =~ / (\w+) \s+ ( [\d-]+ \s [\d:]+ ) /x;
print "$1 and $2";
output
OTNPKT0553 and 04-02-03 21:43:46
You have an extra line feed, change the regex to:
$str =~ /[
]*([\n]?[\n]+[\n]?) ([^;^
]+)/s;
and simpler:
$str =~ /\n+ ([^;^\n]+)/s;

Pattern matching consecutive characters

I have a list of strings that I would like to search through and ignore any that contain A or G characters that occur more than 4 times consecutively. For instance, I would like to ignore strings such as TCAAAATC or GCTGGGGAA.
I've tried:
unless ($string =~ m/A{4,}?/g || m/G{4,}?/g)
{
Do something;
}
But I get an error message "Use of uninitialized value in pattern match (m//)".
Any suggestions would be appreciated.
By writing
|| m/G{4,}?/g
you are implicitly testing $_ against this regex. But, $_ is not initialized, so you get an error.
Write
unless ($string =~ m/A{4}/ || $string =~ m/G{4}/)
instead (note the simplifications made to the regex), or, as a single expression,
unless ($string =~ m/A{4}|G{4}/)
You need to avoid the implicit comparison with $_, which you can do by writing:
unless ($string =~ m/A{4}/ || $string =~ m/G{4}/)
This looks for exactly 4 A's or exactly 4 G's in the string; if there are 4, it doesn't matter whether there are any more than 4.
You can reduce it to a single regular expression by using:
unless ($string =~ m/([AG])\1{3}/)
which looks for an A or G followed by 3 more of the same character.

How to extract a number from a string in Perl?

I have
print $str;
abcd*%1234$sdfsd..#d
The string would always have only one continuous stretch of numbers, like 1234 in this case. Rest all will be either alphabets or other special characters.
How can I extract the number (1234 in this case) and store it back in str?
This page suggests that I should use \d, but how?
If you don't want to modify the original string, you can extract the numbers by capturing them in the regex, using subpatterns. In list context, a regular expression returns the matches defined in the subpatterns.
my $str = 'abc 123 x456xy 789foo';
my ($first_num) = $str =~ /(\d+)/; # 123
my #all_nums = $str =~ /(\d+)/g; # (123, 456, 789)
$str =~ s/\D//g;
This removes all nondigit characters from the string. That's all that you need to do.
EDIT: if Unicode digits in other scripts may be present, a better solution is:
$str =~ s/[^0-9]//g;
If you wanted to do it the destructive way, this is the fastest way to do it.
$str =~ tr/0-9//cd;
translate all characters in the complement of 0-9 to nothing, delete them.
The one caveat to this approach, and Phillip Potter's, is that were there another group of digits further down the string, they would be concatenated with the first group of digits. So it's not clear that you would want to do this.
The surefire way to get one and only one group of digits is
( $str ) = $str =~ /(\d+)/;
The match, in a list context returns a list of captures. The parens around $str are simply to put the expression in a list context and assign the first capture to $str.
Personally, I would do it like this:
$s =~ /([0-9]+)/;
print $1;
$1 will contain the first group matched the given regular expression (the part in round brackets).