use perl to extract a substring between two delimiters and store as a new string - perl

I am working on a Perl script, and I want to split a string between two different variables.
This is my string
<p>Hello my server number is 1221.899999 , please select an option</p>
I want to be able to extract the server number, so I want to split the string after <p>Hello my server number is and before the following space, so my end string would print as
1221.899999
Is regex the best solution for this, rather than using split?

I would just use a regex.
my $str = 'Hello my server number is 1221.899999 , please select an option';
my ($num) = $str =~ /Hello my server number is (\d+\.\d+) ,/;
$num will be undefined if the match didn't succeed.

How about:
$str = 'Hello my server number is 1221.899999 , please select an option';
$str =~ s/^.*\b(\d+\.\d+)\b.*$/$1/;
say $str;
or
$str =~ s/^Hello my server number is (\d+\.\d+)\s.*$/$1/;
If the begining of the string is always that.
output:
1221.899999

I would use regex. How about this:
$str = 'Hello my server number is 1221.899999 , please select an option';
print $1 if $str =~ /is (.*) ,/;

As long as you are sure that there is always a space before the comma, the proper answer is something similar to this
my $string = '<p>Hello my server number is 1221.899999 , please select an option</p>';
my ($server) = $string =~ /server number is (\S+)/;
print $server;
output
1221.899999
If the comma could appear immediately after the end of the server number then you would need to modifiy is slightly to this
my ($server) = $string =~ /server number is ([^\s,]+)/;

Related

how to remove date identifier and special charactar from string

I want to remove date identifier and * from string .
$string = "*102015 Supplied air hood";
$output = "Supplied air hood";
i have used
$string =~ s/[#\%&\""*+]//g;
$string =~ s/^\s+//;
what should i used to get string value = "Supplied air hood";
Thanks in advance
To remove everything from the string up to the first space, you can write
$str =~ s/^\S*\s+//;
Your pattern doesn't contain numbers. It would remove the *, but nothing else. If you want to remove a * followed by six digits and a blank at the beginning of the string, do it like this:
$string =~ s/^\*\d{6} //;
However, if that string always contains a pattern like this, you don't need a regular expression substitution. You can simply take a substring.
my $output = substr $string, 8;
That will assign the content of $string starting from the 9th character
The script below does what you want, assuming that the date always appears at the beginning the line, and that it is follow by exactly one space.
use strict;
use warnings;
while (<DATA>)
{
# skip one or more characters not a space
# then skip exactly one space
# then capture all remaining characters
# and assign them to $s
my ($s) = $_ =~ /[^ ]+ (.*)/;
print $s, "\n";
}
__DATA__
*110115 first date
*110115 second date
*110315 third date
Output is:
first date
second date
third date

removing trailing words end with : from a string using perl

i have question on how to remove specific set of words that end with : in a string using perl.
For instance,
lunch_at_home: start at 1pm.
I want to get only "start at 1 pm"after discarding "lunch_at_home:"
note that lunch_at_home is just an example. It can be any string with any length but it should end with ":"
This should do the job.
my $string = "lunch_at_home: start at 1pm."
$string =~ s/^.*:\s*//;
It will remove all char before : including the :
If you want to remove a specific set of words that are set apart from the data you want:
my $string = 'lunch_at_home: start at 1pm.';
$string =~ s/\b(lunch_at_home|breakfast_at_work):\s*//;
That would leave you with start at 1pm. and you can expand the list as needed.
If you just want to remove any "words" (we'll use the term loosely) that end with a colon:
my $string = 'lunch_at_home: start at 1pm.';
$string =~ s/\b\S+:\s*//;
You'd end up with the same thing in this case.
take
my $string = "lunch_at_home: start at 1pm.";
to remove everything up to the last ":" and the period at the end of the entry as in your question:
$string =~ s/.*: (.*)\./$1/;
to remove everything up to the first ":"
$string =~ s/.*?: (.*)\./$1/;
split on : and discard the first part:
my (undef, $value) = split /:\s*/, $string, 2;
The final argument (2), ensures this works correctly if the trailing string contains a :.
You can use split function to achieve this:
my $string = "lunch_at_home: start at 1pm.";
$string = (split /:\s*/, $string)[1];
print "$string\n";

issue in matching regexp in perl

I am having following code
$str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
$val = $str =~ /[
]*([\n]?[\n]+
[\n]?) ([^;^
]+)/s;
print "$1 and $2";
Getting output as
and PLNA
Why it is getting PLNA as output. I believe it should stop at first\n. I assume output should be OTNPKT0553 04-02-03 21:43:46
Your regex is messy and contains a lot of redundancy. The following steps demonstrate how it can be simplified and then it becomes more clear why it is matching PLNA.
1) Translating the literal new lines in your regex:
$val = $str =~ /[\n\n]*([\n]?[\n]+\n[\n]?) ([^;^\n]+)/s;
2) Then simplifying this code to remove the redundancy:
$val = $str =~ /(\n{2}) ([^;^\n]+)/s;
So basically, the regex is looking for two new lines followed by 3 spaces.
There are three spaces before OTNPKT0553, but there is only a single new line, so it won't match.
The next three spaces are before PLNA which IS preceded by two new lines, and so matches.
You have a whole lot of newlines in there - some literal and some encoded as \n. I'm not clear how you were thinking. Did you think \n matched a number maybe? A \d matches a digit, and will also match many Unicode characters that are digits in other languages. However for simple ASCII text it works fine.
What you need is something like this
use strict;
use warnings;
my $str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
my $val = $str =~ / (\w+) \s+ ( [\d-]+ \s [\d:]+ ) /x;
print "$1 and $2";
output
OTNPKT0553 and 04-02-03 21:43:46
You have an extra line feed, change the regex to:
$str =~ /[
]*([\n]?[\n]+[\n]?) ([^;^
]+)/s;
and simpler:
$str =~ /\n+ ([^;^\n]+)/s;

Extracting alphanumeric phrase from a string

Trying to extract the alphanumeric characters from this string:
A_phase_I-II,_open-req_project_id_PX15RAD001
The problem is: the term PX15RAD001 can occur anywhere in the string.
Trying to extract the alpha-numeric part using the below expression. But this returns the entire string. I thought Alum was a valid keyword for alpha-numerics. Is that not the case?
(my $string = $line ) =~ s/\P{Alnum}//g;
print $string;
How can I extract the alphanumeric part of the afore mentioned string?
Thanks in advance.
-simak
At the end as per your input:
> echo "A_phase_I-II,_open-req_project_id_PX15RAD001"|perl -lne 'print $1 if(/id_([A-Z0-9]*)/)'
PX15RAD001
In the middle:
> echo "A_phase_I-II,_open-req_id_PX15RAD001_project" | perl -lne 'print $1 if(/id_([A-Z0-9]*)/)'
PX15RAD001
or in your terms:
$line=~m/id_([A-Z0-9]*)/g;
print $1;
Here are some testcases, produced with the comments of #Vijay s Answer:
my #line = (
'A_phase_I-II,_open-req_project_id_PX15RAD001',
'_PX15RAD001_A_phase_I-II,_open-req_project_id',
'A_pha3333se_I-II,_ope_PX15RAD001_n-req_project',
'A_phase_I-II,_PX15RAD001_open-req_projec123123123t_id',
'A_phase_I-II_PX15RAD001_roject_id'
);
foreach my $string ( #line ) {
$string =~ m{_([^_]{10})_?}g;
print $1 . "\n" if $1;
}
These kinds of questions are hard to answer because there is not enough information. What information we have is:
You say your target string is "alphanumeric", but the entire input string is alphanumeric, except for some punctuation, so that really doesn't tell us anything.
You say it is 12 characters long, but the sample you show is 10 characters long.
You seem to think that "alphanumeric" does not include underscore.
So, the reliable information I can sense from you is:
Target string is always delimited by underscore _
Target string is 10-12 characters, all alphanumeric except underscore.
The "reliable" solution based on this rather skimpy information is:
my $str = "A_phase_I-II,_open-req_project_id_PX15RAD001";
for my $field (split /_/, $str) {
if (length($field) <= 12 and
length($field) >= 10 and # field is 10-12 characters
$field !~ /\W/) { # and contains no non-alphanumerics
# do something
}
}
By splitting on underscore, we can easily isolate each field in the string and perform simpler tests on it, such as the ones above.

How to extract a number from a string in Perl?

I have
print $str;
abcd*%1234$sdfsd..#d
The string would always have only one continuous stretch of numbers, like 1234 in this case. Rest all will be either alphabets or other special characters.
How can I extract the number (1234 in this case) and store it back in str?
This page suggests that I should use \d, but how?
If you don't want to modify the original string, you can extract the numbers by capturing them in the regex, using subpatterns. In list context, a regular expression returns the matches defined in the subpatterns.
my $str = 'abc 123 x456xy 789foo';
my ($first_num) = $str =~ /(\d+)/; # 123
my #all_nums = $str =~ /(\d+)/g; # (123, 456, 789)
$str =~ s/\D//g;
This removes all nondigit characters from the string. That's all that you need to do.
EDIT: if Unicode digits in other scripts may be present, a better solution is:
$str =~ s/[^0-9]//g;
If you wanted to do it the destructive way, this is the fastest way to do it.
$str =~ tr/0-9//cd;
translate all characters in the complement of 0-9 to nothing, delete them.
The one caveat to this approach, and Phillip Potter's, is that were there another group of digits further down the string, they would be concatenated with the first group of digits. So it's not clear that you would want to do this.
The surefire way to get one and only one group of digits is
( $str ) = $str =~ /(\d+)/;
The match, in a list context returns a list of captures. The parens around $str are simply to put the expression in a list context and assign the first capture to $str.
Personally, I would do it like this:
$s =~ /([0-9]+)/;
print $1;
$1 will contain the first group matched the given regular expression (the part in round brackets).