Extracting text in between strings in Perl - perl

In Perl, how do I extract the text in a string if I knew the pre and post parts of the text?
Example:
Input: www.google.com/search?size=1&make=BMW&model=2000
I would like to extract the word 'BMW' which is always in between "&make=" and the next "&"

Don't use a regular expression. Use URI and URI::QueryParam, like so:
use strict;
use warnings;
use URI;
use URI::QueryParam;
my $u = URI->new('http://www.google.com/search?size=1&make=BMW&model=2000');
print $u->query_param('make');

Use a Regular expression:
my ($captured_string) = $link =~ /\&make=(\w+)\&/;
My regex assumes that you would want to capture anything that appeared in the make field. \w captures upper and lower case letters. If you want to capture something else you can use a character class. Like this [\w\s]+ would match more than one letters and spaces. You can add anything between the [ ] of characters to match in any order.
The ( ) is what actually does the capturing. If you remove that then it will just match (and you should use it in an if statement. If you wanted capture more than one string (say you wanted the model as well. Based on your example you could use a second set of parenthesis like this:
my ($make, $model) = $link =~ /\&make=(\w+)\&model=([A-Za-z0-9]+)/;
Hope that helps!

Related

Perl s/$array[1]\b/$array[1] won't replace

I would like to replace the string $array[1] by the actual variable value.
the \b doesn't seem to work
Does anyone know how the replace the array variable ? What's the delimiter ?
s/$array[1]\b/$array[1]
The [ ... ] has a special meaning in regular expressions (it defines a "character class"). If you want to use [ to mean a [, then you need to escape it with a \.
s/\$array\[1]/$array[1]/
Update: Added escape to $. Removed \b.
I would recommend a real templating engine to perform such a substitution, this will allow you to extend it to things that don't look exactly like $array[1] without making it more complicated, but you will need to alter your input to what the templating engine expects. One option is Text::Template.
use strict;
use warnings;
use Text::Template 'fill_in_string';
my $input = 'foo {$array[1]} bar';
my #array = 1..10;
my $rendered = fill_in_string $input, HASH => {array => \#array};
print $rendered, "\n"; # foo 2 bar

How to combine two regex pattern in perl?

I want to combine two regex pattern to split string and get a table of integers.
this the example :
$string= "1..1188,1189..14,14..15";
$first_pattern = /\../;
$second_pattern = /\,/;
i want to get tab like that:
[1,1188,1189,14,14,15]
Use | to connect alternatives. Also, use qr// to create regex objects, using plain /.../ matches against $_ and assigns the result to $first_pattern and $second_pattern.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = '1..1188,1189..14,14..15';
my $first_pattern = qr/\.\./;
my $second_pattern = qr/,/;
my #integers = split /$first_pattern|$second_pattern/, $string;
say for #integers;
You probably need \.\. to match two dots, as \.. matches a dot followed by anything but a newline. Also, there's no need to backslash a comma.

How to insert a colon between word and number

I want to insert a colon between word and number then add a new line after a number.
For example:
"cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011";
my expected output:
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
My attempt :
$Bday=~ /^([a-z]||\_)/:/^([0-9])/
print "\n";
#!/usr/bin/perl
use warnings;
use strict;
my $str = "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011";
$str =~ s/\s*([a-z_]+)((?: \d+)+)/$1:$2\n/g;
print $str;
produces your desired output from your sample input.
Edit: Note the use of the s operator for regular expression substitution. One of the many problems with your code is that you're not using that (IF your intent is to modify the string in place and not extract bits from it for further processing)
One more variant -
> cat test_perl.pl
#!/usr/bin/perl
use strict;
use warnings;
while ( "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011" =~ m/([a-z_]+)\s+([0-9 ]+)/g )
{
print "$1:$2\n";
}
> test_perl.pl
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
>
The original code $Bday=~ /^([a-z]||\_)/:/^([0-9])/ doesn't make much sense. Apart from missing a semicolon and having too many delimiters (matching patterns are of the format /.../ or m/.../ and replacing ones s/.../.../), it could never match anything.
([a-z]||\_) would match:
one lowercase ASCII letter (a through z);
an empty string (the space between the two |s; or
one underscore (escape with a backslash is superfluous).
To get it (or the corresponding subexpression for numbers) to match a sequence of one
or more of the characters, you need to follow it with a +.
^([0-9]) would fail to match unless it was at the beginning of the string. There it would match a single digit.
My solution (taking into account the later comments by the OP about having input such as cat[1] or dog3):
use strict;
use warnings;
my $bday = "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011 cat[1] 01012018 dog3 02012018";
# capture groups:
# $1------------------------\ $2-------------\
$bday =~ s/([A-Za-z][A-Za-z0-9_\[\]]*)\h+(\d+(?:\h+\d+)*)(?!\S)\s*/$1:$2\n/g;
print $bday;
will print out:
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
cat[1]:01012018
dog3:02012018
Breakdown:
[A-Za-z]: Begin with a letter.
[A-Za-z0-9_\[\]]*: Follow with zero or more letters, numbers, underscores and square brackets.
\h+: Separate with one or more horizontal whitespace.
\d+(?:\h+\d+)*: One sequence of digits (\d+) followed by zero or more sequences of horizontal whitespace and digits.
(?!\S): Can't be followed by non-whitespace.
\s*: Consume following whitespace (including line feeds; this allows the input to be separated on multiple lines, as long as a single entry is not spread on multiple lines. To get that, replace all the \h+ with \s+.).
The replace pattern will repeat (the /g modifier) sequentially in the source string as long as it matches, placing each heading-date record on its own line and then proceeding with the rest of the string.
Note that if your headers (dog etc.) might contain non-ASCII letters, use \pL or \p{XPosixAlpha} instead of [A-Za-z]:
$bday =~ s/\pL[\pL0-9_\[\]]*)\h+(\d+(?:\h+\d+)*)(?!\S)\s*/$1:$2\n/g;

Perl string in Quote Word?

Seem like my daily road block. Is this possible? String in qw?
#!/usr/bin/perl
use strict;
use warnings;
print "Enter Your Number\n";
my $usercc = <>;
##split number
$usercc =~ s/(\w)(?=\w)/$1 /g;
print $usercc;
## string in qw, hmm..
my #ccnumber = qw($usercc);
I get Argument "$usercc" isn't numeric in multiplication (*) at
Thanks
No.
From: http://perlmeme.org/howtos/perlfunc/qw_function.html
How it works
qw() extracts words out of your string
using embedded whitsepace as the
delimiter and returns the words as a
list. Note that this happens at
compile time, which means that the
call to qw() is replaced with the list
before your code starts executing.
Additionlly, no interpolation is possible in the string you pass to qw().
Instead of that, use
my #ccnumber = split /\s+/, $usercc;
Which does what you probably want, to split $usercc on whitespace.

How to extract a number from a string in Perl?

I have
print $str;
abcd*%1234$sdfsd..#d
The string would always have only one continuous stretch of numbers, like 1234 in this case. Rest all will be either alphabets or other special characters.
How can I extract the number (1234 in this case) and store it back in str?
This page suggests that I should use \d, but how?
If you don't want to modify the original string, you can extract the numbers by capturing them in the regex, using subpatterns. In list context, a regular expression returns the matches defined in the subpatterns.
my $str = 'abc 123 x456xy 789foo';
my ($first_num) = $str =~ /(\d+)/; # 123
my #all_nums = $str =~ /(\d+)/g; # (123, 456, 789)
$str =~ s/\D//g;
This removes all nondigit characters from the string. That's all that you need to do.
EDIT: if Unicode digits in other scripts may be present, a better solution is:
$str =~ s/[^0-9]//g;
If you wanted to do it the destructive way, this is the fastest way to do it.
$str =~ tr/0-9//cd;
translate all characters in the complement of 0-9 to nothing, delete them.
The one caveat to this approach, and Phillip Potter's, is that were there another group of digits further down the string, they would be concatenated with the first group of digits. So it's not clear that you would want to do this.
The surefire way to get one and only one group of digits is
( $str ) = $str =~ /(\d+)/;
The match, in a list context returns a list of captures. The parens around $str are simply to put the expression in a list context and assign the first capture to $str.
Personally, I would do it like this:
$s =~ /([0-9]+)/;
print $1;
$1 will contain the first group matched the given regular expression (the part in round brackets).