how to find multiple regex patterns in a single way using Perl - perl

Question Updated
I have list of (few more) regex patterns like: (Note: Sequence is very Important)
([a-z]+)(\d+)
\}([a-z]+)
([a-z]+)(\+|\-)
([0-9])\](\+|\-)
...
...
my input file like :
\ce{CO2}
\ce{2CO}
\ce{H2O}
\ce{Sb2O3}
...
...
In my code I am finding the each and every regex patterns like
if($string=~m/([a-z]+)(\d+)/g) { my statements ... }
if($string=~m/\}([a-z]+)/g) { my statements ... }
if($string=~m/([a-z]+)(\+|\-)/g) { my statements ... }
if($string=~m/([0-9])\](\+|\-)/g) { my statements ... }
Instead of doing the above code Is there any other way to simplify the code?
Could you someone please share your thoughts for my improvement for better coding.

Disclaimer: Your question is very hard to read, so this is pretty much guesswork. I am not sure I understand what you want to do.
When you are processing data in a dynamic way, a typical approach is to use a dispatch table. We can do something similar here. Often a hash or hash reference is used for that, but since we want a specific order, I will be using an array instead.
my #dispatch = (
{
pattern => qr/f(o)(o)/,
callback => sub {
my ($one, $two) = #_;
print "Found $one and $two\n";
},
},
{
pattern => qr/(bar)/,
callback => sub {
my $capture = shift;
print "Saw $capture";
},
},
);
This basically is a list of search patterns and associated instructions. Each pattern has a callback, which is a code reference. I decided it would make sense to pass in the capture variables, because your patterns have capture groups.
Now in order to call them, we iterate over the dispatch array, match the pattern and then call the associated callback, passing in all the captures.
my $text = "Foo bar foo bar baz.";
foreach my $search (#dispatch) {
if ($text =~ $search->{pattern}) {
$search->{callback}->(#{^CAPTURE}); # this requires Perl 5.26
}
}
Please note that I am using #{^CAPTURE}, which was added to Perl in version 5.25.7, so you would require at least the stable Perl 5.26 release to use it. (On an older Perl, my #capture = $t =~ $search->{pattern} and $search->{callback}->(#capture) will behave similarly).
This is way more elegant than having a list of if () {} statement because it's very easy to extend. The dispatch table could be created on the fly, based on some input, or entirely read from disk.
When we run this code, it creates the following output
Found o and o
Saw bar
This is not very spectacular, but you should be able to adapt it to your patterns. On the other hand I don't know what you are actually trying to do. If you wanted to modify the string instead of only matching, you might need additional arguments for your callbacks.
If you want to learn more about dispatch tables, I suggest you read the second chapter of Mark Jason Dominus' excellent book Higher Order Perl, which is available for free as a PDF on his website.

Your question is hard to read, mainly because you have the /g at the end of your regex searches (which returns a list), however, you only check if it matches once.
I'm making the following assumptions
All matches are required
the code can be a single or double match
both groups captured in one line
i think you want
while ( $string =~ /(([a-z]+)(\d+)|\}([a-z]+)|([a-z]+)(\+|\-)|([0-9])\](\+|\-))/g )
{
#$1 has the whole match
#$2 has the first group if defined
#$3 has the second group if defined
}
However, I prefer the method below. this will capture in one line
while ($string =~ /([a-z]+\d+|\}[a-z]+|[a-z]+\+|\-|[0-9]\]\+|\-)/g )
{
# in here split the match if required
}
I recommend you use regex comments to make this clearer.
if you just want a single match, use
if(
$string=~m/([a-z]+)(\d+)/ ||
$string=~m/\}([a-z]+)/ ||
$string=~m/([a-z]+)(\+|\-)/ ||
$string=~m/([0-9])\](\+|\-)/
)
{
#some code
}

Related

Perl sub returns a subroutine

I haven't used Perl for around 20 years, and this is confusing me. I've g******d for it, but I obviously haven't used a suitable search string because I haven't found anything relating to this...
Why would I want to do the following? I understand what it's doing, but the "why" escapes me. Why not just return 0 or 1 to begin with?
I'm working on some code where a sub uses "return sub"; here's a very truncated example e.g.
sub test1 {
$a = shift #_;
if ($a eq "cat") {
return sub {
print("cat test OK\n");
return 0;
}
}
# default if "cat" wasn't the argument
return sub {
print("test for cat did not work\n");
return 1;
}
}
$c = test1("cat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
$c = test1("bat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
In your code there is no reason to return a sub. However, with a little tweak
sub test1 {
my $animal = shift #_;
if ($animal eq "cat" || $animal eq "dog") {
return sub {
print("$animal test OK\n");
return 0;
};
}
# default if "cat" or "dog" wasn't the argument
return sub {
print("test for cat or dog did not work\n");
return 1;
};
}
We now have a closure around $animal this saves memory as the test for cat and dog share the same code. Note that this only works with my variables. Also note that $a and $b are slightly special to Perl, they are used in the block of code that you can pass to the sort function and bypass some of the checks on visibility so it's best to avoid them for anything except sort.
You probably want to search "perl closures".
There are many reasons that you'd want to return a code reference, but it's not something I can shortly answer in a StackOverflow question. Mark Jason Dominus's Higher Order Perl is a good way to expand your mind, and we cover a little of that in Intermediate Perl.
I wrote File::Find::Closures as a way to demonstrate this is class. Each subroutine in that module returns two code references—one for the callback to File::Find and the other as a way to access the results. The two share a common variable which nothing else can access.
Notice in your case, you aren't merely calling a subroutine to "get a zero". It's doing other things. Even in your simple example there's some output. Some behavior is then deferred until you actually use the result for something.
Having said that, we have no chance of understanding why the programmer who wrote your particular code did it. One plausible guess was that the system was set up for more complex situations and you're looking at a trivial example that fits into that. Another plausible guess was that the programmer just learned that feature and fell in love with it then used it everywhere for a year. There's always a reason, but that doesn't mean there's always a good reason.

TreeModelFilter in GTK/Perl - Question about set_visible_func

I am trying to filter a liststore using the GTK2::TreeModelFilter. I can't seem to find an example online that uses perl and I am getting syntax errors. Can someone help me with the syntax below? The $unfiltered_store is a liststore.
$filtered_store = Gtk2::TreeModeFilter->new($unfiltered_store);
$filtered_store->set_visible_func(get_end_products, $unfiltered_store);
$combobox = Gtk2::ComboBoxEntry->new($filtered_store,1);
Then somewhere below:
sub get_end_products {
my ($a, $b) = #_;
warn(Dumper(\$a));
warn(Dumper(\$b));
return true; # Return all rows for now
}
Ultimately what I want to do is look at column 14 of the listore ($unfiltered_store) and if it is a certain value, then it filtered into the $filtered_store.
Can someone help me with the syntax on this? I checked a bunch of sites, but they're in other languages and using different syntax (like 'new_filter' -- doesn't exist with Perl GTK).
This is the most elegant solution to a fix I need to make and I would prefer to learn how to use this rather than using a brute force method of pulling and saving the filtered data.
The set_visible_func method of the filtered store should get a sub reference as the first argument, but you are not passing a sub reference here:
$filtered_store->set_visible_func(get_end_products, $unfiltered_store);
This will instead call the sub routine get_end_products and then pass on its return value (which is not a sub reference). To fix it add the reference operator \& in front of the sub name:
$filtered_store->set_visible_func(\&get_end_products, $unfiltered_store);
Regarding your other question in the comments:
According to the documentation the user data parameter is passed as the third parameter to get_end_products, so you should define it like this:
sub get_end_products {
my ($model, $iter, $user_data) = #_;
# Do something with $user_data
return TRUE;
}
If for some reason $unfiltered_store is not passed on to get_end_products, you can try pass it using an anonymous sub instead, like this:
$filtered_store->set_visible_func(
sub { get_end_products( $unfiltered_store) });

Trigger array element calculation on access

Is there any Perl API that would allow me to execute code on the reading of an array element? I'm thinking something like (or maybe it can?) Variable::Magic and how would I do it? The end objective would be to essentially recalculate the element value on any access (lazy evaluation), but I don't want to constrain functions like grep,map,natatime to be unusable.
There are several modules on CPAN for lazy arrays. Data::Lazy, Variable::Lazy, Tie::Array::Lazy and Variable::Magic.
Data::Lazy and Tie::Array::Lazy both tie. Tying is very slow, about 10 times slower than a normal array, and about 3 times slower than an object. Tying may kill the performance benefits of laziness.
Variable::Lazy is different. Its actually replacing the variable with a chunk of code at compile time using Devel::Declare magic. Unfortunately it appears to only work on scalars. :-/
Variable::Magic is... magic. Its more for annotating variables than controlling them.
I would suggest instead inverting the problem. Write the thing as an object which can be as lazy as it likes. This is faster, more flexible and potentially more featureful and less buggy than a tie. For grep, map and the like, provide an overload it so it can be used as an array ref. The overload won't be lazy, but grep and map must work on the whole list anyway and tie isn't going to do you any better. And object may be able to provide more efficient search and transform methods.
Lazy lists is one of List::Gen's fortés.
You might find this article from brian d foy useful: http://www.effectiveperlprogramming.com/blog/300 . In particular this code does lazy evaluation on and (infinite) tied array.
use 5.012;
{
package Tie::Array::InfiniteSquares;
use parent qw(Tie::Array);
use Carp qw(carp);
use Config qw(%Config);
# mandatory methods
sub TIEARRAY {
bless {}, $_[0];
}
sub FETCH {
my( $self, $index ) = #_;
$index ** 2;
}
sub FETCHSIZE { 0x7F_FF_FF_FF } # still problems here
sub STORE { carp "You can't touch this!" }
sub STORESIZE { carp "You can't touch this!" }
sub EXISTS { 1 }
sub DELETE { carp "You can't touch this!" }
}
tie my #array, 'Tie::Array::InfiniteSquares';
while( my( $index, $value ) = each #array )
{
say "Item is $value at index $index";
}
Now assuming you actual dataset isn't infinite, then when you construct your tied class correctly you can each to do lazy evaluation. map, grep, for etc will evalulate the whole list before acting, but they will still work.

Is there something like `last` for `map`?

In Perl, is it possible to arbitrarily end a map execution, e.g. something equivalent to last in a loop?
It would be a bit like this:
map {
if (test == true) { last; } dosomething
} #myarray
Nope. You can't last, next, etc. out of a map, grep or sort blocks because they are not loops.
Having said that, the code snippet in the question is poor practice because map should never be called in void context. They are supposed to be used to transform one list into another.
Here's a better way to write it if you really must inline it (modified after ysth's comment):
$_ == 10 && last, print for 1..15; # prints '123456789'
No. Use an ordinal foreach loop and the last statement.
Since 5.8.1, map is context aware - in void context, no lists are constructed.
Anyway, map is generally used to get a list from another list, evaluating expr for each element of the original list.
You could use a do-block with a for statement modifier:
do {
last if test;
dosomething;
} for (#myarray);
Though using a foreach block would probably be clearer, and future maintainers of your code will thank you.
foreach (#myarray) {
last if test;
dosomething;
}
You can use a long jump (eval/die pair) to bail out of any nested construct that doesn't directly support it:
eval { map{ die if test; dosomething } #myarray };
But as Zaid said, using a for/foreach loop in this case is better because you are not using the return value of map.
You want a for loop:
foreach ( #myarray ) {
last if test;
...
}
It does the same thing. map is for transforming one list into other lists.
There are map-like constructs that do exactly what you want to do. Take a look at List::Util and List::MoreUtils (conveniently also packaged together as List::AllUtils):
use List::MoreUtils 'first';
# get first element with a {foo} key
my $match = map { $_->{foo} eq 'some string' } #elements;
If you don't want to extract an element(s) from the list, then use foreach, as per the previous answers.
Try goto LABEL. However I do not know how safe is that.

What does "Useless use of a variable in void context" mean in this Perl script?

The following script gives me what I want but Perl also throws me a warning saying "Useless use of a variable in void context". What does it mean?
use strict;
use warnings;
my $example = 'http\u003a//main\u002egslb\u002eku6\u002ecom/c0/q7LmJPfV4DfXeTYf/1260269522170/93456c39545857a15244971e35fba83a/1279582254980/v632/6/28/a14UAJ0CeSyi3UTEvBUyMuBxg\u002ef4v\u002chttp\u003a//main\u002egslb\u002eku6\u002ecom/c1/q7LmJPfV4DfXeTYf/1260269522170/3cb143612a0050335c0d44077a869fc0/1279582254980/v642/10/20/7xo2MJ4tTtiiTOUjEpCJaByg\u002ef4v\u002chttp\u003a//main\u002egslb\u002eku6\u002ecom/c2/q7LmJPfV4DfXeTYf/1260269522170/799955b45c8c32c955564ff9bc3259ea/1279582254980/v652/32/4/6pzkCf4iqTSUVElUA5A3PpMAoA\u002ef4v\u002chttp\u003a//main\u002egslb\u002eku6\u002ecom/c3/q7LmJPfV4DfXeTYf/1260269522170/cebbb619dc61b3eabcdb839d4c2a4402/1279582254980/v567/36/19/MBcbnWwkSJu46UoYCabpvArA\u002ef4v\u002chttp\u003a//main\u002egslb\u002eku6\u002ecom/c4/q7LmJPfV4DfXeTYf/1260269522170/1365c39355424974dbbe4ae8950f0e73/1279582254980/v575/17/15/EDczAa0GTjuhppapCLFjtaQ\u002ef4v';
my #raw_url = $example =~ m{(http\\u003a.+?f4v)}g;
my #processed_url = map {
s{\\u003a}{:}g,$_;
s{\\u002e}{.}g,$_;
s{\\u002d}{#}g,$_;
} #raw_url;
print join("\n",#processed_url);
And why this map thing doesn't work if I omit those dollar underscores like so?
my #processed_url = map {
s{\\u003a}{:}g;
s{\\u002e}{.}g;
s{\\u002d}{#}g;
} #raw_url;
When I omit those dollar underscores, I get nothing except for a possibly success flag "1". What am I missing? Any ideas? Thanks like always :)
What you want is...
my #processed_url = map {
s{\\u003a}{:}g;
s{\\u002e}{.}g;
s{\\u002d}{#}g;
$_;
} #raw_url;
A map block returns the value composed of the last statement evaluated as its result. Thats why we pass the $_ as the last statement. The substitution operator s{}{} returns the number of substitutions made.
In your prior setup, you had by itself the following statement. Which is pretty much meaningless and that is what Perl is warning about.
s{\\u003a}{:}g, $_;
You already have the answer you were looking for, but I wanted to point out a subtlety about using the substitution operator inside a map block: your original array is also being modified. If you want to preserve the original array, one way to do it is to make a copy of the array, then modify only the copy:
my #processed_url = #raw_url;
for (#processed_url) {
s{\\u003a}{:}g;
s{\\u002e}{.}g;
s{\\u002d}{#}g;
}
Or, if you only need one array, and you want the original to be modified:
for (#raw_url) {
s{\\u003a}{:}g;
s{\\u002e}{.}g;
s{\\u002d}{#}g;
}