Check if given string matches one of set of prefixes, effectively - perl

What algorithm to use to check if a given string matches one of set of prefixes, and which prefix from that set?
Other variation: given path and a set of directories, how to check if path is in one of set of directories (assuming that there are no symbolic links, or they do not matter)?
I'm interested in description or name of algorithm, or Perl module which solves this (or can be used to solve this).
Edit
Bonus points for solution which allow to effectively find 'is prefix of' relation between set of strings (set of directories)
For example, given set of directories: foo, foo/bar, foo/baz, quux, baz/quux, baz/quux/plugh the algorithm is to find that foo is prefix of foo/bar and foo/baz, and that baz/quux is prefix of baz/quux/plugh... hopefully without O(n^2) time.

The efficient way to do this would be using a Trie:
http://en.wikipedia.org/wiki/Trie
There is a package for it on CPAN:
https://metacpan.org/pod/Tree::Trie
(never used that package myself though)
You need to consider your what operations need to be the most efficient. The lookup is very cheap in a Trie, but if you only build the trie for one lookup, it might not be the fastest way...

The first function in the List::Util Core module can find if a prefix matches a string. It searches through the list of prefixes, and returns as soon as it finds a match. It does not search through the whole list if it is not necessary:
first returns the first element where the
result from BLOCK is a true value. If
BLOCK never returns true or LIST was
empty then undef is returned.

You pose an interesting question, but as I went out to look for such a thing (in List::MoreUtils for example), I kept coming back to, how is this any different than a grep. So here it is, my basic implementation based on grep. If you don't mind searching the whole list, or want all the matches here is an example:
#!/usr/bin/perl
use strict;
use warnings;
my #prefixes = qw/ pre1 pre2 pre3 /;
my $test = 'pre1fixed';
my #found = grep { $test =~ /^$_/ } #prefixes;
print "$_ is a prefix of $test\n" for #found;
I also I imagine that there must be some way to use the smart-match operator ~~ to do this in a short-circuiting way. Also, as toolic points out the List::Util function could be used for this too. This stops the search after a match is found.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/first/;
my #prefixes = qw/ pre1 pre2 pre3 /;
my $test = 'pre1fixed';
my $found = first { $test =~ /^$_/ } #prefixes;
print "$found is the prefix of $test\n";
The only algorithm I am aware of is the Aho-Corasick though I will leave it as an exercise to the reader (i.e. I don't know) to see if this will help you. I see that there is a module (Algorithm::AhoCorasick). I also believe I have read somewhere that this and trie structures are implemented in Perl's matching under certain circumstances. Perhaps someone knows where I read that? Edit: found it in SO question on matching alternatives

Related

Change ref of hash in Perl

I ran into this and couldn't find the answer. I am trying to see if it is possible to "change" the reference of a hash. In other words, I have a hash, and a function that returns a hashref, and I want to make my hash point to the location in memory specified by this ref, instead of copying the contents of the hash it points to. The code looks something like this:
%hash = $h->hashref;
My obvious guess was that it should look like this:
\%hash = $h->hashref;
but that gives the error:
Can't modify reference constructor in scalar assignment
I tried a few other things, but nothing worked. Is what I am attempting actually possible?
An experimental feature which would seemingly allow you to do exactly what you're describing has been added to Perl 5.21.5, which is a development release (see "Aliasing via reference").
It sounds like you want:
use Data::Alias;
alias %hash = $h->hashref;
Or if %hash is a package variable, you can instead just do:
*hash = $h->hashref;
But either way, this should almost always be avoided; simply use the hash reference.
This question is really old, but Perl now allows this sort of thing as an experimental feature:
use v5.22;
use experimental qw(refaliasing);
my $first = {
foo => 'bar',
baz => 'quux',
};
\my %hash = $first;
Create named variable aliases with ref aliasing
Mix assignment and reference aliasing with declared_refs
Yes, but…
References in Perl are scalars. You are trying to alias the return value. This actually is possible, but you should not do this, since it involves messing with the symbol table. Furthermore, this only works for globals (declared with our): If you assign a hashref to the glob *hash it will assign to the symbol table entry %hash:
#!/usr/bin/env perl
use warnings;
use strict;
sub a_hashref{{a => "one", b => "two"}}
our %hash;
*hash = a_hashref;
printf "%3s -> %s\n", $_, $hash{$_} foreach keys %hash;
This is bad style! It isn't in PBP (directly, but consider section 5.1: “non-lexicals should be avoided”) and won't be reported by perlcritic, but you shouldn't pollute the package namespace for a little syntactic fanciness. Furthermore it doesn't work with lexical variables (which is what you might want to use most of the time, because they are lexically scoped, not package wide).
Another problem is, that if the $h->hashref method changes its return type, you'll suddenly assign to another table entry! (So if $h->hashref changes its return type to an arrayref, you assign to #hash, good luck detecting that). You could circumvent that by checking if $h->hashref really returns a hashref with 'HASH' eq ref $h->hashref`, but that would defeat the purpose.
What is the problem with just keeping the reference? If you get a reference, just store it in a scalar:
$hash = $h->hashref
To read more about the global symbol table, take a look at perlmod and consider perlref for the *FOO{THING} syntax, which sadly isn't for lvalues.
To achieve what you want, you could check out the several aliasing modules on cpan. Data::Alias or Lexical::Alias seem to fit your purpose. Also if you are interested in tie semantics and/or don't want to use XS modules, Tie::Alias might be worth a shoot.

Perl: Grep in an array

I have an array in the below format
array
Link-IF-A<->IF-B
Link-IF-C<->IF-D
Link-IF-E<->IF-F
Link-IF-G<->IF-H
Link-IF-I<->IF-J
I am trying to search interface "IF-D" but the value always show as 0.
I want to see 1 when it matches else 0.I ahve tried all the below method but everytime result is 0.
$link = IF-D
method1 :
my $result = grep /$link/,#array;
method 2:
my $result = grep /^$link,/,#array;
method3 :
my $result = grep(/^$link$/, #array)
Thanks
Your second and third methods can never match, as none of your target strings begin with, or contain only IF-D. The second method uses ^, anchoring to the beginning of your target string, and the third method contains both ^ and $ mandating that the pattern match the entire target string, not just some portion of it. So those will always fail (it appears that you're just trying things at random to see if they work, and especially in the case of regular expressions, that's not a good way to accomplish the goal.)
The first example will match one time; the 2nd element, because the pattern IF-D matches at the end of the target string Link-IF-C<->IF-D. However, it's only going to work if your target string and your pattern are what you think they are. In the example code you showed us, the pattern string wasn't wrapped in quotes. It must be.
So, for example, this will do what you seem to want:
my $link = "IF-D";
my #array = qw(
Link-IF-A<->IF-B
Link-IF-C<->IF-D
Link-IF-E<->IF-F
Link-IF-G<->IF-H
Link-IF-I<->IF-J
);
my $found = grep /\Q$link/, #array;
print "$found\n"; # 1
The \Q isn't strictly necessary for the pattern you've demonstrated. That construct forces the contents of $link to be treated on its literal meaning, rather than possibly as metasymbols. Your example pattern doesn't contain any metasymbols, but if it accidentally did, the \Q would de-meta them.
If you think you've implemented something semantically equal to this example, and yet it's not working, then you've found exactly why people ask that those asking questions post a small self-contained snippet of code that demonstrates the behavior they're describing. If my example code doesn't clear up the problem, boil your code down to a single snippet that demonstrates the problem, and add it as an update to your question so that we can run it ourselves and see exactly what you're talking about.
You must use double quote or single quote in your assignment:
$link = "IF-D";
instead of $link = IF-D.
use strict;
use warnings;
my #array = qw(
Link-IF-A<->IF-B
Link-IF-C<->IF-D
Link-IF-E<->IF-F
Link-IF-G<->IF-H
Link-IF-I<->IF-J
);
my $link = "IF-D";
print scalar grep /$link/, #array;

To match for a certain number

I have a file which has a lot of floating point numbers like this:
4.5268e-06 4.5268e-08 4.5678e-01 4.5689e-04...
I need to check if there is atleast one number with an expoenent -1. So, I wrote this short snippet with the regex. The regex works because I checked and it does. But what I am getting in the output is all 1s. I know I am missing something very basic. Please help.
#!usr/local/bin/perl
use strict;
use warnings;
my $i;
my #values;
open(WPR,"test.txt")||die "couldnt open $!";
while(<WPR>)
{
chomp();
push #values,(/\d\.\d\d\d\de+[+-][0][1]/);
}
foreach $i (#values){
print "$i\n";}
close(WPR);
The regular expression match operator m (which you have omitted) returns true if it matches. True in Perl is usually returned as 1. (Note that most stuff is true, though).
If you want to stick with the short syntax, do this:
push #values, $1 if /(\d\.\d\d\d\de+[+-][0][1])/;
If I move the parenthesis, it works fine:
push #values,/(\d\.\d\d\d\de+[+-][0][1])/;
If there's going to be more than one match on the line, I'd add a g at the end.
If you have capture groups, and a list context, then match returns a list of capture results.
If you want to take this to its insane conclusion then:
my #values = map { /(\d\.\d\d\d\de+[+-][0][1])/g } <WPR> ;
Yes, you can use <WPR> in a list context too.
BTW, while your regex works, it probably isn't exactly what you meant. For example e+ matches one or more es. A little simpler might be:
/\d\.\d{4}e[+-]01/ ;
Which is still going to have other issues like matching x.xxxxe+01 as well.
You could try with this one:
/\d+\.\d+e-01/

Perl extract matches from list

I'm fairly new to perl but not to scripting languages. I have a file, and I'm trying to extract just one portion of each line that matches a regex. For example, given the file:
FLAG(123)
FLAG(456)
Not a flag
FLAG(789)
I'd like to extract the list [123, 456, 789]
The regex is obviously /^FLAG\((\w+)/. My question is, what's an easy way to extract this data in perl?
It's obviously not hard to set up a loop and do a bunch of =~ matches, but I've heard quite a bit about perl's terseness and how it has an operator for everything, so I'm wondering if there's a slick, simple way to do this.
Also, can you point me towards a good perl reference where I can find out slick ways to do other things like this when the opportunity next arises? There are many perl resources on the web, but 90% of them are too simple and the other 10% I seem to lose the signal in the noise.
Thanks!
Here's how I would do it... Did you learn anything new and/or helpful?
my $file_name = "somefile.txt";
open my $fh, '<', $file_name or die "Could not open file $file_name: $!";
my #list;
while (<$fh>)
{
push #list, $1 if /^FLAG\((\w+)/;
}
Things worth pointing out:
In a while loop condition (and ONLY in a while loop condition), reading from a filehandle will set the value to $_ and check that the file was read automatically.
A statement can be modified by attaching an if, unless, for, foreach, while, or until to the end of it. Then it works as a conditional or loop on that one statement.
You probably know that regex capture groups are stored in $1, $2, etc., but you might not have known that the statement will work even if the statement has an if suffix. The if is evaluated first, so push #list, $1 if /some_regex/ makes sense and will do the match first, assigning to $1 before it is needed in the push statement.
Assuming that you have all of the data together in a single string:
my #matches = $data =~ /^FLAG\((\w+)/mg;
The /g modifier means to match as many times as possible, the /m makes ^ match after any newline (not only at the beginning of the string) and a match in list context returns all of the captures for all of those matches.
If you're reading the data in line-by-line then Platinum Azure's solution is the one you want.
map is your friend here.
use strict;
use warnings;
use File::Slurp;
my #matches = map { /^FLAG\((\w+)/ } read_file('file.txt');

Why can't I set $LIST_SEPARATOR in Perl?

I want to set the LIST_SEPARATOR in perl, but all I get is this warning:
Name "main::LIST_SEPARATOR" used only once: possible typo at ldapflip.pl line 7.
Here is my program:
#!/usr/bin/perl -w
#vals;
push #vals, "a";
push #vals, "b";
$LIST_SEPARATOR='|';
print "#vals\n";
I am sure I am missing something obvious, but I don't see it.
Thanks
Only the mnemonic is available
$" = '|';
unless you
use English;
first.
As described in perlvar. Read the docs, please.
The following names have special meaning to Perl. Most punctuation names have reasonable mnemonics, or analogs in the shells. Nevertheless, if you wish to use long variable names, you need only say
use English;
at the top of your program. This aliases all the short names to the long names in the current package. Some even have medium names, generally borrowed from awk. In general, it's best to use the
use English '-no_match_vars';
invocation if you don't need $PREMATCH, $MATCH, or $POSTMATCH, as it avoids a certain performance hit with the use of regular expressions. See English.
perlvar is your friend:
• $LIST_SEPARATOR
• $"
This is like $, except that it applies to array and slice values interpolated into a double-quoted string (or similar interpreted string). Default is a space. (Mnemonic: obvious, I think.)
$LIST_SEPARATOR is only avaliable if you use English; If you don't want to use English; in all your programs, use $" instead. Same variable, just with a more terse name.
Slightly off-topic (the question is already well answered), but I don't get the attraction of English.
Cons:
A lot more typing
Names not more obvious (ie, I still have to look things up)
Pros:
?
I can see the benefit for other readers - especially people who don't know Perl very well at all. But in that case, if it's a question of making code more readable later, I would rather this:
{
local $" = '|'; # Set interpolated list separator to '|'
# fun stuff here...
}
you SHOULD use the strict pragma:
use strict;
you might want to use the diagnostics pragma to get additional hits about the warnings (that you already have enabled with the -w flag):
use diagnostics;