To match for a certain number - perl

I have a file which has a lot of floating point numbers like this:
4.5268e-06 4.5268e-08 4.5678e-01 4.5689e-04...
I need to check if there is atleast one number with an expoenent -1. So, I wrote this short snippet with the regex. The regex works because I checked and it does. But what I am getting in the output is all 1s. I know I am missing something very basic. Please help.
#!usr/local/bin/perl
use strict;
use warnings;
my $i;
my #values;
open(WPR,"test.txt")||die "couldnt open $!";
while(<WPR>)
{
chomp();
push #values,(/\d\.\d\d\d\de+[+-][0][1]/);
}
foreach $i (#values){
print "$i\n";}
close(WPR);

The regular expression match operator m (which you have omitted) returns true if it matches. True in Perl is usually returned as 1. (Note that most stuff is true, though).
If you want to stick with the short syntax, do this:
push #values, $1 if /(\d\.\d\d\d\de+[+-][0][1])/;

If I move the parenthesis, it works fine:
push #values,/(\d\.\d\d\d\de+[+-][0][1])/;
If there's going to be more than one match on the line, I'd add a g at the end.
If you have capture groups, and a list context, then match returns a list of capture results.
If you want to take this to its insane conclusion then:
my #values = map { /(\d\.\d\d\d\de+[+-][0][1])/g } <WPR> ;
Yes, you can use <WPR> in a list context too.
BTW, while your regex works, it probably isn't exactly what you meant. For example e+ matches one or more es. A little simpler might be:
/\d\.\d{4}e[+-]01/ ;
Which is still going to have other issues like matching x.xxxxe+01 as well.

You could try with this one:
/\d+\.\d+e-01/

Related

Perl: Grep in an array

I have an array in the below format
array
Link-IF-A<->IF-B
Link-IF-C<->IF-D
Link-IF-E<->IF-F
Link-IF-G<->IF-H
Link-IF-I<->IF-J
I am trying to search interface "IF-D" but the value always show as 0.
I want to see 1 when it matches else 0.I ahve tried all the below method but everytime result is 0.
$link = IF-D
method1 :
my $result = grep /$link/,#array;
method 2:
my $result = grep /^$link,/,#array;
method3 :
my $result = grep(/^$link$/, #array)
Thanks
Your second and third methods can never match, as none of your target strings begin with, or contain only IF-D. The second method uses ^, anchoring to the beginning of your target string, and the third method contains both ^ and $ mandating that the pattern match the entire target string, not just some portion of it. So those will always fail (it appears that you're just trying things at random to see if they work, and especially in the case of regular expressions, that's not a good way to accomplish the goal.)
The first example will match one time; the 2nd element, because the pattern IF-D matches at the end of the target string Link-IF-C<->IF-D. However, it's only going to work if your target string and your pattern are what you think they are. In the example code you showed us, the pattern string wasn't wrapped in quotes. It must be.
So, for example, this will do what you seem to want:
my $link = "IF-D";
my #array = qw(
Link-IF-A<->IF-B
Link-IF-C<->IF-D
Link-IF-E<->IF-F
Link-IF-G<->IF-H
Link-IF-I<->IF-J
);
my $found = grep /\Q$link/, #array;
print "$found\n"; # 1
The \Q isn't strictly necessary for the pattern you've demonstrated. That construct forces the contents of $link to be treated on its literal meaning, rather than possibly as metasymbols. Your example pattern doesn't contain any metasymbols, but if it accidentally did, the \Q would de-meta them.
If you think you've implemented something semantically equal to this example, and yet it's not working, then you've found exactly why people ask that those asking questions post a small self-contained snippet of code that demonstrates the behavior they're describing. If my example code doesn't clear up the problem, boil your code down to a single snippet that demonstrates the problem, and add it as an update to your question so that we can run it ourselves and see exactly what you're talking about.
You must use double quote or single quote in your assignment:
$link = "IF-D";
instead of $link = IF-D.
use strict;
use warnings;
my #array = qw(
Link-IF-A<->IF-B
Link-IF-C<->IF-D
Link-IF-E<->IF-F
Link-IF-G<->IF-H
Link-IF-I<->IF-J
);
my $link = "IF-D";
print scalar grep /$link/, #array;

How to get rid of use of an uninitialized value within an 'if' construct using a Perl regex

How do I get rid of use of an uninitialized value within an if construct using a Perl regex?
When using the code below, I get use of uninitialized value messages.
if($arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/)
When using the code below, I get no output.
if(defined($arrayOld[$i]) =~ /-(.*)/ || defined($arrayOld[$i]) =~ /\#(.*)/)
What is the proper way to check if a variable has a value given the code above?
Try:
if($arrayOld[$i] && $arrayOld[$i] =~ /-|\#(.*)/)
This first checks $arrayOld[$i] for a value before running a regx against it.
(Have also combined the || into the regex.)
From the error message in your comment, you're accessing an element of #arrayOld that isn't defined. Without seeing the rest of the code, this could indicate a bug in your program, or it could just be expected behavior.
If you understand why $arrayOld[$i] is undef, and you want to allow that without getting a warning, there's a couple of things you can do. Perl 5.10.0 introduced the defined-or operator //, which you can use to substitute the empty string for undef:
use 5.010;
...
if(($arrayOld[$i] // '') =~ /-(.*)/ || ($arrayOld[$i] // '') =~ /\#(.*)/)
Or, you can just turn off the warning:
if (do { no warnings 'uninitalized';
$arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/ })
Here, I'm using do to limit the time the warning is disabled. However, turning off the warning also suppresses the warning you'd get if $i were undef. Using // allows you to specify exactly what is allowed to be undef, and exactly what value should be used instead of undef.
Note: defined($arrayOld[$i]) =~ /-(.*)/ is running a pattern match on the result of the defined function, which is just going to be a true/false value; not the string you want to test.
To answer your question narrowly, you can prevent undefined-value warnings in that line of code with
if (defined $i && defined $arrayOld[$i]
&& ($arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/))
{
...;
}
That is, evaluating either $i or the expression $arrayOld[$i] may result in an undefined value. Note the additional layer of parentheses that are necessary as written above because of the difference in precedence between && and ||, with the former binding more tightly. For the particular patterns in your question, you could sidestep this precedence issue by combining your patterns into one regex, but this can be tricky to do in the general case.
I recommend against using the unpleasing code above. Read on to see an elegant solution to your problem that has Perl do the work for you and is much easier to read.
Looking back
From the slightly broader context of your earlier question, $i is a loop variable and by construction will certainly be defined, so testing $i is overkill. Your code blindly pulls elements from #arrayOld, and Perl happily obliges. In cases where nothing is there, you get the undefined value.
This sort of one-by-one peeking and poking is common in C programs, but in Perl, it is almost always a red flag that you could express your algorithm more elegantly. Consider the complete, working example below.
Working demonstration
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # given/when
*FILEREAD = *DATA; # for demo only
my #interesting_line = (qr/-(.*)/, qr/\#(.*)/);
$/ = ""; # paragraph mode
while(<FILEREAD>) {
chomp;
my #arrayOld = split /\n/;
my #arrayNewLines;
for (1 .. #arrayOld) {
given (shift #arrayOld) {
push #arrayNewLines, $_ when #interesting_line;
push #arrayOld, $_;
}
}
print "\#arrayOld:\n", map("$_\n", #arrayOld), "\n",
"\#arrayNewLines:\n", map("$_\n", #arrayNewLines);
}
__DATA__
#SCSI_test # put this line into #arrayNewLines
kdkdkdkdkdkdkdkd
dkdkdkdkdkdkdkdkd
- ccccccccccccccc # put this line into #arrayNewLines
Front matter
The line
use 5.10.0;
enables Perl’s given/when switch statement, and this makes for a nice way to decide which array gets a given line of input.
As the comment indicates
*FILEREAD = *DATA; # for demo only
is for the purpose of this Stack Overflow demonstration. In your real code, you have open FILEREAD, .... Placing the input from your question into Perl’s DATA filehandle allows presenting code and input in one self-contained unit, and then we alias FILEREAD to DATA so the rest of the code will drop into yours with no fuss.
The main event
The core of the processing is
for (1 .. #arrayOld) {
given (shift #arrayOld) {
push #arrayNewLines, $_ when #interesting_line;
push #arrayOld, $_;
}
}
Notice that there are no defined checks or even explicit regex matches! There’s no $i or $arrayOld[$i]! What’s going on?
You start with #arrayOld containing all the lines from the current paragraph and want to end with the interesting lines in #arrayNewLines and everything else staying in #arrayOld. The code above takes the next line out of #arrayOld with shift. If the line is interesting, we push it onto the end of #arrayNewLines. Otherwise, we put it back on the end of #arrayOld.
The statement modifier when #interesting_line performs an implicit smart-match with the topic from given. As explained in “Smart matching in detail,” when smart matching against an array, Perl implicitly loops over it and stops on the first match. In this case, the array #interesting_line contains compiled regexes that match lines you want to move to #arrayNewLines. If the current line (in $_ thanks to given) does not match any of those patterns, it goes back in #arrayOld.
We do the preceding process exactly scalar #arrayOld times, that is, once for each line in the current paragraph. This way, we process everything exactly once and do not have to worry about fussy bookkeeping over where the current array index is. Whatever is left in #arrayOld after that many shifts must be the lines we pushed back onto it, which are the uninteresting lines in the order that the occurred in the input.
Sample output
For the input in your question, the output is
#arrayOld:
kdkdkdkdkdkdkdkd
dkdkdkdkdkdkdkdkd
#arrayNewLines:
#SCSI_test # put this line into #arrayNewLines
- ccccccccccccccc # put this line into #arrayNewLines

Perl extract matches from list

I'm fairly new to perl but not to scripting languages. I have a file, and I'm trying to extract just one portion of each line that matches a regex. For example, given the file:
FLAG(123)
FLAG(456)
Not a flag
FLAG(789)
I'd like to extract the list [123, 456, 789]
The regex is obviously /^FLAG\((\w+)/. My question is, what's an easy way to extract this data in perl?
It's obviously not hard to set up a loop and do a bunch of =~ matches, but I've heard quite a bit about perl's terseness and how it has an operator for everything, so I'm wondering if there's a slick, simple way to do this.
Also, can you point me towards a good perl reference where I can find out slick ways to do other things like this when the opportunity next arises? There are many perl resources on the web, but 90% of them are too simple and the other 10% I seem to lose the signal in the noise.
Thanks!
Here's how I would do it... Did you learn anything new and/or helpful?
my $file_name = "somefile.txt";
open my $fh, '<', $file_name or die "Could not open file $file_name: $!";
my #list;
while (<$fh>)
{
push #list, $1 if /^FLAG\((\w+)/;
}
Things worth pointing out:
In a while loop condition (and ONLY in a while loop condition), reading from a filehandle will set the value to $_ and check that the file was read automatically.
A statement can be modified by attaching an if, unless, for, foreach, while, or until to the end of it. Then it works as a conditional or loop on that one statement.
You probably know that regex capture groups are stored in $1, $2, etc., but you might not have known that the statement will work even if the statement has an if suffix. The if is evaluated first, so push #list, $1 if /some_regex/ makes sense and will do the match first, assigning to $1 before it is needed in the push statement.
Assuming that you have all of the data together in a single string:
my #matches = $data =~ /^FLAG\((\w+)/mg;
The /g modifier means to match as many times as possible, the /m makes ^ match after any newline (not only at the beginning of the string) and a match in list context returns all of the captures for all of those matches.
If you're reading the data in line-by-line then Platinum Azure's solution is the one you want.
map is your friend here.
use strict;
use warnings;
use File::Slurp;
my #matches = map { /^FLAG\((\w+)/ } read_file('file.txt');

Check if given string matches one of set of prefixes, effectively

What algorithm to use to check if a given string matches one of set of prefixes, and which prefix from that set?
Other variation: given path and a set of directories, how to check if path is in one of set of directories (assuming that there are no symbolic links, or they do not matter)?
I'm interested in description or name of algorithm, or Perl module which solves this (or can be used to solve this).
Edit
Bonus points for solution which allow to effectively find 'is prefix of' relation between set of strings (set of directories)
For example, given set of directories: foo, foo/bar, foo/baz, quux, baz/quux, baz/quux/plugh the algorithm is to find that foo is prefix of foo/bar and foo/baz, and that baz/quux is prefix of baz/quux/plugh... hopefully without O(n^2) time.
The efficient way to do this would be using a Trie:
http://en.wikipedia.org/wiki/Trie
There is a package for it on CPAN:
https://metacpan.org/pod/Tree::Trie
(never used that package myself though)
You need to consider your what operations need to be the most efficient. The lookup is very cheap in a Trie, but if you only build the trie for one lookup, it might not be the fastest way...
The first function in the List::Util Core module can find if a prefix matches a string. It searches through the list of prefixes, and returns as soon as it finds a match. It does not search through the whole list if it is not necessary:
first returns the first element where the
result from BLOCK is a true value. If
BLOCK never returns true or LIST was
empty then undef is returned.
You pose an interesting question, but as I went out to look for such a thing (in List::MoreUtils for example), I kept coming back to, how is this any different than a grep. So here it is, my basic implementation based on grep. If you don't mind searching the whole list, or want all the matches here is an example:
#!/usr/bin/perl
use strict;
use warnings;
my #prefixes = qw/ pre1 pre2 pre3 /;
my $test = 'pre1fixed';
my #found = grep { $test =~ /^$_/ } #prefixes;
print "$_ is a prefix of $test\n" for #found;
I also I imagine that there must be some way to use the smart-match operator ~~ to do this in a short-circuiting way. Also, as toolic points out the List::Util function could be used for this too. This stops the search after a match is found.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/first/;
my #prefixes = qw/ pre1 pre2 pre3 /;
my $test = 'pre1fixed';
my $found = first { $test =~ /^$_/ } #prefixes;
print "$found is the prefix of $test\n";
The only algorithm I am aware of is the Aho-Corasick though I will leave it as an exercise to the reader (i.e. I don't know) to see if this will help you. I see that there is a module (Algorithm::AhoCorasick). I also believe I have read somewhere that this and trie structures are implemented in Perl's matching under certain circumstances. Perhaps someone knows where I read that? Edit: found it in SO question on matching alternatives

Can I use unpack to split a string into characters in Perl?

A common 'Perlism' is generating a list as something to loop over in this form:
for($str=~/./g) { print "the next character from \"$str\"=$_\n"; }
In this case the global match regex returns a list that is one character in turn from the string $str, and assigns that value to $_
Instead of a regex, split can be used in the same way or 'a'..'z', map, etc.
I am investigating unpack to generate a field by field interpretation of a string. I have always found unpack to be less straightforward to the way my brain works, and I have never really dug that deeply into it.
As a simple case, I want to generate a list that is one character in each element from a string using unpack (yes -- I know I can do it with split(//,$str) and /./g but I really want to see if unpack can be used this way...)
Obviously, I can use a field list for unpack that is unpack("A1" x length($str), $str) but is there some other way that kinda looks like globbing? ie, can I call unpack(some_format,$str) either in list context or in a loop such that unpack will return the next group of character in the format group until $str is exausted?
I have read The Perl 5.12 Pack pod and the Perl 5.12 pack tutorial and the Perkmonks tutorial
Here is the sample code:
#!/usr/bin/perl
use warnings;
use strict;
my $str=join('',('a'..'z', 'A'..'Z')); #the alphabet...
$str=~s/(.{1,3})/$1 /g; #...in groups of three
print "str=$str\n\n";
for ($str=~/./g) {
print "regex: = $_\n";
}
for(split(//,$str)) {
print "split: \$_=$_\n";
}
for(unpack("A1" x length($str), $str)) {
print "unpack: \$_=$_\n";
}
pack and unpack templates can use parentheses to group things much like regexps can. The group can be followed by a repeat count. * as a repeat count means "repeat until you run out of things to pack/unpack".
for(unpack("(A1)*", $str)) {
print "unpack: \$_=$_\n";
}
You'd have to run a benchmark to find out which of these is the fastest.