Regex for matching any number between 0 to 100? - perl

I need a regex to match any number between 0 to 100 including decimal numbers example:
my expression should match 1,2,2.3 ,40,40.12 ,100,100.00 like this ..thanks in advance?

Assuming you have to allow for a leading sign, you are best off writing
if ( /(?<![-+.\d])([-+]?\d+(?:\.\d*)?(?![-+.\d])/ and $1 >= 0 and $1 <= 100 ) { .. }
But if you are forced into using a regex, then you need
if ( /(?<![-+.\d])(([-+]?(?:100|\d\d)(?:\.\d*)?(?![-+.\d])/ ) { .. }
These pattern may well be more complex than necessary because they allow for the number appearing anywhere in the string. If you are simply checking an entire string to see if it matches the criteria then it could be much shorter

This would work:
(100(\.0+))|([0-9]{1,2}(\.[0-9]+)?)
match either "100" (with optional dot plus one or more zeroes) or one or two digits, optionally followed by a dot and at least one digit.

EDITED!!!
This problem was much more difficult than I initially realized. With some amount of effort, I have produced a new regex that is without error. Enjoy.
/(?<!\d)(?<!\.)(100(?:(?!\.)|(?:\.0*+|\.))(?=\D)|[0-9]?[0-9](?:\.|\.[0-9]*+)?(?=[\D]))/
This pattern will capture in $1

Related

Can I write a PCRE conditional that only needs the no-match part?

I am trying to create a regular expression to determine if a string contains a number for an SQL statement. If the value is numeric, then I want to add 1 to it. If the number is not numeric, I want to return a 1. More or less. Here is the SQL:
SELECT
field,
CASE
WHEN regexp_like(field, '^ *\d*\.?\d* *$') THEN dec(field) + 1
ELSE 1
END nextnumber
FROM mytable
This actually works, and returns something like this:
INVALID 1
00000 1
00001E 1
00379 380
00013 14
99904 99905
But to push the envelope of understanding, what if I wanted to cover negative numbers, or those with a positive sign. The sign would have to immediately precede or follow the number, but not both, and I would not want to allow white space between the sign and the number.
I came up with a conditional expression with a capture group to capture the sign on the front of the number to determine if a sign was allowed on the end, but it seems a little awkward to handle given I don't really need a yes-pattern.
Here is the modified regex: ^ ([+-]?)*\d*\.?\d*(?(1) *|[+-]? *)$
This works at regex101.com, but in order for it to work I need to have something before the pipe, so I have to duplicate the next pattern in both the yes-pattern and the no-pattern.
All that background for this question: How can I avoid that duplication?
EDIT: DB2 for i uses International Components for Unicode to provide regular expression processing. It turns out that this library does not support conditionals like PRCE, so I changed the tags on this question. The answer given by Wiktor Stribiżew provides a working alternative to the conditional by using a negative lookahead.
You do not have to duplicate the end pattern, just move it outside the conditional:
^ *([+-])?\d*\.?\d*(?(1)|[+-]?) *$
See the regex demo. So, the yes-part is empty, and the no-part has an optional pattern.
You may also solve it with a mere negative lookahead:
^ *([+-](?!.*[-+]))?\d*\.?\d*[+-]? *$
See another regex demo. Here, ([+-](?!.*[-+]))? matches (optionally) a + or - that are not followed with any 0+ char followed with another + or -.

finding a comma in string

[23567,0,0,0,0,0] and other value is [452221,0,0,0,0,0] and the value should be contineously displaying about 100 values and then i want to display only the sensor value like in first sample 23567 and in second sample 452221 , only the these values have to display . For that I have written a code
value = str2double(str(2:7));see here my attempt
so I want to find the comma in the output and only display the value before first comma
As proposed in a comment by excaza, MATLAB has dedicated functions, such as sscanf for such purposes.
sscanf(str,'[%d')
which matches but ignores the first [, and returns the next (i.e. the first) number as a double variable, and not as a string.
Still, I like the idea of using regular expressions to match the numbers. Instead of matching all zeros and commas, and replacing them by '' as proposed by Sardar_Usama, I would suggest directly matching the numbers using regexp.
You can return all numbers in str (still as string!) with
nums = regexp(str,'\d*','match')
and convert the first number to a double variable with
str2double(nums{1})
To match only the first number in str, we can use the regexp
nums = regexp(str,'[(\d*),','tokens')
which finds a [, then takes an arbitrary number of decimals (0-9), and stops when it finds a ,. By enclosing the \d* in brackets, only the parts in brackets are returned, i.e. only the numbers without [ and ,.
Final Note: if you continue working with strings, you could/should consider the regexp solution. If you convert it to a double anyways, using sscanf is probably faster and easier.
You can use regexprep as follows:
str='[23567,0,0,0,0,0]' ;
required=regexprep(str(2:end-1),',0','')
%Taking str(2:end-1) to exclude brackets, and then removing all ,0
If there can be values other than 0 after , , you can use the following more general approach instead:
required=regexprep(str(2:end-1),',[-+]?\d*\.?\d*','')

Progress 4gl Matches Queries

I'm having a problem with Progress, our usual programmer for this is out for the holidays and I have no real knowledge of the system. I need to get a list of Branches that are not one of these codes ["AXD","BOD","CLA","CNA","CTS","NOB","OFF","ONA","PRJ","WVL"].
I tried for each removals where r-brchdisplay not(matches ["AXD","BOD","CLA","CNA","CTS","NOB","OFF","ONA","PRJ","WVL"]).
display rpid.
but that syntax is obviously wrong.
Thanks
The square brackets are not a correct part of the syntax.
Matches matches one string against another -- not a set of options. I.e.
not ( r-brchdisplay matches "axd" or r-brchdisplay matches "bod" or ... )
Using MATCHES is also kind of silly since these are equality comparisons without wild-cards. MATCHES is typically used when wild-cards are involved.
MATCHES is also generally a very, very bad idea in a WHERE clause as it all but guarantees a table scan.
Alternative ways to write your WHERE clause:
not ( r-brchdisplay = "axd" or r-brchdisplay = "bod" or ... )
or
r-brchdisplay <> "axd" and r-brchdisplay <> "bod" and ...
LOOKUP() much is closer to what you probably need:
for each removals no-lock where
lookup( r-brchdisplay, "axd,bod,cla,cna,cts,nob,off,ona,prj,wvl" ) = 0:
/* do something... */
end.
(The "= 0 " means that LOOKUP did NOT find the target string...)

Regular expression repeitition: how to match expressions of variable lengths?

Essentially, here's what I want to do:
if ($expression =~ /^\d{num}\w{num}$/)
{
#doSomething
}
where num is not an identifier, but could stand for any integer greater than 0 (\d and \w were arbitrarily chosen). I want to match a string iff it contains two groups of related characters, one group immediately followed by the other, and the number of characters in each group is the same.
For this example, 123abc and 021202abcdef would match, but 43abc would not, neither would 12ab3c or 1234acbcde.
Don’t think of the string as growing from left to right, but rather from the outside in:
xy
x(xy)y
xx(xy)yy
Your regex would then be something like:
/^(x(?1)?y)$/
Where (?1) is a reference to the outer pair of parentheses. ? makes it optional in order to give a “base case” of sorts to the recursive match. This is probably the simplest example of how regexes can be used to match context-free grammars—though it’s generally easier to get right with a parser generator or parser combinator library.
Well, there's
if ($expression =~ /^(\d+)([[:alpha:]]+)$/ && length($1)==length($2))
{
#doSomething
}
A regex isn't always the best option.

validate 32bit integer with regex

I'm trying to come up with a regex that will match anything that is not a 32bit integer. My eventual goal is to match lines that are not in the following format
Integer\tInteger\tInteger\tInteger\tInteger\tInteger\tInteger
(7 32bit integers and 1 tab in between each integer)
So far I've come up with this
#!/usr/bin/perl -w
use strict;
while ( my $line = <> ) {
if ( $line =~ /^(429496729[0-6]|42949672[0-8]\d|4294967[01]\d{2}|429496[0-6]\d{3}|42949[0-5]\d{4}|4294[0-8]\d{5}|429[0-3]\d{6}|42[0-8]\d{7}|4[01]\d{8}|[1-3]\d{9}|[1-9]\d{8}|[1-9]\d{7}|[1-9]\d{6}|[1-9]\d{5}|[1-9]\d{4}|[1-9]\d{3}|[1-9]\d{2}|[1-9]\d|\d)$/ ) {
print "Match at line $.\n";
print "$line"
}
}
But I can't even get to the first step of having the regex match a 32bit numbers (once I tackle that problem I can tackle having the tabs be the way they need to be)
Am I solving this problem the right way? Any thoughts?
Am I solving this problem the right way?
Assuming validation is actually needed, my first approach would be to split on tabs, check the number of fields, check each field but not by using a regex. Doing a range check in a regex is silly! (Padding using sprintf then doing a string compare would solve overflow problems.)
Other issues:
\d matches far more than just 0-9. Use /\d/a or /[0-9]/ if you want to match just 0-9.
What about negative numbers? 32-bit integers can also be used to store 2147483647..-2147483648.
What about leading zeros and leading plus or minus signs?
What about thousand separators?
Is 10.0 an integer? Mathematically speaking, it is. Perl would also store that as an integer.
I would say no, this is not the correct way - it's very hard to try and follow that regex; while it can be done, consider if it'll make sense tomorrow. Or how hard it will be to alter if the range changes or a slight variation to the format is required :)
Here are my suggestions:
Read Is it a Number? to find out how to tell if a value is a number and, if so, extract it as one. That is, get a real numeric value, not a string. Additional checks can be done at this stage if desired to restrict what "valid" numbers are; don't restrict the range, just the format.
Use a simple range check for the extracted number - between 0 and 232-1 in this case?
You could do it all in a regex, but it's better to treat them as numbers and use math.
# Split it into fields.
my #fields = split /\t/, $line;
# Scan for fields which do not look like integers
# or are outside the unsigned 32 bit integer range
my $valid_line = !grep { /[^0-9]/ || ($_ < 0) || (2**32-1 < $_) } #fields;
All the caveats in the other answers about "what is a 32 bit integer" still apply. Is "+10" valid? "10.0"? Can't answer that without knowing why you're filtering for these numbers, adjust the logic as necessary.
And just to throw in a perl5i plug...
use perl5i::2;
my $valid_line = !grep { $_->is_integer && ($_ < 0) || (2**32-1 < $_) } #fields;