validate 32bit integer with regex - perl

I'm trying to come up with a regex that will match anything that is not a 32bit integer. My eventual goal is to match lines that are not in the following format
Integer\tInteger\tInteger\tInteger\tInteger\tInteger\tInteger
(7 32bit integers and 1 tab in between each integer)
So far I've come up with this
#!/usr/bin/perl -w
use strict;
while ( my $line = <> ) {
if ( $line =~ /^(429496729[0-6]|42949672[0-8]\d|4294967[01]\d{2}|429496[0-6]\d{3}|42949[0-5]\d{4}|4294[0-8]\d{5}|429[0-3]\d{6}|42[0-8]\d{7}|4[01]\d{8}|[1-3]\d{9}|[1-9]\d{8}|[1-9]\d{7}|[1-9]\d{6}|[1-9]\d{5}|[1-9]\d{4}|[1-9]\d{3}|[1-9]\d{2}|[1-9]\d|\d)$/ ) {
print "Match at line $.\n";
print "$line"
}
}
But I can't even get to the first step of having the regex match a 32bit numbers (once I tackle that problem I can tackle having the tabs be the way they need to be)
Am I solving this problem the right way? Any thoughts?

Am I solving this problem the right way?
Assuming validation is actually needed, my first approach would be to split on tabs, check the number of fields, check each field but not by using a regex. Doing a range check in a regex is silly! (Padding using sprintf then doing a string compare would solve overflow problems.)
Other issues:
\d matches far more than just 0-9. Use /\d/a or /[0-9]/ if you want to match just 0-9.
What about negative numbers? 32-bit integers can also be used to store 2147483647..-2147483648.
What about leading zeros and leading plus or minus signs?
What about thousand separators?
Is 10.0 an integer? Mathematically speaking, it is. Perl would also store that as an integer.

I would say no, this is not the correct way - it's very hard to try and follow that regex; while it can be done, consider if it'll make sense tomorrow. Or how hard it will be to alter if the range changes or a slight variation to the format is required :)
Here are my suggestions:
Read Is it a Number? to find out how to tell if a value is a number and, if so, extract it as one. That is, get a real numeric value, not a string. Additional checks can be done at this stage if desired to restrict what "valid" numbers are; don't restrict the range, just the format.
Use a simple range check for the extracted number - between 0 and 232-1 in this case?

You could do it all in a regex, but it's better to treat them as numbers and use math.
# Split it into fields.
my #fields = split /\t/, $line;
# Scan for fields which do not look like integers
# or are outside the unsigned 32 bit integer range
my $valid_line = !grep { /[^0-9]/ || ($_ < 0) || (2**32-1 < $_) } #fields;
All the caveats in the other answers about "what is a 32 bit integer" still apply. Is "+10" valid? "10.0"? Can't answer that without knowing why you're filtering for these numbers, adjust the logic as necessary.
And just to throw in a perl5i plug...
use perl5i::2;
my $valid_line = !grep { $_->is_integer && ($_ < 0) || (2**32-1 < $_) } #fields;

Related

Transform data to array with Perl

How do I transform my data to an array with Perl?
Here is my data:
my $data =
"203.174.38.128203.174.38.129203.174.38.1" .
"30203.174.38.131203.174.38.132203.174.38" .
".133203.174.38.134173.174.38.135203.174." .
"38.136203.174.38.137203.174.38.142";
And I want to transform it to be array like this
my #array= (
"203.174.38.128",
"203.174.38.129",
"203.174.38.130",
"203.174.38.131",
"203.174.38.132",
"203.174.38.133",
"203.174.38.134",
"173.174.38.135",
"203.174.38.136",
"203.174.38.137",
"203.174.38.142"
);
Anyone know how to do that with Perl?
If the first part of IP logged is always 203, it's kinda easy:
my #arr = split /(?<=\d)(?=203\.)/, $data;
In the example given it's not, but the first part is always 3-digit, and the second part is always 174, so it's enough to do...
my #arr = split /(?<=\d)(?=\d{3}\.174\.)/, $data;
... to get the correct result.
But please understand that it's close to impossible to give a more generic (and bulletproof) solution here - when these 'marker' parts are... too dynamic. For example, take this string...
11.11.11.22222.11.11.11
The question is, where to split it? Should it be 11.11.11.22; 222.11.11.11? Or 11.11.11.222; 22.11.11.11? Both are quite valid IPs, if you ask me. And it could get even worse, with trying to split '2222' part (can be '2; 222', '22; 22' and even '222; 2').
You can, for example, make a rule: "split each sequence of > 3 digits followed by a dot sign so that the second part of this split would always start from 3 digits":
my #arr = split /(?<=\d)(?=\d{3}\.)/, $data;
... but this will obviously fail to work properly in the ambiguous cases mentioned earlier IF there are IPs with two- or even one-digit first octet in your datastring.
If you write a regex that will match any valid value for one of the numbers in the quartet then you can just search for them all and recombine them in sets of four. This
/2[0-5][0-5]|1\d\d|[1-9]\d|\d/
matches 200-255 or 100-199 or 10-99 or 0-9, and a program to use it is shown below.
There is no way to know which option to take if there is more than one way to split the string, and this solution assigns the longest value to the first of the two ip addresses. For instance, 1.1.1.1234.1.1.1 will split as 1.1.1.123 and 4.1.1.1
use strict;
use warnings;
use feature 'say';
my $data =
"203.174.38.128203.174.38.129203.174.38.1" .
"30203.174.38.131203.174.38.132203.174.38" .
".133203.174.38.134173.174.38.135203.174." .
"38.136203.174.38.137203.174.38.142";
my $byte = qr/2[0-5][0-5]|1\d\d|\d\d|\d/;
my #bytes = $data =~ /($byte)/g;
my #addresses;
push #addresses, join('.', splice(#bytes, 0, 4)) while #bytes;
say for #addresses;
output
203.174.38.128
203.174.38.129
203.174.38.130
203.174.38.131
203.174.38.132
203.174.38.133
203.174.38.134
173.174.38.135
203.174.38.136
203.174.38.137
203.174.38.142
Using your sample, it looks like you have 3 digits for the first and last node. That would prompt using this pattern:
/(\d{3}\.\d{1,3}\.\d{1,3}\.\d{3})/
Add that with a /g switch and it will pull every one.
However, if you have a larger and divergent set of data than what you show for your sample, somebody should have separated the ips before dumping them into this string. If they are separate data points, they should have some separation.

Regex for matching any number between 0 to 100?

I need a regex to match any number between 0 to 100 including decimal numbers example:
my expression should match 1,2,2.3 ,40,40.12 ,100,100.00 like this ..thanks in advance?
Assuming you have to allow for a leading sign, you are best off writing
if ( /(?<![-+.\d])([-+]?\d+(?:\.\d*)?(?![-+.\d])/ and $1 >= 0 and $1 <= 100 ) { .. }
But if you are forced into using a regex, then you need
if ( /(?<![-+.\d])(([-+]?(?:100|\d\d)(?:\.\d*)?(?![-+.\d])/ ) { .. }
These pattern may well be more complex than necessary because they allow for the number appearing anywhere in the string. If you are simply checking an entire string to see if it matches the criteria then it could be much shorter
This would work:
(100(\.0+))|([0-9]{1,2}(\.[0-9]+)?)
match either "100" (with optional dot plus one or more zeroes) or one or two digits, optionally followed by a dot and at least one digit.
EDITED!!!
This problem was much more difficult than I initially realized. With some amount of effort, I have produced a new regex that is without error. Enjoy.
/(?<!\d)(?<!\.)(100(?:(?!\.)|(?:\.0*+|\.))(?=\D)|[0-9]?[0-9](?:\.|\.[0-9]*+)?(?=[\D]))/
This pattern will capture in $1

Can you explain the bits I'm getting from unpack?

I'm relatively inexperienced with Perl, but my question concerns the unpack function when getting the bits for a numeric value. For example:
my $bits = unpack("b*", 1);
print $bits;
This results in 10001100 being printed, which is 140 in decimal. In the reverse order it's 49 in decimal. Any other values I've tried seem to give the incorrect bits.
However, when I run $bits through pack, it produces 1 again. Is there something I'm missing here?
It seems that I jumped to conclusions when I thought my problem was solved. Maybe I should briefly explain what it is I'm trying do.
I need to convert an integer value that could be as big as 24 bits long (the point being that it could be bigger than one byte) into a bit string. This much can be accomplished using unpack and pack as suggested by #ikegami, but I also need to find a way to convert that bit string back into it's original integer (not a string representation of it).
As I mentioned, I'm relatively inexperienced with Perl, and I've been trying with no success.
I found what seems to be an optimal solution:
my $bits = sprintf("%032b", $num);
print "$bits\n";
my $orig = unpack("N", pack("B32", substr("0" x 32 . $bits, -32)));
print "$orig\n";
This might be obvious, but the other answers haven't pointed it out explicitly: The second argument in unpack("b*", 1) is being typecast to the string "1", which has an ASCII value of 31 in hex (with the most significant nibble first).
The corresponding binary would be 00110001, which is reversed to 10001100 in your output because you used "b*" instead of "B*". These correspond to the opposite "endian" forms of the binary representation. "Endian-ness" is just whether the most-significant bits go at the start or the end of the binary representation.
Yes, you're missing that different machines support different "endianness". And Perl is treating 1 like '1' so ( 0x31 ). So, you're seeing 1 -> 1000 (in ascending order) and 3 -> 1100.
"Wrong" depends on perspective and whether or not you gave Perl enough information to know what encoding and endianness you wanted.
From pack:
b A bit string (ascending bit order inside each byte, like vec()).
B A bit string (descending bit order inside each byte).
I think this is what you want:
unpack( 'B*', chr(1))
You're trying to convert an integer to binary and then back. While you can do that with pack and then unpack, the better way is to use sprintf or printf with the %b format:
my $int = 5;
my $bits = sprintf "%024b\n", $int;
print "$bits\n";
To go the other way (converting a string of 0s & 1s to an integer), the best way is to use the oct function with a 0b prefix:
my $orig = oct("0b$bits");
print "$orig\n";
As the others explained, unpack expects a string to unpack, so if you have an integer, you first have to pack it into a string. The %b format expects an integer to begin with.
If you need to do a lot of this on bytes, and speed is crucial, you could build a lookup table:
my #binary = map { sprintf '%08b', $_ } 0 .. 255;
print $binary[$int]; # Assuming $int is between 0 and 255
The ord(1) is 49. You must want something like sprintf("%064b", 1), although that does seem like overkill.
You didn't specify what you expect. I'm guessing you're expecting 00000001.
That's the correct bits for the byte you provided, at least on non-EBCDIC systems. Remember, the input of unpack is a string (mostly strings of bytes). Perhaps you wanted
unpack('b*', pack('C', 1))
Update: As others have pointed out, the above gives 10000000. For 00000001, you'd use
unpack('B*', pack('C', 1)) # 00000001
You want "B" instead of "b".
$ perl -E'say unpack "b*", "1"'
10001100
$ perl -E'say unpack "B*", "1"'
00110001
pack

(3 lines) from bash to perl?

I have these three lines in bash that work really nicely. I want to add them to some existing perl script but I have never used perl before ....
could somebody rewrite them for me? I tried to use them as they are and it didn't work
note that $SSH_CLIENT is a run-time parameter you get if you type set in bash (linux)
users[210]=radek #where 210 is tha last octet from my mac's IP
octet=($SSH_CLIENT) # split the value on spaces
somevariable=$users[${octet[0]##*.}] # extract the last octet from the ip address
These might work for you. I noted my assumptions with each line.
my %users = ( 210 => 'radek' );
I assume that you wanted a sparse array. Hashes are the standard implementation of sparse arrays in Perl.
my #octet = split ' ', $ENV{SSH_CLIENT}; # split the value on spaces
I assume that you still wanted to use the environment variable SSH_CLIENT
my ( $some_var ) = $octet[0] =~ /\.(\d+)$/;
You want the last set of digits from the '.' to the end.
The parens around the variable put the assignment into list context.
In list context, a match creates a list of all the "captured" sequences.
Assigning to a scalar in a list context, means that only the number of scalars in the expression are assigned from the list.
As for your question in the comments, you can get the variable out of the hash, by:
$db = $users{ $some_var };
# OR--this one's kind of clunky...
$db = $users{ [ $octet[0] =~ /\.(\d+)$/ ]->[0] };
Say you have already gotten your IP in a string,
$macip = "10.10.10.123";
#s = split /\./ , $macip;
print $s[-1]; #get last octet
If you don't know Perl and you are required to use it for work, you will have to learn it. Surely you are not going to come to SO and ask every time you need it in Perl right?

How do I insert a lot of whitespace in Perl?

I need to buff out a line of text with a varying but large number of whitespace. I can figure out a janky way of doing a loop and adding whitespace to $foo, then splicing that into the text, but it is not an elegant solution.
I need a little more info. Are you just appending to some text or do you need to insert it?
Either way, one easy way to get repetition is perl's 'x' operator, eg.
" " x 20000
will give you 20K spaces.
If have an existing string ($s say) and you want to pad it out to 20K, try
$s .= (" " x (20000 - length($s)))
BTW, Perl has an extensive set of operators - well worth studying if you're serious about the language.
UPDATE: The question as originally asked (it has since been edited) asked about 20K spaces, not a "lot of whitespace", hence the 20K in my answer.
If you always want the string to be a certain length you can use sprintf:
For example, to pad out $var with white space so it 20,000 characters long use:
$var = sprintf("%-20000s",$var);
use the 'x' operator:
print ' ' x 20000;