Parse negative numbers from string in perl - perl

How do I parse a negative number from a string in perl? I have this piece of code:
print 3 - int("-2");
It gives me 5, but I need to have 3. How do I do it?

Perl will automatically convert between strings and numbers as needed; no need for an int() operation unless you actually want to convert a floating point number (whether stored as a number or in a string) to an integer. So you can just do:
my $string = "-2";
print 3 - $string;
and get 5 (because 3 minus negative 2 is 5).

Well, 3 - (-2) really is 5. I'm not really sure what you want to achieve, but if you want to filter out negative values, why not do something like this:
$i = int("-2")
$i = ($i < 0 ? 0 : $i);
This will turn your negative values to 0 but lets the positive numbers pass.

It seems to be parsing it correctly.
3 - (-2) is 5.
If it was mistakenly parsing -2 as 2 then it would have output 3 - 2 = 1.
No matter how you add/subtract 2 from 3, you will never get 3.

You are probably thinking of some other function instead of 'int'.
try:
use List::Util qw 'max';
...
print 3 - max("-2", 0);
if you want to get 3 as result.
Regards
rbo

Related

Perl - Forming a value from bits lying between two (16 bit) fields (data across word boundaries)

I have am reading some data into 16 bit data words, and extracting VALUES from parts of the 16 bit words. Some of the values I need straddles the word boundaries.
I need to take the bits from the first word and some from the second word and join them to form a value.
I am thinking of the best way to do this. I could bit shift stuff all over the place and compose the data that way, but I am thinking there must be perhaps an easier/better way because I have many cases like this and the values are in some case different sizes (which I know since I have a data map).
For instance:
[TTTTTDDDDPPPPYYY] - 16 bit field
[YYYYYWWWWWQQQQQQ] - 16 bit field
TTTTT = 5 bit value, easily extracted
DDDD = 4 bit value, easily extracted
WWWWW = 5 bit value, easily extracted
QQQQQQ = 6 bit value, easily extracted
YYYYYYYY = 8 bit value, which straddles the word boundaries. What is the best way to extract this? In my case I have a LOT of data like this, so elegance/simplicity in a solution is what I seek.
Aside - In Perl what are the limits of left shifting? I am on a 32 bit computer, am I right to guess that my (duck) types are 32 bit variables and that I can shift that far, even though I unpacked the data as 16 bits (unpack with type n) into a variable? This situation came up in the case of trying to extract a 31 bit variable that lies between two 16 bit fields.
Lastly (someone may ask), reading/unpacking the data into 32 bit words does not help me as I still face the same issue - Data is not aligned on word boundaries but crosses it.
The size of your integers are given (in bytes) by perl -V:ivsize or programatically using use Config qw( %Config ); $Config{ivsize}. They'll have 32 bit in a 32-bit build (since they are guaranteed to be large enough to hold a pointer). That means you can use
my $i = ($hi << 16 | $lo); # TTTTTDDDDPPPPYYYYYYYYWWWWWQQQQQQ
my $q = ($i >> 0) & (2**6-1);
my $w = ($i >> 6) & (2**5-1);
my $y = ($i >> 11) & (2**8-1);
my $p = ($i >> 19) & (2**4-1);
my $d = ($i >> 23) & (2**4-1);
my $t = ($i >> 27) & (2**5-1);
If you wanted to stick to 16 bits, you could use the following:
my $y = ($hi & 0x7) << 5 | ($lo >> 11);
00000[00000000YYY ]
[ YYYYY]WWWWWQQQQQQ
------------------
[00000000YYYYYYYY]

Printing a 2500 x 2500 dimensional matrix using Perl

I am very new to Perl. Recently I wrote a code to calculate the coefficient of correlation between the atoms between two structures. This is a brief summary of my program.
for($i=1;$i<=2500;$i++)
{
for($j=1;$j<=2500;$j++)
{
calculate the correlation (Cij);
print $Cij;
}
}
This program prints all the correlations serially in a single column. But I need to print the correlations in the form of a matrix, something like..
Atom1 Atom2 Atom3 Atom4
Atom1 0.5 -0.1 0.6 0.8
Atom2 0.1 0.2 0.3 -0.5
Atom3 -0.8 0.9 1.0 0.0
Atom4 0.3 1.0 0.8 -0.8
I don't know, how it can be done. Please help me with a solution or suggest me how to do it !
Simple issue you're having. You need to print a NL after you finish printing a row. However, while i have your attention, I'll prattle on.
You should store your data in a matrix using references. This way, the way you store your data matches the concept of your data:
my #atoms; # Storing the data in here
my $i = 300;
my $j = 400;
my $value = ...; # Calculating what the value should be at column 300, row 400.
# Any one of these will work. Pick one:
my $atoms[$i][$j] = $value; # Looks just like a matrix!
my $atoms[$i]->[$j] = $value; # Reminds you this isn't really a matrix.
my ${$atoms[$1]}[$j] = $value; # Now this just looks ridiculous, but is technically correct.
My preference is the second way. It's just a light reminder that this isn't actually a matrix. Instead it's an array of my rows, and each row points to another array that holds the column data for that particular row. The syntax is still pretty clean although not quite as clean as the first way.
Now, let's get back to your problem:
my #atoms; # I'll store the calculated values here
....
my $atoms[$i]->[$j] = ... # calculated value for row $i column $j
....
# And not to print out my matrix
for my $i (0..$#atoms) {
for my $j (0..$#{ $atoms[$i] } ) {
printf "%4.2f ", $atoms[$i]->[$j]; # Notice no "\n".
}
print "\n"; # Print the NL once you finish a row
}
Notice I use for my $i (0..$#atoms). This syntax is cleaner than the C style three part for which is being discouraged. (Python doesn't have it, and I don't know it will be supported in Perl 6). This is very easy to understand: I'm incrementing through my array. I also use $#atom which is the length of my #atoms array -- or the number of rows in my Matrix. This way, as my matrix size changes, I don't have to edit my program.
The columns [$j] is a bit tricker. $atom[$i] is a reference to an array that contains my column data for row $i, and doesn't really represent a row of data directly. (This is why I like $atoms[$i]->[$j] instead of $atoms[$i][$j]. It gives me this subtle reminder.) To get the actual array that contains my column data for row $i, I need to dereference it. Thus, the actual column values are stored in row $i in the array array #{$atoms[$i]}.
To get the last entry in an array, you replace the # sigil with $#, so the last index in my
array is $#{ $atoms[$i] }.
Oh, another thing because this isn't a true matrix: Each row could have a different numbers of entries. You can't have that with a real matrix. This makes using an Array of Arrays in Perl a bit more powerful, and a bit more dangerous. If you need a consistent number of columns, you have to manually check for that. A true matrix would automatically create the required columns based upon the largest $j value.
Disclaimer: Pseudo Code, you might have to take care of special cases and especially the headers yourself.
for($i=1;$i<=2500;$i++)
{
print "\n"; # linebreak here.
for($j=1;$j<=2500;$j++)
{
calculate the correlation (Cij);
printf "\t%4f",$Cij; # print a tab followed by your float giving it 4
# spaces of room. But no linebreak here.
}
}
This is of course a very crude and quick and dirty solution. But if you save the output into a .csv file, most csv-able spreadsheet programs (OpenOfice) should easily be able to read it into a proper table. If the spreadsheet viewer of your choice can not understand tabs as delimeter, you could easily add ; or / or whatever it can use into the printf string.

Why does substr not return undef at the end of a string?

I'm not sure whether this is defined behaviour or not. I have the following code:
use strict;
use warnings;
use Data::Dumper;
my $string = 'aaaaaa0aaaa';
my $char = substr($string, length($string), 1);
my $char2 = substr($string, length($string)+1, 1);
print Dumper($char);
print Dumper($char2);
Besides getting one warning about substr() past the end of a string, I'm confused about the output:
$VAR1 = '';
$VAR1 = undef;
Perldoc says about substr:
substr EXPR,OFFSET,LENGTH
If OFFSET and LENGTH specify a substring that is partly outside the string, only the part within the string is returned. If the substring is beyond either end of the string, substr() returns the undefined value and produces a warning.
Both length($string) and length($string) + 1 are beyond the (zero-indexed) end of the string, so I don't know why substr returns the empty string in one case and undef in the other. Does it have to do with the NULL character that C uses for string termination and that is somehow returned by substr in the first case, so that there is an "invisible" last character to this string that is not counted by length? Am I missing something obvious here?
There are a couple of issues here. Firstly you should consider the substr offset to indicate position between characters thus:
S T R I N G
0 1 2 3 4 5 6
so you can see that offset 6 - the length of the string - is at the end of the string, not beyond it.
Secondly the length parameter of substr serves as an upper limit to the number of characters returned, not a requirement. That is what the documentation means by only the part within the string is returned.
Putting these together, a call like substr 'STRING', 6, 1 - asking for a maximum of one character at the end of the string - returns the empty string, while asking for anything beyond the end of the string (or before its start) gives undef.
substr($string, length($string), 1)
This gave you an empty string because, substr considers the offset between 0 to len(str), and anything beyond that range is undef.
So, substr("aa", 2, 1); -> will give you the empty string after last a
and,substr("aa", 3, 1); -> Will give you undef (Substring completely outside range)
Similarly: -
substr("aa", 2, 2); -> Will give you the empty string after last
a (Substring partly outside the range)
Now, for the second one: -
substr($string, length($string) + 1, 1)
This is already past the last allowed offset. So it returns undef value.
Suppose: -
$str = "abcd";
Then, the index will look like: -
a b c d undef
0 1 2 3 len(str) len(str) + 1
UPDATE: -
So, as #Borodin explained in his post, the character d comes between the offsets - 3 and len(str) in the above example.
But, if we try to access anything beyond len(str) including len(str), we will get an empty string, as in the documentation, which says that -
If OFFSET and LENGTH specify a substring that is partly outside the
string, only the part within the string is returned.
Also, if we try to access anything beyond len(str) excluding the len(str), we will get undef value, as in docs: -
If the substring is beyond either end of the string, substr() returns
the undefined value and produces a warning.

Trouble using 'while' loop to evaluate multiple lines, Perl

Thank you in advance for indulging an amateur Perl question. I'm extracting some data from a large, unformatted text file, and am having trouble combining the use of a 'while' loop and regular expression matching over multiple lines.
First, a sample of the data:
01-034575 18/12/2007 258,750.00 11,559.00 36 -2 0 6 -3 2 -2 0 2 1 -1 3 0 5 15
-13 -44 -74 -104 -134 -165 -196 -226 -257 -287 -318 -349 -377 -408 -438
-469 -510 -541 -572 -602 -633 -663
Atraso Promedio ---> 0.94
The first sequence, XX-XXXXXX is a loan ID number. The date and the following two numbers aren't important. '36' is the number of payments. The following sequence of positive and negative numbers represent how late/early this client was for this loan at each of the 36 payment periods. The '0.94' following 'Atraso Promedio' is the bank's calculation for average delay. The problem is it's wrong, since they substitute all negative (i.e. early) payments in the series with zeros, effectively over-stating how risky a client is. I need to write a program that extracts ID and number of payments, and then dynamically calculates a multi-line average delay.
Here's what I have so far:
#Create an output file
open(OUT, ">out.csv");
print OUT "Loan_ID,Atraso_promedio,Atraso_alt,N_payments,\n";
open(MYINPUTFILE, "<DATA.txt");
while(<MYINPUTFILE>){
chomp($_);
if($ID_select != 1 && m/(\d{2}\-\d{6})/){$Loan_ID = $1, $ID_select = 1}
if($ID_select == 1 && m/\d{1,2},\d{1,3}\.00\s+\d{1,2},\d{1,3}\.00\s+(\d{1,2})/) {$N_payments = $1, $Payment_find = 1};
if($Payment_find == 1 && $ID_select == 1){
while(m/\s{2,}(\-?\d{1,3})/g){
$N++;
$SUM = $SUM + $1;
print OUT "$Loan_ID,$1\n"; #THIS SHOWS ME WHAT NUMBERS THE CODE IS GRABBING. ACTUAL OUTPUT WILL BE WRITTEN BELOW
print $Loan_ID,"\n";
}
if(m/---> *(\d*.\d*)/){$Atraso = $1, $Atraso_select = 1}
if($ID_select == 1 && $Payment_find == 1 && $Atraso_select == 1){
...
There's more, but the while loop is where the program is breaking down. The problem is with the pattern modifier, 'g,' which performs a global search of the string. This makes the program grab numbers that I don't want, such as the '1' in loan ID and the '36' for the number of payments. I need the while loop to start from wherever the previous line in the code left off, which should be right after it has identified the number of loans. I've tried every pattern modifier that I've been able to look up, and only 'g' keeps me out of an infinite loop. I need the while loop to go to the end of the line, then start on the next one without combing over the parts of the string already fed through the program.
Thoughts? Does this make sense? Would be immensely grateful for any help you can offer. This work is pro-bono, unpaid: just trying to help out some friends in a micro-lending institution conduct a risk analysis.
Cheers,
Aaron
The problem is probably easier using split, for instance something like this:
use strict;
use warnings;
open DATA, "<DATA.txt" or die "$!";
my #payments;
my $numberOfPayments;
my $loanNumber;
while(<DATA>)
{
if(/\b\d{2}-\d{6}\b/)
{
($loanNumber, undef, undef, undef, $numberOfPayments, #payments) = split;
}
elsif(/Atraso Promedio/)
{
my (undef, undef, undef, $atrasoPromedio) = split;
# Calculate average of payments and print results
}
else
{
push(#payments, split);
}
}
If the data's clean enough, I might approach it by using split instead of regular expressions. The first line is identifiable if field[0] matches the form of a loan number and field[1] matches the format of a date; then the payment dates are an array slice of field[5..-1]. Similarly testing the first field of each line tells you where you are in the data.
Peter van her Heijden's answer is a nice simplification for a solution.
To answer the OP's question about getting the regexp to continue where it left off, see Perl operators - regexp-quote-like operators, specifically the section "Matching in list context" and the "\G assertion" section just after that.
Essentially, you can use m//gc along with the \G assertion to use regexps match where previous matches left off.
The example in the "\G assertion" section about lex-like scanners would seem to apply to this question.

How can I modify specific part of a binary scalar in Perl?

I adopt select(), sysread(), syswrite() mechanism to handle socket messages where messages are sysread() into $buffer (binary) before they are syswritten.
Now I want to change two bytes of the message, which denote the length of the whole message. At first, I use following code:
my $msglen=substr($buffer,0,2); # Get the first two bytes
my $declen=hex($msglen);
$declen += 3;
substr($buffer,0,2,$declen); # change the length
However, it doesn't work in this way. If the final value of $declen is 85, then the modified $buffer will be "0x35 0x35 0x00 0x02...". I insert digital number to $buffer but finally got ASCII!
I also tried this way:
my $msglen=substr($buffer,0,2); # Get the first two bytes,binary
$msglen += 0b11; # Or $msglen += 3;
my $msgbody=substr($buffer,2); # Get the rest part of message, binary
$buffer=join("", $msglen, $msgbody);
Sadly, this method also failed. The result is such as"0x33 0x 0x00 0x02..." I just wonder why two binary scalar can't be joined into a binary scalar?
Can you help me? Thank you!
my $msglen=substr($buffer,0,2); # Get the first two bytes
my $number = unpack("S",$msglen);
$number += 3;
my $number_bin = pack("S",$number);
substr($buffer,0,2,$number_bin); # change the length
Untested, but I think this is what you are trying to do... convert a string with two bytes representing a short int into an actual int object and then back again.
I have found another workable way -- using vec
vec($buffer, 0, 16) += 3;
You cannot join two binary buffers in Perl directly all you have to do is call pack to get an ASCII and then join it and call unpack on it to get back.