Assigning a string to Perl substr? - perl

I am looking at Perl script written by someone else, and I found this:
$num2 = '000000';
substr($num2, length($num2)-length($num), length($num)) = $num;
my $id_string = $text."_".$num2
Forgive me ignorance, but for an untrained Perl programmer the second line looks as if the author is assigning the string $num to the result of the function substr. What does this line exactly do?

Exactly what you think it would do:
$ perldoc -f substr
You can use the substr() function as an lvalue, in which case
EXPR must itself be an lvalue. If you assign something shorter
than LENGTH, the string will shrink, and if you assign
something longer than LENGTH, the string will grow to
accommodate it. To keep the string the same length, you may
need to pad or chop your value using "sprintf".

In Perl, (unlike say, Python, where strings, tuples are not modifiable in-place), strings can be modified in situ. That is what substr is doing here, it is modifying only a part of the string. Instead of this syntax, you can use the more cryptic syntax:
substr($num2, length($num2)-length($num), length($num),$num);
which accomplishes the same thing. You can further stretch it. Imagine you want to replace all instances of foo by bar in a string, but only within the first 50 characters. Perl will let you do it in a one-liner:
substr($target,0,50) =~ s/foo/bar/g;
Great, isn't it?

"Exactly", you ask?
Normally, substr returns a boring string (PV with POK).
$ perl -MDevel::Peek -e'$_="abcd"; Dump("".substr($_, 1, 2));'
SV = PV(0x99f2828) at 0x9a0de38
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x9a12510 "bc"\0
CUR = 2
LEN = 12
However, when substr is evaluated where an lvalue (assignable value) is expected, it returns a magical scalar (PVLV with GMG (get magic) and SMG (set magic)).
$ perl -MDevel::Peek -e'$_="abcd"; Dump(substr($_, 1, 2));'
SV = PVLV(0x8941b90) at 0x891f7d0
REFCNT = 1
FLAGS = (TEMP,GMG,SMG)
IV = 0
NV = 0
PV = 0
MAGIC = 0x8944900
MG_VIRTUAL = &PL_vtbl_substr
MG_TYPE = PERL_MAGIC_substr(x)
TYPE = x
TARGOFF = 1
TARGLEN = 2
TARG = 0x8948c18
FLAGS = 0
SV = PV(0x891d798) at 0x8948c18
REFCNT = 2
FLAGS = (POK,pPOK)
PV = 0x89340e0 "abcd"\0
CUR = 4
LEN = 12
This magical scalar holds the parameters passed to susbtr (TARG, TARGOFF and TARGLEN). You can see the scalar pointed by TARG (the original scalar passed to substr) repeated at the end (the SV at 0x8948c18 you see at the bottom).
Any read of this magical scalar results in an associated function to be called instead. Similarly, a write calls a different associated function. These functions cause the selected part of the string passed to substr to be read or modified.
perl -E'
$_ = "abcde";
my $ref = \substr($_, 1, 3); # $$ref is magical
say $$ref; # bcd
$$ref = '123';
say $_; # a123e
'

Looks to me like it's overwriting the last length($num) characters of $num2 with the contents of $num in order to get a '0' filled number.
I imagine most folks would accomplish this same task w/ sprintf()

Related

In perl, when assigning a subroutine's return value to a variable, is the data duplicated in memory?

sub foo {
my #return_value = (1, 2);
}
my #receiver = foo();
Is this assigning like any other assigning in perl? the array is duplicated in memory? I doubt this cause of that since the array held by the subroutine is disposable, a duplication is totally redundant. it makes sense to just 'link' the array to #receiver for optimization reason.
by the way, I noticed a similar question Perl: function returns reference or copy? but didn't get what I want.
and I'm talking about Perl5
ps. any books or materials on such sort of topics about perl?
The scalars returned by :lvalue subs aren't copied.
The scalars returned by XS subs aren't copied.
The scalars returned by function (named operators) aren't copied.
The scalars returned by other subs are copied.
But that's before any assignment comes into play. If you assign the returned values to a variable, you will be copying them (again, in the case of a normal Perl sub).
This means my $y = sub { $x }->(); copies $x twice!
But that doesn't really matter because of optimizations.
Let's start with an example of when they aren't copied.
$ perl -le'
sub f :lvalue { my $x = 123; print \$x; $x }
my $r = \f();
print $r;
'
SCALAR(0x465eb48) # $x
SCALAR(0x465eb48) # The scalar on the stack
But if you remove :lvalue...
$ perl -le'
sub f { my $x = 123; print \$x; $x }
my $r = \f();
print $r;
'
SCALAR(0x17d0918) # $x
SCALAR(0x17b1ec0) # The scalar on the stack
Worse, one usually follows up by assigning the scalar to a variable, so a second copy occurs.
$ perl -le'
sub f { my $x = 123; print \$x; $x }
my $r = \f(); # \
print $r; # > my $y = f();
my $y = $$r; # /
print \$y;
'
SCALAR(0x1802958) # $x
SCALAR(0x17e3eb0) # The scalar on the stack
SCALAR(0x18028f8) # $y
On the plus side, assignment in optimized to minimize the cost of copying strings.
XS subs and functions (named operators) typically return mortal ("TEMP") scalars. These are scalars "on death row". They will be automatically destroyed if nothing steps in to claim a reference to them.
In older versions of Perl (<5.20), assigning a mortal string to another scalar will cause ownership of the string buffer to be transferred to avoid having to copy the string buffer. For example, my $y = lc($x); doesn't copy the string created by lc; simply the string pointer is copied.
$ perl -MDevel::Peek -e'my $s = "abc"; Dump($s); $s = lc($s); Dump($s);'
SV = PV(0x1705840) at 0x1723768
REFCNT = 1
FLAGS = (PADMY,POK,IsCOW,pPOK)
PV = 0x172d4c0 "abc"\0
CUR = 3
LEN = 10
COW_REFCNT = 1
SV = PV(0x1705840) at 0x1723768
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x1730070 "abc"\0 <-- Note the change of address from stealing
CUR = 3 the buffer from the scalar returned by lc.
LEN = 10
In newer versions of Perl (≥5.20), the assignment operator never[1] copies the string buffer. Instead, newer versions of Perl uses a copy-on-write ("COW") mechanism.
$ perl -MDevel::Peek -e'my $x = "abc"; my $y = $x; Dump($x); Dump($y);'
SV = PV(0x26b0530) at 0x26ce230
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x26d68a0 "abc"\0 <----+
CUR = 3 |
LEN = 10 |
COW_REFCNT = 2 +-- Same buffer (0x26d68a0)
SV = PV(0x26b05c0) at 0x26ce248 |
REFCNT = 1 |
FLAGS = (POK,IsCOW,pPOK) |
PV = 0x26d68a0 "abc"\0 <----+
CUR = 3
LEN = 10
COW_REFCNT = 2
Ok, so far, I've only talked about scalars. Well, that's because subs and functions can only return scalars[2].
In your example, the scalar assigned to #return_value would be returned[3], copied, then copied a second time into #receiver by the assignment.
You could avoid all of this by returning a reference to the array.
sub f { my #fizbobs = ...; \#fizbobs }
my $fizbobs = f();
The only thing copied there is a reference, the simplest non-undefined scalar.
Ok, maybe not never. I think there needs to be a free byte in the string buffer to hold the COW count.
In list context, they can return 0, 1 or many of them, but they can only return scalars.
The last operator of your sub is a list assignment operator. In list context, the list assignment operator returns the scalars to which its left-hand side (LHS) evaluates. See Scalar vs List Assignment Operator for more info.
The subroutine returns the result of the last operation if you don't specify an explicit return.
#return_value is created separately from #receiver and the values are copied and the memory used by #return_value is released when it goes out of scope at subroutine exit.
So yes - the memory used is duplicated.
If you desperately want to avoid this, you can create an anonymous array once, and 'pass' a reference to it around:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $anon_array_ref = [ 1, 2 ];
return $anon_array_ref;
}
my $results_from_foo = foo();
print Dumper $results_from_foo;
This will usually be premature optimisation though, unless you know you're dealing with really big data structures.
Note - you should probably include an explicit return; in your sub after the assignment, as it's good practice to make clear what you're doing.

Is the value returned by refaddr permanent?

According to Scalar::Util's documentation, refaddr works like this:
my $addr = refaddr( $ref );
If $ref is reference the internal memory address of the referenced value is returned as a plain integer. Otherwise undef is returned.
However, this doesn't tell me if $addr is permanent. Could the refaddr of a reference change over time? In C, for example, running realloc could change the location of something stored in dynamic memory. Is this analogous for Perl 5?
I'm asking because I want to make an inside-out object, and I'm wondering whether refaddr($object) would make a good key. It seems simplest when programming in XS, for example.
First of all, don't reinvent the wheel; use Class::InsideOut.
It is permanent. It must be, or the following would fail:
my $x;
my $r = \$x;
... Do something with $x ...
say $$r;
Scalars have a "head" at a fixed location. If the SV needs an upgrade (e.g. to hold a string), it's a second memory block known as the "body" that will change. The string buffer is yet a third memory block.
$ perl -MDevel::Peek -MScalar::Util=refaddr -E'
my $x=4;
my $r=\$x;
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
say "";
say "Upgrade SV:";
$x='abc';
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
say "";
say "Increase PV size:";
$x="x"x20;
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
'
refaddr=0x2e1db58
SV = IV(0x2e1db48) at 0x2e1db58 <-- SVt_IV variables can't hold strings.
REFCNT = 2
FLAGS = (PADMY,IOK,pIOK)
IV = 4
Upgrade SV:
refaddr=0x2e1db58
SV = PVIV(0x2e18b40) at 0x2e1db58 <-- Scalar upgrade to SVt_PVIV.
REFCNT = 2 New body at new address,
FLAGS = (PADMY,POK,IsCOW,pPOK) but head still at same address.
IV = 4
PV = 0x2e86f20 "abc"\0 <-- The scalar now has a string buffer.
CUR = 3
LEN = 10
COW_REFCNT = 1
Increase PV size:
refaddr=0x2e1db58
SV = PVIV(0x2e18b40) at 0x2e1db58
REFCNT = 2
FLAGS = (PADMY,POK,pPOK)
IV = 4
PV = 0x2e5d7b0 "xxxxxxxxxxxxxxxxxxxx"\0 <-- Changing the address of the string buffer
REFCNT = 2 doesn't change anything else.
CUR = 20
LEN = 22

Difference between $var = 500 and $var = '500'

In Perl, what is the difference between
$status = 500;
and
$status = '500';
Not much. They both assign five hundred to $status. The internal format used will be different initially (IV vs PV,UTF8=0), but that's of no importance to Perl.
However, there are things that behave different based on the choice of storage format even though they shouldn't. Based on the choice of storage format,
JSON decides whether to use quotes or not.
DBI guesses the SQL type it should use for a parameter.
The bitwise operators (&, | and ^) guess whether their operands are strings or not.
open and other file-related builtins encode the file name using UTF-8 or not. (Bug!)
Some mathematical operations return negative zero or not.
As already #ikegami told not much. But remember than here is MUCH difference between
$ perl -E '$v=0500; say $v'
prints 320 (decimal value of 0500 octal number), and
$ perl -E '$v="0500"; say $v'
what prints
0500
and
$ perl -E '$v=0900; say $v'
what dies with error:
Illegal octal digit '9' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.
And
perl -E '$v="0300";say $v+1'
prints
301
but
perl -E '$v="0300";say ++$v'
prints
0301
similar with 0x\d+, e.g:
$v = 0x900;
$v = "0x900";
There is only a difference if you then use $var with one of the few operators that has different flavors when operating on a string or a number:
$string = '500';
$number = 500;
print $string & '000', "\n";
print $number & '000', "\n";
output:
000
0
To provide a bit more context on the "not much" responses, here is a representation of the internal data structures of the two values via the Devel::Peek module:
user#foo ~ $ perl -MDevel::Peek -e 'print Dump 500; print Dump "500"'
SV = IV(0x7f8e8302c280) at 0x7f8e8302c288
REFCNT = 1
FLAGS = (PADTMP,IOK,READONLY,pIOK)
IV = 500
SV = PV(0x7f8e83004e98) at 0x7f8e8302c2d0
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,pPOK)
PV = 0x7f8e82c1b4e0 "500"\0
CUR = 3
LEN = 16
Here is a dump of Perl doing what you mean:
user#foo ~ $ perl -MDevel::Peek -e 'print Dump ("500" + 1)'
SV = IV(0x7f88b202c268) at 0x7f88b202c270
REFCNT = 1
FLAGS = (PADTMP,IOK,READONLY,pIOK)
IV = 501
The first is a number (the integer between 499 and 501). The second is a string (the characters '5', '0', and '0'). It's not true that there's no difference between them. It's not true that one will be converted immediately to the other. It is true that strings are converted to numbers when necessary, and vice-versa, and the conversion is mostly transparent, but not completely.
The answer When does the difference between a string and a number matter in Perl 5 covers some of the cases where they're not equivalent:
Bitwise operators treat numbers numerically (operating on the bits of the binary representation of each number), but they treat strings character-wise (operating on the bits of each character of each string).
The JSON module will output a string as a string (with quotes) even if it's numeric, but it will output a number as a number.
A very small or very large number might stringify differently than you expect, whereas a string is already a string and doesn't need to be stringified. That is, if $x = 1000000000000000 and $y = "1000000000000000" then $x might stringify to 1e+15. Since using a variable as a hash key is stringifying, that means that $hash{$x} and $hash{$y} may be different hash slots.
The smart-match (~~) and given/when operators treat number arguments differently from numeric strings. Best to avoid those operators anyway.
There are different internally:)
($_ ^ $_) ne '0' ? print "$_ is string\n" : print "$_ is numeric\n" for (500, '500');
output:
500 is numeric
500 is string
I think this perfectly demonstrates what is going on.
$ perl -MDevel::Peek -e 'my ($a, $b) = (500, "500");print Dump $a; print Dump $b; $a.""; $b+0; print Dump $a; print Dump $b'
SV = IV(0x8cca90) at 0x8ccaa0
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 500
SV = PV(0x8acc20) at 0x8ccad0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x8c5da0 "500"\0
CUR = 3
LEN = 16
SV = PVIV(0x8c0f88) at 0x8ccaa0
REFCNT = 1
FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
IV = 500
PV = 0x8d3660 "500"\0
CUR = 3
LEN = 16
SV = PVIV(0x8c0fa0) at 0x8ccad0
REFCNT = 1
FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
IV = 500
PV = 0x8c5da0 "500"\0
CUR = 3
LEN = 16
Each scalar (SV) can have string (PV) and or numeric (IV) representation. Once you use variable with only string representation in any numeric operation and one with only numeric representation in any string operation they have both representations. To be correct, there can be also another number representation, the floating point representation (NV) so there are three possible representation of scalar value.
Many answers already to this question but i'll give it a shot for the confused newbie:
my $foo = 500;
my $bar = '500';
As they are, for practical pourposes they are the "same". The interesting part is when you use operators.
For example:
print $foo + 0;
output: 500
The '+' operator sees a number at its left and a number at its right, both decimals, hence the answer is 500 + 0 => 500
print $bar + 0;
output: 500
Same output, the operator sees a string that looks like a decimal integer at its left, and a zero at its right, hence 500 + 0 => 500
But where are the differences?
It depends on the operator used. Operators decide what's going to happen. For example:
my $foo = '128hello';
print $foo + 0;
output: 128
In this case it behaves like atoi() in C. It takes biggest numeric part starting from the left and uses it as a number. If there are no numbers it uses it as a 0.
How to deal with this in conditionals?
my $foo = '0900';
my $bar = 900;
if( $foo == $bar)
{print "ok!"}
else
{print "not ok!"}
output: ok!
== compares the numerical value in both variables.
if you use warnings it will complain about using == with strings but it will still try to coerce.
my $foo = '0900';
my $bar = 900;
if( $foo eq $bar)
{print "ok!"}
else
{print "not ok!"}
output: not ok!
eq compares strings for equality.
You can try "^" operator.
my $str = '500';
my $num = 500;
if ($num ^ $num)
{
print 'haha\n';
}
if ($str ^ $str)
{
print 'hehe\n';
}
$str ^ $str is different from $num ^ $num so you will get "hehe".
ps, "^" will change the arguments, so you should do
my $temp = $str;
if ($temp ^ $temp )
{
print 'hehe\n';
}
.
I usually use this operator to tell the difference between num and str in perl.

View Perl Variables as Bytes/Bits

Disclaimer: It's been ages since I've done any perl, so if I'm asking/saying something stupid please correct me.
Is it possible to view a byte/bit representation of a perl variable? That is, if I say something like
my $foo = 'a';
I know (think?) the computer sees $foo as something like
0b1100010
Is there a way to get perl to print out the binary representation of a variable?
(Not asking for any practical purpose, just tinkering around with a old friend and trying to understand it more deeply than I did in 1997)
Sure, using unpack:
print unpack "B*", $foo;
Example:
% perl -e 'print unpack "B*", "bar";'
011000100110000101110010
The perldoc pages for pack and perlpacktut give a nice overview about converting between different representations.
The place to start if you want the actual internals is a document called "perlguts". Either perldoc perlguts or read it here: http://perldoc.perl.org/perlguts.html
After seeing the way that Andy interpreted your question, I can follow up by saying that Devel::Peek has a Dump function which can show the internal representation of a variable. It won't take it to the binary level, but if what you are interested in is the internals, you might look at this.
$ perl -MDevel::Peek -e 'my $foo="a";Dump $foo';
SV = PV(0x7fa8a3004e78) at 0x7fa8a3031150
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x7fa8a2c06190 "a"\0
CUR = 1
LEN = 16
$ perl -MDevel::Peek -e 'my %bar=(x=>"y",a=>"b");Dump \%bar'
SV = IV(0x7fbc5182d6e8) at 0x7fbc5182d6f0
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x7fbc51831168
SV = PVHV(0x7fbc5180c268) at 0x7fbc51831168
REFCNT = 2
FLAGS = (PADMY,SHAREKEYS)
ARRAY = 0x7fbc5140f9f0 (0:6, 1:2)
hash quality = 125.0%
KEYS = 2
FILL = 2
MAX = 7
RITER = -1
EITER = 0x0
Elt "a" HASH = 0xca2e9442
SV = PV(0x7fbc51804f78) at 0x7fbc51807340
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fbc5140fa60 "b"\0
CUR = 1
LEN = 16
Elt "x" HASH = 0x9303a5e5
SV = PV(0x7fbc51804e78) at 0x7fbc518070d0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fbc514061a0 "y"\0
CUR = 1
LEN = 16
And one more way:
printf "%v08b\n", 'abc';
output:
01100001.01100010.0110001
(The v flag is a perl-only printf/sprintf feature and also works with numeric formats other than b.)
This differs from the unpack suggestion where there are characters greater than "\xff": unpack will only return the 8 low bits (with a warning), printf '%v...' will show all the bits:
$ perl -we'printf "%vX\n", "\cA\13P\x{1337}"'
1.B.50.1337
You can use ord to return the numeric value of a character, and printf with a %b format to display that value in binary.
print "%08b\n”, ord 'a'
output
01100010

Converting strings to floats

could soemone help me with the following condition, please?
I'm trying to compare $price and $lsec.
if( (sprintf("%.2f", ($price*100+0.5)/100)*1 != $lsec*1) )
{
print Dumper($price,$lsec)
}
Sometimes the dumper prints same numbers(as strings) and jumps in.
Thought, that multiplying with 1 makes floats from them...
Here dumper output:
$VAR1 = '8.5';
$VAR2 = '8.5';
What am I doing wrong?
Thank you,
Greetings and happy easter.
There is a difference between what is stored in a Perl variable and how it is used. You are correct that multiplying by 1 forces a variable to be used as a number. It also causes the number to be stored in the SV data structure that represents the variable to the interpreter. You can use the Devel::Peek module to see what Perl has stored in each variable:
use Devel::Peek;
my $num = "8.5";
Dump $num;
outputs:
SV = PV(0xa0a46d8) at 0xa0c3f08
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0xa0be8c8 "8.5"\0
CUR = 3
LEN = 4
continuing...
my $newnum = $num * 1;
Dump $num;
Dump $newnum;
outputs:
SV = PVNV(0xa0a46d8) at 0xa0c3f08
REFCNT = 1
FLAGS = (PADMY,NOK,POK,pIOK,pNOK,pPOK)
IV = 8
NV = 8.5
PV = 0xa0be8c8 "8.5"\0
CUR = 3
LEN = 4
SV = NV(0x9523660) at 0x950df20
REFCNT = 1
FLAGS = (PADMY,NOK,pNOK)
NV = 8.5
The attributes we are concerned with are PV (string pointer), NV (floating-point number), and IV (integer). Initially, $num only has the string value, but using it as a number (e.g. in multiplication) causes it to store the numeric values. However, $num still "remembers" that it is a string, which is why Data::Dumper treats it like one.
For most purposes, there is no need to explicitly force the use of a string as a number, since operators and functions can use them in the most appropriate form. The == and != operators, for example, coerce their operands into numeric form to do numeric comparison. Using eq or ne instead forces a string comparison. This is one more reason to always use warnings in your Perl scripts, since trying to compare a non-numeric string with == will garner this warning:
Argument "asdf" isn't numeric in numeric eq (==) at -e line 1.
You are correct to say that multiplying a string by 1 will force it to be evaluated as a number, but the numeric != comparator will do the same thing. This is presumably a technique you have acquired from other languages as Perl will generally do the right thing and there is no need to force a cast of either operand.
Lets take a look at the values you're comparing:
use strict;
use warnings;
use Data::Dumper;
my $price = '8.5';
my $lsec = '8.5';
my $rounded_price = sprintf("%.2f", ($price * 100 + 0.5) / 100);
print "$rounded_price <=> $lsec\n";
if ( $rounded_price != $lsec ) {
print Dumper($price,$lsec);
}
output
8.51 <=> 8.5
$VAR1 = '8.5';
$VAR2 = '8.5';
So Perl is correctly saying that 8.51 is unequal to 8.5.
I suspect that your
($price * 100 + 0.5) / 100
is intended to round $price to two decimal places, but all it does in fact is to increase $price by 0.005. I think you meant to write
int($price * 100 + 0.5) / 100
but you also put the value through sprintf which is another way to do the same thing.
Either
$price = int($price * 100 + 0.5) / 100
or
$price = sprintf ".2f", $price
but both is overkill!
This part:
($price*100+0.5)/100)
If you put in 8.5, you get back 8.505. Which naturally is not equal to 8.5. Since you do not change $price, you do not notice any difference.
Perl handles conversion automatically, so you do not need to worry about that.
my $x = "8.5";
my $y = 8.5;
print "Equal" if $x == $y; # Succeeds
The nature of the comparison, == or in your case != converts the arguments to numeric, whether they are numeric or not.
You're doing nothing wrong. Perl converts it to a string before dumping it. For comparisons, use == and != for numeric comparisons and eq and ne for a string comparisons. Perl converts to strings and numbers as needed.
Example:
$ perl -MData::Dumper -e "my $a=3.1415; print Dumper($a);"
$VAR1 = '3.1415';