XS typemap for intptr_t - perl

I'm trying to return an intptr_t type from some XS code:
intptr_t
my_func( self )
myObjPtr self
CODE:
RETVAL = (intptr_t) self;
OUTPUT:
RETVAL
My typemap doesn't have anything about intptr_t, so of course dmake fails with Could not find a typemap for C type 'intptr_t'. I'm not sure if Perl even works with integers as big as intptr_t can be. If there's no good way to return this to Perl as a number, I'll just stringify it.

IV, Perl's signed integer format, is guaranteed to be large enough to hold a pointer. intptr_t is C's version of what Perl has had for a long time. (In fact, a ref is just a pointer stored in a scalar's IV slot with a flag indicating it's a reference.)
But you don't want to cast directly to an IV as that can result in a spurious warning. As Sinan Ünür points out, use PTR2IV instead.
IV
my_func()
myObjPtr self
CODE:
self = ...;
RETVAL = PTR2IV(self);
OUTPUT:
RETVAL
INT2PTR(myObjPtr, iv) does the inverse operation.

This thread suggests:
The existing mechanism which does work everywhere are the macros INT2PTR and PTR2IV (in perl.h)
From perldoc perlguts:
Because pointer size does not necessarily equal integer size, use the follow macros to do it right.
PTR2UV(pointer)
PTR2IV(pointer)
PTR2NV(pointer)
INT2PTR(pointertotype, integer)

Related

Perl variable assignment side effects

I'll be the first to admit that Perl is not my strong suit. But today I ran across this bit of code:
my $scaledWidth = int($width1x * $scalingFactor);
my $scaledHeight = int($height1x * $scalingFactor);
my $scaledSrc = $Media->prependStyleCodes($src, 'SX' . $scaledWidth);
# String concatenation makes this variable into a
# string, so we need to make it an integer again.
$scaledWidth = 0 + $scaledWidth;
I could be missing something obvious here, but I don't see anything in that code that could make $scaledWidth turn into a string. Unless somehow the concatenation in the third line causes Perl to permanently change the type of $scaledWidth. That seems ... wonky.
I searched a bit for "perl assignment side effects" and similar terms, and didn't come up with anything.
Can any of you Perl gurus tell me if that commented line of code actually does anything useful? Does using an integer variable in a concatenation expression really change the type of that variable?
It is only a little bit useful.
Perl can store a scalar value as a number or a string or both, depending on what it needs.
use Devel::Peek;
Dump($x = 42);
Dump($x = "42");
Outputs:
SV = PVIV(0x139a808) at 0x178a0b8
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 42
PV = 0x178d9e0 "0"\0
CUR = 1
LEN = 16
SV = PVIV(0x139a808) at 0x178a0b8
REFCNT = 1
FLAGS = (POK,pPOK)
IV = 42
PV = 0x178d9e0 "42"\0
CUR = 2
LEN = 16
The IV and IOK tokens refer to how the value is stored as a number and whether the current integer representation is valid, while PV and POK indicate the string representation and whether it is valid. Using a numeric scalar in a string context can change the internal representation.
use Devel::Peek;
$x = 42;
Dump($x);
$y = "X" . $x;
Dump($x);
SV = IV(0x17969d0) at 0x17969e0
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 42
SV = PVIV(0x139aaa8) at 0x17969e0
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 42
PV = 0x162fc00 "42"\0
CUR = 2
LEN = 16
Perl will seamlessly convert one to the other as needed, and there is rarely a need for the Perl programmer to worry about the internal representation.
I say rarely because there are some known situations where the internal representation matters.
Perl variables are not typed. Any scalar can be either a number or a string depending how you use it. There are a few exceptions where an operation is dependent on whether a value seems more like a number or string, but most of them have been either deprecated or considered bad ideas. The big exception is when these values must be serialized to a format that explicitly stores numbers and strings differently (commonly JSON), so you need to know which it is "supposed" to be.
The internal details are that a SV (scalar value) contains any of the values that have been relevant to its usage during its lifetime. So your $scaledWidth first contains only an IV (integer value) as the result of the int function. When it is concatenated, that uses it as a string, so it generates a PV (pointer value, used for strings). That variable contains both, it is not one type or the other. So when something like JSON encoders need to determine whether it's supposed to be a number or a string, they see both in the internal state.
There have been three strategies that JSON encoders have taken to resolve this situation. Originally, JSON::PP and JSON::XS would simply consider it a string if it contains a PV, or in other words, if it's ever been used as a string; and as a number if it only has an IV or NV (double). As you alluded to, this leads to an inordinate amount of false positives.
Cpanel::JSON::XS, a fork of JSON::XS that fixes a large number of issues, along with more recent versions of JSON::PP, use a different heuristic. Essentially, a value will still be considered a number if it has a PV but the PV matches the IV or NV it contains. This, of course, still results in false positives (example: you have the string '5', and use it in a numerical operation), but in practice it is much more often what you want.
The third strategy is the most useful if you need to be sure what types you have: be explicit. You can do this by reassigning every value to explicitly be a number or string as in the code you found. This assigns a new SV to $scaledWidth that contains only an IV (the result of the addition operation), so there is no ambiguity. Another method of being explicit is using an encoding method that allows specifying the types you want, like Cpanel::JSON::XS::Type.
The details of course vary if you're not talking about the JSON format, but that is where this issue has been most deliberated. This distinction is invisible in most Perl code where the operation, not the values, determine the type.

How does dereference work C++

I have trouble understanding what happens when calling &*pointer
int j=8;
int* p = &j;
When I print in my compiler I get the following
j = 8 , &j = 00EBFEAC p = 00EBFEAC , *p = 8 , &p = 00EBFEA0
&*p= 00EBFEAC
cout << &*p gives &*p = 00EBFEAC which is p itself
& and * have same operator precedence.I thought &*p would translate to &(*p)--> &(8) and expected compiler error.
How does compiler deduce this result?
You are stumbling over something interesting: Variables, strictly spoken, are not values, but refer to values. 8 is an integer value. After int i=8, i refers to an integer value. The difference is that it could refer to a different value.
In order to obtain the value, i must be dereferenced, i.e. the value stored in the memory location which i stands for must be obtained. This dereferencing is performed implicitly in C whenever a value of the type which the variable references is requested: i=8; printf("%d", i) results in the same output as printf("%d", 8). That is funny because variables are essentially aliases for addresses, while numeric literals are aliases for immediate values. In C these very different things are syntactically treated identically. A variable can stand in for a literal in an expression and will be automatically dereferenced. The resulting machine code makes that very clear. Consider the two functions below. Both have the same return type, int. But f has a variable in the return statement which must be dereferenced so that its value can be returned (in this case, it is returned in a register):
int i = 1;
int g(){ return 1; } // literal
int f(){ return i; } // variable
If we ignore the housekeeping code, the functions each translate into a sigle machine instruction. The corresponding assembler (from icc) is for g:
movl $1, %eax #5.17
That's pretty starightforward: Put 1 in the register eax.
By contrast, f translates to
movl i(%rip), %eax #4.17
This puts the value at the address in register rip plus offset i in the register eax. It's refreshing to see how a variable name is just an address (offset) alias to the compiler.
The necessary dereferencing should now be obvious. It would be more logical to write return *i in order to return 1, and write return i only for functions which return references — or pointers.
In your example it is indeed illogical to a degree that
int j=8;
int* p = &j;
printf("%d\n", *p);
prints 8 (i.e, p is actually dereferenced twice); but that &(*p) yields the address of the object pointed to by p (which is the address value stored in p), and is not interpreted as &(8). The reason is that in the context of the address operator a variable (or, in this case, the L-value obtained by dereferencing p) is not implicitly dereferenced the way it is in other contexts.
When the attempt was made to create a logical, orthogonal language — Algol68 —, int i=8 indeed declared an alias for 8. In order to declare a variable the long form would have been refint m = loc int := 3. Consequently what we call a pointer or reference would have had the type ref ref int because actually two dereferences are needed to obtain an integer value.
j is an int with value 8 and is stored in memory at address 00EBFEAC.
&j gives the memory address of variable j (00EBFEAC).
int* p = &j Here you define a variable p which you define being of type int *, namely a value of an address in memory where it can find an int. You assign it &j, namely an address of an int -> which makes sense.
*p gives you the value associated with the address stored in p.
The address stored in p points to an int, so *p gives you the value of that int, namely 8.
& p is the address of where the variable p itself is stored
&*p gives you the address of the value the memory address stored in p points to, which is indeed p again. &(*p) -> &(j) -> 00EBFEAC
Think about &j itself (or even &(j)). According to your logic, shouldn't j evaluate to 8 and result in &8, as well? Dereferencing a pointer or evaluating a variable results in an lvalue, which is a value that you can assign to or take the address of.
The L in "lvalue" refers to the left in "left hand side of the assignment", such as j = 10 or *p = 12. There are also rvalues, such as j + 10, or 8, which obviously cannot be assigned to.
That's just a basic explanation. In C++ there's a lot more to it, with various classes of values (but that thread might be too advanced for your current needs).

Perl SV value from pointer without copy

How I could create SV value from null terminated string without copy? Like newSVpv(const char*, STRLEN) but without copy and with moving ownership to Perl (so Perl must release that string memory). I need this to avoid huge memory allocation and copy.
I found following example:
SV *r = sv_newmortal();
SvPOK_on(r);
sv_usepvn_mg(r, string, strlen(string) + 1);
But I don't have deep knowledge of XS internals and have some doubts.
If you want Perl to manage the memory block, it needs to know how to reallocate it and deallocate it. The only memory it knows how to reallocate and deallocate is memory allocated using its allocator, Newx. (Otherwise, it would have to associate a reallocator and deallocator with each memory block.)
If you can't allocate the memory block using Newx, then your best option might be to create a read-only SV with SvLEN set to zero. That tells Perl that it doesn't own the memory. That SV could be blessed into a class that has a destructor that will deallocate the memory using the appropriate deallocator.
If you can allocate the memory block using Newx, then you can use the following:
SV* newSVpvn_steal_flags(pTHX_ const char* ptr, STRLEN len, const U32 flags) {
#define newSVpvn_steal_flags(a,b,c) newSVpvn_steal_flags(aTHX_ a,b,c)
SV* sv;
assert(!(flags & ~(SVf_UTF8|SVs_TEMP|SV_HAS_TRAILING_NUL)));
sv = newSV(0);
sv_usepvn_flags(sv, ptr, len, flags & SV_HAS_TRAILING_NUL);
if ((flags & SVf_UTF8) && SvOK(sv)) {
SvUTF8_on(sv);
}
SvTAINT(sv);
if (flags & SVs_TEMP) {
sv_2mortal(sv);
}
return sv;
}
Note: ptr should point to memory that was allocated by Newx, and it must point to the start of the block returned by Newx.
Note: Accepts flags SVf_UTF8 (to specify that ptr is the UTF-8 encoding of the string to be seen in Perl), SVs_TEMP (to have sv_2mortal called on the SV) and SV_HAS_TRAILING_NUL (see below).
Note: Some code expects the string buffer of scalars to have a trailing NUL (even though the length of the buffer is known and even though the buffer can contain NULs). If the memory block you allocated has a trailing NUL beyond the end of the data (e.g. a C-style NUL-terminated string), then pass the SV_HAS_TRAILING_NUL flag. If not, the function will attempt to extend the buffer and add a NUL.

Does 'mixing' the result of a 32-bit hash to create a 64-bit hash have any value?

For example, if you're programming in Java, and you want to create a 64-bit hash function for an arbitrary object, does it make sense to apply something like murmurHash3's 'finalizer' to the result of Object.hashCode()?
Specifically, is the following hash function
long Mix(int i)
{
long result = i;
return result ^ (result << 32) ^ (result << 33); // Or some 'better' way of mixing up the bits of i.
}
long Hash(Object o)
{
return Mix(o.hashCode());
}
better than simply doing
long Hash(Object o)
{
return o.hashCode();
}
(I'm well aware that the second one gives you nothing over a 32-bit hash)
The hash is going to be used to implement (recursive) hash-join, and the buckets are going to be determined by doing hash % prime. A concern is that it's going to be hard to make a good sequence of independent hash functions for the 'recursive' part if we only have 32-bits to start out with.
I'm thinking the answer is 'no', and that you really need to start out with a 64-bit hash which was computed directly from the value of the object.
I guess a side question is whether you actually need a 64-bit hash in the first place for the purposes of hash-join.

Perl XS and C++ passing pointer to buffer

I know almost no C++ so that's not helping, and my XS isn't much better. I'm creating an XS interface for a C++ library and I have almost all my methods working except one.
The method in Perl should look like this:
$return_data = $obj->readPath( $path );
The method is defined as this the .h file:
int readPath(const char* path, char* &buffer, bool flag=true);
The "buffer" will get allocated if it's passed in NULL.
There's two additional versions of readPath with different signatures, but they are not the ones I want. (And interestingly, when I try and compile it tells me the "candidates" are the two I don't want.) Is that because it's not understanding the "char * &"?
Can someone help with the xsub I need to write?
I'm on Perl 5.14.2.
BTW -- I've also used a typemap "long long int" to T_IV. I cannot find any documentation on how to correctly typemap long long. Any suggestions how I should typemap long long?
Thanks,
I've never dealt with C++ from C or XS. If it was C, it would be:
void
readPath(SV* sv_path)
PPCODE:
{
char* path = SvPVbyte_nolen(sv_path, len);
char* buffer = NULL;
if (!readPath(path, &buffer, 0))
XSRETURN_UNDEF;
ST(0) = sv_2mortal(newSVpv(buffer, 0));
free(buffer);
XSRETURN(1);
}
Hopefully, that works or you can adjust it to work.
I assumed:
readPath returns true/false for success/failure.
buffer isn't allocated on failure.
The deallocator for buffer is free.
Second part of the question is TYPEMAP for long long (or long long int).
Normally long is at least 32 bits and long long is at least 64. The default typemap for long is T_IV. Perl's equivalent for long long is T_IV, too.
But sometimes, you wan't to reduce warnings for cast. So, you can use T_LONG for long. T_LONG is an equivalent to T_IV but explicitly casts the return to type long. The TYPEMAP for T_LONG is descripted at $PERLLIB/ExtUtils/typemap
With this knowledge you can write you own TYPEMAP for long long int:
TYPEMAP: <<TYPEMAPS
long long int T_LONGLONG
INPUT
T_LONGLONG
$var = (long long int)SvIV($arg)
OUTPUT
T_LONGLONG
sv_setiv($arg, (IV)$var);
TYPEMAPS