Perl SV value from pointer without copy - perl

How I could create SV value from null terminated string without copy? Like newSVpv(const char*, STRLEN) but without copy and with moving ownership to Perl (so Perl must release that string memory). I need this to avoid huge memory allocation and copy.
I found following example:
SV *r = sv_newmortal();
SvPOK_on(r);
sv_usepvn_mg(r, string, strlen(string) + 1);
But I don't have deep knowledge of XS internals and have some doubts.

If you want Perl to manage the memory block, it needs to know how to reallocate it and deallocate it. The only memory it knows how to reallocate and deallocate is memory allocated using its allocator, Newx. (Otherwise, it would have to associate a reallocator and deallocator with each memory block.)
If you can't allocate the memory block using Newx, then your best option might be to create a read-only SV with SvLEN set to zero. That tells Perl that it doesn't own the memory. That SV could be blessed into a class that has a destructor that will deallocate the memory using the appropriate deallocator.
If you can allocate the memory block using Newx, then you can use the following:
SV* newSVpvn_steal_flags(pTHX_ const char* ptr, STRLEN len, const U32 flags) {
#define newSVpvn_steal_flags(a,b,c) newSVpvn_steal_flags(aTHX_ a,b,c)
SV* sv;
assert(!(flags & ~(SVf_UTF8|SVs_TEMP|SV_HAS_TRAILING_NUL)));
sv = newSV(0);
sv_usepvn_flags(sv, ptr, len, flags & SV_HAS_TRAILING_NUL);
if ((flags & SVf_UTF8) && SvOK(sv)) {
SvUTF8_on(sv);
}
SvTAINT(sv);
if (flags & SVs_TEMP) {
sv_2mortal(sv);
}
return sv;
}
Note: ptr should point to memory that was allocated by Newx, and it must point to the start of the block returned by Newx.
Note: Accepts flags SVf_UTF8 (to specify that ptr is the UTF-8 encoding of the string to be seen in Perl), SVs_TEMP (to have sv_2mortal called on the SV) and SV_HAS_TRAILING_NUL (see below).
Note: Some code expects the string buffer of scalars to have a trailing NUL (even though the length of the buffer is known and even though the buffer can contain NULs). If the memory block you allocated has a trailing NUL beyond the end of the data (e.g. a C-style NUL-terminated string), then pass the SV_HAS_TRAILING_NUL flag. If not, the function will attempt to extend the buffer and add a NUL.

Related

Different offset in libc's backtrace_symbols() and libunwind's unw_get_proc_name()

I make a stack trace at some point in my program. Once with libc's backtrace_symbols() function and once with unw_get_proc_name() from libunwind.
backtrace_symbols() output:
/home/jj/test/mylib.so(+0x97004)[0x7f6b47ce9004]
unw_get_proc_name() output:
ip: 0x7f6b47ce9004, offset: 0x458e4
Here you see that the instruction pointer address (0x7f6b47ce9004) is the same and correct. The function offset 0x97004 from backtrace_symbols() is also correct but not the one I get from unw_get_proc_name() (0x458e4).
Does somebody have a clue what's going on here and what might cause this difference in offsets?
Both methods use a similar code like the following examples:
backtrace():
void *array[10];
size_t size;
size = backtrace(array, 10);
backtrace_symbols_fd(array, size, STDERR_FILENO);
libunwind:
unw_cursor_t cursor;
unw_context_t context;
unw_getcontext(&context);
unw_init_local(&cursor, &context);
while (unw_step(&cursor) > 0) {
unw_word_t offset, pc;
char fname[64];
unw_get_reg(&cursor, UNW_REG_IP, &pc);
fname[0] = '\0';
(void) unw_get_proc_name(&cursor, fname, sizeof(fname), &offset);
printf ("%p : (%s+0x%x) [%p]\n", pc, fname, offset, pc);
}
I think unw_get_proc_name compute offset from an unnamed internal frame.
For example:
void f() {
int i;
while (...) {
int j;
}
}
Notice there is a variable declaration inside loop block. In this case (and depending of level of optimization), compiler may create a frame (and related unwind information) for the loop. Consequently, unw_get_proc_name compute offset from this loop instead of begin of function.
This is explained in unw_get_proc_name man page:
Note that on some platforms there is no reliable way to distinguish
between procedure names and ordinary labels. Furthermore, if symbol
information has been stripped from a program, procedure names may be
completely unavailable or may be limited to those exported via a
dynamic symbol table. In such cases, unw_get_proc_name() may return
the name of a label or a preceeding (nearby) procedure.
You may try to test again but without stripping your binary (Since unw_get_proc_name is not able to find name of function, I think your binary is stripped).

XS typemap for intptr_t

I'm trying to return an intptr_t type from some XS code:
intptr_t
my_func( self )
myObjPtr self
CODE:
RETVAL = (intptr_t) self;
OUTPUT:
RETVAL
My typemap doesn't have anything about intptr_t, so of course dmake fails with Could not find a typemap for C type 'intptr_t'. I'm not sure if Perl even works with integers as big as intptr_t can be. If there's no good way to return this to Perl as a number, I'll just stringify it.
IV, Perl's signed integer format, is guaranteed to be large enough to hold a pointer. intptr_t is C's version of what Perl has had for a long time. (In fact, a ref is just a pointer stored in a scalar's IV slot with a flag indicating it's a reference.)
But you don't want to cast directly to an IV as that can result in a spurious warning. As Sinan Ünür points out, use PTR2IV instead.
IV
my_func()
myObjPtr self
CODE:
self = ...;
RETVAL = PTR2IV(self);
OUTPUT:
RETVAL
INT2PTR(myObjPtr, iv) does the inverse operation.
This thread suggests:
The existing mechanism which does work everywhere are the macros INT2PTR and PTR2IV (in perl.h)
From perldoc perlguts:
Because pointer size does not necessarily equal integer size, use the follow macros to do it right.
PTR2UV(pointer)
PTR2IV(pointer)
PTR2NV(pointer)
INT2PTR(pointertotype, integer)

Perl XS and C++ passing pointer to buffer

I know almost no C++ so that's not helping, and my XS isn't much better. I'm creating an XS interface for a C++ library and I have almost all my methods working except one.
The method in Perl should look like this:
$return_data = $obj->readPath( $path );
The method is defined as this the .h file:
int readPath(const char* path, char* &buffer, bool flag=true);
The "buffer" will get allocated if it's passed in NULL.
There's two additional versions of readPath with different signatures, but they are not the ones I want. (And interestingly, when I try and compile it tells me the "candidates" are the two I don't want.) Is that because it's not understanding the "char * &"?
Can someone help with the xsub I need to write?
I'm on Perl 5.14.2.
BTW -- I've also used a typemap "long long int" to T_IV. I cannot find any documentation on how to correctly typemap long long. Any suggestions how I should typemap long long?
Thanks,
I've never dealt with C++ from C or XS. If it was C, it would be:
void
readPath(SV* sv_path)
PPCODE:
{
char* path = SvPVbyte_nolen(sv_path, len);
char* buffer = NULL;
if (!readPath(path, &buffer, 0))
XSRETURN_UNDEF;
ST(0) = sv_2mortal(newSVpv(buffer, 0));
free(buffer);
XSRETURN(1);
}
Hopefully, that works or you can adjust it to work.
I assumed:
readPath returns true/false for success/failure.
buffer isn't allocated on failure.
The deallocator for buffer is free.
Second part of the question is TYPEMAP for long long (or long long int).
Normally long is at least 32 bits and long long is at least 64. The default typemap for long is T_IV. Perl's equivalent for long long is T_IV, too.
But sometimes, you wan't to reduce warnings for cast. So, you can use T_LONG for long. T_LONG is an equivalent to T_IV but explicitly casts the return to type long. The TYPEMAP for T_LONG is descripted at $PERLLIB/ExtUtils/typemap
With this knowledge you can write you own TYPEMAP for long long int:
TYPEMAP: <<TYPEMAPS
long long int T_LONGLONG
INPUT
T_LONGLONG
$var = (long long int)SvIV($arg)
OUTPUT
T_LONGLONG
sv_setiv($arg, (IV)$var);
TYPEMAPS

How to reverse a string without allocating memory

I was asked this question on how to reverse a string without allocating memory. Any takers?
You cannot reverse an NSString, with or without allocating memory, because an NSString is immutable.
You cannot reverse an NSMutableString in place without allocating memory, because the only methods that NSMutableString provides to replace its contents require the new characters to be specified in an NSString, which you would have to allocate.
CFMutableString has the same “problem”.
void reverseStringBetter(char* str)
{
int i, j;
i=j=0;
j=strlen(str)1;
for (i=0; i<j; i++, j-)
{
str[i] ^= str[j] ;
str[j] ^= str[i] ;
str[i] ^= str[j] ;
}
}
It is not possible with NSString since they are immutable and the only way is to create a new string.
Though this might not be what you are looking for, you can convert the NSString to a normal c-string, and edit that in-place. You are still allocating memory, but you'll at least get half of what you want by being able to modify the string in place.
I'm not sure what your use case is for not wanting to allocate memory, or if this is simply a hypothetical.

Using memcpy/memset

When using memset or memcpy within an Obj-C program, will the compiler optimise the setting (memset) or copying (memcpy) of data into 32-bit writes or will it do it byte by byte?
You can see the libc implementations of these methods in the Darwin source. In 10.6.3, memset works at the word level. I didn't check memcpy, but probably it's the same.
You are correct that it's possible for the compiler to do the work inline instead of calling these functions. I suppose I'll let someone who knows better answer what it will do, though I would not expect a problem.
Memset will come as part of your standard C library so it depends on the implementation you are using. I would guess most implementations will copy in blocks of the native CPU size (32/64 bits) and then the remainder byte-by-byte.
Here is glibc's version of memcpy for an example implementation:
void *
memcpy (dstpp, srcpp, len)
void *dstpp;
const void *srcpp;
size_t len;
{
unsigned long int dstp = (long int) dstpp;
unsigned long int srcp = (long int) srcpp;
/* Copy from the beginning to the end. */
/* If there not too few bytes to copy, use word copy. */
if (len >= OP_T_THRES)
{
/* Copy just a few bytes to make DSTP aligned. */
len -= (-dstp) % OPSIZ;
BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ);
/* Copy whole pages from SRCP to DSTP by virtual address manipulation,
as much as possible. */
PAGE_COPY_FWD_MAYBE (dstp, srcp, len, len);
/* Copy from SRCP to DSTP taking advantage of the known alignment of
DSTP. Number of bytes remaining is put in the third argument,
i.e. in LEN. This number may vary from machine to machine. */
WORD_COPY_FWD (dstp, srcp, len, len);
/* Fall out and copy the tail. */
}
/* There are just a few bytes to copy. Use byte memory operations. */
BYTE_COPY_FWD (dstp, srcp, len);
return dstpp;
}
So you can see it copies a few bytes first to get aligned, then copies in words, then finally in bytes again. It does some optimized page copying using some kernel operations.