Moving binary data to/from Perl using SWIG - perl

I'm trying to make it easy for me to move binary data between Perl and my C++ library.
I created a c++ struct to hand the binary_data:
struct binary_data {
unsigned long length;
unsigned char *data;
};
In my SWIG interface file for I have the following:
%typemap(in) binary_data * (binary_data temp) {
STRLEN len;
unsigned char *outPtr;
if(!SvPOK($input))
croak("argument must be a scalar string");
outPtr = (unsigned char*) SvPV($input, len);
printf("set binary_data '%s' [%d] (0x%X)\n", outPtr, len, $input);
temp.data = outPtr;
temp.length = len;
$1 = &temp;
}
%typemap(out) binary_data * {
SV *obj = sv_newmortal();
if ($1 != 0 && $1->data != 0 && $1->length > 0) {
sv_setpvn(obj, (const char*) $1->data, $1->length);
printf("get binary_data '%s' [%d] (0x%X)\n", $1->data, $1->length, obj);
} else {
sv_setsv(obj, &PL_sv_undef);
printf("get binary_data [set to undef]\n");
}
if( !SvPOK(obj) )
croak("The result is not a scalar string");
$result = obj;
}
I build my Perl module via "ExtUtils::MakeMaker" and it's all good.
I then run the following perl test script to ensure the binary data is being
set/get from a perl string correctly.
my $fr = ObjectThatContainsBinaryData->new();
my $data = "1234567890";
print ">>>PERL:swig_data_set\n";
$fr->swig_data_set($data);
print "<<<PERL:swig_data_set\n";
print ">>>PERL:swig_data_get\n";
my $rdata = $fr->swig_data_get();
print "<<<PERL: swig_data_get\n";
print "sent :" . \$data . " len=" . length($data). " '$data'\n"
."recieved:". \$rdata. " len=" . length($rdata). " '$rdata'\n";
Now the combined C++ and Perl printf stdout is:
>>>PERL:swig_data_set
set binary_data '1234567890' [10] (0x12B204D0)
<<<PERL:swig_data_set
>>>PERL:swig_data_get
get binary_data '1234567890' [10] (0x1298E4E0)
<<<PERL: swig_data_get
sent :SCALAR(0x12b204d0) len=10 '1234567890'
recieved:SCALAR(0x12bc71c0) len=0 ''
So why does it look like the perl call to sv_setpvn is failing or not working?
I don't know why when I print the returned binary data in perl, it shows as an empty scalar, but it looks fine within the SWIG C++ embedded typemap.
I'm using:
Perl v5.8.8 built for x86_64-linux-thread-multi
SWIG 2.0.1
gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)

If you replace the following line of in your %typemap(out):
$result = obj;
With
$result = obj; argvi++; //This is a hack to get the hidden stack pointer to increment before the return
The SWIG Generated code will now look like:
...
ST(argvi) = obj; argvi++;
}
XSRETURN(argvi);
}
And your test script will return the Perl String as expected.
SV = PV(0x1eae7d40) at 0x1eac64d0
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x1eb25870 "1234567890"\0
CUR = 10
LEN = 16
<<<PERL: swig_data_get
sent :SCALAR(0x1ea64530) len=10 '1234567890'
recieved:SCALAR(0x1eac64d0) len=10 '1234567890'
You should have read the SWIG 2.0 documentation on typemaps in Perl more closely:
"
30.8.2 Return values
Return values are placed on the argument stack of each wrapper function. The current value of the argument stack pointer is contained in a variable argvi. Whenever a new output value is added, it is critical that this value be incremented. For multiple output values, the final value of argvi should be the total number of output values.
"

What if you don't make it mortal? I was doing testing with Inline::C (since I've never used SWIG), and setting the SV to mortal caused problems since Inline::C was doing it for me. Perhaps SWIG uses a similar design?
Both
SV* obj = newSV(0);
sv_setpvn(obj, "abc", 3);
and
SV* obj = newSVpvn("abc", 3);
worked with Inline::C.

swig provides a module named cdata.i.
You should include this in the interface definition file.
Once you include this, it gives two functions cdata() and memmove(). Given a void * and the length of the binary data, cdata() converts it into a string type of the target language.
memmove() is the reverse. given a string type, it will copy the contents of the string(including embedded null bytes) into the C void* type.
Handling binary data becomes much simple with this module.
I hope this is what you need.

On the Perl side, could you add
use Devel::Peek;
Dump($fr->swig_data_get());
and provide the output? Thanks.

Related

Writing value 0 to a binary file

I am generating a binary file from a SystemVerilog simulation environment. Currently, I'm doing the following:
module main;
byte arr[] = {0,32, 65, 66, 67};
initial begin
int fh=$fopen("/home/hagain/tmp/h.bin","w");
for (int idx=0; idx<arr.size; idx++) begin //{
$fwrite(fh, "%0s", arr[idx]);
end //}
$fclose(fh);
$system("xxd /home/hagain/tmp/h.bin | tee /home/hagain/tmp/h.txt");
end
endmodule : main
The problem is, that when b has the value of 0, nothing is written to the file. xxd output is:
0000000: 2041 4243 ABC
Same result when casting to string as follows:
$fwrite(fh, string'(arr[idx]));
I tried to change the write command to:
$fwrite(fh, $sformatf("%0c", arr[idx]));
And then I got the same value for the first two bytes ('d0 and 'd32):
0000000: 2020 4142 43 ABC
Any idea on how to generate this binary file?
You cannot have a null(0) character in the middle of a string, it is used to terminate the string.
You should use the %u format specifier for unformated data.
module main;
byte arr[] = {0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9};
int fh, c, tmp;
initial begin
fh = $fopen("h.bin","wb");
for (int idx=0; idx<arr.size; idx+=4) begin
tmp = {<<8{arr[idx+:4]}};
$fwrite(fh, "%u", tmp);
end
$fclose(fh);
fh = $fopen("h.bin","r");
while ((c = $fgetc(fh)) != -1)
$write("%d ",c[7:0]);
$display;
end
endmodule : main
Note that %u writes a 32-bit value in least-significant bytes first order, so I reversed the bytes being written with the streaming operator {<<8{arr[idx+:4]}}. If the number of bytes is not divisible by 4, it will just pad the file with null bytes.
If you need the exact number of bytes, the you will have to use some DPI C code
#include <stdio.h>
#include "svdpi.h"
void DPI_fwrite(const char* filename,
const svOpenArrayHandle buffer){
int size = svSize(buffer,1);
char *buf = (char *)svGetArrayPtr(buffer);
FILE *fp = fopen(filename,"wb");
fwrite(buf,1,size,fp);
}
And then import it with
import "DPI-C" function void DPI_fwrite(input string filename, byte buffer[]);
...
DPI_fwrite("filename", arr);

Is there a String size limit when sending strings back to BPF code and back to userspace?

I am sending this sentence through my BPF code through a BPF Char Array here:
jmommyijsadifjasdijfa, hello, world
And when I print out my output, I only seem to get this output
jmommyij
I seem to be hitting some kind of String size limit. Is there any way to go over this string size limit and print the entire string?
Here is what my BPF code looks like:
#include <uapi/linux/bpf.h>
#define ARRAYSIZE 512
BPF_ARRAY(lookupTable, char**, ARRAYSIZE);
int helloworld2(void *ctx)
{
int k = 0;
//print the values in the lookup table
#pragma clang loop unroll(full)
for (int i = 0; i < sizeof(lookupTable); i++) {
//need to use an intermiate variable to hold the value since the pointer will not increment correctly.
k = i;
char *key = lookupTable.lookup(&k);
// if the key is not null, print the value
if (key != NULL && sizeof(key) > 1) {
bpf_trace_printk("%s\n", key);
}
}
return 0;
}
Here is my py file:
import ctypes
from bcc import BPF
b = BPF(src_file="hello.c")
lookupTable = b["lookupTable"]
#add hello.csv to the lookupTable array
f = open("hello.csv","r")
file_contents = f.read()
#append file contents to the lookupTable array
b_string1 = file_contents.encode('utf-8')
print(b_string1)
lookupTable[ctypes.c_int(0)] = ctypes.create_string_buffer(b_string1, len(b_string1))
#print(file_contents)
f.close()
# This attaches the compiled BPF program to a kernel event of your choosing,
#in this case to the sys_clone syscall which will cause the BPF program to run
#everytime the sys_clone call occurs.
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="helloworld2")
# Capture and print the BPF program's trace output
b.trace_print()
You're creating an array of 512 char** (basically u64). So you're just storing the first 8 bytes of your string the rest is discarded.
What you need is an array of 1 holding a 512 byte value:
struct data_t {
char buf[ARRAYSIZE];
};
BPF_ARRAY(lookupTable, struct data_t, ARRAYSIZE);
Also see https://github.com/iovisor/bpftrace/issues/1957

libpqxx C Aggregate Extension returns wrong data?

I am learning how to create C aggregate extensions and using libpqxx with C++ on the client side to process the data.
My toy aggregate extension has one argument of type bytea, and the state is also of type bytea. The following is the simplest example of my problem:
Server side:
PG_FUNCTION_INFO_V1( simple_func );
Datum simple_func( PG_FUNCTION_ARGS ){
bytea *new_state = (bytea *) palloc( 128 + VARHDRSZ );
memset(new_state, 0, 128 + VARHDRSZ );
SET_VARSIZE( new_state,128 + VARHDRSZ );
PG_RETURN_BYTEA_P( new_state );
}
Client side:
std::basic_string< std::byte > buffer;
pqxx::connection c{"postgresql://user:simplepassword#localhost/contrib_regression"};
pqxx::work w(c);
c.prepare( "simple_func", "SELECT simple_func( $1 ) FROM table" );
pqxx::result r = w.exec_prepared( "simple_func", buffer );
for (auto row: r){
cout << " Result Size: " << row[ "simple_func" ].size() << endl;
cout << "Raw Result Data: ";
for( int jj=0; jj < row[ "simple_func" ].size(); jj++ ) printf( "%02" PRIx8, (uint8_t) row[ "simple_func" ].c_str()[jj] ) ;
cout << endl;
}
The result on the client side prints :
Result Size: 258
Raw Result Data: 5c783030303030303030303030303030...
Where the 30 pattern repeats until the end of the string and the printed string in hex is 512 bytes.
I expected to receive an array of length 128 bytes where every byte is set to zero. What am I doing wrong?
The libpqxx version is 7.2 and PostgreSQL 12 on Ubuntu 20.04.
Addendum
Installation of the extesion sql statement;
CREATE OR REPLACE FUNCTION agg_simple_func( state bytea, arg1 bytea)
RETURNS bytea
AS '$libdir/agg_simple_func'
LANGUAGE C IMMUTABLE STRICT;
CREATE OR REPLACE AGGREGATE simple_func( arg1 bytea)
(
sfunc = agg_simple_func,
stype = bytea,
initcond = "\xFFFF"
);
The answer appears to be that the bytea type data on the client side must be retrieved as follows in the libpqxx library as of 7.0 (Not tested in earlier versions):
row[ "simple_func" ].as<std::basic_string<std::byte>>()
This retrieves the right bytea data without any conversions, string idiosyncrasies or unexpected behavior like I was seeing.
I recommend that you tackle these things one by one: first get the function to work, testing it with psql in interactive queries, then write the client code (or vice versa).
I can't speak about libpqxx, but I have to complain about your function: what you presented won't even compile, because you wrote DATUM in upper case and forgot headers and other important stuff.
This function will compile and run as you expect:
#include "postgres.h"
#include "fmgr.h"
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(simplest_func);
Datum simplest_func(PG_FUNCTION_ARGS) {
bytea *new_state = (bytea *) palloc(128 + VARHDRSZ);
memset(new_state, 0, 128 + VARHDRSZ);
SET_VARSIZE(new_state, 128 + VARHDRSZ);
PG_RETURN_BYTEA_P(new_state);
}
The memset will work that way, but the better and more idiomatic and robust way to set the value of a varlena is
memset(VARDATA(new_state), 0, 128);
I have no idea, how you got your result, but since the code you presented doesn't compile, I don't know how your function really looks.

What is RMAGICAL?

I'm trying to understand some XS code that I inherited. I've been trying to add comments to a section that invokes Perl magic stuff, but I can't find any documentation to help me understand this line:
SvRMAGICAL_off((SV *) myVar);
What is RMAGICAL for? When should one turn in on or off when working with Perl magic variables?
Update
Perlguts Illustrated is very interesting and has a little bit of info on RMAGICAL (the 'R' is for 'random'), but it doesn't say when to mess with it: http://cpansearch.perl.org/src/RURBAN/illguts-0.42/index.html
It's a flag that indicates whether a variable has "clear" magic, magic that should be called when the variable is cleared (e.g. when it's destroyed). It's used by mg_clear which is called when one attempts to do something like
undef %hash;
delete $a[4];
etc
It's derived information calculated by mg_magical that should never be touched. mg_magical will be called to update the flag when magic is added to or removed from a variable. If any of the magic attached to the scalar has a "clear" handler in its Magic Virtual Table, the scalar gets RMAGICAL set. Otherwise, it gets turned off. Effectively, this caches the information to save Perl from repeatedly checking all the magic attached to a scalar for this information.
One example use of clear magic: When a %SIG entry is cleared, the magic removes the signal handler for that signal.
Here's mg_magical:
void
Perl_mg_magical(pTHX_ SV *sv)
{
const MAGIC* mg;
PERL_ARGS_ASSERT_MG_MAGICAL;
PERL_UNUSED_CONTEXT;
SvMAGICAL_off(sv);
if ((mg = SvMAGIC(sv))) {
do {
const MGVTBL* const vtbl = mg->mg_virtual;
if (vtbl) {
if (vtbl->svt_get && !(mg->mg_flags & MGf_GSKIP))
SvGMAGICAL_on(sv);
if (vtbl->svt_set)
SvSMAGICAL_on(sv);
if (vtbl->svt_clear)
SvRMAGICAL_on(sv);
}
} while ((mg = mg->mg_moremagic));
if (!(SvFLAGS(sv) & (SVs_GMG|SVs_SMG)))
SvRMAGICAL_on(sv);
}
}
The SVs_RMG flag (which is what SvRMAGICAL tests for and SvRMAGICAL_on/SvRMAGICAL_off sets/clears) means that the variable has some magic associated with it other than a magic getter method (which is indicated by the SVs_GMG flag) and magic setter method (indicated by SVs_SMG).
I'm getting out of my depth, here, but examples of variables where RMAGIC is on include most of the values in %ENV (the ones that are set when the program begins, but not ones you define at run-time), the values in %! and %SIG, and stash values for named subroutines (i.e., in the program
package main;
sub foo { 42 }
$::{"foo"} is RMAGICAL and $::{"bar"} is not). Using Devel::Peek is a little bit, but not totally enlightening about what this magic might be:
$ /usr/bin/perl -MDevel::Peek -e 'Dump $ENV{HOSTNAME}'
SV = PVMG(0x8003e910) at 0x800715f0
REFCNT = 1
FLAGS = (SMG,RMG,POK,pPOK)
IV = 0
NV = 0
PV = 0x80072790 "localhost"\0
CUR = 10
LEN = 12
MAGIC = 0x800727a0
MG_VIRTUAL = &PL_vtbl_envelem
MG_TYPE = PERL_MAGIC_envelem(e)
MG_LEN = 8
MG_PTR = 0x800727c0 "HOSTNAME"
Here we see that the scalar held in $ENV{HOSTNAME} has an MG_TYPE and MG_VIRTUAL that give you the what, but not the how and why of this variable's magic. On a "regular" magical variable, these are usually (always?) PERL_MAGIC_sv and &PL_vtbl_sv:
$ /usr/bin/perl -MDevel::Peek -e 'Dump $='
SV = PVMG(0x8008e080) at 0x80071de8
REFCNT = 1
FLAGS = (GMG,SMG)
IV = 0
NV = 0
PV = 0
MAGIC = 0x80085aa8
MG_VIRTUAL = &PL_vtbl_sv
MG_TYPE = PERL_MAGIC_sv(\0)
MG_OBJ = 0x80071d58
MG_LEN = 1
MG_PTR = 0x80081ad0 "="
There is one place in the perl source where SvRMAGICAL_off is used -- in perlio.c, in the XS(XS_io_MODIFY_SCALAR_ATTRIBUTES).
XS(XS_io_MODIFY_SCALAR_ATTRIBUTES)
{
dXSARGS;
SV * const sv = SvRV(ST(1));
AV * const av = newAV();
MAGIC *mg;
int count = 0;
int i;
sv_magic(sv, MUTABLE_SV(av), PERL_MAGIC_ext, NULL, 0);
SvRMAGICAL_off(sv);
mg = mg_find(sv, PERL_MAGIC_ext);
mg->mg_virtual = &perlio_vtab;
mg_magical(sv);
Perl_warn(aTHX_ "attrib %" SVf, SVfARG(sv));
for (i = 2; i < items; i++) {
STRLEN len;
const char * const name = SvPV_const(ST(i), len);
SV * const layer = PerlIO_find_layer(aTHX_ name, len, 1);
if (layer) {
av_push(av, SvREFCNT_inc_simple_NN(layer));
}
else {
ST(count) = ST(i);
count++;
}
}
SvREFCNT_dec(av);
XSRETURN(count);
}
where for some reason (again, I'm out of my depth), they want that magic turned off during the mg_find call.

Perl Win32::API() call() function

Dear all,
I am trying to get the value of char pointer or string in the return of call() function for my dll.
my dll is having a function RandomDec(long , int*) and returns a string. so what will be my call using Win32::API(). I have tried this and didn't succeed. plz help
use Win32::API;
my #lpBuffer = " " x 20;
my $pp= \#lpBuffer;
my $xy=0;
my $ff= \$xy;
my $fun2 = new Win32::API('my.dll','RandomDec','NP','**P**')or die $^E;
$pp = $fun2->Call(4,$ff);
how to get using $pp ?
There are multiple errors in your code.
my #lpBuffer = " " x 20; my $pp= \#lpBuffer;
=> my $pp = " " x 20;
You are mixing arrays with strings, and you don't need a perl ref for a c ptr.
Similar for the int*.
N is for number not long. L would be unsigned, you need signed, so l.
use Win32::API;
my $pp = " " x 20; # alloc a string
my $xy = 0; # alloc an int
my $fun2 = new Win32::API('my.dll','RandomDec','lP','P') or die $^E;
$pp = $fun2->Call(4,$xy);
I haven't check if Win32::API can do lvalue assignment to char*. Normally not, so $pp will be a foreign pointer to some string after the call, and the prev. PV slot for $pp will be lost, and inaccessible from perl.
With FFI's and the WinAPI also you usually return int, not strings.
Strings only via sideeffects, as function arg.