STL map in Perl using SWIG - perl

This is duplicate of my question on SWIG mailing list.
I am trying to use stl containers in my SWIG bindings. Everything works perfectly except for stl map handling in Perl. On C++ side, I have
std::map<std::string, std::string> TryMap(const std::map<std::string, std::string> &map) {
std::map<std::string, std::string> modified(map);
modified["7"] = "!";
return modified;
}
SWIG config look like this
%module stl
%include "std_string.i"
%include "std_map.i"
%template(StringStringMap) std::map<std::string, std::string>;
%{
#include "stl.h"
%}
%include "stl.h"
In my Python script I can call TryMap this way
print dict(stl.TryMap({'a': '4'}))
and get beautiful output
{'a': '4', '7': '!'}
but in Perl I call
print Dumper stl::TryMap({'a' => '4'});
and get an error
TypeError in method 'TryMap', argument 1 of type 'std::map< std::string,std::string > const &' at perl.pl line 7.
I can actually do something like
my $map = stl::TryMap(stl::StringStringMap->new());
print $map->get('7');
and get '!', but this is not an option because there is a lot of legacy code using "TryMap" that expects normal Perl hash as its output.
I believe there is a way work this out because SWIG solves this particular problem nicely in Python and even in Perl if I use stl vectors and strings but not maps.
Is there any way to handle stl map with Perl in SWIG? I am using latest SWIG 2.0.7
UPDATE Maybe there is something wrong with perl5/std_map.i. It is too short =)
$ wc -l perl5/std_map.i python/std_map.i
74 perl5/std_map.i
305 python/std_map.i

I put your C++ function into header file as an inline function for testing.
I was then able to construct a SWIG interface that does what you are looking for. It has two key parts. Firstly I wrote a typemap that will allow either a std::map, or a perl hash to be given as input to C++ functions that expect a std::map. In the case of the latter it builds a temporary map from the perl hash to use as the argument. (Which is convenient but potentially slow). The typemap picks the correct behaviour by checking what it was actually passed in.
The second part of the solution is to map some of the C++ map's member functions onto the special functions that perl uses for overloading operations on hashes. Most of these are implemented simply with %rename where the C++ function and perl functions are compatible however FIRSTKEY and NEXTKEY don't map well onto C++'s iterators, so these were implemented using %extend and (internally) another std::map to store the iteration state of the maps we're wrapping.
There are no special typemaps implemented here for returning the maps, however there is extra behaviour via the special operations that are now implemented.
The SWIG interface looks like:
%module stl
%include <std_string.i>
%include <exception.i>
%rename(FETCH) std::map<std::string, std::string>::get;
%rename(STORE) std::map<std::string, std::string>::set;
%rename(EXISTS) std::map<std::string, std::string>::has_key;
%rename(DELETE) std::map<std::string, std::string>::del;
%rename(SCALAR) std::map<std::string, std::string>::size;
%rename(CLEAR) std::map<std::string, std::string>::clear;
%{
#include <map>
#include <string>
// For iteration support, will leak if iteration stops before the end ever.
static std::map<void*, std::map<std::string, std::string>::const_iterator> iterstate;
const char *current(std::map<std::string, std::string>& map) {
std::map<void*, std::map<std::string, std::string>::const_iterator>::iterator it = iterstate.find(&map);
if (it != iterstate.end() && map.end() == it->second) {
// clean up entry in the global map
iterstate.erase(it);
it = iterstate.end();
}
if (it == iterstate.end())
return NULL;
else
return it->second->first.c_str();
}
%}
%extend std::map<std::string, std::string> {
std::map<std::string, std::string> *TIEHASH() {
return $self;
}
const char *FIRSTKEY() {
iterstate[$self] = $self->begin();
return current(*$self);
}
const char *NEXTKEY(const std::string&) {
++iterstate[$self];
return current(*$self);
}
}
%include <std_map.i>
%typemap(in,noblock=1) const std::map<std::string, std::string>& (void *argp=0, int res=0, $1_ltype tempmap=0) {
res = SWIG_ConvertPtr($input, &argp, $descriptor, %convertptr_flags);
if (!SWIG_IsOK(res)) {
if (SvROK($input) && SvTYPE(SvRV($input)) == SVt_PVHV) {
fprintf(stderr, "Convert HV to map\n");
tempmap = new $1_basetype;
HV *hv = (HV*)SvRV($input);
HE *hentry;
hv_iterinit(hv);
while ((hentry = hv_iternext(hv))) {
std::string *val=0;
// TODO: handle errors here
SWIG_AsPtr_std_string SWIG_PERL_CALL_ARGS_2(HeVAL(hentry), &val);
fprintf(stderr, "%s => %s\n", HeKEY(hentry), val->c_str());
(*tempmap)[HeKEY(hentry)] = *val;
delete val;
}
argp = tempmap;
}
else {
%argument_fail(res, "$type", $symname, $argnum);
}
}
if (!argp) { %argument_nullref("$type", $symname, $argnum); }
$1 = %reinterpret_cast(argp, $ltype);
}
%typemap(freearg,noblock=1) const std::map<std::string, std::string>& {
delete tempmap$argnum;
}
%template(StringStringMap) std::map<std::string, std::string>;
%{
#include "stl.h"
%}
%include "stl.h"
I then adapted your sample perl to test:
use Data::Dumper;
use stl;
my $v = stl::TryMap(stl::StringStringMap->new());
$v->{'a'} = '1';
print Dumper $v;
print Dumper stl::TryMap({'a' => '4'});
print Dumper stl::TryMap($v);
foreach my $key (keys %{$v}) {
print "$key => $v->{$key}\n";
}
print $v->{'7'}."\n";
Which I was able to run successfully:
Got map: 0x22bfb80
$VAR1 = bless( {
'7' => '!',
'a' => '1'
}, 'stl::StringStringMap' );
Convert HV to map
a => 4
Got map: 0x22af710
In C++ map: a => 4
$VAR1 = bless( {
'7' => '!',
'a' => '4'
}, 'stl::StringStringMap' );
Got map: 0x22bfb20
In C++ map: 7 => !
In C++ map: a => 1
$VAR1 = bless( {
'7' => '!',
'a' => '1'
}, 'stl::StringStringMap' );
7 => !
a => 1
!
You can also tie this object to a hash, for example:
use stl;
my $v = stl::TryMap(stl::StringStringMap->new());
print "$v\n";
tie %foo, "stl::StringStringMap", $v;
print $foo{'a'}."\n";
print tied(%foo)."\n";
In theory you can write an out typemap to set up this tie automatically on return from every function call, but so far I've not succeeded in writing a typemap that works with both the tying and the SWIG runtime type system.
It should be noted that this isn't production ready code. There's a thread safety issue for the internal map and some error handling missing too that I know of. I've also not fully tested all of hash operations work from the perl side beyond what you see above. It would also be nice to make it more generic, by interacting with the swig_map_common macro. Finally I'm not a perl guru by any means and I've not used the C API much so some caution in that area would be in order.

Related

Get blob uploaded data with pure Perl

In Javascript, I am sending a blob using XHR by the following code:
var v=new FormData();
v.append("EFD",new Blob([...Uint8Array...]));
var h=new XMLHttpRequest();
h.setRequestHeader("Content-type","multipart/form-data; charset=utf-8");
h.open("POST","...url...");
h.send(v);
In the server, I have created in Perl the following function, that suppose to implement CGI->param and CGI->upload:
# QS (Query String) receive in argument string for single parameter or array of many required parameters.
# If string been supplied: Return the value of the parameter or undef if missing.
# If array been supplied, a hash will be returned with keys for param names and their corresponding values.
# If the first argument is undef, then return hash with ALL available parameters.
sub QS {
my $b=$ENV{'QUERY_STRING'};
if($ENV{'REQUEST_METHOD'} eq "POST") {
read(STDIN,$b,$ENV{'CONTENT_LENGTH'}) or die "E100";
}
my $e=$_[0]; my $t=&AT($e); my $r={}; my #q=split(/&/,$b);
my %p=(); if($t eq "A") { %p=map { $_=>1 } #{$e}; }
foreach my $i(#q) {
my ($k,$s)=split(/=/,$i); $s=~tr/+//; $s=~s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
if($t eq "") { $r->{$k}=$s; }
elsif($t eq "A") { if($p{$k}) { $r->{$k}=$s; } }
elsif($k eq $_[0]) { return $s; }
}
return $r;
}
# AT is a function for determining type of an object, and also a quck way to distinguish between just a string and a number.
sub AT {
if(!defined $_[0]) { return ""; } my $v=ref($_[0]);
if($v eq "") { return ($_[0]*1 eq $_[0])?"N":"S"; }
my $k={"ARRAY"=>"A","HASH"=>"H"};
return $k->{$v}||$_[0]->{_obt}||$v;
}
So in the main program it will be called as:
my $EFD=&FW::QS("EFD"); # FW = The module name where QS and AT are.
When I issuing the POST from the client, the script in the server does not pop-up any errors, and does not terminates - it continues to run and run and run.... Endlessly.... Consuming 100% CPU time and 100% memory - without any explanation.
I have these in the beginning of the script, though:
use strict;
use warnings;
use diagnostics;
but it still behave in such a way that I need to kill the script in order to terminate it...
Anyone know what I did wrong...? No infinite loop here, as far as I know... If I change the Blob to regular classic way of "...url...?EFD=dhglkhserkhgoi" then it works just fine, but I want a Blob....
Thanks a lot
This QS function is only usable for POSTs with an application/x-www-urlencoded body, which yours isn't.

How can I turn an op address into the right kind of B::OP?

In a running Perl program if I have an Op address (either by B::Concise, Devel::Callsite or via mysterious other ways) is there a simple way to cast that into the right kind of B::OP, short of walking an Opcode tree?
To try to make this clearer, here's some code:
use Devel::Callsite;
use B::Concise qw(set_style);
use B;
sub testing
{
sub foo { callsite() };
my $op_addr = foo;
printf "Op address is 0x%x\n", $op_addr;
# I can get OPs by walking and looking for $op_addr,
# but I don't want to do that.
my $walker = B::Concise::compile('-terse', '-src', \&testing);
B::Concise::walk_output(\my $buf);
$walker->(); # walks and renders into $buf;
print $buf;
}
testing();
When this is run you'll see something like:
$ perl /tmp/foo.pl
Op address is 0x2026940
B::Concise::compile(CODE(0x1f32b18))
UNOP (0x1f40fd0) leavesub [1]
LISTOP (0x20aa870) lineseq
# 8: my $op_addr = foo;
COP (0x1f7cd80) nextstate
BINOP (0x20aba80) sassign
UNOP (0x20ad200) entersub [2]
UNOP (0x1f39b80) null [148]
OP (0x1fd14f0) pushmark
UNOP (0x1f397c0) null [17]
SVOP (0x1f39890) gv GV (0x1fa0968) *foo
OP (0x2026940) padsv [1]
^^^^^^^^^^
....
So 0x2026940 is the address of a B::OP and and which according to this has next(), sibling(), name() methods. If the address were say 0x20aa870 that would be the address of a LISTOP which has in addition a children() method.
I added B::Concise just to show what's going on. In practice I don't want to walk the optree, because I'm assuming/hoping that the address is in fact where that listop resides.
So perhaps there are two parts, first casting an address to B::Op which I believe is the parent class, but after that I'd like to know which kind of Op, (UNOP, BINOP, LISTOP) we are then talking about.
If I can get the cast part done, the second part is probably easy: all B::OP's have a name() method, so from that I can figure out what subclass of OP I have.
EDIT:
ikegami's solution is now part of Devel::Callsite version 1.0.1 al though it isn't quite right.
This duplicates B's internal make_op_object.
use B qw( );
use Inline C => <<'__EOS__';
static const char * const opclassnames[] = {
"B::NULL",
"B::OP",
"B::UNOP",
"B::BINOP",
"B::LOGOP",
"B::LISTOP",
"B::PMOP",
"B::SVOP",
"B::PADOP",
"B::PVOP",
"B::LOOP",
"B::COP",
"B::METHOP",
"B::UNOP_AUX"
};
SV *make_op_object(IV o_addr) {
const OP *o = INT2PTR(OP*, o_addr);
SV *opsv = newSV(0);
sv_setiv(newSVrv(opsv, opclassnames[op_class(o)]), o_addr);
return opsv;
}
__EOS__
Example use:
use Devel::Callsite qw( callsite );
my $site = sub { return callsite() };
my $addr = $site->();
my $op = make_op_object($addr);
say $op->name;

Using Number::Phone to validate and format

I'm trying to use Number::Phone from CPAN to accomplish 2 tasks:
Validate a Phone Number; and
Format the number in E.164 Notation.
However, I'm unable to figure out how it works. My sample code is:
#!/usr/bin/perl -w
use strict;
use warnings;
use Number::Phone;
foreach my $fnum ( '17888888', '97338888888', '00923455555333', '+97366767777' , '38383838') {
my $phone = Number::Phone->new($fnum);
my $norm = "";
eval {
$norm = $phone->format_using('E123'); # or 'Raw'
print "E164 => '$norm'\n";
} or do {
print STDERR "Unable to parse '$fnum'\n";
}
}
Expected output:
E164 => '+97317888888'
E164 => '+97338888888'
E164 => '+923455555333'
E164 => '+97366767777'
E164 => '+97338383838'
But the results were incorrect. I tried using Number::Phone::Normalize, but still not successful:
#!/usr/bin/perl -w
use strict;
use warnings;
use Number::Phone::Normalize;
my %params = (
'CountryCode'=>'973',
'IntlPrefix' =>'00',
'CountryCodeOut'=>'973',
'IntlPrefixOut' => '+',
);
my $nlz = Number::Phone::Normalize->new( %params );
foreach my $number ('17888888', '97338888888', '00923455555333', '+97366767777' , '38383838') {
my $e164 = $nlz->intl( $number );
print "E164 => '$e164'\n";
}
with the same expected output of:
E164 => '+97317888888'
E164 => '+97338888888'
E164 => '+923455555333'
E164 => '+97366767777'
E164 => '+97338383838'
However, this produced the wrong results too. The snippet Java code below works perfectly, and it's what I'm trying to achieve in Perl.
// Uses libphonenumber: http://code.google.com/p/libphonenumber/
// setenv CLASSPATH .:libphonenumber-8.5.2.jar
// libphonenumber
import com.google.i18n.phonenumbers.PhoneNumberUtil;
import com.google.i18n.phonenumbers.Phonenumber.PhoneNumber;
import com.google.i18n.phonenumbers.NumberParseException;
import com.google.i18n.phonenumbers.PhoneNumberUtil.PhoneNumberFormat;
public class ValidateList {
public static void main(String[] args) {
try {
if (args.length != 1) {
throw new IllegalArgumentException("Invalid number of arguments.");
}
String file = args[0];
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
try (java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(file))) {
String line = null;
while ((line = br.readLine()) != null) {
try {
PhoneNumber phoneNumber = phoneUtil.parse(line, "BH");
boolean isValid = phoneUtil.isValidNumber(phoneNumber);
if (isValid) {
System.out.println( "E164 => " + phoneUtil.format(phoneNumber, PhoneNumberFormat.E164) );
}
else {
System.err.println( "Invalid => " + line);
}
}
catch (NumberParseException e) {
System.err.println("NumberParseException for ("+line+"): " + e.toString());
}
}
}
}
catch (Exception e) {
System.err.println(e);
System.err.println("Usage: java ValidateList <fileNameWithPhoneNumbers>");
}
}
}
% cat input.txt
17888888
97338888888
00923455555333
+97366767777
38383838
% javac -cp libphonenumber-8.5.2.jar ValidateList.java
% java -cp .:libphonenumber-8.5.2.jar ValidateList input.txt
E164 => +97317888888
E164 => +97338888888
E164 => +923455555333
E164 => +97366767777
E164 => +97338383838
Your input is greatly appreciated.
When I run the first example code for the numbers, two of those fail to be parsed:
17888888 - this is obvious, when calling Number::Phone without a country code, this will not be parsed as it's unclear what country this is from
00923455555333 - 923 is, according to a quick google search, the country code for Pakistan. The Wikipedia page for dialing codes in Pakistan shows no 455, leading me to think that this is not a known area code to either Number::Phone or Wikipedia. I suspect it is an invalid number.
So for the first Number: specify which country this is supposed to be from.
If you are certain the other number is correct, you know more about that than the developer of Number::Phone currently and I'm sure he'd be happy to receive your input in the form of a more complete Number::Phone localized package.
The fact that your Java code accepts the (probably) invalid number does not necessarily mean it is more correct, just that it is less picky in what it declares to be a correct number.
Edit:
Asking Phone::Number to parse the input '+923455555333' instead of '00923455555333' leads to the desired output.
Looking at the source of Phone::Number:
# ... processing input arguments
$number = "+$number" unless($number =~ /^\+/);
It becomes clear that the 00 is interpreted as '+00' and then rejected as being an invalid number.
View some discussion on that here
It seems to me you will have to handle this yourself.
One way may be to simply replace leading 00 with '+' - preferably only if parsing failed.
The other number can be parsed if you make it clear what country it should belong to.
Perhaps like so:
my $phone = Number::Phone->new($fnum);
unless ($phone){
$phone = Number::Phone->new('BH',$fnum);
if ( !$phone && $fnum =~ s/^00/+/ ){
# You should probably check the discussion I linked.
# There may well be problems with this approach!
$phone = Number::Phone->new($fnum);
}
}

SendMessageTimeout API in Perl

I am trying to replace SendMessage API with SendMessageTimeout in an installation script (that refreshes the environment -- registry and stuff). After replacing the installer is crashing. I have compiled the sub routine separately and it works alright.
Is it because SendMessageTimeout is in a different module -- Win32::GUI? I am unable to find a source to download this module. Apologies for the naivety, I am totally new to Perl and this is the only change I had to make.
use Win32::API;
sub refreshEnvironment()
{
use constant WM_WININICHANGE => 0x001A;
use constant HWND_BROADCAST => 0xffff;
use constant SMTO_ABORTIFHUNG => 0x0002;
print("Refreshing the environment.\n");
my $sm = new Win32::API(
"user32",
"SendMessageTimeout",
['N', 'N', 'I', 'P', 'N','I', 'P'], 'N'
);
if (! defined ($sm)) {
print("SendMessage api did not initialize.\n");
return;
}
my $buffer = "Environment";
my $res = $sm->Call(HWND_BROADCAST, WM_WININICHANGE, 0, $buffer, SMTO_ABORTIFHUNG, 2000, NULL);
print("SendMessage refresh environment done\n");
}
refreshEnvironment();

Writing simple parser in Perl: having lexer output, where to go next?

I'm trying to write a simple data manipulation language in Perl (read-only, it's meant to transform SQL-inspired queries into filters and properties to use with vSphere Perl API: http://pubs.vmware.com/vsphere-60/topic/com.vmware.perlsdk.pg.doc/viperl_advancedtopics.5.1.html_)
I currently have something similar to lexer output if I understand it properly - a list of tokens like this (Data::Dumper prints array of hashes):
$VAR1 = {
'word' => 'SHOW',
'part' => 'verb',
'position' => 0
};
$VAR2 = {
'part' => 'bareword',
'word' => 'name,',
'position' => 1
};
$VAR3 = {
'word' => 'cpu,',
'part' => 'bareword',
'position' => 2
};
$VAR4 = {
'word' => 'ram',
'part' => 'bareword',
'position' => 3
};
Now what I'd like to do is to build a syntax tree. The documentation I've seen so far is mostly on using modules and generating grammars from BNF, but at the moment I can't wrap my head around it.
I'd like to tinker with relatively simple procedural code, probably recursive, to make some ugly implementation myself.
What I'm currently thinking about is building a string of $token->{'part'}s like this:
my $parts = 'verb bareword bareword ... terminator';
and then running a big and ugly regular expression against it, (ab)using Perl's capability to embed code into regular expressions: http://perldoc.perl.org/perlretut.html#A-bit-of-magic:-executing-Perl-code-in-a-regular-expression:
$parts =~ /
^verb(?{ do_something_smart })\s # Statement always starts with a verb
(bareword\s(?{ do_something_smart }))+ # Followed by one or more barewords
| # Or
# Other rules duct taped here
/x;
Whatever I've found so far requires solid knowledge of CS and/or linguistics, and I'm failing to even understand it.
What should I do about lexer output to start understanding and tinker with proper parsing? Something like 'build a set of temporary hashes representing smaller part of statement' or 'remove substrings until the string is empty and then validate what you get'.
I'm aware of the Dragon Book and SICP, but I'd like something lighter at this time.
Thanks!
As mentioned in a couple of comments above, but here again as a real answer:
You might like Parser::MGC. (Disclaimer: I'm the author of Parser::MGC)
Start by taking your existing (regexp?) definitions of various kinds of token, and turn them into "token_..." methods by using the generic_token method.
From here, you can start to build up methods to parse larger and larger structures of your grammar, by using the structure-building methods.
As for actually building an AST - it's possibly simplest to start with to simply emit HASH references with keys containing named parts of your structure. It's hard to tell a grammatical structure from your example given in the question, but you might for instance have a concept of a "command" that is a "verb" followed by some "nouns". You might parse that using:
sub parse_command
{
my $self = shift;
my $verb = $self->token_verb;
my $nouns = $self->sequence_of( sub { $self->token_noun } );
# $nouns here will be an ARRAYref
return { type => "command", verb => $verb, nouns => $nouns };
}
It's usually around this point in writing a parser that I decide I want some actual typed objects instead of mere hash references. One easy way to do this is via another of my modules, Struct::Dumb:
use Struct::Dumb qw( -named_constructors );
struct Command => [qw( verb nouns )];
...
return Command( verb => $verb, nouns => $nouns );