What is a recommended R interface for Perl integration? - perl

I never dealt with R, so I was wondering if anyone can recommend (either from personal experience or some reviews/comparisons) which of the several Perl/R integration modules are considered "best practices"? Ideally something which could somehow qualify for production readiness.
Google shows several different modules but I am not quite sure how to evaluate the options, having zero previous R or statistics experience (the question came from a co-worker who was interested in using R)

Yes, looks like Statistics::R is probably your best bet. It's been updated recently, Brian Cassidy is a competent developer, and it's passing its CPAN smoke tests.
There is also Statistics::useR, it has been touched relatively recently, but that one doesn't seem to be compliant with CPAN's smoke testing system, which makes me a bit nervous.
That said, I haven't used either of these.

I've personally not used it but Statistics::R looks interesting. Its got a 3 star review on CPAN ratings and is currently going through a face lift with a new maintainer.
/I3az/

What are your actual requirements in terms of
OS that R is running on
OS that Perl clients are running on
type of query you plan: 'canned' or interactive
etc pp.
I have long been a fan of Rserve as a headless R backend but I can't recall if there was a Perl client.

If you want to just read R data files, my module Statistics::R::IO would fit the bill. It's a pure Perl implementation that reads both RDS and RData files.
Starting with version 0.4, released last week, you can also use it as an Rserve client.

I've just released Statistics::NiceR. It has support for pretty much all R data types including data.frames.
It's an early release, so I'd like feedback. This is what it looks like:
#!/usr/bin/env perl
use v5.16;
use Statistics::NiceR;
use Data::Frame::Rlike;
my $r = Statistics::NiceR->new;
my $iris = $r->get('iris');
say "Subset of Iris data set";
say $iris->subset( sub { # like a SQL WHERE clause
( $_->('Sepal.Length') > 6.0 )
& ( $_->('Petal.Width') < 2 )
})->select_rows(0, 34); # grab the first and last rows
which outputs
Subset of Iris data set
-----------------------------------------------------------------------
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
-----------------------------------------------------------------------
51 7 3.2 4.7 1.4 versicolor
147 6.3 2.5 5 1.9 virginica
-----------------------------------------------------------------------

I have recently added Statistics::RserveClient to CPAN. This allows perl applications to interact with a (possibly remote) Rserve server via a connection-oriented binary protocol. You send R code to the server as strings, and the results are returned as perl data structures.
There are a number of shortcomings - we don't support long packets yet, or deal properly with certain heterogeneous structures, but the code is under active development, and it works quite nicely for our basic applications.
The code is GPL, hosted at https://github.com/djun-kim/Statistics--RserveClient

Related

Convert MIndiGolog fluents to the IndiGolog causes_val format

I am using Eclipse (version: Kepler Service Release 1) with Prolog Development Tool (PDT) plug-in for Prolog development in Eclipse. Used these installation instructions: http://sewiki.iai.uni-bonn.de/research/pdt/docs/v0.x/download.
I am working with Multi-Agent IndiGolog (MIndiGolog) 0 (the preliminary prolog version of MIndiGolog). Downloaded from here: http://www.rfk.id.au/ramblings/research/thesis/. I want to use MIndiGolog because it represents time and duration of actions very nicely (I want to do temporal planning), and it supports planning for multiple agents (including concurrency).
MIndiGolog is a high-level programming language based on situation calculus. Everything in the language is exactly according to situation calculus. This however does not fit with the project I'm working on.
This other high-level programming language, Incremental Deterministic (Con)Golog (IndiGolog) (Download from here: http://sourceforge.net/p/indigolog/code/ci/master/tree/) (also made with Prolog), is also (loosly) based on situation calculus, but uses fluents in a very different way. It makes use of causes_val-predicates to denote which action changes which fluent in what way, and it does not include the situation in the fluent!
However, this is what the rest of the team actually wants. I need to rewrite MIndiGolog so that it is still an offline planner, with the nice representation of time and duration of actions, but with the causes_val predicate of IndiGolog to change the values of the fluents.
I find this extremely hard to do, as my knowledge in Prolog and of situation calculus only covers the basics, but they see me as the expert. I feel like I'm in over my head and could use all the help and/or advice I can get.
I already removed the situations from my fluents, made a planning domain with causes_val predicates, and tried to add IndiGolog code into MIndiGolog. But with no luck. Running the planner just returns "false." And I can make little sense of the trace, even when I use the GUI-tracer version of the SWI-Prolog debugger or when I try to place spy points as strategically as possible.
Thanks in advance,
Best, PJ
If you are still interested (sounds like you might not be): this isn't actually very hard.
If you look at Reiter's book, you will find that causes_vals are just effect axioms, while the fluents that mention the situation are usually successor-state-axioms. There is a deterministic way to convert from the former to the latter, and the correct interpretation of the causes_vals is done in the implementation of regression. This is always the same, and you can just copy that part of Prolog code from indiGolog to your flavor.

Where is the documentation for Perl's builtin `Internals::` package?

When using keys %:: to get a list of the currently loaded root namespaces, the Internals:: package is loaded by default (along with UNIVERSAL:: and a few others). However, I haven't found any documentation for the functions in Internals::
keys %{Internals::} returns SvREFCNT hv_clear_placeholders hash_seed SvREADONLY HvREHASH rehash_seed
All of these can probably be looked up in Perl's C API docs, but is there any Perl level documentation for them? Is the package stable? It's used by several core modules (Hash::Util for one), so I imagine it is, but the lack of documentation is a bit troubling.
I didn't see Internals.pm in the Perl distribution (different name maybe?), and it is not the Internals module up on CPAN.
Note: I fully understand that the functions in Internals:: are potentially dangerous, and I do not have any particular use in mind. I was reading through Hash::Util's source and came across it.
IIRC the code is not Internals.pm but libinternals.c. It looks like they used to be in universal.c in Perl 5.8 but got migrated out.
As per 03/2009 and Perl 5.10 they were not documented as per this perlmonks thread.
Also, in the same thread, ysth states:
Undocumented things in universal.c
should not be depended on; they should
only be used by core modules. They
aren't documented on purpose, to allow
them to be changed whenever and
however necessary. For those purposes,
the code is good enough documentation.

What is a good pure Perl on-line or streaming statistics package?

Are there any prerolled streaming statistics libraries for Perl à la: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm
I haven't found anything on CPAN yet and I really don't want to have to code one myself.
You want Statistics::Descriptive. The regular "sparse" version of the module (not Statistics::Descriptive::Full) provides statistics that are available without storing the entire dataset; variance is one of them.
You can use RSPerl module that interfaces with the R statistics package.

What are the popular, contemporary uses for Perl?

What are the popular, contemporary uses for Perl?
Edit
I should have been more specific. I was wondering more on the large scale (popular) what people are using Perl for rather than what it could be used for on the individual level.
As a glue language, system administrators' language, and now, it is back to taking-over-the-internet using Catalyst.
At my University Perl is widely used for Bioinformatic tasks. Automatic changing the format of a Proteindata file, checking with a database transforming the results back and so on.
So its mostly changing file formats, regular expressions, and parsing of huge datasets
The same as ever: Making the impossible, possible. ;-)
Along with Python, the system administrators in my company love it for driving automation tasks. "If something is worth doing, it's worth automating" seems to be a mantra, and if they can do it in five lines, all the better.
The problem with this question, is that Perl is a very versatile language. Between code golf and it's similarity to awk/sed, it is still widely used as a glue language and quick go-to language for sysadmin tasks.
With CPAN, lots of very useful and more advanced things can be written quickly.
It interfaces well with databases and there are tons of frameworks for web design. It works quite well with Ajax, as I've noticed through my own use of it.
Get into best practices, and you've got a system that is quite good at doing very large programming tasks. Heck, the whole of cpan is a testament to Perl's reusability and encapsulation.
See skills that are being sought by employers at http://jobs.perl.org/.
Somewhat confused by the question. For coding.
I think it would be better framed as: What isn't Perl used for? Which I'd answer with: Writing device drivers, anyone got any more?
It's used for gui apps (See Padre), Internet apps (Catalyst), other networking/sockets (POE), accessing databases (DBI), Cryptology (Crypt namespace), Web services (SOAP), Handling binary formats (pack/unpack)...
And of course all manner of text processing.
And that's just the stuff I've used it for.. recently.
Amazon and IMDB uses Perl, more specifically Mason, IIANM.
I currently am using Perl to write an automated testing suite for my company's web sites (using WWW::Mechanize and WWW::Selenium). One of my co-workers is doing the same for other types of servers. We also use it for our monitoring software (Nagios). And I use perl daily as a commandline tool to aid in basic sysadminy tasks.
I wrote a short, simple script to parse some data out of a log file recently. I find it pretty easy and useful for quick scripting tasks.
Try running this with the terminal size set to at least 120x50 and you will be enlightened ;).
#
sub j(\$){($
P,$V)= #_;while($$P=~s:^
([()])::x){ $V+=('('eq$1)?-32:31
}$V+=ord( substr( $$P,0,1,""))-74} sub a{
my($I,$K,$ J,$L)=#_ ;$I=int($I*$M/$Z);$K=int(
$K*$M/$Z);$J=int($J*$M /$Z);$L=int($L*$M/$Z); $G=$
J-$I;$F=$L-$K;$E=(abs($ G)>=abs($F))?$G:$F;($E<0) and($
I,$K)=($J,$L);$E||=.01 ;for($i=0;$i<=abs$E;$i++ ){ $D->{$K
+int($i*$F/$E) }->{$I+int($i*$G/$E)}=1}}sub p{$D={};$
Z=$z||.01;map{ $H=$_;$I=$N=j$H;$K=$O=j$H;while($H){$q=ord
substr($H,0,1,"" );if(42==$q){$J=j$H;$L=j$H}else{$q-=43;$L =$q
%9;$J=($q-$L)/9;$L=$q-9*$J-4;$J-=4}$J+=$I;$L+=$K;a($I,$K,$J,$ L);
($I,$K)=($J,$L)}a($I,$K,$N,$O)}#_;my$T;map{$y=$_;map{ $T.=$D->{$y}
->{$_}?$\:' '}(-59..59);$T.="\n"}(-23..23);print"\e[H$T"}$w= eval{
require Win32::Console::ANSI};$b=$w?'1;7;':"";($j,$u,$s,$t,$a,$n,$o
,$h,$c,$k,$p,$e,$r,$l,$C)=split/}/,'Tw*JSK8IAg*PJ[*J#wR}*JR]*QJ[*J'.
'BA*JQK8I*JC}KUz]BAIJT]*QJ[R?-R[e]\RI'.'}Tn*JQ]wRAI*JDnR8QAU}wT8KT'.
']n*JEI*EJR*QJ]*JR*DJ#IQ[}*JSe*JD[n]*JPe*'.'JBI/KI}T8#?PcdnfgVCBRcP'.
'?ABKV]]}*JWe*JD[n]*JPe*JC?8B*JE};Vq*OJQ/IP['.'wQ}*JWeOe{n*EERk8;'.
'J*JC}/U*OJd[OI#*BJ*JXn*J>w]U}CWq*OJc8KJ?O[e]U/T*QJP?}*JSe*JCnTe'.
'QIAKJR}*JV]wRAI*J?}T]*RJcJI[\]3;U]Uq*PM[wV]W]WCT*DM*SJ'. 'ZP[Z'.
'PZa[\]UKVgogK9K*QJ[\]n[RI#*EH#IddR[Q[]T]T]T3o[dk*JE'. '[Z\U'.
'{T]*JPKTKK]*OJ[QIO[PIQIO[[gUKU\k*JE+J+J5R5AI*EJ00'. 'BCB*'.
'DMKKJIR[Q+*EJ0*EK';sub h{$\ = qw(% & # x)[int rand
4];map{printf "\e[$b;%dm",int(rand 6)+101-60* ($w
||0);system( "cls")if$w ;($A,$S)= ($_[1], $
_[0]);($M, #,)= split '}';for( $z=256
;$z>0; $z -=$S){$S*= $A;p #,} sleep$_
[2];while ($_[3]&&($ z+=$ S) <=256){
p#,}}("". "32}7D$j" ."}AG". "$u}OG"
."$s}WG" ."$t","" ."24}(" ."IJ$a"
."}1G$n" ."}CO$o" ."}GG$t" ."}QC"
."$h}" ."^G$e" ."})IG" ."$r",
"32}?" ."H$p}FG$e}QG$r". "}ZC"
."$l", "28}(LC" ."" ."".
"$h}:" ."J$a}EG". "$c"
."}M" ."C$k}ZG". "$e"
."}" ."dG$r","18" ."}("
."D;" ."$C" )}{h(16 ,1,1,0
);h(8, .98,0,0 );h(16 ,1,1,1)
;h(8.0 ,0.98,0, 1); redo}###
#written 060204 by
#liverpole #######
############
You can find out quite a bit about what people are currently doing with Perl by taking a look at the posts submitted to the Enlightened Perl Iron Man Challenge.
Personally, I'm currently using it to build the site for (yet another) AJAX-enabled, Twitterfied, etc., etc. social networking startup.
Web sites, data processing/extraction, system administration, task automation, even GUI programming. Mathematics, bioinformatics, chemistry, geology programs.
At my company we used to use Perl to run hundreds of RegEx's to transform random publisher files into SGML to make electronic books. Alas, those days are over now that we've updated our systems to XML books.
I use Perl for what it has been designed: a Practical way for Extracting useful information from raw data and presenting them in human-readable Reports. This is a very nice Language for this task.

Perl aids for regression testing

Is there a Perl module that allows me to view diffs between actual and reference output of programs (or functions)? The test fails if there are differences.
Also, in case there are differences but the output is OK (because the functionality has changed) I want to be able to commit the actual output as future reference output.
Perl has excellent utilities for doing testing. The most commonly used module is probably Test::More, which provides all the infrastructure you're likely to need for writing regression tests. The prove utility provides an easy interface for running test suites and summarizing the results. The Test::Differences module (which can be used with Test::More) might be useful to you as well. It formats differences as side-by-side comparisons. As for committing the actual output as the new reference material, that will depend on how your code under test provides output and how you capture it. It should be easy if you write to files and then compare them. If that's the case you might want to use the Text::Diff module within your test suite.
As mentioned, Test::Differences is one of the standard ways of accomplishing this, but I needed to mention PerlUnit: please do not use this. It's "abandonware" and does not integrate with standard Perl testing tools. Thus, for all new test modules coming out, you would have to port their functionality if you wanted to use them. (If someone has picked up the maintenance of this abandoned module, drop me a line. I need to talk to them as I maintain core testing tools I'd like to help integrate with PerlUnit).
Disclaimer: while Id didn't write it, I currently maintain Test::Differences, so I might be biased.
I tend to use more of the Test::Simple and Test::More functionality. I looked at PerlUnit and it seems to provide much of the functionality which is already built into the standard libraries with the Test::Simple and Test::More libraries.
I question those of you who recommend the use of PerlUnit. It hasn't had a release in 3 years. If you really want xUnit-style testing, have a look at Test::Class, it does the same job, but in a more Perlish way. The fact that it's still maintained and has regular releases doesn't hurt either.
Just make sure that it makes sense for your project. Maybe good old Test::More is all you need (it usually is for me). I recommend reading the "Why you should [not] use Test::Class" sections in the docs.
The community standard workhorses are Test::Simple (for getting started with testing) and Test::More (for once you want more than Test::Simple can do for you). Both are built around the concept of expected versus actual output, and both will show you differences when they occur. The perldoc for these modules will get you on your way.
You might also want to check out the Perl QA wiki, and if you're really interested in perl testing, the perl-qa mailing list might be worth looking into -- though it's generally more about creation of testing systems for Perl than using those systems within the language.
Finally, using the module-starter tool (from Module::Starter) will give you a really nice "CPAN standard" layout for new work -- or for dropping existing code into -- including a readymade test harness setup.
For testing the output of a program, there is Test::Command. It allows to easily verify the stdout and stderr (and the exit value) of programs. E.g.:
use Test::Command tests => 3;
my $echo_test = Test::Command->new( cmd => 'echo out' );
$echo_test->exit_is_num(0, 'exit normally');
$echo_test->stdout_is_eq("out\n", 'echoes out');
$echo_test->stderr_unlike( qr/something went (wrong|bad)/, 'nothing went bad' )
The module also has a functional interface too, if it's more to your liking.