What does the DumpXS in Perl's Data::Dumper do? - perl

I have gone through the source code of Data::Dumper. In this package I didn't understand what's going on with DumpXS. What is the use of this DumpXS?
I have searched about this and I read that, it is equal to the Dump function and it is faster than Dump. But I didn't understand it.

The XS language is a glue between normal Perl and C. When people want to squeeze every last bit of performance out of an operation, they try to write it as close to the C code as possible. Python and Ruby have similar mechanisms for the same reason.
Some Perl modules have an XS implementation to improve performance. However, you need a C compiler to install it. Not everyone is in a position to install compiled modules, so the modules also come in a "PurePerl" or "PP" version that does the same thing just a bit slower. If you don't have the XS implementation, a module such as Data::Dumper can automatically use the pure Perl implementation. In this case, Data::Dumper also lets you choose which one you want to use.

A lot of Perl modules have "XS" versions, like JSON::XS. The XS in the name means that it partly uses C in order to increase the speed or other efficiency of the module. I don't know this particular case, but it is probably that.

And if you want a bit more info on XS go to http://perldoc.perl.org/perlxs.html
But I am curious what lead you to this question.

Related

Exporting symbols without Exporter

I am learning how to use packages and objects in Perl.
It has been suggested that I use Exporter in order to use the functions and variables of a module that have the package directive.
I was curious to know whether there is a way to export symbols without using Exporter? In other words, can Exporter be emulated in some way?
The reason I ask is due to my assumption that Exporter carries extra overhead for functionality that small, simple scripts that must run as fast as possible don't need, and could be avoided by including that functionality with a few simple lines of code.
Maybe a simple illustration of what I mean might help.
Say I have a module which only does this
my $my_string = "my_print";
sub my_print {
print "$my_string: ", #_;
}
which would allow for a lot of small scripts to use my_print instead of print, with just a simple require with the filename of my module (and very little overhead).
Then, I wanted to use this in another module that has a package declaration, and this no longer works, so now I must use a package declaration and therefore Exporter in my simple module just to get this to work in the new module.
Having been using Perl for quite a while I am used the fact that almost everything is quite simple, straightforward, and low overhead, so I just feel that there could be such a solution for this. If not, then I would accept an answer that explains exactly why Exporter is the only way.
What Exporter does is quite simple. Here's how you can do it without using Exporter:
In Foo.pm:
package Foo;
use strict;
use warnings;
sub answer { 42 }
sub import {
no strict 'refs';
my $caller = caller;
*{$caller . "::answer"} = \&answer;
}
1;
In script.pl:
use Foo;
print answer(), "\n";
However, as others have said, you really needn't worry about the overhead of using Exporter. It's a fairly small and efficient module, and has been bundled with every version of Perl since 5.0. Whatsmore, chances are you're already loading it somewhere anyway -- many of the core Perl modules (such as Carp, Scalar::Util, List::Util, etc) use it.
Update: in an earlier version of the code above, I forgot the no strict 'refs';. This is necessary for *{$some_string} to work.
Exporter is definitely low overhead. There is also a list of alternative modules in the See Also section.
If you're curious, you can do a search on CPAN for the module and view the source yourself. You'll notice that the majority of the "code" is just documentation. However, honestly if you're this new to perl, you shouldn't be worrying about streamlining your code for being light weight as much as you should be aiming to take advantage of as many resources as possible to make coding quicker and easier.
Just my $.02
It sounds very much like you're trying to find an excuse to avoid Exporter? Have you really had programs that start up too slowly? Exporter is loaded and executed only in the compilation phase.
If you are genuinely worried about compilation speed then you should write
BEGIN { require MyPackage }
and then later
MyPackage::myprint($myparam)
which involves no overhead from Exporter at all, or even from the equivalent in-line code.
Yes, the code that actually exports symbols from one package to another is just a single line in the code of Exporter.pm and you could duplicate it. But wouldn't you much rather just add
use MyPackage;
at the start of you program and know that, from any context, the symbols from MyPackage would be correctly exported?
If you have "small simple scripts that must be run as fast as possible" then you should investigate leaving them running as daemons rather than recompiling the Perl each time the program is run.

How to guess minimum perl version a particular script is written for?

I have a bunch of scripts that I wrote at times when I did not realize how use v1.2.3; can be useful. So some of them may be using features from later versions of perl, some of them may be OK with, say, perl 5.8.
Now I would like to get that into some order and add proper uses where there is need for them, just to be able to sleep better. :-)
How should I do that? Is there any tool that could help me make an educated guess?
Perl::MinimumVersion
Find a minimum required version of perl for Perl code
The most reliable way is 1) to write a decent test suite, then 2) to run your tests using each version of Perl.
You've surely already done the first part (!), and the second part is actually pretty easy to do using perlbrew.

Is there any current review of statistical modules for Perl?

I would like to know which is the current status of the statistical modules in CPAN, does any one know any recent review or could comment about its likes/dislikes with those modules?
I have used the clasical: Statistics::Descriptive, Statistics::Distributions, and some others contained in Bundle::Math::Statistics
Some of the modules has not been updated for long time. I don't know if this is because they are rock solid or has been overtaken by better modules.
Does someone know any current review similar to this old one:
Using Perl for Statistics: Data Processing and Statistical Computing
NB (for the people that will suggest to use R ;-)):
All my code is mainly in perl, but I use R a lot for statistics and plotting. I usually prepare the dataframes with perl and write the R script in the perl modules as templates and save to a file and execute them from perl. But sometimes you have small data sets where efficiency is not an issue (well I am using perl insn't it ;-)) and you want to add some statistics and histograms to your report produced with perl.
PDL, the Perl Data Language is alive and thriving so its worth taking a look at that.
And I think the other stats modules you mention are OK. For eg. Statistics::Descriptive is up-to-date and has been used in answers to a few questions here on Stackoverflow.
NB. There is also a Perl to R bridge called Statistics::R which looks interesting.
/I3az/

Is it okay to use modules from within subroutines?

Recently I start playing with OO Perl and I've been creating quite a bunch of new objects for a new project that I'm working on. As I'm unfamilliar with any best practice regarding OO Perl and we're kind in a tight rush to get it done :P
I'm putting a lot of this kind of code into each of my function:
sub funcx{
use ObjectX; # i don't declare this on top of the pm file
# but inside the function itself
my $obj = new ObjectX;
}
I was wondering if this will cause any negative impact versus putting on the use Object line on top of the Perl modules outside of any function scope.
I was doing this so that I feel it's cleaner in case I need to shift the function around.
And the other thing that I have noticed is that when I try to run a test.pl script on the unix server itself which test my objects, it slow as heck. But when the same code are run through CGI which is connected to an apache server, the web page doesn't load as slowly.
Where to put use?
use occurs at compile time, so it doesn't matter where you put it. At least from a purely pragmatic, 'will it work', point of view. Because it happens at compile time use will always be executed, even if you put it in a conditional. Never do this: if( $foo eq 'foo' ) { use SomeModule }
In my experience, it is best to put all your use statements at the top of the file. It makes it easy to see what is being loaded and what your dependencies are.
Update:
As brian d foy points out, things compiled before the use statement will not be affected by it. So, the location can matter. For a typical module, location does not matter, however, if it does things that affect compilation (for example it imports functions that have prototypes), the location could matter.
Also, Chas Owens points out that it can affect compilation. Modules that are designed to alter compilation are called pragmas. Pragmas are, by convention, given names in all lower-case. These effects apply only within the scope where the module is used. Chas uses the integer pragma as an example in his answer. You can also disable a pragma or module over a limited scope with the keyword no.
use strict;
use warnings;
my $foo;
print $foo; # Generates a warning
{ no warnings 'unitialized`; # turn off warnings for working with uninitialized values.
print $foo; # No warning here
}
print $foo; # Generates a warning
Indirect object syntax
In your example code you have my $obj = new ObjectX;. This is called indirect object syntax, and it is best avoided as it can lead to obscure bugs. It is better to use this form:
my $obj = ObjectX->new;
Why is your test script slow on the server?
There is no way to tell with the info you have provided.
But the easy way to find out is to profile your code and see where the time is being consumed. NYTProf is another popular profiling tool you may want to check out.
Best practices
Check out Perl Best Practices, and the quick reference card. This page has a nice run down of Damian Conway's OOP advice from PBP.
Also, you may wish to consider using Moose. If the long script startup time is acceptable in your usage, then Moose is a huge win.
question 1
It depends on what the module does. If it has lexical effects, then it will only affect the scope it is used in:
my $x;
{
use integer;
$x = 5/2; #$x is now 2
}
my $y = 5/2; #$y is now 2.5
If it is a normal module then it makes no difference where you use it, but it is common to use all of those modules at the top of the program.
question 2
Things that can affect the speed of a program between machines
speed of the processor
version of modules installed (some modules have XS versions that are much faster)
version of Perl
number of entries in PERL5LIB
speed of the drive
daotoad and Chas. Owens already answered the part of your question pertaining to the position of use statements. Let me remark on something else here:
I was doing this so that I feel it's
cleaner in case I need to shift the
function around.
Personally, I find it much cleaner to have all the used modules in one place at the top of the file. You won't have to search for use statements to see what other modules are being used and a quick glance will tell you what is being used and even what is not being used.
Regarding your performance problem: with Apache and mod_perl the Perl interpreter will have to parse and compile your used modules only once. The next time the script is run, execution should be much faster. On the command line, however, a second run doesn't get this benefit.

What's the modern way of declaring which version of Perl to use?

When it come to saying what version of Perl we need for our scripts, we've got options, oh, brother, we've got options:
use 5.010;
use 5.010_001;
use 5.10.0;
use v5.10;
use v5.10.0;
All seem to work. perlcritic complains about all but the first two. (It's unfortunate that the v strings seem to have such flaws, since Perl 6 expects you to do use v6; for your Perl 6 scripts...)
So, what should we be doing to indicate that we want to use a particular version of perl?
There are really only two options: decimal numbers and v-strings. Which form to use depends in part on which versions of Perl you want to "support" with a meaningful error message instead of a syntax error. (The v-string syntax was added in Perl 5.6.) The accepted best practice -- which is what perlcritic enforces -- is to use decimal notation. You should specify the minimum version of Perl that's required for your script to behave properly. Normally that means declaring a dependency on language features added in a major release, such as using the say function added in 5.10. You should include the patch level if it's important for your script to behave properly. For example, some of my code specifies use 5.008001 because it depends on the fix for a bug that 5.8.0 had which was fixed in 5.8.1.
I just use something like 5.010_001. I've grow weary of dealing with version string problems for something that should be mind-numbingly simple.
Since I mostly deal with build systems, I have the constant struggle of Module::Build's internal version.pm which is out of sync with the version.pm on CPAN. I think that's mostly better now, but I have better things to think about.
The best practice should always be to do the thing that commands the least of your attention, and certainly not take more attention than the value it gives back. In my opinion, v-strings and dotted decimals were a huge distraction with no additional benefit, wasting a lot of valuable programmer time just to get back to the starting point.
I should also note that Perl::Critic has often pushed questionable practices for the higher purpose of reducing the ways that people do things. However, those practices often cause problems, make them un-best. This is one of those cases. A more realistic best practice is to not make Perl::Critic compliance your goal. Use it where it is useful, but in cases like this, don't waste mental time on it.
The "modern" way is to use the forms starting with v. However, that may not necessarily be what you really want to do.
Critic complains because older versions of Perl won't understand and play nicely with the forms that start with v. However, if your version of Perl supports it, v is nicer to read because you can say:
use v5.10.1;
... rather than ...
use 5.010_001;
So, in the documentation for use, the following workaround is offered:
use 5.006; use v5.6.1;
NB: I think the documenation is in error here, as the v is omitted from the example at perldoc use.
Since the versions of Perl that don't support the v syntax will fail at the first use, they won't get to the second more specific and readable one.