When using Perl script as mapper & reducer in Hadoop streaming, how we can manage perl module dependencies.
I want to use "Net::RabbitMQ" in my perl mapper & reducer script.
Is there any standard way in perl/hadoop streaming to handle dependencies similar to the DistributedCache (for Hadoop java MR)
There are a couple of ways to handle dependencies including specifying a custom library path or creating a packed binary of your Perl application with PAR::Packer. There are some examples of how to accomplish these tasks in the Examples section of the Hadoop::Streaming POD, and the author includes a good description of the process, as well as some considerations for the different ways to handle dependencies. Note that the suggestions provided in the Hadoop::Streaming documentation about handling Perl dependencies are not specific to that module.
Here is an excerpt from the documentation for Hadoop::Streaming (there are detailed examples therein, as previously mentioned):
All perl modules must be installed on each hadoop cluster machine. This proves to be a challenge for large installations. I have a local::lib controlled perl directory that I push out to a fixed location on all of my hadoop boxes (/apps/perl5) that is kept up-to-date and included in my system image. Previously I was producing stand-alone perl files with PAR::Packer (pp), which worked quite well except for the size of the jar with the -file option. The standalone files can be put into hdfs and then included with the jar via the -cacheFile option.
Related
I am working on perl module that I would like to submit in CPAN.
But I have a small query in regards to the directory structure of module.
As per the perlmonk article the module code directory structure should be as below:
Foo-Bar-0.01/Bar.pm
Foo-Bar-0.01/Makefile.PL
Foo-Bar-0.01/MANIFEST
Foo-Bar-0.01/Changes
Foo-Bar-0.01/test.pl
Foo-Bar-0.01/README
But when I am using the command, the structure is generated as below
h2xs -AX Foo::Bar
Writing Foo-Bar/lib/Foo/Bar.pm
Writing Foo-Bar/Makefile.PL
Writing Foo-Bar/README0
Writing Foo-Bar/t/Foo-Bar.t
Writing Foo-Bar/Changes
Writing Foo-Bar/MANIFEST
The article in question is advocating a considerably-older module structure. It certainly could be used, but it loses a lot of the advancements that have been put into place as far as good testing, building, and distribution practices.
To break down the differences:
modules have moved from the top level to the lib/ directory. This unifies the location where your module "lives" (i.e., the place where you work on the code and create the baseline modules to be tested and eventually distributed). It also makes it easier to set up any hierarchy that you need (e.g. subclasses, or helper modules); the newer setup will just pick these up. The older one may but I'm not familiar enough with it to say yes or no.
Makefile.PL in the newer setup will, when "make" is run. create a library called "blib", the *b*uild *lib*rary - this is where the code is built for actual testing. It will pretty much be a copy of lib/ unless you have XS code, in which case this is where the compiled XS code ends up. This makes the process of building and testing the code simpler; if you update a file in lib/, the Makefile will rebuild the file into blib before trying to test it.
the t/ directory replaces test.pl; "make test" will execute all the *.t files in t/, as opposed to you having to put all your tests in test.pl. This makes it far easier to write tests, as you can be sure you have a consistent state at the beginning of each test.
MANIFEST and Changes are the same in both: MANIFEST (built by "make manifest") is used to determine which files in the build library should be redistributed when the module is packaged for upload, and used to verify that a package is complete when it's downloaded and unpacked for building. Changes is simply a changelog, which you edit by hand to record the changes made in each distributed version.
As recommended in the comments on your question, using Module::Starter or Dist::Zilla (be warned that Dist::Zilla is Moose-based and will install a lot of prereqs) is a better approach to building modules in a more modern way. Of the two, the h2xs version is closer to modern packaging standards, but you're really better off using one of the recommended package starter options (and probably Module::Build, which uses a Build Perl script instead of a Makefile to build the code).
I am a very new to porting.
I was trying to port perl to a netbsd system. Since its a custom made build, we wont be able to run configure or make on the target netbsd system. So we are trying to cross-compile it in a host pc and copy the binary over target machine. And in order to do so, we have to make a makefile from scratch, since the format for the makefile in our build is different.
I have some basic doubts regarding this,
Firstly, In order to create a perl makefile for my custom build, what are the basic things will come. Such as ccflags, library paths etc.,?
There are some files like DynaLoader, uudmap.h, myConfig, Config.pm which gets generated while "make". How can i generate them using custum makefile.
How to set various library paths and what are they ?
The #INC, shows the perl search paths, how can i create it ?
Where exactly Perl modules get installed and when it happens?
A perl build normally involves building a stripped down version of perl named miniperl, which is then used extensively in the remainder of the process of building perl and the bundled modules.
There are two basic approaches to cross-compiling: to build miniperl for the target machine and build the modules, etc., there, or to build miniperl for the host and use it to build perl and modules for the target.
The WinCE port uses the latter approach; the rudimentary (last I knew, anyway) support for a -Dusecrosscompile switch to Configure uses the former.
I recommend you ask for advice and help on the perl5-porters mailing list: http://lists.perl.org/list/perl5-porters.html
And be prepared for hard work.
NetBSD's pkgsrc system has perl in it already and has the ability to generate binary packages that you can then install on a target machine.
I posted this question looking for something similar to Buildout for Perl. I think Shipwright is what I'm looking for but I'm not really sure. I've played around with it and I created a project, imported all of my source and dependencies and I've exported everything to a vessel then the documentation sort of just stopped. What do I do with a shipyard vessel? Do I do my actual development work in the vessel, or do I do my development in the Shipyard? I'm assuming that the vessel is only for deployment, but how do I actually deploy a vessel to a web server (say I'm using linux, apache and just running straight cgi).
Is Shipwright the right thing for what I'm trying to accomplish or is there something else that would be more appropriate? Ideally I could use Shipwright similar to how I use Buildout. I use Buildout to create a nice isolated environment for my development, and also I use Buildout when deploying to live servers to manage all of my application's dependencies.
EDIT: Here are the highlights of what I can do with Buildout that I would like to be able to do in Perl.
With Buildout, I have a file in my codebase that lists dependencies (which for Perl would either be CPAN modules or other source repositories). I can run a bootstrap script that will fetch all of those dependencies and drop them into a directory within my project and NOT install them at a system level. Buildout also creates utility scripts which can do anything you want (run tests, other command line tools, anything really) and those scripts explicitly add the dependencies to the path so that as my scripts are running all of my dependencies are available to be imported.
What this really does very well is that it allows me to manage my dependencies without having to ever install anything at a system level. Which makes changing from one version to another very easy. Also, it allows me to have multiple Buildout projects running on the same system using different versions of the same module. Finally, one huge benefit is that with Buildout's directory structure, I can just commit the dependencies to source control and to deploy to a new machine I just need to do a checkout and all of my dependencies are already satisfied without having to touch anything installed at a system level.
I don't think you'll find anything exactly like Buildout in Perl, but you could put together a couple of things that would do the trick.
You could use a standard Build.PL script for Module::Build for managing your dependencies and having commands to run tests, etc.
Then you could use cpanminus to do the installation of those dependencies into a local (non-system) directory.
Then you might be able to use Shipwright to do the bundling and deployment of the project with these now-local dependencies.
I am trying to set up an application dependant on few Perl modules, but the server I am installing to, does not have Internet connection. I read about offline module installs via ppd files, however I would have to resolve all the dependencies one by one.. All the more tedious considering I don't have direct internet connection.
I am hoping to find a solution, where I install ActivePerl on my PC and install all the libraries that I want and then copy paste the directories to my server. If it is just a matter of fixing some environment variables, that would be fine. Just want to know the definitive list of variables to modify. Not sure whether it is mandatory to install the perl libraries on the computer in which it is intended to run? (One is 32 bit platform and other one is 64 bit, but the server is already running various 32 bit applications so I hope it is not a major problem) For best compatibility, I plan to install ActivePerl on both the systems and merge the library directories to be identical.
The answer was on Perl FAQ, my bad didn't go through it properly.
I copied the perl binary from one machine to another, but scripts don't work.
That's probably because you forgot libraries, or library paths differ.
You really should build the whole distribution on the machine it will
eventually live on, and then type "make install". Most other approaches
are doomed to failure.
One simple way to check that things are in the right place is to print
out the hard-coded #INC that perl looks through for libraries:
% perl -le 'print for #INC'
If this command lists any paths that don't exist on your system, then
you may need to move the appropriate libraries to these locations, or
create symbolic links, aliases, or shortcuts appropriately. #INC is also
printed as part of the output of
% perl -V
You might also want to check out "How do I keep my own module/library
directory?" in perlfaq8.
From this link
Occasionally, you will not be able to
use any of the methods to install
modules. This may be the case if you
are a particularly under-privileged
user - perhaps you are renting web
space on a server, where you are not
given rights to do anything.
It is possible, for some modules, to
install the module without compiling
anything, and so you can just drop the
file in place and have it work.
Without going into a lot of the
detail, some Perl modules contain a
portion written in some other language
(such as C or C++) and some are
written in just in Perl. It is the
latter type that this method will work
for. How will you know? Well, if there
are no files called something.c and
something.h in the package, chances
are that it is a module that contains
only Perl code.
In these cases, you can just unpack
the file, and then copy just the *.pm
files to a directory from which you
will run the modules. Two examples of
this should suffice to illustrate how
this is done.
IniConf.pm is a wonderful little
module that allows you to read
configuration information out of a
.ini-style config file. IniConf.pm is
written only in Perl, and has no C
portion. When you unpack the .tar.gz
file that you got from CPAN, you will
find several files in there, and one
of them is called IniConf.pm. This is
the only file that you are actually
interested in. Copy that file to the
directory where you have the Perl
programs that will be using this
module. You can then use the module as
you would if it was installed
``correctly,'' with just the line:
use IniConf;
Time::CTime is another very handy
module that lets you print times in
any format that strikes your fancy. It
is written just in Perl, without a C
component. You will install it just
the same way as you did with IniConf,
except that the file, called CTime.pm,
must be placed in a subdirectory
called Time. The colons, as well as
indicating an organization of modules,
also indicates a directory structure
on your file system.
I'm an absolute beginner at Perl, and am trying to use some non-core modules on my shared Linux web host. I have no command line access, only FTP.
Host admins will consider installing modules on request, but the ones I want to use are updated frequently (DateTime::TimeZone for example), and I'd prefer to have control over exactly which version I'm using.
By experimentation, I've found some modules can be installed by copying files from the module's lib directory to a directory on the host, and using
use lib "local_path";
in my script, i.e. no compiling is required to install (DateTime and DateTime::TimeZone again).
How can I tell whether this is the case for a particular module? I realise I'll have to resolve dependencies myself.
Additionally: if I wanted to be able to install any module, including those which require compiling, what would I be looking for in terms of hosting?
I'm guessing at the moment I share a VM with several others and the minimum provision I'd need would be a dedicated VM with shell access?
See perldoc perlxs.
You can probably inspect the module's source for DynaLoader or something like this. This way you can find out if a module uses any C code.
If you use a unix-like OS, you can use a package manager to see what files/libraries a package (perl module) installs.
You can use
use lib "your_local_path" ,
In this case , you can have module in your local path.