Perl: Smartest and fastest way for (dividing) large script? - perl

I have a Perl Script, that has (as of now) 10 subs and growing..
Each sub is making a different LWP-Call and is working with a variable I set in the first sub.
As each sub takes some time, I'm looking for a smart way to make the script run faster.
What would you recommend?
Should I:
Put each sub into a seperate script?
Call the subs (except the first) at once?
Use a different solution (that I don't know of)?
Thanks in advance!
EDIT:
The script is fetching some xml- and soap-data from the web, then extracting the necesary parts.

You probably want check the LWP::Parallel module.
ParallelUserAgent is an extension to the existing libwww module. It
allows you to take a list of URLs (it currently supports HTTP, FTP,
and FILE URLs. HTTPS might work, too) and connect to all of them in
parallel, then wait for the results to come in.

You dont have to split up the file if you dont want to. Just take a list of things to run on STDIN, then execute your script multiple times putting each process in the background or use cron if its an on going thing.
I would recommend using the Operating System's processes until you have a good reason not to do so anymore.

I've had success using Coro and there is already a module to do LWP requests.
http://metacpan.org/pod/LWP::Protocol::Coro::http
Coro is cooperative routines so not multithreaded. There is also an AnyEvent variation

Related

Executing system commands safely while coding in Perl

Should one really use external commands while coding in Perl? I see several disadvantages of it. It's not system independent plus security risks might also be there. What do you think? If there is no way and you have to use the shell commands from Perl then what is the safest way to execute that particular command (like checking pid, uid etc)?
It depends on how hard it is going to be to replicate the functionality in Perl. If I needed to run the m4 macro processor on something, I'd not think of trying to replicate that functionality in Perl myself, and since there's no module on http://search.cpan.org/ that looks suitable, it would appear others agree with me. In that case, then, using the external program is sensible. On the other hand, if I needed to read the contents of a directory, then the combination of readdir() et al plus stat() or lstat() inside Perl is more sensible than futzing with the output of ls.
If you need to execute commands, think very carefully about how you invoke them. In particular, you probably want to avoid the shell interpreting the arguments, so use the array form of system (see also exec), etc, rather than a single string for the command plus arguments (which means the shell is used to process the command line).
Executing external commands can be expensive simply because it involves forking new process and watching for its output if you need it.
Probably more importantly, should external process fail for any reason, it may be difficult to understand what happened by means of your script. Worse still, surprisingly often external process can be stuck forever, so will be your script. You can use special tricks like opening pipe and watching for output in loop, but this itself is error-prone.
Perl is very capable of doing many things. So, if you stick to using only Perl native constructs and modules to accomplish your tasks, not only it will be faster because you never fork, but it will be more reliable and easier to catch errors by looking at native Perl objects and structures returned by library routines. And of course, it will be automatically portable to different platforms.
If your script runs under elevated permissions (like root or under sudo), you should be very careful as to what external programs you execute. One of the simple ways to ensure basic security is to always specify commands by full name, like /usr/bin/grep (but still think twice and just do grep by Perl itself!). However, even this may not be enough if attacker is using LD_PRELOAD mechanism to inject rogue shared libraries.
If you are willing to go very secure, it is suggested to use tainted check by using -T flag like this:
#!/usr/bin/perl -T
Taint flag will be also enabled by Perl automatically if your script was determined to have different real and effective user or group ids.
Tainted mode will severely limit your ability to do many things (like system() call) without Perl complaining - see more at http://perldoc.perl.org/perlsec.html#Taint-mode, but it will give you much higher security confidence.
Should one really use external commands while coding in Perl?
There's no single answer to this question. It all depends on what you are doing within the wide range of potential uses of Perl.
Are you using Perl as a glorified shell script on your local machine, or just trying to find a quick-and-dirty solution to your problem? In that case, it makes a lot of sense to run system commands if that is the easiest way to accomplish your task. Security and speed are not that important; what matters is the ability to code quickly.
On the other hand, are you writing a production program? In that case, you want secure, portable, efficient code. It is often preferable to write the functionality in Perl (or use a module), rather than calling an external program. At least, you should think hard about the benefits and drawbacks.

How to split long Perl code into several files without too much manual editing?

How do I split a long Perl script into two or more different files that can all access the same variables - without having to rename all shared variables from e.g. $count to $::count (or $main::count which is the same)?
In other words, what's the best and simplest way to split the Perl script into several files without having to import a lot of variables/functions and/or do a lot of manual editing.
I assume it has something to do with making the code part of the same package/scope/namespace, but my experiments so far have failed.
I am not sure it makes a difference, but the script is used for web/CGI purposes and will be running under mod_perl.
EDIT - Background:
I kind of knew I would get that response. The reason I want to split up the file is the following:
Currently I have a single very old and very long Perl file. I know it is not following Perl best practices but it works.
The problem is, I need to distribute the data files it uses between different web servers, first of all for performance reasons. There will be one "master" server and one or several "slaves".
About 20% of the mentioned Perl file contains shared functions, 40% has the code need to run on the master server and 40% on the slave servers. Therefore, I would like to split the code into three files: 1. shared, 2. master-only, 3. slave-only. On the master server, 1 and 2 will be loaded, on the slaves, 1 and 3 will be loaded.
I assume this approach would use less process RAM and, more importantly, I would minimize the risk of not splitting the code correctly (e.g. a slave process calling a master data file). I don't see a great need for modularization, as the system works and the code does not need a lot of changes or exchanges with other projects.
EDIT 2 - Solution:
Found the solution I was looking for here:
http://www.perlmonks.org/?node_id=95813
In cases where the main package is in ownership of the variable, the
actual word 'main' can be ommitted to yield something like: $::var
It is possible to get around having to fully qualify variable names
when strict is in use. Applying a simply use vars to your script, with
the variable names as it arguments will get around explicit package
names.
Actually, I ended up repeating the our ($count, etc...) statement for the needed variables instead of use vars ();
Do let me know if I am missing something vital - apart from not going with modules! :)
#Axeman, Thanks, I will accept your answer, both for your effort and for sending me in the right direction.
Unless you put different package statements in their files, they will all be treated as if they had package main; at the top. So assuming that the scripts use package variables, you shouldn't have to do anything. If you have declared them with my (that is, if they are lexically scoped variables) then you would have to make sure that all references to the variables are in the same file.
But splitting scripts up for length is a rotten substitute for modularization. Yes, modularization helps keep code length down, but modularization if the proper way to keep code length down--for all the reasons that you would want to keep code-length down, modularization does it best.
If chopping the files by length could really work for you, then you could create a script like this:
do '/path/to/bin/part1.pl';
do '/path/to/bin/part2.pl';
do '/path/to/bin/part3.pl';
...
But I kind of suspect that if the organization of this code is as bad as you're--sort of--indicating, it might suffer from some of the same re-inventing the wheel that I've seen in Perl-ignorant scripts. Just offhand (I might be wrong) but I'm thinking you would be surprised how much could be chopped from the length by simply substituting better-tested Perl library idioms than for-looping and while-ing everything.

How do I daemonize a perl script from within a perl script?

I have a perl script which calls another perl script using backticks. I want to instead call this script and have it daemonize. How do I go about doing this?
edit:
I dont care to communicate back with the process/daemon. I'll most likely just stick it in an sqlite3 table or something.
You refer to backticks, thus I suppose that you want to communicate with the daemon after it's started? Since daemons does not use STDOUT, you will have to think of some other way of passing information to and from it.
The Perl interprocess communication man page (perlipc) has several good examples of this, especially the section "Complete dissociation of child from parent".
The Proc::Daemon contains convenient functions for daemonizing a process.

How can I control an interactive Unix application programmatically through Perl?

I have inherited a 20-year-old interactive command-line unix application that is no longer supported by its vendor. We need to automate some tasks in this application.
The most troublesome of these is creating thousands of new records with slightly different parameters (e.g. different identifiers, different names). The records have to be created in sequence, one at a time, which would take many months (and therefore dollars) to do manually. In most cases, creating a record has a very predictable pattern of keying in commands, reading responses, keying in further commands, etc. However, some record creation operations will result in error conditions ('record with this identifier already exists') that require a different set of commands to be exit gracefully.
I can see a few different ways to do this:
Named pipes. Write a Perl script that runs the target application with STDIN and STDOUT set to named pipes then sends the target application the sequence of commands to create a record with the required parameters, and then instructs the target application to exit and shut down. We then run the script as many times as required with different parameters.
Application. Find another Unix tool that can be used to script interactive programs. The only ones I have been able to find though are expect, but this does not seem top be maintained; and chat, which I recall from ages ago, and which seems to do more-or-less what I want, but appears to be only for controlling modems.
One more potential complication: I think the target application was written for a VT100 terminal and it uses some sort of escape sequences to do things like provide highlighting.
My question is what approach should I take? One of these, or something completely different? I quite like the idea of using named pipes and then having a Perl script that opens the FIFOs and reads and writes as required, as it provides a lot of flexibility, but from what I have read it seems like there's a lot of potential problems if I go down this path.
Thanks in advance.
I'd definitely stick to Perl for the extra flexibility, as chaos suggested. Are you aware of the Expect perl module? It's a lot nicer than the named pipe approach.
Note also with named pipes, you can't force the output coming back from your legacy application to be unbuffered, which could be annoying. I think Expect.pm uses pseudo-ttys to get around this problem, but I'm not sure. See the discussion in perlipc in the section "Bidirectional Communication with Another Process" for more details.
expect is a lot more solid than you're probably giving it credit for, but if I were you I'd still go with the Perl option, wanting to have a full and familiar programming language for managing the process and having confidence that whatever weird issues arise, there will be ways of addressing them.
Expect, either with the Tcl or Perl implementations, would be my first attempt. If you are seeing odd sequences in the output because it's doing odd terminal things, just filter those from the output before you do your matching.
With named pipes, you're going to end up reinventing Expect anyway.

How can I force unload a Perl module?

Hi I am using a perl script written by another person who is no longer in the company.
If I run the script as a stand alone, then the output are as expected. But when I call the script from another code repeatedly, the output is wrong except for the first time.
I suspect some variables are not initialised properly. When it is called standalone, each time it exits and all the variable values are initialised to defaults. But when called from another perl script, the modules and the variable values are probably carried over to the next call of the script.
Is there any way to flush out the called script from memory before I call it next time?
I tried enabling warning and it was throwing up 1000s of lines of warning...!
EDIT: How I am calling the other script:
The code looks like this:
do "processing.pl";
...
...
...
process(params); #A function in processing.pl
...
...
...
If you want to force the module to be reloaded, delete its entry from %INC and then reload it.
For example:
sub reload_module {
delete $INC{'Your/Silly/Module.pm'};
require Your::Silly::Module;
Your::Silly::Module->import;
}
Note that if this module relies on globals in other modules being set, those may need to be reloaded as well. There's no easy way to know without taking a peak at the code.
Hi I am using a perl script written by another person who is no longer in the company.
I tried enabling warning and it was throwing up 1000s of lines of warning...!
There's your problem right there. The script was not written properly, and should be rewritten.
Ask yourself this question: if it has 1000s of warnings when you enable strict checking, how can you be sure that it is doing the right thing? How can you be sure that it is not clobbering files, trashing data sets, making a mess of your filesystem? Chances are it is doing all of these things, either deliberately or accidentally.
I wouldn't trust running an error-filled script written by someone no longer with the company. I'd rewrite it and be sure that it was doing what I needed it to do.
Unloading a module is a more difficult task than simply removing the %INC entry of the module. Take a look at Class::Unload from CPAN.
If you don't want to rewrite/fix the script, I suggest calling the script via exec() or one of its varieties. While it is not very elegant to do, it will definitely fix your problem.
Are you sure that you need to reload the module? By using do, you are reading the source every time and executing it. What happens if you change that to require, which will only read and evaluate the source once?
Another possibility (just thinking aloud here) could be to do with the local directory? Are they running from the same place. Probably wouldn't work the first time though.
Another option is to use system ('doprocessing.pl');. Lazily, we do this with a few scripts to force re-initialisation of a number of classes/variables etc. And to force the log files to rotate properly.
edit: I have just re-read your question, and it would appear that you are not calling it like this.