Proper way to use Ack inside Perl scripts? - perl

I've begun to use ack because it seems to be a nice way to implement unix rgrep in a Perl script, easily. Thought, I use it through a Perl back-quoted system command:
# Searching for 'use' something
my $pattern = "^use ";
my $path = "./path/to/libs/";
#Only pm files
my $parameters = "--perl";
my #uses = `ack $parameters $pattern $path`;
I know ack is pure Perl, but it seems to be designed for a command line use, not like an API or a Perl module. Is there a way to use it from a Perl script, I mean a module that makes possible to use ack like a Perl function :
# My phantasm
use ack;
my #uses = ack($parameters $pattern $path);
... or another method, timtowtdi.

You can't.
Maybe I should put "No user-serviceable parts inside" in the modules. It's really not meant for being used programatically. It really only uses modules for ease of maintainability for those working on it, and for ease of installing through the CPAN shell.
The file finding part is pretty simple to do. It's just calls to an iterator from File::Next.
Why do you want to call ack things from inside another Perl program? What's your use case?
Maybe this would be better taken up on the ack-users mailing list over on Google Groups.

The module behind ack is App::Ack, but as you can see from the documentation, it's not really any kind of a programmatic interface at all, and not meant for public consumption. You could talk to Andy about making it better, but short of that you're probably best off using the commandline or re-inventing your own.

Related

Running nested, dependent perl scripts

I have two perl scripts that I need to run together.
The first script defines a number of common functions and a main method.
Script 1(one.pl) Example:
#!/usr/bin/perl
sub getInformation
{
my $serverMode = $_[0];
my $port = $_[1];
return "You\nneed\nto\nparse\nme\nright\nnow\n!\n";
}
#main
&parseInformation(&getInformation($ARGV[0], $ARGV[1]));
The second is a script that calls the second script after defining two functions.
Script 2(two.pl) Example:
#!/usr/bin/perl
sub parseInformation
{
my $parsedInformation = $_[0];
#omitted
print "$parsedInformation";
}
my $parameterOne = $ARGV[0];
my $parameterTwo = $ARGV[1];
do "./one.pl $parameterOne $parameterTwo";
Command line usage:
> ./two.pl bayside 20
I have attempted to do this and the script seems to run however, whenever I run the script in perl -d two.pl mode I get no information from the debugger about the other script.
I have done some research and read about system, capture, require and do. If use the system function to run the script, how will I be able to export the functions defined in script two?
Questions:
1. Is there anyway to do this in perl?
2. If so how exactly do I need to achieve that?
I fully understand that perl is perl. Not another programming language. Unfortunately, when transitioning one tends to bring with what they knew with them. My apologies.
References:
How to run a per script from within a perl script
Perl documentation for require function
Generally speaking, that is not the way you should write reusable common functions in Perl. Instead, you should put the bulk of your code into Perl modules, and write just short scripts that act as wrappers for the modules. These short scripts should basically just grab and validate command-line arguments, pass those arguments to the modules for the real work, then format and output the results.
I really wish I could recommend perldoc perlmod to learn about writing modules, but it seems to mostly concentrate on the minutiae rather than a high-level overview of how to write and use a Perl module. Gabor Szabo's tutorial is perhaps a better place to start.
Here's a simple example, creating a script that outputs the Unix timestamp. This is the module:
# This file is called "lib/MyLib/DateTime.pm"
use strict;
use warnings;
package MyLib::DateTime;
use parent "Exporter";
our #EXPORT_OK = qw( get_timestamp );
sub get_timestamp {
my $ts = time;
return $ts;
}
1;
And this is the script that uses it:
#!/usr/bin/env perl
use strict;
use warnings;
use lib "/path/to/lib"; # path to the module we want, but
# excluding the "MyLib/DateTime.pm" part
use MyLib::DateTime qw( get_timestamp ); # import the function we want
# Here we might deal with input; e.g. #ARGV
# but as get_timestamp doesn't need any input, we don't
# have anything to do.
# Here we'll call the function we defined in the module.
my $result = get_timestamp();
# And here we'll do the output
print $result, "\n";
Now, running the script should output the current Unix timestamp. Another script that was doing something more complex with timestamps could also use MyLib::DateTime.
More importantly, another module which needed to do something with timestamps could use MyLib::DateTime. Putting logic into modules, and having those modules use each other is really the essence of CPAN. I've been demonstrating a really basic date and time library, but the king of datetime manipulation is the DateTime module on CPAN. This in turn uses DateTime::TimeZone.
The ease of re-using code, and the availability of a large repository of free, well-tested, and (mostly) well-documented modules on CPAN, is one of the key selling points of Perl.
Exactly.
Running 2 separate scripts at the same time won't give either script access to the others functions at all. They are 2 completely separate processes. You need to use modules. The point of modules is so that you don't repeat yourself, often called "dry" programming. A simple rule of thumb is:
If you are going to use a block of code more than once put it into a subroutine in the current script.
If you are going to use the same block in several programs put it in a module.
Also remember that common problems usually have a module on CPAN
That should be enough to get you going. Then if you're going to do much Perl Programming you should buy the book "Programming Perl" by Larry Wall, if you've programmed in other languages, or "Learning Perl" by Randal Schwartz if you're new to programming. I'm old fashioned so I have both books in print but you can still get them as ebooks. Also check out Perl.org as you're not alone.

What is the preferred cross-platform IPC Perl module?

I want to create a simple IO object that represents a pipe opened to another program to that I can periodically write to another program's STDIN as my app runs. I want it to be bullet-proof (in that it catches all errors) and cross-platform. The best options I can find are:
open
sub io_read {
local $SIG{__WARN__} = sub { }; # Silence warning.
open my $pipe, '|-', #_ or die "Cannot exec $_[0]: $!\n";
return $pipe;
}
Advantages:
Cross-platform
Simple
Disadvantages
No $SIG{PIPE} to catch errors from the piped program
Are other errors caught?
IO::Pipe
sub io_read {
IO::Pipe->reader(#_);
}
Advantages:
Simple
Returns an IO::Handle object for OO interface
Supported by the Perl core.
Disadvantages
Still No $SIG{PIPE} to catch errors from the piped program
Not supported on Win32 (or, at least, its tests are skipped)
IPC::Run
There is no interface for writing to a file handle in IPC::Run, only appending to a scalar. This seems…weird.
IPC::Run3
No file handle interface here, either. I could use a code reference, which would be called repeatedly to spool to the child, but looking at the source code, it appears that it actually writes to a temporary file, and then opens it and spools its contents to the pipe'd command's STDIN. Wha?
IPC::Cmd
Still no file handle interface.
What am I missing here? It seems as if this should be a solved problem, and I'm kind of stunned that it's not. IO::Pipe comes closest to what I want, but the lack of $SIG{PIPE} error handling and the lack of support for Windows is distressing. Where is the piping module that will JDWIM?
Thanks to guidance from #ikegami, I have found that the best choice for interactively reading from and writing to another process in Perl is IPC::Run. However, it requires that the program you are reading from and writing to have a known output when it is done writing to its STDOUT, such as a prompt. Here's an example that executes bash, has it run ls -l, and then prints that output:
use v5.14;
use IPC::Run qw(start timeout new_appender new_chunker);
my #command = qw(bash);
# Connect to the other program.
my ($in, #out);
my $ipc = start \#command,
'<' => new_appender("echo __END__\n"), \$in,
'>' => new_chunker, sub { push #out, #_ },
timeout(10) or die "Error: $?\n";
# Send it a command and wait until it has received it.
$in .= "ls -l\n";
$ipc->pump while length $in;
# Wait until our end-of-output string appears.
$ipc->pump until #out && #out[-1] =~ /__END__\n/m;
pop #out;
say #out;
Because it is running as an IPC (I assume), bash does not emit a prompt when it is done writing to its STDOUT. So I use the new_appender() function to have it emit something I can match to find the end of the output (by calling echo __END__). I've also used an anonymous subroutine after a call to new_chunker to collect the output into an array, rather than a scalar (just pass a reference to a scalar to '>' if you want that).
So this works, but it sucks for a whole host of reasons, in my opinion:
There is no generally useful way to know that an IPC-controlled program is done printing to its STDOUT. Instead, you have to use a regular expression on its output to search for a string that usually means it's done.
If it doesn't emit one, you have to trick it into emitting one (as I have done here—god forbid if I should have a file named __END__, though). If I was controlling a database client, I might have to send something like SELECT 'IM OUTTA HERE';. Different applications would require different new_appender hacks.
The writing to the magic $in and $out scalars feels weird and action-at-a-distance-y. I dislike it.
One cannot do line-oriented processing on the scalars as one could if they were file handles. They are therefore less efficient.
The ability to use new_chunker to get line-oriented output is nice, if still a bit weird. That regains a bit of the efficiency on reading output from a program, though, assuming it is buffered efficiently by IPC::Run.
I now realize that, although the interface for IPC::Run could potentially be a bit nicer, overall the weaknesses of the IPC model in particular makes it tricky to deal with at all. There is no generally-useful IPC interface, because one has to know too much about the specifics of the particular program being run to get it to work. This is okay, maybe, if you know exactly how it will react to inputs, and can reliably recognize when it is done emitting output, and don't need to worry much about cross-platform compatibility. But that was far from sufficient for my need for a generally useful way to interact with various database command-line clients in a CPAN module that could be distributed to a whole host of operating systems.
In the end, thanks to packaging suggestions in comments on a blog post, I decided to abandon the use of IPC for controlling those clients, and to use the DBI, instead. It provides an excellent API, robust, stable, and mature, and suffers none of the drawbacks of IPC.
My recommendation for those who come after me is this:
If you just need to execute another program and wait for it to finish, or collect its output when it is done running, use IPC::System::Simple. Otherwise, if what you need to do is to interactively interface with something else, use an API whenever possible. And if it's not possible, then use something like IPC::Run and try to make the best of it—and be prepared to give up quite a bit of your time to get it "just right."
I've done something similar to this. Although it depends on the parent program and what you are trying to pipe. My solution was to spawn a child process (leaving $SIG{PIPE} to function) and writing that to the log, or handling the error in what way you see fit. I use POSIX to handle my child process and am able to utilize all the functionality of the parent. However if you're trying to have the child communicate back to the parent - then things get difficult. Do you have an example of the main program and what you're trying to PIPE?

Having a perl script make use of one among several secondary scripts

I have a main program mytool.pl to be run from the command line. There are several auxillary scripts special1.pl, special2.pl, etc. which each contain a couple subroutines and a hash, all identically named across scripts. Let's suppose these are named MySpecialFunction(), AnotherSpecialFunction() and %SpecialData.
I'd like for mytool to include/use/import the contents of one of the special*.pl files, only one, according to a command line option. For example, the user will do:
bash> perl mytool.pl --specialcase=5
and mytools will use MySpecialFunction() from special5.pl, and ignore all other special*.pl files.
Is this possible and how to do it?
It's important to note that the selection of which special file to use is made at runtime, so adding a "use" at the top of mytool.pl probably isn't the right thing to do.
Note I am a long-time C programmer, not a perl expert; I may be asking something obvious.
This is for a one-off project that will turn to dust in only a month. Neither mytool.pl nor special?.pl (nor perl itself) will be of interest beyond the end of this short project. Therefore, we don't care for solutions that are elaborate or require learning some deep magic. Quick and dirty preferred. I'm guessing that Perl's module mechanism is overkill for this, but have no idea what the alternatives are.
You can use a hash or array to map values of specialcase to .pl files and require or do them as needed.
#!/usr/bin/env perl
use strict; use warnings;
my #handlers = qw(one.pl two.pl);
my ($case) = #ARGV;
$case = 0 unless defined $case;
# check that $case is within range
do $handlers[$case];
print special_function(), "\n";
When you use a module, Perl just require's the module in a BEGIN block (and imports the modules exported items). Since you want to change what script you load at runtime, call require yourself.
if ($special_case_1) {
require 'special1.pl';
# and go about your business
}
Here's a good reference on when to use use vs. require.

What Perl module can I use to test CGI output for common errors?

Is there a Perl module which can test the CGI output of another program? E.g. I have a program
x.cgi
(this program is not in Perl) and I want to run it from program
test_x_cgi.pl
So, e.g. test_x_cgi.pl is something like
#!perl
use IPC::Run3
run3 (("x.cgi"), ...)
So in test_x_cgi.pl I want to automatically check that the output of x.cgi doesn't do stupid things like, e.g. print messages before the HTTP header is fully outputted. In other words, I want to have a kind of "browser" in Perl which processes the output. Before I try to create such a thing myself, is there any module on CPAN which does this?
Please note that x.cgi here is not a Perl script; I am trying to write a test framework for it in Perl. So, specifically, I want to test a string of output for ill-formedness.
Edit: Thanks
I have already written a module which does what I want, so feel free to answer this question for the benefit of other people, but any further answers are academic as far as I'm concerned.
There's CGI::Test, which looks like what you're looking for. It specifically mentions the ability to test non-Perl CGI programs. It hasn't been updated for a while, but neither has the CGI spec.
There is Test::HTTP. I have not used it, but seems to have an interface that fits your requirements.
$test->header_is($header_name, $value [, $description]);
Compares the response header
$header_name with the value $value
using Test::Builder-is>.
$test->header_like($header_name, $regex, [, $description]);
Compares the response header
$header_name with the regex $regex
using Test::Builder-like>.
Look at the examples from chapter 16 from the perl cookbook
16.9. Controlling the Input, Output, and Error of Another Program
It uses IPC::Open3.
Fom perl cookbook, might be modified by me, see below.
Example 16.2
cmd3sel - control all three of kids in, out, and error.
use IPC::Open3;
use IO::Select;
$cmd = "grep vt33 /none/such - /etc/termcap";
my $pid = open3(*CMD_IN, *CMD_OUT, *CMD_ERR, $cmd);
$SIG{CHLD} = sub {
print "REAPER: status $? on $pid\n" if waitpid($pid, 0) > 0
};
#print CMD_IN "test test 1 2 3 \n";
close(CMD_IN);
my $selector = IO::Select->new();
$selector->add(*CMD_ERR, *CMD_OUT);
while (my #ready = $selector->can_read) {
foreach my $fh (#ready) {
if (fileno($fh) == fileno(CMD_ERR)) {print "STDERR: ", scalar <CMD_ERR>}
else {print "STDOUT: ", scalar <CMD_OUT>}
$selector->remove($fh) if eof($fh);
}
}
close(CMD_OUT);
close(CMD_ERR);
If you want to check that the output of x.cgi is properly formatted HTML/XHTML/XML/etc, why not run it through the W3 Validator?
You can download the source and find some way to call it from your Perl test script. Or, you might able to leverage this Perl interface to calling the W3 Validator on the web.
If you want to write a testing framework, I'd suggest taking a look at Test::More from CPAN as a good starting point. It's powerful but fairly easy to use and is definitely going to be better than cobbling something together as a one-off.

How can I get one Perl script to see variables in another Perl script?

I have a Perl script that is getting big, so I want to break it out into multiple scripts. namely, I want to take out some large hash declarations and put them into another file. How do I get the original script to be able to see and use the variables that are now being declared in the other script?
This is driving me nuts because I haven't used Perl in a while and for the life of me can't figure this out
Use a module:
package Literature;
our %Sidekick = (
Batman => "Robin",
Bert => "Ernie",
Don => "Sancho",
);
1;
For example:
#! /usr/bin/perl
use Literature;
foreach my $name (keys %Literature::Sidekick) {
print "$name => $Literature::Sidekick{$name}\n";
}
Output:
$ ./prog
Bert => Ernie
Batman => Robin
Don => Sancho
You use modules. Or modulinos.
Make it a module that exports (optionally!) some variables or functions. Look up how to use the Exporter module.
Yet another suggestion to use a module.
Modules are not hard to write or use. They just seem hard until you write one. After the first time, it will be easy. Many, many good things come from using modules--encapsulation, ease of testing, and easy code reuse to name a few.
See my answer to a similar question for an example module, with exported functions.
Also, some very smart people in the Perl community like modules so much that they advocate writing apps as modules--they call them modulinos. The technique works well.
So, in conclusion, try writing a module today!
As an addition to the other "use a module" suggestions, if you plan on reusing the module a lot, you will want to get this installed into either your site library (usually under ...(perl install folder)...\site\lib.
If not (perhaps, it has limited reusability outside the script), you can keep it in the directory with your script and import it like so:
use lib './lib'; # where ./lib is replaced with wherever the module actually is.
use MyModule;