How can I quickly find the user's terminal PID in Perl? - perl

The following snippet of code is used to find the PID of a user's terminal, by using ptree and grabbing the third PID from the results it returns. All terminal PID's are stored in a hash with the user's login as the key.
## If process is a TEMINAL.
## The command ptree is used to get the terminal's process ID.
## The user can then use this ID to peek the user's terminal.
if ($PID =~ /(\w+)\s+(\d+) .+basic/) {
$user = $1;
if (open(PTREE, "ptree $2 |")) {
while ($PTREE = <PTREE>) {
if ($PTREE =~ /(\d+)\s+-pksh-ksh/) {
$terminals{$user} = $terminals{$user} . " $1";
last;
}
next;
}
close(PTREE);
}
next;
}
Below is a sample ptree execution:
ares./home_atenas/lmcgra> ptree 29064
485 /usr/lib/inet/inetd start
23054 /usr/sbin/in.telnetd
23131 -pksh-ksh
26107 -ksh
29058 -ksh
29064 /usr/ob/bin/basic s=61440 pgm=/usr/local/etc/logon -q -nr trans
412 sybsrvr
I'd like to know if there is a better way to code this. This is the part of the script that takes longest to run.
Note: this code, along with other snippets, are inside a loop and are executed a couple of times.

I think the main problem is that this code is in a loop. You don't need to run ptree and parse the results more than once! You need to figure out a way to run ptree once and put it into a data structure that you can use later. Probably be some kind of simple hash will suffice. You may even be able to just keep around your %terminals hash and keep reusing it.
Some nitpicks...
Both of your "next" statements seem
unnecessary to me... you should be
able to just remove them.
Replace
$terminals{$user} = $terminals{$user} . " $1";
with:
$terminals{$user} .= " $1";
Replace the bareword PTREE which you
are using as a filehandle with
$ptreeF or some such... using
barewords became unnecessary for
filehandles about 10 years ago :)
I don't know why your $PID variable
is all caps... it could be confusing
to readers of your code because it
looks like there is something
special about that variable, and
there isn't.

I think you'll get the best performance improvement by avoiding the overhead of repeatedly executing an external command (ptree, in this case). I'd look for a CPAN module that provides a direct interface to the data structures that ptree is reading. Check the Linux:: namespace, maybe? (I'm not sure if ptree is setuid; that may complicate things.)
The above advice aside, some additional style and robustness notes based on the posted snippet only (forgive me if the larger code invalidates them):
I'd start by using strict, at the very least. Lexical filehandles would also be a good idea.
You appear to be silently ignoring the case when you cannot open() the ptree command. That could happen for many reasons, some of which I can't imagine you wanting to ignore, such as…
You're not using the full path to the ptree command, but rather assuming it's in your path—and that the one in your path is the right one.

How many users are on the system? Can you invert this? List all -pksh-ksh processes in the system along with their EUIDs, and build the map from that - that might be only one execution of ps/ptree.

I was thinking of using ps to get the parents pid, but I would need to loop this to get the great-grandparent's pid. That's the one I need. Thanks. – lamcro
Sorry, there are many users and each can have up to three terminals open. The whole script is used to find those terminals that are using a file. I use fuser to find the processes that use a file. Then use ptree to find the terminal's pid. – lamcro
If you have (or can get) a list of PIDs using a file, and just need all of the grand-parents of that PID, there's an easier way, for sure.
#!perl
use warnings;
use strict;
#***** these PIDs are gotten with fuser or some other method *****
my($fpids) = [27538, 31812, 27541];
#***** get all processes, assuming linux PS *****
my($cmd) = "ps -ef";
open(PS, "$cmd |") || die qq([ERROR] Cannot open pipe from "$cmd" - $!\n);
my($processlist) = {};
while (<PS>) {
chomp;
my($user, $pid, $ppid, $rest) = split(/ +/, $_, 4);
$processlist->{$pid} = $ppid;
}
close PS;
#***** lookup grandparent *****
foreach my $fpid (#$fpids) {
my($parent) = $processlist->{$fpid} || 0;
my($grandparent) = $processlist->{$parent} || 0;
if ($grandparent) {
#----- do something here with grandparent's pid -----
print "PID:GRANDPID - $fpid:$grandparent\n";
}
else {
#----- some error condition -----
print "ERROR - Cannot determine GrandPID: $fpid ($parent)\n";
}
}
Which for me produces:
ERROR - Cannot determine GrandPID: 27538 (1)
PID:GRANDPID - 31812:2804
PID:GRANDPID - 27541:27538

Have you considered using 'who -u' to tell you which process is the login shell for a given tty instead of using ptree? This would simplify your search - irrespective of the other changes you should also make.

I just did some trivial timings here based on your script (calling "cat ptree.txt" instead of ptree itself) and confirmed my thoughts that all of your time is spent creating new sub-processes and running ptree itself. Unless you can factor away the need to call ptree (maybe there's a way to open up the connection once and reuse it, like with nslookup), you won't see any real gains.

Related

Perl: Pass one byte plus STDIN to another command

I would like to do this efficiently:
my $buf;
my $len = read(STDIN,$buf,1);
if($len) {
# Not empty
open(OUT,"|-", "wc") || die;
print OUT $buf;
# This is the line I want to do faster
print OUT <STDIN>;
exit;
}
The task is to start wc only if there is any input. If there is no input, the program should just exit.
wc is just an example here. It will be substituted with a much more complex command.
The input can be of several TB of data, so I would really like to not touch that data at all (not even with a sysread). I tried doing:
pipe(STDIN,OUT);
But that doesn't work. Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN? Maybe some open(">=&2") gymnastics combined with exec?
The FIONREAD ioctl, mentioned in the Perl Cookbook, can tell you how many bytes are pending on a file descriptor without consuming them. In perlish terms:
use strict;
use warnings;
use IO::Select qw( );
BEGIN { require 'sys/ioctl.ph'; }
sub fionread {
my $sz = pack('L', 0);
return unless ioctl($_[0], FIONREAD, $sz);
return unpack('L', $sz);
}
# Wait until it's known whether the handle has data to read or has reached EOF.
IO::Select->new(\*STDIN)->can_read();
if (fionread(\*STDIN)) {
system('wc');
# Check for errors
}
This should be very widely portable to UNIX and UNIX-like platforms.
A child process is always given duplicates of its parent's file handles, so simply starting wc - either with backticks or with a call to system or exec - will cause it to read from the same place as the Perl process's STDIN.
As for starting wc only when there is something to read, it looks like you need IO::Select, which will allow you either to check whether a file handle has something to read, or to block until it does have something.
This program will check whether STDIN has any data waiting, and run wc and print its output if so.
use strict;
use warnings;
use IO::Select;
my $select = IO::Select->new(\*STDIN);
if ( $select->can_read(0) ) {
print `wc`;
}
The parameter to can_read is a timeout in seconds. Passing a value of zero makes it return immediately, reporting true (actually it returns the file handle itself) if there is data waiting, or false (undef) if not.
If you don't pass a parameter then can_read will wait forever until there is something to read, so you can suspend your program and wait for data for wc by writing just
$select->can_read;
print `wc`;
or you could combine the construction of the object to make it even more concise
IO::Select->new(\*STDOUT)->can_read;
print `wc`;
Note also that IO::Select works fine with file descriptors too, and as the fileno for STDIN is zero, you could write
my $select = IO::Select(0)
but that isn't very descriptive and would need a comment to make sense
The specific solution in which you're interested is impossible.
As you surely discovered already, you can't determine if a file handle has reached EOF without reading from it. [Apparently, you can] select(2) will get you close. It will tell you that a handle has reached EOF or has data waiting, but it won't tell you which. This is why you're looking into alternate solutions. Unfortunately, the one you're looking into is just as impossible.
Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN?
No. OUT isn't code; it doesn't read anything. It's a variable. Furthermore, it's a variable in the parent. Changing a variable in the parent isn't going to affect the child.
Maybe you meant to ask: Can one tell the child program to start reading from a second handle?
No, generally speaking. You can't go and edit another program's variables. The program would have to be specifically written to accept two file handles and read from one after the other.
Then again, it's possible to obtain a file name for an arbitrary file handle, so all we need is a program that is specifically written to accept two file names and read from one after the other, and that's quite common.
$ echo abcdef | perl -MFcntl -e'
if (sysread(STDIN, $buf, 1)) {
pipe(my $r, my $w);
my $pid = fork();
if (!$pid) {
close($w);
# Clear close-on-exec flag.
my $flags = fcntl($r, Fcntl::F_GETFD, 0);
fcntl($r, Fcntl::F_SETFD, $flags & ~Fcntl::FD_CLOEXEC);
exec("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
die $!;
}
close($r);
print($w $buf);
close($w);
waitpid($pid, 0);
}
'
abcdef
(Lots of error checking needed.)
Above, cat was used an example where your program would be used, but that presents another solution: Why not just use cat? The overhead of cat should be quite minor for an IO-bound program.
use String::ShellQuote qw( shell_quote );
my $cmd1 = shell_quote("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
my $cmd2 = ...
exec("$cmd1 | $cmd2");

What is the preferable way to get the I/O of another console program in Perl?

It seems there are 2 ways to get the I/O of another program in Perl, one is surround the program with arguments by `, the other is using open like below, any difference between these 2 and which one is preferred?
# 2 options to get the I/O of another program
1. $output = `program`;
2. open(PIPE, 'program |');
Your 2nd case appears to be piping to 'program', rather than receiving input from it.
If you mean (and I think you do):
open(PIPE, "program |")
then the advantage of piping in is that you can process record-by-record, and you don't consume the whole of the process's output in one go (as occurs in the first scenario). That would be better from a memory perspective, and possbily from a time perspective if the program runs over a long period.
A possibility is to pipe from stdin, and then you can organise your pipes on the command line, and provide additional filtering etc. if required, and not hardcode a dependency on a particular binary.
Pipes are useful when you want to read output from a process over time. Let's say you want to monitor a log file.
You can, for example, write:
open(PIPE, 'tail -f log.txt |');
while (<PIPE>) {
chomp;
print("NEW LINE: $_\n");
}
and do some action on every new line added to the log.
If you were to write:
$output = `tail -f log.txt`;
println($output);
Then your program would never actually get to the println part since a command in backticks will wait until the command has returned.
There are a lot of different ways of communicating between processes in perl - so many that there's a whole section of documentation on it perlipc
For simple requirements, it doesn't really matter what you use. I would say - use whatever's most obvious given the context.
If you're just running a command and grabbing the output as a block:
my $df_out = `df -h`;
Then I consider backticks most appropriate.
If you're wanting to do line by line parsing - backticks will work, but generally I'll prefer an open.
E.g.
open ( my $ssh_data, "-|", "ssh -n $hostname df" );
my %size_of;
while ( <$ssh_data> ) {
my ( $fs, $total, $used, $avail ) = split;
$size_of{$fs} = $total;
}
close ( $ssh_data );
IPC::Open2 and IPC::Open3 are also options to allow bidirectional communication.
One caveat though - I'd strongly recommend not using 2 argument open, and using lexical filehandles. It's a good habit to train yourself in - even if it doesn't matter now getting used to 3-arg open is worth doing.

Perl script getting stuck in terminal for no apparent reason

I have a Perl script which reads three files and writes new files after reading each one of them. Everything is one thread.
In this script, I open and work with three text files and store the contents in a hash. The files are large (close to 3 MB).
I am using a loop to go through each of the files (open -> read -> Do some action (hash table) -> close)
I am noticing that the whenever I am scanning through the first file, the Perl terminal window in my Cygwin shell gets stuck. The moment I hit the enter key I can see the script process the rest of the files without any issues.
It's very odd as there is no read from STDIN in my script. Moreover, the same logic applies to all the three files as everything is in the same loop.
Has anyone here faced a similar issue? Does this usually happen when dealing with large files or big hashes?
I can't post the script here, but there is not much in it to post anyway.
Could this just be a problem in my Cygwin shell?
If this problem does not go away, how can I circumvent it? Like providing the enter input when the script is in progress? More importantly, how can I debug such a problem?
sub read_set
{
#lines_in_set = ();
push #lines_in_set , $_[0];
while (<INPUT_FILE>)
{ $line = $_;
chomp($line);
if ($line=~ /ENDNEWTYPE/i or $line =~ /ENDSYNTYPE/ or eof())
{
push #lines_in_set , $line;
last;
}
else
{
push #lines_in_set , $line;
}
}
return #lines_in_set;
}
--------> I think i found the problem :- or eof() call was ensuring that the script would be stuck !! Somehow happening only at the first time. I have no idea why though
The eof() call is the problem. See perldoc -f eof.
eof with empty parentheses refers to the pseudo file accessed via while (<>), which consists of either all the files named in #ARGV, or to STDIN if there are none.
And in particular:
Note that this function actually reads a character and then "ungetc"s it, so isn't useful in an interactive context.
But your loop reads from another handle, one called INPUT_FILE.
It would make more sense to call eof(INPUT_FILE). But even that probably isn't necessary; your outer loop will terminate when it reaches the end of INPUT_FILE.
Some more suggestions, not related to the symptoms you're seeing:
Add
use strict;
use warnings;
near the top of your script, and correct any error messages this produces (perl -cw script-name does a compile-only check). You'll need to declare your variables using my (perldoc -f my). And use consistent indentation; I recommend the same style you'll find in most Perl documentation.

exit code of system() call with a single scalar argument in Perl

There is a system() call in a Perl script with multiple pipes, using a single scalar argument. The call looks more or less like this:
system("zcat /foo.gz | grep '^.{6}X|Y|Z' | awk '{print $2,$3,$4,$6}' | bzip2 > /foo.processed.bz2");
The file in question (foo.gz) is quite large, about 2GB compressed in size. I guess that's why it was originally done via a system call.
Questions:
The problem now is, that this system call always seem to return 0, whether one of the system commands fail or not. I assume this is because it gets invoked via sh -c '...'. Is that correct?
Is there a way to check if a system() call was successful if only a single scalar argument is passed?
Is there a better way to process a large file like this, in a way thats equally or more efficient (in terms of speed mainly)?
Thanks for any hints as I am not really familiar with Perl.
Two things:
When you do a system call, the value returned is the last value in the pipeline. Thus, you're getting the status code of the bzip2 command.
The reason the program is doing this is because the people who wrote the program probably didn't know any better. I've seen Perl programs use system calls for finding the basename of the file, doing a find, and even doing a copy/rename/move. These are all things that can be done faster and easier inside the Perl program. And, you don't have the whole Windows/Unix compatibility issues.
You're always better off using Perl modules for things like this. In this case, I bet the Perl modules will be even faster than the shell pipeline, and you'll have more control over the entire operation.
There's a set called IO::Compress that can handle both Zip and BZip2.
I use Archive::Zip which is a great module, but you want to use the Bzip2 compression algorithm, and Archive::Zip can't handle that.
system() returns what the /bin/sh shell returns. When multiple commands are pipelined, the shell forks a new process for each of them and the status code of the last command in the chain is returned, in this case bzip2.
Based on your comments and answers, I'd do it like that now:
$infile =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
open(OUTFH, "| /bin/bzip > $outfile") or die "Can't open $outfile: $!";
open(INFH, $infile) or die "Can't open $infile: $!";
while (my $line = <INFH>) {
if ($line =~ /^.{6}X|Y|Z) {
# TODO: the awk part...
print OUTFH $line;
}
}
close(INFH);
close(OUTFH);
Please feel free to comment and vote up/down.
You'd be better doing the text processing from within perl itself - that's what perl's for :)
system() only ever returns 0 or 1. To capture actual output, try calling it via backticks: `command` rather than system('command')

How do I run a Perl script from within a Perl script?

I've got a Perl script that needs to execute another Perl script. This second script can be executed directly on the command line, but I need to execute it from within my first program. I'll need to pass it a few parameters that would normally be passed in when it's run standalone (the first script runs periodically, and executes the second script under a certain set of system conditions).
Preliminary Google searches suggest using backticks or a system() call. Are there any other ways to run it? (I'm guessing yes, since it's Perl we're talking about :P ) Which method is preferred if I need to capture output from the invoked program (and, if possible, pipe that output as it executes to stdout as though the second program were invoked directly)?
(Edit: oh, now SO suggests some related questions. This one is close, but not exactly the same as what I'm asking. The second program will likely take an hour or more to run (lots of I/O), so I'm not sure a one-off invocation is the right fit for this.)
You can just do it.
{
local #ARGV = qw<param1 param2 param3>;
do '/home/buddy/myscript.pl';
}
Prevents the overhead of loading in another copy of perl.
The location of your current perl interpreter can be found in the special variable $^X. This is important if perl is not in your path, or if you have multiple perl versions available but which to make sure you're using the same one across the board.
When executing external commands, including other Perl programs, determining if they actually ran can be quite difficult. Inspecting $? can leave lasting mental scars, so I prefer to use IPC::System::Simple (available from the CPAN):
use strict;
use warnings;
use IPC::System::Simple qw(system capture);
# Run a command, wait until it finishes, and make sure it works.
# Output from this program goes directly to STDOUT, and it can take input
# from your STDIN if required.
system($^X, "yourscript.pl", #ARGS);
# Run a command, wait until it finishes, and make sure it works.
# The output of this command is captured into $results.
my $results = capture($^X, "yourscript.pl", #ARGS);
In both of the above examples any arguments you wish to pass to your external program go into #ARGS. The shell is also avoided in both of the above examples, which gives you a small speed advantage, and avoids any unwanted interactions involving shell meta-characters. The above code also expects your second program to return a zero exit value to indicate success; if that's not the case, you can specify an additional first argument of allowable exit values:
# Both of these commands allow an exit value of 0, 1 or 2 to be considered
# a successful execution of the command.
system( [0,1,2], $^X, "yourscript.pl", #ARGS );
# OR
capture( [0,1,2, $^X, "yourscript.pl", #ARGS );
If you have a long-running process and you want to process its data while it's being generated, then you're probably going to need a piped open, or one of the more heavyweight IPC modules from the CPAN.
Having said all that, any time you need to be calling another Perl program from Perl, you may wish to consider if using a module would be a better choice. Starting another program carries quite a few overheads, both in terms of start-up costs, and I/O costs for moving data between processes. It also significantly increases the difficulty of error handling. If you can turn your external program into a module, you may find it simplifies your overall design.
All the best,
Paul
I can think of a few ways to do this. You already mentioned the first two, so I won't go into detail on them.
backticks: $retVal = `perl somePerlScript.pl`;
system() call
eval
The eval can be accomplished by slurping the other file into a string (or a list of strings), then 'eval'ing the strings. Heres a sample:
#!/usr/bin/perl
open PERLFILE, "<somePerlScript.pl";
undef $/; # this allows me to slurp the file, ignoring newlines
my $program = <PERLFILE>;
eval $program;
4 . do: do 'somePerlScript.pl'
You already got good answers to your question, but there's always the posibility to take a different point of view: maybe you should consider refactoring the script that you want to run from the first script. Turn the functionality into a module. Use the module from the first and from the second script.
If you need to asynchronously call your external script -you just want to launch it and not wait for it to finish-, then :
# On Unix systems, either of these will execute and just carry-on
# You can't collect output that way
`myscript.pl &`;
system ('myscript.pl &');
# On Windows systems the equivalent would be
`start myscript.pl`;
system ('start myscript.pl');
# If you just want to execute another script and terminate the current one
exec ('myscript.pl');
Use backticks if you need to capture the output of the command.
Use system if you do not need to capture the output of the command.
TMTOWTDI: so there are other ways too, but those are the two easiest and most likely.
See the perlipc documentation for several options for interprocess communication.
If your first script merely sets up the environment for the second script, you may be looking for exec.
#!/usr/bin/perl
use strict;
open(OUTPUT, "date|") or die "Failed to create process: $!\n";
while (<OUTPUT>)
{
print;
}
close(OUTPUT);
print "Process exited with value " . ($? >> 8) . "\n";
This will start the process date and pipe the output of the command to the OUTPUT filehandle which you can process a line at a time. When the command is finished you can close the output filehandle and retrieve the return value of the process. Replace date with whatever you want.
I wanted to do something like this to offload non-subroutines into an external file to make editing easier. I actually made this into a subroutine. The advantage of this way is that those "my" variables in the external file get declared in the main namespace. If you use 'do' they apparently don't migrate to the main namespace. Note the presentation below doesn't include error handling
sub getcode($) {
my #list;
my $filename = shift;
open (INFILE, "< $filename");
#list = <INFILE>;
close (INFILE);
return \#list;
}
# and to use it:
my $codelist = [];
$codelist = getcode('sourcefile.pl');
eval join ("", #$codelist);