How to find a solaris process with ___ status - perl

I made the following script which searches for certain processes, displays uses pflags for each one, and stops when it finds one with the word "pause":
!cat find_pause
#!/usr/bin/perl -W
use warnings;
use strict;
if (open(WCF,
"ps -ef | grep '/transfile' | cut -c10-15 | xargs -n1 pflags 2>&1 |"
)) {
while (<WCF>) {
next if ($_ =~ /cannot/);
print $_;
last if ($_ =~ /pause/);
}
close(WCF);
}
It works, but I wonder if there is a better way to do this.
Update
pause is a low-level system call. Like read, nanosleep, waitid, etc.
With this script I want to find processes that are stuck in the pause call. We are trying to find a bug in our system, and we think it might be related to this.

I don't know what you'd consider a "better way" in this case, but I can offer some technique guidance for the approach you already have:
grep '/[t]ransfile'
A grep against ps output often runs the risk of matching the grep process itself, which is almost never desired. An easy protection against this is simply to introduce a character class of one member in the grep pattern argument.
awk '/\/[t]ransfile/{ print $2 }'
grep + cut, that is, field extraction following a pattern match, is an easy task for a single awk command.
Don't refer to $_
Tighter, more idiomatic perl would omit explicit use of $_. Try next if /cannot/ and the like.
open(my $wcf, ...)
Please use lexical filehandles, otherwise you'll be chided by those old enough to remember when we couldn't use them. :)

There are two possible improvements to this, depending on:
Do you actually require to print exact output of pflags command or some info from it (e.g. list of PIDs and flags?)
What does "pause" in pflags output mean? It's nowhere in "proc" or "pflags" man-pages and all the actual flags are upper case. Depending on its meaning, it might be found in native Perl implementation of "/proc" - Proc::processTable::Process.
For example, that Process object contains all the flags (in a bit vector) and process status (my suspicion is that "pause" might be a process status).
If the answers to those questions are "Proc::processTable::Process contains enough info for my needs", then a better solution is to use that:
#!/usr/bin/perl -W
use warnings;
use strict;
use Proc::ProcessTable;
my $t = new Proc::ProcessTable;
foreach $p ( #{$t->table} ) {
my $flags = $p->pid; # This is an integer containing bit vector.
# Somehow process $flags or $p->status to find "if the process is paused"
print "$flags\n";
last if paused($p); # No clue how to do that without more info from you
# May be : last if $p->status =~ /paused/;
}
However, if the native Perl process does not have enough info for you (unlikely but possible), OR if you acually desire to print exact pflags output as-is for some reason, the best optimization is to construct a list of PIDs for pflags natively - not as big of a win but you still lose a bunch of extra forked off processes. Something like this:
#!/usr/bin/perl -W
use warnings;
use strict;
use Proc::ProcessTable;
my $t = new Proc::ProcessTable;
my $pids = join " ", map { $_->pid } #{$t->table};
if (open(WCF, "pflags 2>&1 $pids|")) {
while (<WCF>) {
next if ($_ =~ /cannot/);
print $_;
last if ($_ =~ /pause/);
}
close(WCF);
}

Related

perl text-processing (in particular when loading files)

Loading files and sorting columns is usually easy in shell with a combination of grep, cut, sed, awk & so on.
However, when I have to do it in Perl, I often end up doing long and painful things using many splits, one after another, regexes, and the result is dirty code that looks like something like this:
open $FH, "<", $file;
#file = <$FH>;
close $FH;
foreach $line (#file) {
( $foo, $bar, $some, $thing) = ( split(/,/, $line) )[3,8,9,15]
( $new_some ) = (split(/-/, $some))[2];
($new_foo = $foo) =~ s/xx//;
$uc_bar = uc($bar);
# and so on.....
}
Isn't there a more elegant way of doing such things (splitting fields, replacing patterns etc.)? Or a more "quicker" way (not necessarily elegant)?
Also is there a way to load just the required part of the file at loading time, (without having to load everything in memory, but filter prior to the loading)?
Elegance is subjective, but I can answer at least one of your questions, and suggest some things that might shorten or improve your code.
"is there a way to load just the required part of the file at loading time" - in the code you showed, I don't see the need to load the entire file into memory. The typical pattern for processing files line-by-line, and the equivalent of what Perl's -n and -p switches do, is this pattern:
open my $fh, '<', $file or die "$file: $!";
while (<$fh>) { # reads line into $_
my #fields = split; # splits $_ on whitespace, like awk
my ($foo, $bar, $some, $thing) = #fields[3,8,9,15];
...
}
close $fh;
I consider that fairly elegant, but based on what you're writing I guess you're comparing that to oneliners of piped commands that fit within maybe 100 characters. Perl can do that too: as the comments have already mentioned, have a look at the switches -n, -p, -a, -F, and -i. If you show some concrete examples of things you want to do, you'll probably get some replies showing how to do it shorter with Perl.
But if you're going to be doing more, then it's usually better to expand that into a script like the one above. IMHO putting things into a script gives you more power: it's not ephemeral like the command-line history, it's more easily extensible, and it's easier to use modules, you can add command-line options, process multiple files, and so on. Just for example, with the following snippet, you get all the power of Text::CSV - support for quoting, escaping, multiline strings, etc.
use Text::CSV;
my $csv = Text::CSV->new({binary=>1, auto_diag=>2, eol=>$/});
open my $fh, '<', $file or die "$file: $!";
while ( my $row = $csv->getline($fh) ) {
...
$csv->print(select, $row);
}
$csv->eof or $csv->error_diag;
close $fh;
You might also want to check out that module's csv function, which provides a lot of functionality in a short function. If you still think that's all to "painful" and "dirty" and you'd rather do stuff with less code, then there are a few shortcuts you could take, for example to slurp a whole file into memory, my $data = do { local (*ARGV, $/) = $file; <> };, or to do the same as the -i command-line switch:
local ($^I, #ARGV) = ('.bak', $file);
while (<>) {
# s///; or #F=split; or whatever
print; # prints $_ back out
}
One thing I like about Perl is that it lets you express yourself in lots of different ways - whether you want to hack together a really short script to take care of a one-time task, or write a big OO project, TIMTOWTDI 🙂

How do you check for the existence of file names with a specific string in Perl

I am rewriting a Bash script in Perl in order to learn the latter.
The script creates a file using the current date in a custom format and a ".txt" extension but checks first to make sure no file with the date in question already exists.
In Bash, I accomplish this with ls |grep $customDate as a condition. That is, if ls |grep $customDate is true, a warning is issued and no file is create while if ls |grep $customDate is false, the file gets created with the custom date plus a ".txt" extension
How can I mimic this in Perl?
For testing purposes, I wrote the code below but it does not print out anything - even when I have created a file that meets the condition:
use POSIX qw( strftime );
$customDate = strftime "%Y_%m%b_%d%a", localtime;
opendir(DIR, ".") or die "$!";
my #FILES = grep { /${customDate}*/ } readdir(DIR);
closedir(DIR);
print "$_\n" for #FILES;
I apologize if my question is unclear
"I am rewriting a Bash script in Perl in order to learn the latter."
I think you're taking the wrong approach to learning Perl, or to learning any language.
While there are always a lot of similarities between procedural languages, it is always wrong to focus on those above the differences. Programming languages must be learned from scratch if you hope ever to be able to read and write them well
I regularly see Perl code on Stack Overflow that has clearly been written by someone with the wrong head on. For instance, the clearest signs of a C programmer are
Declaring everything in one block at the top of a source file
Over-use of scalar and parentheses
Under-use of the default variable $_ and regular expressions
Using the C-style for loop, which usually looks something like this in Perl
my $i;
for ($i=0; $i<=scalar(#data); $i++)
{
process($data[$i])
}
Apart from ignoring perlstyle completely, the author is grasping for something familiar instead of embracing the new language. In idiomatic Perl that should look like
process($_) for #data
Reaching further, it is easy to become complacent about the consequences of phrases you may be writing glibly in the shell
You need to be aware that your shell statement
ls |grep $customDate
is starting new processes to run /bin/ls and /bin/grep, and piping information between them and back to the shell process. The Linux shell and its supporting utilities are designed to get trivial jobs done easily, but I believe they are being used too much with elaborate shell script one-liners that are opaque and beyond debugging
It's very hard to tell exactly what it is that you need your program to do, but it's looking like you want to discover whether there are any files that contain a string like 2016_05May_30Mon in the current directory
I have to say that's a horrible date-time format and I've struggling to believe that it's what you want, but I would prefer Perl's core Time::Piece module over POSIX any time
In this instance I would also make use od Perl's regular expressions, the -X *file test operators, and Perl's glob operator instead of opendir, readdir, closedir. None of those have a direct equivalent in any shell language
So, assuming that my guesses about your intention are correct, I would write this
use strict;
use warnings 'all';
use feature 'say';
use Time::Piece;
my $dtime = localtime()->strftime('%Y_%m%b_%d%a');
say for grep { -f and /$dtime/ } glob '*.txt';
which isn't remotely like your translation from shell to Perl
The reason you're not getting what you expect is the * in the grep is looking for the last character of the "$customDate" repeated as many times as it likes (which is not what you expect from the * in this case).
If your file has a "somedata.txt" ext, you should update the code as such, which will look for your date string then any number of characters followed by a txt:
$customDate = strftime "%Y_%m%b_%d%a", localtime;
opendir(DIR, ".") or die "$!";
my #FILES = grep { /${customDate}.*\.txt/ } readdir(DIR);
closedir(DIR);
print "$_\n" for #FILES;

What does -l $_ do in Perl, and how does it work?

What is the meaning of the nest code
foreach (#items)
{
if (-l $_) ## this is what I don't understand: the meaning of -l
{
...
}
}
Thanks for any help.
Let's look at each thing:
foreach (#items) {
...
}
This for loop (foreach and for are the same command in Perl) is taking each item from the #items list, and setting it to $_. The $_ is a special variable in Perl that is used as sort of a default variable. The idea is that you could do things like this:
foreach (#items) {
s/foo/bar/;
uc;
print;
}
And each of those command would operate on that $_ variable! If you simply say print with nothing else, it would print whatever is in $_. If you say uc and didn't mention a variable, it would uppercase whatever is in $_.
This is now discouraged for several reasons. First, $_ is global, so there might be side effects that are not intended. For example, imagine you call a subroutine that mucked with the value of $_. You would suddenly be surprised that your program doesn't work.
The other -l is a test operator. This operator checks whether the file given is a symbolic link or not. I've linked to the Perldoc that explains all of the test operators.
If you're not knowledgeable in Unix or BASH/Korn/Bourne shell scripting, having a command that starts with a dash just looks weird. However, much of Perl's syntax was stolen... I mean borrowed from Unix shell and awk commands. In Unix, there's a command called test which you can use like this:
if test -L $FILE
then
....
fi
In Unix, that -L is a parameter to the test command, and in Unix, most parameters to commands start with dashes. Perl simply borrowed the same syntax dash and all.
Interestingly, if you read the Perldoc for these test commands, you will notice that like the foreach loop, the various test commands will use the $_ variable if you don't give it a variable or file name. Whoever wrote that script could have written their loop like this:
foreach (#items)
{
if (-l) ## Notice no mention of the `$_` variable
{
...
}
}
Yeah, that's soooo much clear!
Just for your information, The modern way as recommended by many Perl experts (cough Damian Conway cough) is to avoid the $_ variable whenever possible since it doesn't really add clarity and can cause problems. He also recommends just saying for and forgetting foreach, and using curly braces on the same line:
for my $file (#items) {
if ( -l $file ) {
...
}
}
That might not help with the -l command, but at least you can see you're dealing with files, so you might suspect that -l has something to do with files.
Unfortunately, the Perldoc puts all of these file tests under the -X section and alphabetized under X, so if you're searching the Perldoc for a -l command, or any command that starts with a dash, you won't find it unless you know. However, at least you know now for the future where to look when you see something like this: -s $file.
It's an operator that checks if a file is a symbolic link.
The -l filetest operator checks whether a file is a symbolic link.
The way -l works under the hood resembles the code below.
#! /usr/bin/env perl
use strict;
use warnings;
use Fcntl ':mode';
sub is_symlink {
my($path) = #_;
my $mode = (lstat $path)[2];
die "$0: lstat $path: $!" unless defined $mode;
return S_ISLNK $mode;
}
my #items = #ARGV;
foreach (#items) {
if (is_symlink $_) {
print "$0: link: $_\n";
}
}
Sample output:
$ ln -s foo/bar/baz quux
$ ./flag-links flag-links quux
./flag-links: link: quux
Note the call to lstat and not stat because the latter would attempt to follow symlinks but never identify them!
To understand how Unix mode bits work, see the accepted answer to “understanding and decoding the file mode value from stat function output.”
From perldoc :
-l File is a symbolic link.

sed / perl regex extremly slow

So, I've got a file called
cracked.txt, which contains thousands(80million+) lines of this:
dafaa15bec90fba537638998a5fa5085:_BD:zzzzzz12
a8c2e774d406b319e33aca8b38540063:2JB:zzzzzz999
d6d24dfcef852729d10391f186da5b08:WNb:zzzzzzzss
2f1c72ccc940828b5daf4ab98e0f8731:#]9:zzzzzzzz
3b7633b6c19d79e5ab76bdb9cce4fd42:#A9:zzzzzzzz
a3dc9c03ff845776b485fa8337c9625a:yQ,:zzzzzzzz
ade1d43b29674814a16e96098365f956:FZ-:zzzzzzzz
ba93090dfa64d964889f521788aca889:/.g:zzzzzzzz
c3bd6861732affa3a437df46a6295810:m}Z:zzzzzzzz
b31d9f86c28bd1245819817e353ceeb1:>)L:zzzzzzzzzzzz
and in my output.txt 80 million lines like this:
('chen123','45a36afe044ff58c09dc3cd2ee287164','','','','f+P',''),
('chen1234','45a36afe044ff58c09dc3cd2ee287164','','','','f+P',''),
('chen125','45a36afe044ff58c09dc3cd2ee287164','','','','f+P',''),
(45a36afe044ff58c09dc3cd2ee287164 and f+P change every line)
What I've done is created a simple bash script to match the cracked.txt to output.txt and join them.
cat './cracked.txt' | while read LINE; do
pwd=$(echo "${LINE}" | awk -F ":" '{print $NF}' | sed -e 's/\x27/\\\\\\\x27/g' -e 's/\//\\\x2f/g' -e 's/\x22/\\\\\\\x22/g' )
hash=$(echo "${LINE}" | awk -F ":" '{print $1}')
lines=$((lines+1))
echo "${lines} ${pwd}"
perl -p -i -e "s/${hash}/${hash} ( ${pwd} ) /g" output.txt
#sed -u -i "s/${hash}/${hash} ( ${pwd} ) /g" output.txt
done
As you can see by the comment, I've tried sed, and perl.
perl seems to be a tad faster than sed
I'm getting something like one line per second.
I've never used perl, so I've got no idea how to use that to my advantage (multi threading, etc)
What would the best way to speed up this process?
Thanks
edit:
I got a suggestion that it would be better to use something like this:
while IFS=: read pwd seed hash; do
...
done < cracked.txt
But because inbetween the first and last occurance of : (awk '{print $1}' awk '{print $NF}', : could appear inbetween there, it would make it bad(corrupt it)
I could use it just for the "hash", but not for the "pwd".
edit again;
The above wouldn't work, because I would have to name all the other data, which ofc will be a problem.
The problem with bash scripting is that, while very flexible and powerful, it creates new processes for nearly anything, and forking is expensive. In each iteration of the loop, you spawn 3Ă—echo, 2Ă—awk, 1Ă—sed and 1Ă—perl. Restricting yourself to one process (and thus, one programming language) will boost performance.
Then, you are re-reading output.txt each time in the call to perl. IO is always slow, so buffering the file would be more efficient, if you have the memory.
Multithreading would work if there were no hash collisions, but is difficult to program. Simply translating to Perl will get you a greater performance increase than transforming Perl to multithreaded Perl.[citation needed]
You would probably write something like
#!/usr/bin/perl
use strict; use warnings;
open my $cracked, "<", "cracked.txt" or die "Can't open cracked";
my #data = do {
open my $output, "<", "output.txt" or die "Can't open output";
<$output>;
};
while(<$cracked>) {
my ($hash, $seed, $pwd) = split /:/, $_, 3;
# transform $hash here like "$hash =~ s/foo/bar/g" if really neccessary
# say which line we are at
print "at line $. with pwd=$pwd\n";
# do substitutions in #data
s/\Q$hash\E/$hash ( $pwd )/ for #data;
# the \Q...\E makes any characters in between non-special,
# so they are matched literally.
# (`C++` would match many `C`s, but `\QC++\E` matches the character sequence)
}
# write #data to the output file
(not tested or anything, no guarantees)
While this would still be an O(n²) solution, it would perform better than the bash script. Do note that it can be reduced to O(n), when organizing #data into a hash tree, indexed by hash codes:
my %data = map {do magic here to parse the lines, and return a key-value pair} #data;
...;
$data{$hash} =~ s/\Q$hash\E/$hash ( $pwd )/; # instead of evil for-loop
In reality, you would store a reference to an array containing all lines that contain the hash code in the hash tree, so the previous lines would rather be
my %data;
for my $line (#data) {
my $key = parse_line($line);
push #$data{$key}, $line;
}
...;
s/\Q$hash\E/$hash ( $pwd )/ for #{$data{$hash}}; # is still faster!
On the other hand, a hash with 8E7 elems might not exactly perform well. The answer lies in benchmarking.
When parsing logs on my work i do this thing: split file for N parts (N=num_processors); align split points to \n. Start N threads to work each part. Works really fast but harddrive is bottleneck.

Is there a simple way to do bulk file text substitution in place?

I've been trying to code a Perl script to substitute some text on all source files of my project. I'm in need of something like:
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" *.{cs,aspx,ascx}
But that parses all the files of a directory recursively.
I just started a script:
use File::Find::Rule;
use strict;
my #files = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
# In-place file editing, or something like that
}
}
But now I'm stuck. Is there a simple way to edit all files in place using Perl?
Please note that I don't need to keep a copy of every modified file; I'm have 'em all subversioned =)
Update: I tried this on Cygwin,
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" {*,*/*,*/*/*}.{cs,aspx,ascx
But it looks like my arguments list exploded to the maximum size allowed. In fact, I'm getting very strange errors on Cygwin...
If you assign #ARGV before using *ARGV (aka the diamond <>), $^I/-i will work on those files instead of what was specified on the command line.
use File::Find::Rule;
use strict;
#ARGV = (File::Find::Rule->file()->name('*.cs', '*.aspx', '*.ascx')->in('.'));
$^I = '.bak'; # or set `-i` in the #! line or on the command-line
while (<>) {
s/thisgoesout/thisgoesin/gi;
print;
}
This should do exactly what you want.
If your pattern can span multiple lines, add in a undef $/; before the <> so that Perl operates on a whole file at a time instead of line-by-line.
You may be interested in File::Transaction::Atomic or File::Transaction
The SYNOPSIS for F::T::A looks very similar with what you're trying to do:
# In this example, we wish to replace
# the word 'foo' with the word 'bar' in several files,
# with no risk of ending up with the replacement done
# in some files but not in others.
use File::Transaction::Atomic;
my $ft = File::Transaction::Atomic->new;
eval {
foreach my $file (#list_of_file_names) {
$ft->linewise_rewrite($file, sub {
s#\bfoo\b#bar#g;
});
}
};
if ($#) {
$ft->revert;
die "update aborted: $#";
}
else {
$ft->commit;
}
Couple that with the File::Find you've already written, and you should be good to go.
You can use Tie::File to scalably access large files and change them in place. See the manpage (man 3perl Tie::File).
Change
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
#inplace file editing, or something like that
}
}
To
foreach my $f (#files){
open my $in, '<', $f;
open my $out, '>', "$f.out";
while (my $line = <$in>){
chomp $line;
$line =~ s/thisgoesout/thisgoesin/gi
print $out "$line\n";
}
}
This assumes that the pattern doesn't span multiple lines. If the pattern might span lines, you'll need to slurp in the file contents. ("slurp" is a pretty common Perl term).
The chomp isn't actually necessary, I've just been bitten by lines that weren't chomped one too many times (if you drop the chomp, change print $out "$line\n"; to print $out $line;).
Likewise, you can change open my $out, '>', "$f.out"; to open my $out, '>', undef; to open a temporary file and then copy that file back over the original when the substitution's done. In fact, and especially if you slurp in the whole file, you can simply make the substitution in memory and then write over the original file. But I've made enough mistakes doing that that I always write to a new file, and verify the contents.
Note, I originally had an if statement in that code. That was most likely wrong. That would have only copied over lines that matched the regular expression "thisgoesout" (replacing it with "thisgoesin" of course) while silently gobbling up the rest.
You could use find:
find . -name '*.{cs,aspx,ascx}' | xargs perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi"
This will list all the filenames recursively, then xargs will read its stdin and run the remainder of the command line with the filenames appended on the end. One nice thing about xargs is it will run the command line more than once if the command line it builds gets too long to run in one go.
Note that I'm not sure whether find completely understands all the shell methods of selecting files, so if the above doesn't work then perhaps try:
find . | grep -E '(cs|aspx|ascx)$' | xargs ...
When using pipelines like this, I like to build up the command line and run each part individually before proceeding, to make sure each program is getting the input it wants. So you could run the part without xargs first to check it.
It just occurred to me that although you didn't say so, you're probably on Windows due to the file suffixes you're looking for. In that case, the above pipeline could be run using Cygwin. It's possible to write a Perl script to do the same thing, as you started to do, but you'll have to do the in-place editing yourself because you can't take advantage of the -i switch in that situation.
Thanks to ephemient on this question and on this answer, I got this:
use File::Find::Rule;
use strict;
sub ReplaceText {
my $regex = shift;
my $replace = shift;
#ARGV = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
$^I = '.bak';
while (<>) {
s/$regex/$replace->()/gie;
print;
}
}
ReplaceText qr/some(crazy)regexp/, sub { "some $1 text" };
Now I can even loop through a hash containing regexp=>subs entries!