Find available space in /tmp on a remote machine - perl

I want to find out the available space in /tmp on a remote machine. I can do it with the following command from my machine:
ssh host-name df /tmp | awk '{ print $4 }' | tail +2`
IT works and gives an output something like this :9076656, which the available space in KB.
But when I put this command inside a perl program, I get the error message about the use of uninitialized value for $4.
Use of uninitialized value $4 in concatenation
Here is how I am doing it in the perl code:
my $output = `ssh $server df /tmp | awk '{ print $4 }' | tail +2`;
Any ideas how can I resolve this issue ?

The problem is that $4 is interpolated by Perl, just like you expect $server to be interpolated and not literal. The immediate solution is to escape the dollar sign: \$4. However, it is as Brian Agnew says, pretty redundant to use awk inside perl to do something that perl excels at.
use strict;
use warnings;
use Data::Dumper;
my #output = `ssh $server df /tmp`; # capture output in array
#output = map { ( split )[3] } #output; # take only 4th field
print Dumper \#output; # display data
Then you can use the various array tools to trim #output to your liking, e.g. pop, push, shift, unshift, splice, using slices and subscripts. For example, taking all but the first two lines:
print #output[2 .. $#output];
Removing the first two lines:
splice #output, 0, 2;

Using Perl to spawn off an awk script seems a little redundant (and heavyweight) given Perl's ability to read files, parse out fields etc. I would rather investigate reading a process' output via Perl and splitting the output lines.

Related

Awk's output in Perl doesn't seem to be working properly

I'm writing a simple Perl script which is meant to output the second column of an external text file (columns one and two are separated by a comma).
I'm using AWK because I'm familiar with it.
This is my script:
use v5.10;
use File::Copy;
use POSIX;
$s = `awk -F ',' '\$1==500 {print \$2}' STD`;
say $s;
The contents of the local file "STD" is:
CIR,BS
60,90
70,100
80,120
90,130
100,175
150,120
200,260
300,500
400,600
500,850
600,900
My output is very strange and it prints out the desired "850" but it also prints a trailer of the line and a new line too!
ka#man01:$ ./test.pl
850
ka#man01:$
The problem isn't just printing. I need to use the variable generated by awk "i.e. the $s variable) but the variable is also being reserved with a long string and a new line!
Could you guys help?
Thank you.
I'd suggest that you're going down a dirty road by trying to inline awk into perl in the first place. Why not instead:
open ( my $input, '<', 'STD' ) or die $!;
while ( <$input> ) {
s/\s+\z//;
my #fields = split /,/;
print $fields[1], "\n" if $fields[0] == 500;
}
But the likely problem is that you're not handling linefeeds, and say is adding an extra one. Try using print instead, or chomp on the resultant string.
perl can do many of the things that awk can do. Here's something similar that replaces your entire Perl program:
$ perl -naF, -le 'chomp; print $F[1] if $F[0]==500' STD
850
The -n creates a while loop around your argument to -e.
The -a splits up each line into #F and -F lets you specify the separator. Since you want to separate the fields on a comma you use -F,.
The -l adds a newline each time you call print.
The -e argument is the program to run (with the added while from -n). The chomp removes the newline from the output. You get a newline in your output because you happen to use the last field in the line. The -l adds a newline when you print; that's important when you want to extract a field in the middle of the line.
The reason you get 2 newlines:
the backtick operator does not remove the trailing newline from the awk output. $s contains "850\n"
the say function appends a newline to the string. You have say "850\n" which is the same as print "850\n\n"

Extract everything between first and last occurence of the same pattern in single iteration

This question is very much the same as this except that I am looking to do this as fast as possible, doing only a single pass of the (unfortunately gzip compressed) file.
Given the pattern CAPTURE and input
1:.........
...........
100:CAPTURE
...........
150:CAPTURE
...........
200:CAPTURE
...........
1000:......
Print:
100:CAPTURE
...........
150:CAPTURE
...........
200:CAPTURE
Can this be accomplished with a regular expression?
I vaguely remember that this kind of grammar cannot be captured by a regular expression but not quite sure as regular expressions these days provide look aheads,etc.
You can buffer the lines until you see a line that contains CAPTURE, treating the first occurrence of the pattern specially.
#!/usr/bin/env perl
use warnings;
use strict;
my $first=1;
my #buf;
while ( my $line = <> ) {
push #buf, $line unless $first;
if ( $line=~/CAPTURE/ ) {
if ($first) {
#buf = ($line);
$first = 0;
}
print #buf;
#buf = ();
}
}
Feed the input into this program via zcat file.gz | perl script.pl.
Which can of course be jammed into a one-liner, if need be...
zcat file.gz | perl -ne '$x&&push#b,$_;if(/CAPTURE/){$x||=#b=$_;print#b;#b=()}'
Can this be accomplished with a regular expression?
You mean in a single pass, in a single regex? If you don't mind reading the entire file into memory, sure... but this is obviously not a good idea for large files.
zcat file.gz | perl -0777ne '/((^.*CAPTURE.*$)(?s:.*)(?2)(?:\z|\n))/m and print $1'
I would write
gunzip -c file.gz | sed -n '/CAPTURE/,$p' | tac | sed -n '/CAPTURE/,$p' | tac
Find the first CAPTURE and look back for the last one.
echo "/CAPTURE/,?CAPTURE? p" | ed -s <(gunzip -c inputfile.gz)
EDIT: Answer to comment and second (better?) solution.
When your input doesn't end with a newline, ed will complain, as shown by these tests.
# With newline
printf "1,$ p\n" | ed -s <(printf "%s\n" test)
# Without newline
printf "1,$ p\n" | ed -s <(printf "%s" test)
# message removed
printf "1,$ p\n" | ed -s <(printf "%s" test) 2> /dev/null
I do not know the memory complications this will give for a large file, but you would prefer a streaming solution.
You can use sed for the next approach.
Keep reading lines until you find the first match. During this time only remember the last line read (by putting it in a Hold area).
Now change your tactics.
Append each line to the Hold area. You do not know when to flush until the next match.
When you have the next match, recall the Hold area and print this.
I needed some tweeking for preventing the second match to be printed twice. I solved this by reading the next line and replacing the HOLD area with that line.
The total solution is
gunzip -c inputfile.gz | sed -n '1,/CAPTURE/{h;n};H;/CAPTURE/{x;p;n;h};'
When you don't like the sed holding space, you can implemnt the same approach with awk:
gunzip -c inputfile.gz |
awk '/CAPTURE/{capt=1} capt==1{a[i++]=$0} /CAPTURE/{for(j=0;j<i;j++) print a[j]; i=0}'
I don't think regex will be faster than double scan...
Here is an awk solution (double scan)
$ awk '/pattern/ && NR==FNR {a[++f]=NR; next} a[1]<=FNR && FNR<=a[f]' file{,}
Alternatively if you have any a priori information on where the patterns appear on the file you can have heuristic approaches which will be faster on those special cases.
Here is one more example with regex (the cons is that if files are large, it will consume a large memory)
#!/usr/bin/perl
{
local $/ = undef;
open FILE, $ARGV[0] or die "Couldn't open file: $!";
binmode FILE;
$string = <FILE>;
close FILE;
}
print $1 if $string =~ /([^\n]+(CAPTURE).*\2.*?)\n/s;
Or with one liner:
cat file.tmp | perl -ne '$/=undef; print $1 if <STDIN> =~ /([^\n]+(CAPTURE).*\2.*?)\n/s'
result:
100:CAPTURE
...........
150:CAPTURE
...........
200:CAPTURE
This might work for you (GNU sed):
sed '/CAPTURE/!d;:a;n;:b;//ba;$d;N;bb' file
Delete all lines until the first containing the required string. Print the line containing the required string. Replace the pattern space with the next line. If this line contains the required string, repeat the last two previous sentences. If it is the last line of the file, delete the pattern space. Otherwise, append the next line and repeat the last three previous sentences.
Having studied the test files used for haukex's benchmark, it would seem that sed is not the tool to extract this file. Using a mixture of csplit, grep and sed presents a reasonably fast solution as follows:
lines=$(grep -nTA1 --no-group-separator CAPTURE oldFile |
sed '1s/\t.*//;1h;$!d;s/\t.*//;H;x;s/\n/ /')
csplit -s oldFile $lines && rm xx0{0,2} && mv xx01 newFile
Split the original file into three files. A file preceding the first occurrence of CAPTURE, a file from the first CAPTURE to the last CAPTURE and a file containing of the remainder. The first and third files are discarded and the second file renamed.
csplit can use line numbers to split the original file. grep is extremely fast at filtering patterns and can return the line numbers of all patterns that match CAPTURE and the following context line. sed can manipulate the results of grep into two line numbers which are supplied to the csplit command.
When run against the test files (as above) I get timings around 10 seconds.
While posting this question, the problem I had at hand was that I had several huge gzip compressed log files generated by a java application.
The log lines were of the following format:
[Timestamp] (AppName) {EventId} [INFO]: Log text...
[Timestamp] (AppName) {EventId} [EXCEPTION]: Log text...
at com.application.class(Class.java:154)
caused by......
[Timestamp] (AppName) {EventId} [LogLevel]: Log text...
Given an EventId, I needed to extract all the lines corresponding to the event from these files. The problem became unsolvable with a trivial grep for EventId just due to the fact that the exception lines could be of arbitrary length and do not contain the EventId.
Unfortunately I forgot to consider the edge case where the last log line for an EventId could be the exception and the answers posted here would not print the stacktrace lines. However it wasn't hard to modify haukex's solution to cover these cases as well:
#!/usr/bin/env perl
use warnings;
use strict;
my $first=1;
my #buf;
while ( my $line = <> ) {
push #buf, $line unless $first;
if ( $line=~/EventId/ or ($first==0 and $line!~/\(AppName\)/)) {
if ($first) {
#buf = ($line);
$first = 0;
}
print #buf;
#buf = ();
}
else {
$first = 1;
}
}
I am still wondering if the faster solutions(mainly walter's sed solution or haukex's in-memory perl solution) could be modified to do the same.

Unix - filename and string result on same line

I need to search a directory that has hundreds or thousands of files, each containing XML with one or more instances of a specific string (begin/end tag with data).
I can get all the instances of the string by doing
grep -ho '<mytagname>..............<\/mytagname>' /home/xyzzy/mydata/*.XML > /home/mydata/tagvalues.txt
then a few sed commands to strip off the tags, so I wind up with a file just containing a list of values:
value001
value002
value003
(etc)
Ideally though, I'd like to have each line of the file to also include the filename so I can import into a database for analysis.
So my result would be something like this
fileAAA value001
fileAAA value002
fileAAA value003
fileBBB value004
Exact formatting of the above is flexible - could have spaces or other separator, it could even still include the begin/end tags.
The closest I've been able to get is with grep -o
fileAAA:value001
value002
value003
fileBBB:value004
A perl one-liner would seem ideal but I'm new enough to that, that I have no clue how to begin.
Could be done using a one-liner like so:
perl -lne 'print "$ARGV $1" if /<mytagname>(.*?)<\/mytagname>/' *.xml
However, I'd strongly recommend that you use an actual XML parser like XML::Twig or XML::LibXML
use strict;
use warnings;
use XML::LibXML;
for my $file (</home/xyzzy/mydata/*.XML>) {
my $doc = XML::LibXML->load_xml(location => $file);
for my $node ($doc->findnodes("//mytagname")) {
print "$file " . $node->textContent() . "\n";
}
}
What about awk?
awk -F'</?mytagname>' '$2 {print FILENAME,$2}' /home/xyzzy/mydata/*.XML
Explanation:
-F regex - set field delimiter must be a separate argument thus enclosed in its own quotes
$2 - if second field has a value
{print FILENAME,$2} - print filename SPACE the value of second field

sed / perl regex extremly slow

So, I've got a file called
cracked.txt, which contains thousands(80million+) lines of this:
dafaa15bec90fba537638998a5fa5085:_BD:zzzzzz12
a8c2e774d406b319e33aca8b38540063:2JB:zzzzzz999
d6d24dfcef852729d10391f186da5b08:WNb:zzzzzzzss
2f1c72ccc940828b5daf4ab98e0f8731:#]9:zzzzzzzz
3b7633b6c19d79e5ab76bdb9cce4fd42:#A9:zzzzzzzz
a3dc9c03ff845776b485fa8337c9625a:yQ,:zzzzzzzz
ade1d43b29674814a16e96098365f956:FZ-:zzzzzzzz
ba93090dfa64d964889f521788aca889:/.g:zzzzzzzz
c3bd6861732affa3a437df46a6295810:m}Z:zzzzzzzz
b31d9f86c28bd1245819817e353ceeb1:>)L:zzzzzzzzzzzz
and in my output.txt 80 million lines like this:
('chen123','45a36afe044ff58c09dc3cd2ee287164','','','','f+P',''),
('chen1234','45a36afe044ff58c09dc3cd2ee287164','','','','f+P',''),
('chen125','45a36afe044ff58c09dc3cd2ee287164','','','','f+P',''),
(45a36afe044ff58c09dc3cd2ee287164 and f+P change every line)
What I've done is created a simple bash script to match the cracked.txt to output.txt and join them.
cat './cracked.txt' | while read LINE; do
pwd=$(echo "${LINE}" | awk -F ":" '{print $NF}' | sed -e 's/\x27/\\\\\\\x27/g' -e 's/\//\\\x2f/g' -e 's/\x22/\\\\\\\x22/g' )
hash=$(echo "${LINE}" | awk -F ":" '{print $1}')
lines=$((lines+1))
echo "${lines} ${pwd}"
perl -p -i -e "s/${hash}/${hash} ( ${pwd} ) /g" output.txt
#sed -u -i "s/${hash}/${hash} ( ${pwd} ) /g" output.txt
done
As you can see by the comment, I've tried sed, and perl.
perl seems to be a tad faster than sed
I'm getting something like one line per second.
I've never used perl, so I've got no idea how to use that to my advantage (multi threading, etc)
What would the best way to speed up this process?
Thanks
edit:
I got a suggestion that it would be better to use something like this:
while IFS=: read pwd seed hash; do
...
done < cracked.txt
But because inbetween the first and last occurance of : (awk '{print $1}' awk '{print $NF}', : could appear inbetween there, it would make it bad(corrupt it)
I could use it just for the "hash", but not for the "pwd".
edit again;
The above wouldn't work, because I would have to name all the other data, which ofc will be a problem.
The problem with bash scripting is that, while very flexible and powerful, it creates new processes for nearly anything, and forking is expensive. In each iteration of the loop, you spawn 3×echo, 2×awk, 1×sed and 1×perl. Restricting yourself to one process (and thus, one programming language) will boost performance.
Then, you are re-reading output.txt each time in the call to perl. IO is always slow, so buffering the file would be more efficient, if you have the memory.
Multithreading would work if there were no hash collisions, but is difficult to program. Simply translating to Perl will get you a greater performance increase than transforming Perl to multithreaded Perl.[citation needed]
You would probably write something like
#!/usr/bin/perl
use strict; use warnings;
open my $cracked, "<", "cracked.txt" or die "Can't open cracked";
my #data = do {
open my $output, "<", "output.txt" or die "Can't open output";
<$output>;
};
while(<$cracked>) {
my ($hash, $seed, $pwd) = split /:/, $_, 3;
# transform $hash here like "$hash =~ s/foo/bar/g" if really neccessary
# say which line we are at
print "at line $. with pwd=$pwd\n";
# do substitutions in #data
s/\Q$hash\E/$hash ( $pwd )/ for #data;
# the \Q...\E makes any characters in between non-special,
# so they are matched literally.
# (`C++` would match many `C`s, but `\QC++\E` matches the character sequence)
}
# write #data to the output file
(not tested or anything, no guarantees)
While this would still be an O(n²) solution, it would perform better than the bash script. Do note that it can be reduced to O(n), when organizing #data into a hash tree, indexed by hash codes:
my %data = map {do magic here to parse the lines, and return a key-value pair} #data;
...;
$data{$hash} =~ s/\Q$hash\E/$hash ( $pwd )/; # instead of evil for-loop
In reality, you would store a reference to an array containing all lines that contain the hash code in the hash tree, so the previous lines would rather be
my %data;
for my $line (#data) {
my $key = parse_line($line);
push #$data{$key}, $line;
}
...;
s/\Q$hash\E/$hash ( $pwd )/ for #{$data{$hash}}; # is still faster!
On the other hand, a hash with 8E7 elems might not exactly perform well. The answer lies in benchmarking.
When parsing logs on my work i do this thing: split file for N parts (N=num_processors); align split points to \n. Start N threads to work each part. Works really fast but harddrive is bottleneck.

How do I best pass arguments to a Perl one-liner?

I have a file, someFile, like this:
$cat someFile
hdisk1 active
hdisk2 active
I use this shell script to check:
$cat a.sh
#!/usr/bin/ksh
for d in 1 2
do
grep -q "hdisk$d" someFile && echo "$d : ok"
done
I am trying to convert it to Perl:
$cat b.sh
#!/usr/bin/ksh
export d
for d in 1 2
do
cat someFile | perl -lane 'BEGIN{$d=$ENV{'d'};} print "$d: OK" if /hdisk$d\s+/'
done
I export the variable d in the shell script and get the value using %ENV in Perl. Is there a better way of passing this value to the Perl one-liner?
You can enable rudimentary command line argument with the "s" switch. A variable gets defined for each argument starting with a dash. The -- tells where your command line arguments start.
for d in 1 2 ; do
cat someFile | perl -slane ' print "$someParameter: OK" if /hdisk$someParameter\s+/' -- -someParameter=$d;
done
See: perlrun
Sometimes breaking the Perl enclosure is a good trick for these one-liners:
for d in 1 2 ; do cat kk2 | perl -lne ' print "'"${d}"': OK" if /hdisk'"${d}"'\s+/';done
Pass it on the command line, and it will be available in #ARGV:
for d in 1 2
do
perl -lne 'BEGIN {$d=shift} print "$d: OK" if /hdisk$d\s+/' $d someFile
done
Note that the shift operator in this context removes the first element of #ARGV, which is $d in this case.
Combining some of the earlier suggestions and adding my own sugar to it, I'd do it this way:
perl -se '/hdisk([$d])/ && print "$1: ok\n" for <>' -- -d='[value]' [file]
[value] can be a number (i.e. 2), a range (i.e. 2-4), a list of different numbers (i.e. 2|3|4) (or almost anything else, that's a valid pattern) or even a bash variable containing one of those, example:
d='2-3'
perl -se '/hdisk([$d])/ && print "$1: ok\n" for <>' -- -d=$d someFile
and [file] is your filename (that is, someFile).
If you are having trouble writing a one-liner, maybe it is a bit hard for one line (just my opinion). I would agree with #FM's suggestion and do the whole thing in Perl. Read the whole file in and then test it:
use strict;
local $/ = '' ; # Read in the whole file
my $file = <> ;
for my $d ( 1 .. 2 )
{
print "$d: OK\n" if $file =~ /hdisk$d\s+/
}
You could do it looping, but that would be longer. Of course it somewhat depends on the size of the file.
Note that all the Perl examples so far will print a message for each match - can you be sure there are no duplicates?
My solution is a little different. I came to your question with a Google search the title of your question, but I'm trying to execute something different. Here it is in case it helps someone:
FYI, I was using tcsh on Solaris.
I had the following one-liner:
perl -e 'use POSIX qw(strftime); print strftime("%Y-%m-%d", localtime(time()-3600*24*2));'
which outputs the value:
2013-05-06
I was trying to place this into a shell script so I could create a file with a date in the filename, of X numbers of days in the past. I tried:
set dateVariable=`perl -e 'use POSIX qw(strftime); print strftime("%Y-%m-%d", localtime(time()-3600*24*$numberOfDaysPrior));'`
But this didn't work due to variable substitution. I had to mess around with the quoting, to get it to interpret it properly. I tried enclosing the whole lot in double quotes, but this made the Perl command not syntactically correct, as it messed with the double quotes around date format. I finished up with:
set dateVariable=`perl -e "use POSIX qw(strftime); print strftime('%Y-%m-%d', localtime(time()-3600*24*$numberOfDaysPrior));"`
Which worked great for me, without having to resort to any fancy variable exporting.
I realise this doesn't exactly answer your specific question, but it answered the title and might help someone else!
That looks good, but I'd use:
for d in $(seq 1 2); do perl -nle 'print "hdisk$ENV{d} OK" if $_ =~ /hdisk$ENV{d}/' someFile; done
It's already written on the top in one long paragraph but I am also writing for lazy developers who don't read those lines.
Double quotes and single quote has big different meaning for the bash.
So please take care
Doesn't WORK perl '$VAR' $FILEPATH
WORKS perl "$VAR" $FILEPATH