Is there a way to filter stdout (or stderr) before being redirected to a file?
"redirecting to a pipe" is probably not the best way to put it but I'm looking for the easiest way to achieve something with that effect.
The usage scenario is the following. I'm using gawk --lint-invalid by principle to detect possible errors in my scripts and want to filter out spurious ones. Instead of redirecting errors to a file and grepping them out when examining the file, I would like the filtering to take place before writing to the file.
Example: this script prints every second line to stderr.
echo -ne 'a\nb\nc\nd\n' | gawk --lint=invalid 'BEGIN {b = 1;} // {if (b) print; else print > "/dev/stderr"; b = !b;}' 1>/dev/null 2>errors
cat errors | less
gawk: warning: regexp constant `//' looks like a C++ comment, but is not
b
d
gawk: (FILENAME=- FNR=4) warning: no explicit close of file `/dev/stderr' provided
But you can see the spurious gawk warnings (they are not of concern). They could be filtered for example, using
filter-gawk-output.sh
---------------------
grep -Ev 'looks like a|explicit close'
Is there an elegant way of doing that in-line when redirecting to errors file?
Right now when examining error files I always do
cat errors | ./filter-gawk-output.sh | less
What about:
gawk --lint=invalid 'whatever' INPUTFILE 2> GAWK_ERRORS.LOG
This way STDERR will be redirected to the error log.
I am not aware of gawk having facility to change the output of warnings. So I think this is more a question about shell syntax.
Given
filter_warnings() { grep -v '^gawk:'; }
awkprog='BEGIN {b = 1;} // {if (b) print; else print > "/dev/stderr"; b = !b;}'
where filter_warnings is for filtering out the gawk warnings and assuming bash as your shell, we can direct stderr to pipe command using |& syntax:
echo -ne 'a\nb\nc\nd\n' | gawk --lint=invalid "$awkprog" |& filter_warnings
If you want to outputs to file, then need to use parenthesis:
(echo -ne 'a\nb\nc\nd\n' | gawk --lint=invalid "$awkprog" > output.1) |& filter_warnings > output.2
Here output.1 will contain the gawk program output to stdout and output.2 the program output to to stderr.
Related
$ perl -pe 1 foo && echo ok
Can't open foo: No such file or directory.
ok
I'd really like the perl script to fail when the file does not exist. What's the "proper" way to make -p or -n fail when the input file does not exist?
The -p switch is just a shortcut for wrapping your code (the argument following -e) in this loop:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
(-n is the same but without the continue block.)
The <> empty operator is equivalent to readline *ARGV, and that opens each argument in succession as a file to read from. There's no way to influence the error handling of that implicit open, but you can make the warning it emits fatal (note, this will also affect several warnings related to the -i switch):
perl -Mwarnings=FATAL,inplace -pe 1 foo && echo ok
Set a flag in the body of the loop, check the flag in the END block at the end of the oneliner.
perl -pe '$found = 1; ... ;END {die "No file found" unless $found}' -- file1 file2
Note that it only fails when no file was processed.
To report the problem when not all files have been found, you can use something like
perl -pe 'BEGIN{ $files = #ARGV} $found++ if eof; ... ;END {die "Some files not found" unless $files == $found}'
The follwoing code is Perl script, grep lines with 'Stage' from hostlog. and then line by line match the content with regex, if find add the count by 1:
$command = 'grep \'Stage \' '. $hostlog;
#stage_info = qx($command);
foreach (#stage_info) {
if ( /Stage\s(\d+)\s(.*)/ ) {
$stage_number = $stage_number+1;
}
}
so how to do this in linux shell? Based on my test, the we can not loop line by line, since there is space inside.
That is a horrible piece of Perl code you've got there. Here's why:
It looks like you are not using use strict; use warnings;. That is a huge mistake, and will not prevent errors, it will just hide them.
Using qx() to grep lines from a file is a completely redundant thing to do, as this is what Perl does best itself. "Shelling out" a process like that most often slows your program down.
Use some whitespace to make your code readable. This is hard to read, and looks more complicated than it is.
You capture strings by using parentheses in your regex, but you never use these strings.
Re: $stage_number=$stage_number+1, see point 3. And also, this can be written $stage_number++. Using the ++ operator will make your code clearer, will prevent the uninitialized warnings, and save you some typing.
Here is what your code should look like:
use strict;
use warnings;
open my $fh, "<", $hostlog or die "Cannot open $hostlog for reading: $!";
while (<$fh>) {
if (/Stage\s\d+/) {
$stage_number++;
}
}
You're not doing anything with the internal captures, so why bother? You could do everything with a grep:
$ stage_number=$(grep -E 'Stage\s\d+\s' | wc -l)
This is using extended regular expressions. I believe the GNU version takes these without a -E parameter, and in Solaris, even the egrep command might not quite allow for this regular expression.
If there's something more you have to do, you've got to explain it in your question.
If I understand the issue correctly, you should be able to do this just fine in the shell:
while read; do
if echo ${REPLY} | grep -q -P "'Stage' "; then
# Do what you need to do
fi
done < test.log
Note that if your grep command supports the -P option you may be able to use the Perl regular expression as-is for the second test.
this is almost it. bash has no expression for multiple digits.
#!/bin/bash
command=( grep 'Stage ' "$hostlog" )
while read line
do
[ "$line" != "${line/Stage [0-9]/}" ] && (( ++stage_number ))
done < <( "${command[#]}" )
On the other hand taking the function of the perl script into account rather than the operations it performs the whole thing could be rewritten as
(( stage_number += ` grep -c 'Stage \d\+\s' "$hostlog" ` ))
or this
stage_number=` grep -c 'Stage \d\+\s' "$hostlog" `
if, in the original perl, stage_number is uninitialised, or is initalised to 0.
I have the following code in a Perl .pl file. Do you think there's any issue with this code (I can't understand how it'll work as in the 2nd line there's a "|" character without a command following it)
while ( $temp ne "" ) {
open( PS, "ps -ef | grep deploy.sh | grep ssh | grep -v grep|" );
$temp = <PS>;
close(PS);
print "The Deploy scripts are still running. Now sleeping 20\n";
sleep 20;
}
That stray | is a way of Perl of saying that you want the output of that command to be made available to your program. There are several equivalent forms.
Take a look here: open - perldoc.perl.org. Specially at the line that says:
open(FOO, "cat -n '$file'|");
open(my $FOO, "foo");
opens the file for reading, while
open(my $FOO, "foo |");
tell Perl that foo is a command to run whose output is to be piped to file handle $FOO.
Since open(FOO, "foo |") just reads from FOO the output of the foo command, each line in the output of the foo command will become a line in the FOO file. The following will be identical to the shell command 'ps -ef':
open(PS, 'ps -ef |');
while (<PS>) { print $_ }
The command in the 2nd line of your sample is shell pipe filtering the list to produce on the running instances of 'deploy.sh', if the file has a line then there still are instances running, that's why it only reads the first line of input in $temp variable.
I am trying to use the tee command on Solaris to route output of 1 command to 2 different steams each of which comprises multiple statements. Here is the snippet of what I coded, but does not work. This iteration throws errors about unexpected end of files. If I change the > to | it throws an error Syntax Error near unexpected token do.
todaydir=/some/path
baselen=${#todaydir}
grep sometext $todaydir/somefiles*
while read iline
tee
>(
# this is the first block
do ojob=${iline:$baselen+1:8}
echo 'some text here' $ojob
done > firstoutfile
)
>(
# this is the 2nd block
do ojob=${iline:$baselen+1:8}
echo 'ls -l '$todaydir'/'$ojob'*'
done > secondoutfile
)
Suggestions?
The "while" should begin (and end) inside each >( ... ) substitution, not outside. Thus, I believe what you want is:
todaydir=/some/path
baselen=${#todaydir}
grep sometext $todaydir/somefiles* | tee >(
# this is the first block
while read iline
do ojob=${iline:$baselen+1:8}
echo 'some text here' $ojob
done > firstoutfile
) >(
# this is the 2nd block
while read iline
do ojob=${iline:$baselen+1:8}
echo 'ls -l '$todaydir'/'$ojob'*'
done > secondoutfile
)
I don't think the tee command will do that. The tee command will write stdin to one or more files as well as spit it back out to stdout. Plus I'm not sure the shell can fork off two sub-processes in the command pipeline like you are trying. You'd probably be better off to use something like Perl to fork off a couple of sub-process and write stdin to each.
As I understand (Perl is new to me) Perl can be used to script against a Unix command line. What I want to do is run (hardcoded) command line calls, and search the output of these calls for RegEx matches. Is there a way to do this simply in Perl? How?
EDIT: Sequence here is:
-Call another program.
-Run a regex against its output.
my $command = "ls -l /";
my #output = `$command`;
for (#output) {
print if /^d/;
}
The qx// quasi-quoting operator (for which backticks are a shortcut) is stolen from shell syntax: run the string as a command in a new shell, and return its output (as a string or a list, depending on context). See perlop for details.
You can also open a pipe:
open my $pipe, "$command |";
while (<$pipe>) {
# do stuff
}
close $pipe;
This allows you to (a) avoid gathering the entire command's output into memory at once, and (b) gives you finer control over running the command. For example, you can avoid having the command be parsed by the shell:
open my $pipe, '-|', #command, '< single argument not mangled by shell >';
See perlipc for more details on that.
You might be able to get away without Perl, as others have mentioned. However, if there is some Perl feature you need, such as extended regex features or additional text manipulation, you can pipe your output to perl then do what you need. Perl's -e switch let's you specify the Perl program on the command line:
command | perl -ne 'print if /.../'
There are several other switches you can pass to perl to make it very powerful on the command line. These are documented in perlrun. Also check out some of the articles in Randal Schwartz's Unix Review column, especially his first article for them. You can also google for Perl one liners to find lots of examples.
Do you need Perl at all? How about
command -I use | grep "myregexp" && dosomething
right in the shell?
#!/usr/bin/perl
sub my_action() {
print "Implement some action here\n";
}
open PROG, "/path/to/your/command|" or die $!;
while (<PROG>) {
/your_regexp_here/ and my_action();
print $_;
}
close PROG;
This will scan output from your command, match regexps and do some action (which now is printing the line)
In Perl you can use backticks to execute commands on the shell. Here is a document on using backticks. I'm not sure about how to capture the output, but I'm sure there's more than a way to do it.
You indeed use a one-liner in a case like this. I recently coded up one that I use, among other ways, to produce output which lists the directory structure present in a .zip archive (one dir entry per line). So using that output as an example of command output that we'd like to filter, we could put a pipe in and then use perl with the -n -e flags to filter the incoming data (and/or do other things with it):
[command_producing_text_output] | perl -MFile::Path -n -e \
"BEGIN{#PTM=()} if (m{^perl/(bin|lib(?!/site))}) {chomp;push #PTM,$_}" ^
-e "END{#WDD=mkpath (\#PTM,1);" ^
-e "printf qq/Created %u dirs to reflect part of structure present in the .ZIP file\n/, scalar(#WDD);}"
the shell syntax used, including: quoting of perl code and escaping of newlines, reflects CMD.exe usage in Windows NT-like consoles. If you need to, mentally replace
"^" with "\" and " with ' in the appropriate places.
The one-liner above adds only the directory names that start with "perl/bin" or
"perl/lib (not followed by "/site"); it then creates those directories. You wind
up with a (empty) tree that you can use for whatever evil purposes you desire.
The main point is to illustrate that there are flags available (-n, -p) to
allow perl to loop over each input record (line), and that what you can do is unlimited in terms of complexity.