I have a tab delimited file(dummy) that looks like this :
a b
a b
a c
a c
a b
I am trying to write an awk command inside the perl script in which the file.txt is being made.
The awk command :
$n=system(" awk -F"\t" '{if($1=="a" && $2=="b") print $1,$2}' file.txt|wc -l ")
Error :
comparison operator :error in '==' , ',' between $1 and $2 in print }'
The awk script is running fine on command line but giving error while running inside the script.
I don't see any syntax error in the awk command.
Aside from the fact that what are you trying to achieve by executing awk from within perl (since it could be accomplished using the latter itself), you could use the q operator:
$cmd = q(awk -F"\t" '{if($1=="a" && $2=="b") print $1,$2}' file.txt | wc -l);
$n = system($cmd);
Note that using double-quotes would interpolate variables and you'd need to escape those.
You can get the number of a\tbs from Perl itself without calling an external command:
#!/usr/bin/perl
use warnings;
use strict;
open my $FH, '<', 'file.txt' or die $!;
my $n = 0;
"a\tb\n" eq $_ and $n++ while <$FH>;
print "$n\n";
Related
I am calling a perl program from a bash script. This Perl script works when I flag its arguments from the command line but when I call it from the bash script it gives an error.
Here is the relevant part of the bash script:
for ((i=1; i<=$z; i++))
do
if (($i%2 == 0)); then
~/Desktop/SNP_finder.pl `awk 'FNR == '$i <"${d}".fasta` `awk ...` "${d}".txt
else
...
and the Perl script:
local $, = "\t";
my #read = split('', $ARGV[0]);
my #reference = split('', $ARGV[1]);
my $filename = $ARGV[2];
...
open(my $fh, '>>', $filename) or die;
print $fh "Reference ", "SNP ", "Location\n";
...
I can run SNP_finder.pl fine from the terminal window but when I pipe it to this Bash script it gives me the error Use of unitialized value $filename and claims the value of $filename is ''.
The shell is performing word splitting. If your awk scripts return empty, the "${d}".txt parameter will end up as $ARGV[1] or $ARGV[0]. If your awk scripts return a string containing whitespace, it will end up further down...
To avoid this, you should quote them, like this:
"`awk ...`"
Even better would be:
"$(awk ...)"
In the below script. am not able to change the directory.i need the output like above 70% disk inside that directory which one is consuming more space.
#!/usr/bin/perl
use strict;
use warnings;
my $test=qx("df -h |awk \+\$5>=70 {print \$6} ");
chdir($test) or die "$!";
print $test;
system("du -sh * | grep 'G'");
No need to call awk in your case because Perl is quite good at splitting and printing certain lines itself. Your code has some issues:
The code qx("df -h |awk \+\$5>=70 {print \$6} ") tries to execute the string "df -h | awk ..." as a command which fails because there is no such command called "df -h | awk". When I run that code I get sh: 1: df -h |awk +>=70 {print } : not found. You can fix that by dropping the quotes " because qx() already is quoting. The variable $test is empty afterwards, so the chdir changes to your $HOME directory.
Then you'll see the next error: awk: line 1: syntax error at or near end of line, because it calls awk +\$5>=70 {print \$6}. Correct would be awk '+\$5>=70 {print \$6}', i.e. with ticks ' around the awk scriptlet.
As stated in a comment, df -h splits long lines into two lines. Example:
Filesystem 1K-blocks Used Available Use% Mounted on
/long/and/possibly/remote/file/system
10735331328 10597534720 137796608 99% /local/directory
Use df -hP to get guaranteed column order and one line output.
The last system call shows the directory usage (space) for all lines containing the letter G. I reckon that's not exactly what you want.
I suggest the following Perl script:
#!/usr/bin/env perl
use strict;
use warnings;
foreach my $line ( qx(df -hP) ) {
my ($fs, $size, $used, $avail, $use, $target) = split(/\s+/, $line);
next unless ($use =~ /^\d+\s*\%$/); # skip header line
# now $use is e.g. '90%' and we drop the '%' sign:
$use =~ s/\%$//;
if ($use > 70) {
print "almost full: $target; top 5 directories:\n";
# no need to chdir here. Simply use $target/* as search pattern,
# reverse-sort by "human readable" numbers, and show the top 5:
system("du -hs $target/* 2>/dev/null | sort -hr | head -5");
print "\n\n";
}
}
#!/usr/bin/perl
use strict;
use warnings;
my #bigd = map { my #f = split " "; $f[5] }
grep { my #f = split " "; $f[4] =~ /^(\d+)/ && $1 >= 70}
split "\n", `df -hP`;
print "big directories: $_\n" for #bigd;
for my $bigd (#bigd) {
chdir($bigd);
my #bigsubd = grep { my #f = split " "; $f[0] =~ /G/ }
split "\n", `du -sh *`;
print "big subdirectories in $bigd:\n";
print "$_\n" for #bigsubd;
}
I belive you wanted to do something like this.
I am writing a script for replacing 2 words from a text file. The script is
count=1
for f in *.pdf
do
filename="$(basename $f)"
filename="${filename%.*}"
filename="${filename//_/ }"
echo $filename
echo $f
perl -pe 's/intime_mean_pu.pdf/'$f'/' fig.tex > fig_$count.tex
perl -pi 's/TitleFrame/'$filename'/' fig_$count.tex
sed -i '/Pointer-rk/r fig_'$count'.tex' $1.tex
count=$((count+1))
done
But the replacing of words using the second perl command is giving error:
Can't open perl script "s/TitleFrame/Masses1/": No such file or directory
Please suggest what I am doing wrong.
You could change your script to something like this:
#!/bin/bash
for f in *.pdf; do
filename=$(basename "$f" .pdf)
filename=${filename//_/}
perl -spe 's/intime_mean_pu.pdf/$a/;
s/TitleFrame/$b/' < fig.tex -- -a="$f" -b="$filename" > "fig_$count.tex"
sed -i "/Pointer-rk/r fig_$count.tex" "$1.tex"
((++count))
done
As well as some other minor changes to your script, I have made use of the -s switch to Perl, which means that you can pass arguments to the one-liner. The bash variables have been double quoted to avoid problems with spaces in filenames, etc.
Alternatively, you could do the whole thing in Perl:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use File::Basename;
my $file_arg = shift;
my $count = 1;
for my $f (glob "*.pdf") {
my $name = fileparse($f, qq(.pdf));
open my $in, "<", $file_arg;
open my $out, ">", 'tmp';
open my $fig, "<", 'fig.tex';
# copy up to match
while (<$in>) {
print $out $_;
last if /Pointer-rk/;
}
# insert contents of figure (with substitutions)
while (<$fig>) {
s/intime_mean_pu.pdf/$f/;
s/TitleFrame/$name/;
print $out $_;
}
# copy rest of file
print $out $_ while <$in>;
rename 'tmp', $file_arg;
++$count;
}
Use the script like perl script.pl "$1.tex".
You're missing the -e in the second perl call
How to add a blank line after every grep result?
For example, grep -o "xyz" may give something like -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I want the output to be like this -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I would like to do something like
grep "xyz" | perl (code to add a new line after every grep result)
This is the direct answer to your question:
grep 'xyz' | perl -pe 's/$/\n/'
But this is better:
perl -ne 'print "$_\n" if /xyz/'
EDIT
Ok, after your edit, you want (almost) this:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++'
If you don’t like the blank line at the beginning, make it:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++ && $. > 1'
NOTE: This won’t work right on filenames with colons in them. :)½
If you want to use perl, you could do something like
grep "xyz" | perl -p -e 's/(.*)/\1\n/g'
If you want to use sed (where I seem to have gotten better results), you could do something like
grep "xyz" | sed 's/.*/\0\n/g'
This prints a newline after every single line of grep output:
grep "xyz" | perl -pe 'print "\n"'
This prints a newline in between results from different files. (Answering the question as I read it.)
grep 'xyx' * | perl -pe '/(.*?):/; if ($f ne $1) {print "\n"; $f=$1}'
Use a state machine to determine when to print a blank line:
#!/usr/bin/env perl
use strict;
use warnings;
# state variable to determine when to print a blank line
my $prev_file = '';
# change DATA to the appropriate input file handle
while( my $line = <DATA> ){
# did the state change?
if( my ( $file ) = $line =~ m{ \A ([^:]*) \: .*? xyz }msx ){
# blank lines between states
print "\n" if $file ne $prev_file && length $prev_file;
# set the new state
$prev_file = $file;
}
# print every line
print $line;
}
__DATA__
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I'm looking for a simple/elegant way to grep a file such that every returned line must match every line of a pattern file.
With input file
acb
bc
ca
bac
And pattern file
a
b
c
The command should return
acb
bac
I tried to do this with grep -f but that returns if it matches a single pattern in the file (and not all). I also tried something with a recursive call to perl -ne (foreach line of the pattern file, call perl -ne on the search file and try to grep in place) but I couldn't get the syntax parser to accept a call to perl from perl, so not sure if that's possible.
I thought there's probably a more elegant way to do this, so I thought I'd check. Thanks!
===UPDATE===
Thanks for your answers so far, sorry if I wasn't clear but I was hoping for just a one-line result (creating a script for this seems too heavy, just wanted something quick). I've been thinking about it some more and I came up with this so far:
perl -n -e 'chomp($_); print " | grep $_ "' pattern | xargs echo "cat input"
which prints
cat input | grep a | grep b | grep c
This string is what I want to execute, I just need to somehow execute it now. I tried an additional pipe to eval
perl -n -e 'chomp($_); print " | grep $_ "' pattern | xargs echo "cat input" | eval
Though that gives the message:
xargs: echo: terminated by signal 13
I'm not sure what that means?
One way using perl:
Content of input:
acb
bc
ca
bac
Content of pattern:
a
b
c
Content of script.pl:
use warnings;
use strict;
## Check arguments.
die qq[Usage: perl $0 <input-file> <pattern-file>\n] unless #ARGV == 2;
## Open files.
open my $pattern_fh, qq[<], pop #ARGV or die qq[ERROR: Cannot open pattern file: $!\n];
open my $input_fh, qq[<], pop #ARGV or die qq[ERROR: Cannot open input file: $!\n];
## Variable to save the regular expression.
my $str;
## Read patterns to match, and create a regex, with each string in a positive
## look-ahead.
while ( <$pattern_fh> ) {
chomp;
$str .= qq[(?=.*$_)];
}
my $regex = qr/$str/;
## Read each line of data and test if the regex matches.
while ( <$input_fh> ) {
chomp;
printf qq[%s\n], $_ if m/$regex/o;
}
Run it like:
perl script.pl input pattern
With following output:
acb
bac
Using Perl, I suggest you read all the patterns into an array and compile them. Then you can read through your input file using grep to make sure all of the regexes match.
The code looks like this
use strict;
use warnings;
open my $ptn, '<', 'pattern.txt' or die $!;
my #patterns = map { chomp(my $re = $_); qr/$re/; } grep /\S/, <$ptn>;
open my $in, '<', 'input.txt' or die $!;
while (my $line = <$in>) {
print $line unless grep { $line !~ $_ } #patterns;
}
output
acb
bac
Another way is to read all the input lines and then start filtering by each pattern:
#!/usr/bin/perl
use strict;
use warnings;
open my $in, '<', 'input.txt' or die $!;
my #matches = <$in>;
close $in;
open my $ptn, '<', 'pattern.txt' or die $!;
for my $pattern (<$ptn>) {
chomp($pattern);
#matches = grep(/$pattern/, #matches);
}
close $ptn;
print #matches;
output
acb
bac
Not grep and not a one liner...
MFILE=file.txt
PFILE=patterns
i=0
while read line; do
let i++
pattern=$(head -$i $PFILE | tail -1)
if [[ $line =~ $pattern ]]; then
echo $line
fi
# (or use sed instead of bash regex:
# echo $line | sed -n "/$pattern/p"
done < $MFILE
A bash(Linux) based solution
#!/bin/sh
INPUTFILE=input.txt #Your input file
PATTERNFILE=patterns.txt # file with patterns
# replace new line with '|' using awk
PATTERN=`awk 'NR==1{x=$0;next}NF{x=x"|"$0}END{print x}' "$PATTERNFILE"`
PATTERNCOUNT=`wc -l <"$PATTERNFILE"`
# build regex of style :(a|b|c){3,}
PATTERN="($PATTERN){$PATTERNCOUNT,}"
egrep "${PATTERN}" "${INPUTFILE}"
Here's a grep-only solution:
#!/bin/sh
foo ()
{
FIRST=1
cat pattern.txt | while read line; do
if [ $FIRST -eq 1 ]; then
FIRST=0
echo -n "grep \"$line\""
else
echo -n "$STRING | grep \"$line\""
fi
done
}
STRING=`foo`
eval "cat input.txt | $STRING"