unix functions inside perl - perl

I tried to use some unix tools inside a perl driver script because I knew little about writing shell script. My purpose is to just combine a few simple unix commands together so I can run the script on 100 directories in one perl command.
The task is I have more than 100 folders, in each folder, there are n number of files. I want to do the same thing on each folder, which is to combine the files in them and sort the combined file and use bedtools to merge overlapping regions (quite common practice in bioinformatics)
Here is what I have:
#!/usr/bin/perl -w
use strict;
my $usage ="
This is a driver script to merge files in each folder into one combined file
";
die $usage unless #ARGV;
my ($in)=#ARGV;
open (IN,$in)|| die "cannot open $in";
my %hash;
my $final;
while(<IN>){
chomp;
my $tf = $_;
my #array =`ls $tf'/.'`;
my $tmp;
my $tmp2;
foreach my $i (#array){
$tmp = `cut -f 1-3 $tf'/'$i`;
$tmp2 = `cat $tmp`;
}
my $tmp3;
$tmp3=`sort -k1,1 -k2,2n $tmp2`;
$final = `bedtools merge -i $tmp3`;
}
print $final,"\n";
I know that this line isn't working at all..
$tmp2 = `cat $tmp`;
The issue is how to direct the output into another variable in perl and use that variable later on in another unix command...
Please let me know if you can point out where I can change to make it work. Greatly appreciated.

The output from backticks usually includes newlines, which usually have to be removed before using the output downstream. Add some chomp's to your code:
chomp( my #array =`ls $tf'/.'` );
my $tmp;
my $tmp2;
foreach my $i (#array){
chomp( $tmp = `cut -f 1-3 $tf'/'$i` );
chomp( $tmp2 = `cat $tmp` );
}
my $tmp3;
chomp( $tmp3=`sort -k1,1 -k2,2n $tmp2` );
$final = `bedtools merge -i $tmp3`;

To use a perl variable in the shell, this is an example :
#!/usr/bin/env perl
my $var = "/etc/passwd";
my $out = qx(file $var);
print "$out\n";
For the rest, it's very messy. You should take the time learning perl and not mixing coreutils commands and Perl, where perl itself is a better tool to do the whole joke.

OK. I gave it up on perl and decided to give it a try using shell script. It worked!!
Thanks for the above answers though!
for dir in `ls -d */`
do
name=$(basename $dir /)
cd $dir
for file in `ls`
do
cut -f 1-3 $file > $file.tmp
done
for x in `ls *tmp`
do
cat $x >> $name.tmp1
done
sort -k1,1 -k2,2n $name.tmp1 > $name.tmp2
bedtools merge -i $name.tmp2 > $name.combined
done

Related

perl : making a script as efficient as a perl one-liner

I'm able to do this on the command line and it works :
~/Tools/perl/edif_extr_cell.pl design.edif nmos1p8v | perl -p -e 's/^/\n/ if /portImplementation|figure\s+device/;' | perl -n -000 -e 'print if /portImplementation/;'
(basically, extracting a section of the EDIF file).
Now, I want to make a utility of this. And my script is below. Question : can this code be more efficient? If feel like it's very inelegant. I could pipe streams easily on the command line but, in a script, I feel lost.
#!/usr/bin/perl -w -p
BEGIN{ $file = '';}
s/^/\n/ if /portImplementation|figure\s+device/;
$file .= $_;
END{
$cmd = q{\rm -f /tmp/dump}.$$.'.txt';
system( $cmd );
open( OUT, ">/tmp/dump$$.txt");
print OUT $file;
close OUT;
$out = `perl -n -000 -e 'print if /portImplementation/;' /tmp/dump$$.txt`;
system( $cmd );
print $out;
}
If I understand correct, you want to be able to do
~/Tools/perl/edif_extr_cell.pl design.edif nmos1p8v | myfilter
Ideally, you'd merge the two Perl scripts into one rather than having one script launch two instances of Perl, but this turns out to be rather hard because of the change to $/ (via -00) and because you insert newlines in the first filter.
The simplest answer:
#!/bin/sh
perl -pe's/^/\n/ if /portImplementation|figure\s+device/' |
perl -00ne'print if /portImplementation/'
It appears that you were trying to write the equivalent of that sh script in Perl. It would look like the following:
#!/usr/bin/perl
use strict;
use warnings;
use IPC::Open qw( open3 );
# open3 has issues with lexical file handles.
pipe(local *PIPE_READER, local *PIPE_WRITER)
or die($!);
my $pid1 = open3('<&STDIN', '>&PIPE_WRITER', '>&STDERR',
'perl', '-pes/^/\n/ if /portImplementation|figure\s+device/');
my $pid2 = open3('<&PIPE_READER', '>&STDOUT', '>&STDERR',
'perl', '-00neprint if /portImplementation/');
waitpid($pid1);
waitpid($pid2);
I'd normally recommend IPC::Run3 or IPC::Run for launching and interfacing with child processes, but low-level open3 does the trick nicely in this particular situation.
I downloaded a random EDIF file from GitHub, running the following script on it gives the same output as your code:
#! /usr/bin/perl
use warnings;
use strict;
my #buffer;
my $found;
my $prepend = q();
while (<>) {
if (/portImplementation|figure\s+device/) {
if ($found && #buffer) {
print $prepend, #buffer;
$prepend = "\n";
}
undef $found;
#buffer = ();
}
$found ||= /portImplementation/;
push #buffer, $_;
}
# Don't forget to output the last paragraph!
print $prepend, #buffer if $found && #buffer;

concatenate and print each array element with a suffix

I have some files in a directory like:
A.txt
B.txt
C.txt
I want to print them with new suffix:
#!/user/bin/perl
my #dir_list_initial = `ls *.txt | sed 's/.txt//g' `;
my $suffix = ".dat";
for (my $i=0; $i<=$#dir_list_initial; $i++){
print $dir_list_initial[$i] . $suffix;
}
I expect to print
A.dat
B.dat
C.dat
but it prints
.datA
.datB
.datC
You might try,
chomp(#dir_list_initial);
and later
print $dir_list_initial[$i] . "$suffix\n";
as every element of #dir_list_initial array has newline at the end of string.
Even better it would be to skip shell altogether, and use only perl,
my #dir_list_initial = map { s|\.txt||; $_ } glob("*.txt");
You're doing half of your program by running ls. Don't shell out when you can use builtin Perl mechanisms to do the same job The glob can do everything you're doing with ls and you don't have to depend upon the ls command (what if this is Windows?).
Also, always use strict; and use warnings; in your program, it can save you a ton of grief by picking up common mistakes.
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);
for my $file ( glob "*.txt" ) {
chomp $file;
$file =~ s/\.txt$/.dat/;
say $file;
}
Note I'm using s/.../.../ to substitute one suffix with another. You need to learn regular expressions. Your s/.txt// is not correct. The . can stand for any character, and you don't specify that .txt has to be on the end of the string. If my file was called ftxtback.txt, you'd change the file name to .datback.txt which is not what you want.
there's no need for calling sed here, perl can handle that well:
my #dir_list_initial = `ls *.txt`;
chomp #dir_list_initial;
my $suffix = ".dat";
foreach (#dir_list_inital){
s/\.txt$/$suffix/; # this alters #dir_list_initial
print "$_\n";
}
if you have a perl more recent than 5.14, you also can do this:
use 5.014;
my #dir_list_initial = `ls *.txt`;
chomp #dir_list_initial;
my $suffix = ".dat";
say s/\.txt$/$suffix/r foreach #dir_list_initial; # this doesn't alter the array.
and, as mpapec already has pointed out, it would be better to not involve the shell at all:
say s/\.txt$/.dat/r foreach <*txt>

Getting error while replacing word using perl

I am writing a script for replacing 2 words from a text file. The script is
count=1
for f in *.pdf
do
filename="$(basename $f)"
filename="${filename%.*}"
filename="${filename//_/ }"
echo $filename
echo $f
perl -pe 's/intime_mean_pu.pdf/'$f'/' fig.tex > fig_$count.tex
perl -pi 's/TitleFrame/'$filename'/' fig_$count.tex
sed -i '/Pointer-rk/r fig_'$count'.tex' $1.tex
count=$((count+1))
done
But the replacing of words using the second perl command is giving error:
Can't open perl script "s/TitleFrame/Masses1/": No such file or directory
Please suggest what I am doing wrong.
You could change your script to something like this:
#!/bin/bash
for f in *.pdf; do
filename=$(basename "$f" .pdf)
filename=${filename//_/}
perl -spe 's/intime_mean_pu.pdf/$a/;
s/TitleFrame/$b/' < fig.tex -- -a="$f" -b="$filename" > "fig_$count.tex"
sed -i "/Pointer-rk/r fig_$count.tex" "$1.tex"
((++count))
done
As well as some other minor changes to your script, I have made use of the -s switch to Perl, which means that you can pass arguments to the one-liner. The bash variables have been double quoted to avoid problems with spaces in filenames, etc.
Alternatively, you could do the whole thing in Perl:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use File::Basename;
my $file_arg = shift;
my $count = 1;
for my $f (glob "*.pdf") {
my $name = fileparse($f, qq(.pdf));
open my $in, "<", $file_arg;
open my $out, ">", 'tmp';
open my $fig, "<", 'fig.tex';
# copy up to match
while (<$in>) {
print $out $_;
last if /Pointer-rk/;
}
# insert contents of figure (with substitutions)
while (<$fig>) {
s/intime_mean_pu.pdf/$f/;
s/TitleFrame/$name/;
print $out $_;
}
# copy rest of file
print $out $_ while <$in>;
rename 'tmp', $file_arg;
++$count;
}
Use the script like perl script.pl "$1.tex".
You're missing the -e in the second perl call

Find file which content not match a string pattern in Perl

I'm writing a code to find the file which not contain a string pattern. Provided I have a list of files, I have to look into the content of each file, I would like to get the file name if the string pattern "clean" not appear inside the file. Pls help.
Here is the scenario:
I have a list of files, inside each file is having numerous of lines. If the file is clean, it will have the "clean" wording. But if the file is dirty, the "clean" wording not exist and there will be no clear indication to tell the file is dirty. So as long as inside each file, if the "clean" wording is not detect, I'll category it as dirty file and I would like to trace the file name
You can use a simple one-liner:
perl -0777 -nlwE 'say $ARGV if !/clean/i' *.txt
Slurping the file with -0777, making the regex check against the entire file. If the match is not found, we print the file name.
For perl versions lower than 5.10 that do not support -E you can substitute -E with -e and say $ARGV with print "$ARGV".
perl -0777 -nlwe 'print "$ARGV\n" if !/clean/i' *.txt
If you need to generate the list within Perl, the File::Finder module will make life easy.
Untested, but should work:
use File::Finder;
my #wanted = File::Finder # finds all ..
->type( 'f' ) # .. files ..
->name( '*.txt' ) # .. ending in .txt ..
->in( '.' ) # .. in current dir ..
->not # .. that do not ..
->contains( qr/clean/ ); # .. contain "clean"
print $_, "\n" for #wanted;
Neat stuff!
EDIT:
Now that I have a clearer picture of the problem, I don't think any module is necessary here:
use strict;
use warnings;
my #files = glob '*.txt'; # Dirty & clean laundry
my #dirty;
foreach my $file ( #files ) { # For each file ...
local $/ = undef; # Slurps the file in
open my $fh, $file or die $!;
unless ( <$fh> =~ /clean/ ) { # if the file isn't clean ..
push #dirty, $file; # .. it's dirty
}
close $fh;
}
print $_, "\n" for #dirty; # Dirty laundry list
Once you get the mechanics, this can be simplified a la grep, etc.
One way like this:
ls *.txt | grep -v "$(grep -l clean *.txt)"
#!/usr/bin/perl
use strict;
use warnings;
open(FILE,"<file_list_file>");
while(<FILE>)
{
my $flag=0;
my $filename=$_;
open(TMPFILE,"$_");
while(<TMPFILE>)
{
$flag=1 if(/<your_string>/);
}
close(TMPFILE);
if(!$flag)
{
print $filename;
}
}
close(FILE);

One-liner Perl command to rename files

So far this one-liner is stripping off one line and renaming the file, but I need help to alter it so that it strips that line I am looking for Data for and remove the old file extension .csv instead of adding to it. (.csv.out). I am not sure if this can be done with one-liner.
Instead it's adding on the the extension filename.csv.out
Example
test_20110824.csv.out
One-liner:
find -type f -name '*.csv' -exec perl -i.out -wlne '/^Data for/ or print' {} \;
I want to replace the extension:
test_20110824.out
perl -MFile::Copy -we 'for (glob "*.csv") { my ($name) = /^(.+).csv/i; move($_, $name . ".out"); }'
To remove the header matching Data for:
perl -MFile::Copy -MTie::File -wE 'for (glob '*x.csv') { tie my #file,
"Tie::File", $_ or die $!; shift #file if $file[0] =~ /^Data for/;
untie #file; my ($name) = /^(.*).csv/i; move($_, $name . ".out"); }'
But then it's really not a one-liner anymore...
use strict;
use warnings;
use Tie::File;
use File::Copy;
use autodie;
for (#ARGV) {
tie my #file, "Tie::File", $_;
shift #file if $file[0] =~ /^Data for/;
untie #file;
my ($name) = /^(.*).csv/i;
move($_, $name . ".out");
}
And use with:
$ script.pl *.csv
A simple Bash shell script will suffice
(shopt -s failglob; for i in *.csv.out; do echo mv $i ${i%csv.out}out; done)
The shopt -s failglob is needed to ensure that if there are no matches the command will fail instead of trying to rename *.csv.out to *.out. The construct ${i%csv.out}out removes a trailing csv.out and replaces it with just out.
As I have coded it here, this will just echo the commands it would execute. When you're satisfied it does what you want, remove the word echo.