I have some files in a directory like:
A.txt
B.txt
C.txt
I want to print them with new suffix:
#!/user/bin/perl
my #dir_list_initial = `ls *.txt | sed 's/.txt//g' `;
my $suffix = ".dat";
for (my $i=0; $i<=$#dir_list_initial; $i++){
print $dir_list_initial[$i] . $suffix;
}
I expect to print
A.dat
B.dat
C.dat
but it prints
.datA
.datB
.datC
You might try,
chomp(#dir_list_initial);
and later
print $dir_list_initial[$i] . "$suffix\n";
as every element of #dir_list_initial array has newline at the end of string.
Even better it would be to skip shell altogether, and use only perl,
my #dir_list_initial = map { s|\.txt||; $_ } glob("*.txt");
You're doing half of your program by running ls. Don't shell out when you can use builtin Perl mechanisms to do the same job The glob can do everything you're doing with ls and you don't have to depend upon the ls command (what if this is Windows?).
Also, always use strict; and use warnings; in your program, it can save you a ton of grief by picking up common mistakes.
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);
for my $file ( glob "*.txt" ) {
chomp $file;
$file =~ s/\.txt$/.dat/;
say $file;
}
Note I'm using s/.../.../ to substitute one suffix with another. You need to learn regular expressions. Your s/.txt// is not correct. The . can stand for any character, and you don't specify that .txt has to be on the end of the string. If my file was called ftxtback.txt, you'd change the file name to .datback.txt which is not what you want.
there's no need for calling sed here, perl can handle that well:
my #dir_list_initial = `ls *.txt`;
chomp #dir_list_initial;
my $suffix = ".dat";
foreach (#dir_list_inital){
s/\.txt$/$suffix/; # this alters #dir_list_initial
print "$_\n";
}
if you have a perl more recent than 5.14, you also can do this:
use 5.014;
my #dir_list_initial = `ls *.txt`;
chomp #dir_list_initial;
my $suffix = ".dat";
say s/\.txt$/$suffix/r foreach #dir_list_initial; # this doesn't alter the array.
and, as mpapec already has pointed out, it would be better to not involve the shell at all:
say s/\.txt$/.dat/r foreach <*txt>
Related
I am working on my project where the GNU Makefile should automatically test my Perl program with different input files. I have this code which reads only one file from inputs directory, searches stop words and outputs out.txt file with frequency table of non-stop words.
#!/usr/bin/perl
use strict;
use warnings;
use Lingua::StopWords qw(getStopWords);
my %found;
my $src = '/programu-testavimas/1-dk/trunk/tests/inputs/test.txt';
my $des = '/programu-testavimas/1-dk/trunk/tests/outputs/out.txt';
open(SRC,'<',$src) or die $!;
open(DES,'>',$des) or die $!;
my $stopwords = getStopWords('en');
while( my $line = <SRC>){
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print DES $_, "\t\t", $found{$_}, $/ for sort keys %found;
close(SRC);
close(DES);
My goal is to test many files with separate case.sh scripts where the input files should be different in each case, this is one of the case:
#!/bin/sh
perl /programu-testavimas/1-dk/trunk/scripts/test.pl /programu-testavimas/1-dk/trunk/tests/inputs/test.txt > /home/aleksandra/programų-testavimas/1-dk/trunk/tests/outputs/out.txt
Then, my Makefile at once should test program with different inputs in each case. So, right now I'm struggling with my Perl code where my input file is only one individual and I need to make it read different files in inputs directory. How can I change path correctly that bash scripts could have each case with individual input file?
EDIT: I tried this with glob function but it outputs empty file
open(DES,'>',$des) or die $!;
my $stopwords = getStopWords('en');
for my $file ( glob $src ) {
open(SRC,'<',$file) or die "$! opening $file";
while( my $line = <SRC>){
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print DES $_, "\t\t", $found{$_}, $/ for sort keys %found;
close(SRC);
}
close(DES);
Correct me if I'm wrong, but to me it sounds like you have different shell scripts, each calling your perl script with a different input, and redirecting your perl's script output to a new file.
You don't need to glob anything in your perl script. It already has all the information it needs: which file to read. Your shell script/Makefile is handling the rest.
So given the shell script
#!/bin/sh
perl /path/to/test.pl /path/to/input.txt > /path/to/output.txt
Then in your perl script, simply read from the file provided via the first positional parameter:
#!/usr/bin/perl
use strict;
use warnings;
use Lingua::StopWords qw(getStopWords);
my %found;
my $stopwords = getStopWords('en');
while(my $line = <>) {
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print $_, "\t\t", $found{$_}, $/ for sort keys %found;
while(<>) will read from STDIN or from ARGV.
Your shell script could then call your perl script with different inputs and define outputs:
#!/bin/sh
for input in /path/to/*.txt; do
perl /path/to/test.pl "$input" > "$input.out"
done
I am writing a script for replacing 2 words from a text file. The script is
count=1
for f in *.pdf
do
filename="$(basename $f)"
filename="${filename%.*}"
filename="${filename//_/ }"
echo $filename
echo $f
perl -pe 's/intime_mean_pu.pdf/'$f'/' fig.tex > fig_$count.tex
perl -pi 's/TitleFrame/'$filename'/' fig_$count.tex
sed -i '/Pointer-rk/r fig_'$count'.tex' $1.tex
count=$((count+1))
done
But the replacing of words using the second perl command is giving error:
Can't open perl script "s/TitleFrame/Masses1/": No such file or directory
Please suggest what I am doing wrong.
You could change your script to something like this:
#!/bin/bash
for f in *.pdf; do
filename=$(basename "$f" .pdf)
filename=${filename//_/}
perl -spe 's/intime_mean_pu.pdf/$a/;
s/TitleFrame/$b/' < fig.tex -- -a="$f" -b="$filename" > "fig_$count.tex"
sed -i "/Pointer-rk/r fig_$count.tex" "$1.tex"
((++count))
done
As well as some other minor changes to your script, I have made use of the -s switch to Perl, which means that you can pass arguments to the one-liner. The bash variables have been double quoted to avoid problems with spaces in filenames, etc.
Alternatively, you could do the whole thing in Perl:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use File::Basename;
my $file_arg = shift;
my $count = 1;
for my $f (glob "*.pdf") {
my $name = fileparse($f, qq(.pdf));
open my $in, "<", $file_arg;
open my $out, ">", 'tmp';
open my $fig, "<", 'fig.tex';
# copy up to match
while (<$in>) {
print $out $_;
last if /Pointer-rk/;
}
# insert contents of figure (with substitutions)
while (<$fig>) {
s/intime_mean_pu.pdf/$f/;
s/TitleFrame/$name/;
print $out $_;
}
# copy rest of file
print $out $_ while <$in>;
rename 'tmp', $file_arg;
++$count;
}
Use the script like perl script.pl "$1.tex".
You're missing the -e in the second perl call
I tried to use some unix tools inside a perl driver script because I knew little about writing shell script. My purpose is to just combine a few simple unix commands together so I can run the script on 100 directories in one perl command.
The task is I have more than 100 folders, in each folder, there are n number of files. I want to do the same thing on each folder, which is to combine the files in them and sort the combined file and use bedtools to merge overlapping regions (quite common practice in bioinformatics)
Here is what I have:
#!/usr/bin/perl -w
use strict;
my $usage ="
This is a driver script to merge files in each folder into one combined file
";
die $usage unless #ARGV;
my ($in)=#ARGV;
open (IN,$in)|| die "cannot open $in";
my %hash;
my $final;
while(<IN>){
chomp;
my $tf = $_;
my #array =`ls $tf'/.'`;
my $tmp;
my $tmp2;
foreach my $i (#array){
$tmp = `cut -f 1-3 $tf'/'$i`;
$tmp2 = `cat $tmp`;
}
my $tmp3;
$tmp3=`sort -k1,1 -k2,2n $tmp2`;
$final = `bedtools merge -i $tmp3`;
}
print $final,"\n";
I know that this line isn't working at all..
$tmp2 = `cat $tmp`;
The issue is how to direct the output into another variable in perl and use that variable later on in another unix command...
Please let me know if you can point out where I can change to make it work. Greatly appreciated.
The output from backticks usually includes newlines, which usually have to be removed before using the output downstream. Add some chomp's to your code:
chomp( my #array =`ls $tf'/.'` );
my $tmp;
my $tmp2;
foreach my $i (#array){
chomp( $tmp = `cut -f 1-3 $tf'/'$i` );
chomp( $tmp2 = `cat $tmp` );
}
my $tmp3;
chomp( $tmp3=`sort -k1,1 -k2,2n $tmp2` );
$final = `bedtools merge -i $tmp3`;
To use a perl variable in the shell, this is an example :
#!/usr/bin/env perl
my $var = "/etc/passwd";
my $out = qx(file $var);
print "$out\n";
For the rest, it's very messy. You should take the time learning perl and not mixing coreutils commands and Perl, where perl itself is a better tool to do the whole joke.
OK. I gave it up on perl and decided to give it a try using shell script. It worked!!
Thanks for the above answers though!
for dir in `ls -d */`
do
name=$(basename $dir /)
cd $dir
for file in `ls`
do
cut -f 1-3 $file > $file.tmp
done
for x in `ls *tmp`
do
cat $x >> $name.tmp1
done
sort -k1,1 -k2,2n $name.tmp1 > $name.tmp2
bedtools merge -i $name.tmp2 > $name.combined
done
I'm writing a code to find the file which not contain a string pattern. Provided I have a list of files, I have to look into the content of each file, I would like to get the file name if the string pattern "clean" not appear inside the file. Pls help.
Here is the scenario:
I have a list of files, inside each file is having numerous of lines. If the file is clean, it will have the "clean" wording. But if the file is dirty, the "clean" wording not exist and there will be no clear indication to tell the file is dirty. So as long as inside each file, if the "clean" wording is not detect, I'll category it as dirty file and I would like to trace the file name
You can use a simple one-liner:
perl -0777 -nlwE 'say $ARGV if !/clean/i' *.txt
Slurping the file with -0777, making the regex check against the entire file. If the match is not found, we print the file name.
For perl versions lower than 5.10 that do not support -E you can substitute -E with -e and say $ARGV with print "$ARGV".
perl -0777 -nlwe 'print "$ARGV\n" if !/clean/i' *.txt
If you need to generate the list within Perl, the File::Finder module will make life easy.
Untested, but should work:
use File::Finder;
my #wanted = File::Finder # finds all ..
->type( 'f' ) # .. files ..
->name( '*.txt' ) # .. ending in .txt ..
->in( '.' ) # .. in current dir ..
->not # .. that do not ..
->contains( qr/clean/ ); # .. contain "clean"
print $_, "\n" for #wanted;
Neat stuff!
EDIT:
Now that I have a clearer picture of the problem, I don't think any module is necessary here:
use strict;
use warnings;
my #files = glob '*.txt'; # Dirty & clean laundry
my #dirty;
foreach my $file ( #files ) { # For each file ...
local $/ = undef; # Slurps the file in
open my $fh, $file or die $!;
unless ( <$fh> =~ /clean/ ) { # if the file isn't clean ..
push #dirty, $file; # .. it's dirty
}
close $fh;
}
print $_, "\n" for #dirty; # Dirty laundry list
Once you get the mechanics, this can be simplified a la grep, etc.
One way like this:
ls *.txt | grep -v "$(grep -l clean *.txt)"
#!/usr/bin/perl
use strict;
use warnings;
open(FILE,"<file_list_file>");
while(<FILE>)
{
my $flag=0;
my $filename=$_;
open(TMPFILE,"$_");
while(<TMPFILE>)
{
$flag=1 if(/<your_string>/);
}
close(TMPFILE);
if(!$flag)
{
print $filename;
}
}
close(FILE);
So far this one-liner is stripping off one line and renaming the file, but I need help to alter it so that it strips that line I am looking for Data for and remove the old file extension .csv instead of adding to it. (.csv.out). I am not sure if this can be done with one-liner.
Instead it's adding on the the extension filename.csv.out
Example
test_20110824.csv.out
One-liner:
find -type f -name '*.csv' -exec perl -i.out -wlne '/^Data for/ or print' {} \;
I want to replace the extension:
test_20110824.out
perl -MFile::Copy -we 'for (glob "*.csv") { my ($name) = /^(.+).csv/i; move($_, $name . ".out"); }'
To remove the header matching Data for:
perl -MFile::Copy -MTie::File -wE 'for (glob '*x.csv') { tie my #file,
"Tie::File", $_ or die $!; shift #file if $file[0] =~ /^Data for/;
untie #file; my ($name) = /^(.*).csv/i; move($_, $name . ".out"); }'
But then it's really not a one-liner anymore...
use strict;
use warnings;
use Tie::File;
use File::Copy;
use autodie;
for (#ARGV) {
tie my #file, "Tie::File", $_;
shift #file if $file[0] =~ /^Data for/;
untie #file;
my ($name) = /^(.*).csv/i;
move($_, $name . ".out");
}
And use with:
$ script.pl *.csv
A simple Bash shell script will suffice
(shopt -s failglob; for i in *.csv.out; do echo mv $i ${i%csv.out}out; done)
The shopt -s failglob is needed to ensure that if there are no matches the command will fail instead of trying to rename *.csv.out to *.out. The construct ${i%csv.out}out removes a trailing csv.out and replaces it with just out.
As I have coded it here, this will just echo the commands it would execute. When you're satisfied it does what you want, remove the word echo.