Find a specific word in a file - perl

First, I want to search for a particular file in the directory and then in the file I need to search for a specific word. Here's what I have so far:
$show_bp = 'ShowBuildProcess';
$get_bs = 'GetBuildStatus';
opendir (DIR, $my_dir) or die $!;
while(my $file = readdir(DIR))
{
if($file=~/\.log/)
{
if($file=~/GetBuildStatus/)
{
Filenames will be like GetStatus.<number>.log, e.g. GetStatus.123456.log. I need to:
Find all .log files in the directory
Search for a file with filename starting with GetStatus
Search for filename with the lower numeric part
Search for a particular word in that file

Here is a possible solution for you:
First we look at the file pattern and also extract $1,which is the first regex match )in the brackets). If the file fits we open it and look through it line by line and look for a match to your /YourSearchPattern/:
#!/usr/bin/perl
use warnings;
use strict;
my $mydir = './test/';
opendir (DIR, $mydir) or die $!;
while(my $file = readdir(DIR)){
if ($file =~ /^GetStatus\.(\d+)\.log$/){
if ($1 >= 123456 || $1 < 345678){
open(my $fh,'<', $mydir . $file) or die "Cannot open file $file: $!\n";
while (<$fh>){
if ($_ =~ /YourSearchPattern/){
print $_;
}
}
close($fh);
}
}
}
When you look for the smallest sequence number of the files from your dir you can simply store them in an array and then sort them after those numbers:
...
opendir (DIR, $mydir) or die $!;
my #files;
while(my $file = readdir(DIR)){
if ($file =~ /^GetStatus\.(\d+)\.log$/){
push #files $file;
}
}
my #sortedfiles = sort { my ($anum,$bnum); $a =~ /^GetStatus\.(\d+)\.log$/; $anum = $1; $b =~ /^GetStatus\.(\d+)\.log$/; $bnum = $1; $anum <=> $bnum } #files;
print $sortedfiles[0] . " has the smallest sequence number!\n";

Related

To parse multiple files in Perl

Please correct my code, I cannot seem to open my file to parse.
The error is this line open(my $fh, $file) or die "Cannot open file, $!";
Cannot open file, No such file or directory at ./sample.pl line 28.
use strict;
my $dir = $ARGV[0];
my $dp_dpd = $ENV{'DP_DPD'};
my $log_dir = $ENV{'DP_LOG'};
my $xmlFlag = 0;
my #fileList = "";
my #not_proc_dir = `find $dp_dpd -type d -name "NotProcessed"`;
#print "#not_proc_dir\n";
foreach my $dir (#not_proc_dir) {
chomp ($dir);
#print "$dir\n";
opendir (DIR, $dir) or die "Couldn't open directory, $!";
while ( my $file = readdir DIR) {
next if $file =~ /^\.\.?$/;
next if (-d $file);
# print "$file\n";
next if $file eq "." or $file eq "..";
if ($file =~ /.xml$/ig) {
$xmlFlag = 1;
print "$file\n";
open(my $fh, $file) or die "Cannot open file, $!";
#fileList = <$fh>;
close $file;
}
}
closedir DIR;
}
Quoting readdir's documentation:
If you're planning to filetest the return values out of a readdir, you'd better prepend the directory in question. Otherwise, because we didn't chdir there, it would have been testing the wrong file.
Your open(my $fh, $file) should therefore be open my $fh, '<', "$dir/$file" (note how I also added '<' as well: you should always use 3-argument open).
Your next if (-d $file); is also wrong and should be next if -d "$dir/$file";
Some additional remarks on your code:
always add use warnings to your script (in addition to use strict, which you already have)
use lexical file/directory handle rather than global ones. That is, do opendir my $DH, $dir, rather than opendir DH, $dir.
properly indent your code (if ($file =~ /.xml$/ig) { is one level too deep; it makes it harder to read you code)
next if $file =~ /^\.\.?$/; and next if $file eq "." or $file eq ".."; are redundant (even though not technically equivalent); I'd suggest using only the latter.
the variable $dir defined in my $dir = $ARGV[0]; is never used.

File editing in perl

Hi I am trying to delete a content of file based on regex match. Here is the following code:
my $file = "Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my #content = ();
open (INFILE, $file) || die "error2: $!";
while (<INFILE>)
{
chomp;
if ($_ =~ /\s*3374_Cioin/)
{#capture the query sequence
#content = $_;
print #content;
}
}
Sample data is:
===================================================================
Query= 3374_Cioin
(24,267 letters)
Database: /home/aprasanna/BLAST/DMel_renamedfile.fasta
14,047 sequences; 7,593,731 total letters
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= 578_Antlo
(88 letters)
=========================================================
I wish to remove from Query =3374_Coin... till -3402. i.e till next record separator. I am able to store the matched part in #content. However, I am not able to delete it in the original file. I wish my original file only has Query= 578_Antlo!
I am very new to Perl.
The easiest way is to simply write all lines you do want into some other file.
I would suggest something like:
my $file = "Cioin_PatchAnalysis.txt";
my $outfile = "Fixed_Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my #content = ();
open (INFILE, $file) || die "error2: $!";
open(my $outfile, '>', $outfile) or die "Could not open file '$outfile' $!";
while (<INFILE>)
{
chomp;
if ($_ !~ /\s*3374_Cioin/)
{#capture the query sequence
#content = $_;
print $outfile #content;
}
}
Than you can replace the original with the new file.
Another option is to keep all the lines that doesn't match the regex, than print them back into the original file:
my $file = "Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my #content = ();
open (INFILE, $file) || die "error2: $!";
while (<INFILE>)
{
chomp;
if ($_ !~ /\s*3374_Cioin/)
{#capture the query sequence
push #content, $_;
}
}
open(my $outfile, '>', $file) or die "Could not open file '$outfile' $!";
print $outfile #content;

perl + read multiple csv files + manipulate files + provide output_files

Apologies if this is a bit long winded, bu i really appreciate an answer here as i am having difficulty getting this to work.
Building on from this question here, i have this script that works on a csv file(orig.csv) and provides a csv file that i want(format.csv). What I want is to make this more generic and accept any number of '.csv' files and provide a 'output_csv' for each inputed file. Can anyone help?
#!/usr/bin/perl
use strict;
use warnings;
open my $orig_fh, '<', 'orig.csv' or die $!;
open my $format_fh, '>', 'format.csv' or die $!;
print $format_fh scalar <$orig_fh>; # Copy header line
my %data;
my #labels;
while (<$orig_fh>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
print $format_fh join(',', #{ $data{$label} }), "\n";
}
i was hoping to use this script from here but am having great difficulty putting the 2 together:
#!/usr/bin/perl
use strict;
use warnings;
#If you want to open a new output file for every input file
#Do it in your loop, not here.
#my $outfile = "KAC.pdb";
#open( my $fh, '>>', $outfile );
opendir( DIR, "/data/tmp" ) or die "$!";
my #files = readdir(DIR);
closedir DIR;
foreach my $file (#files) {
open( FH, "/data/tmp/$file" ) or die "$!";
my $outfile = "output_$file"; #Add a prefix (anything, doesn't have to say 'output')
open(my $fh, '>', $outfile);
while (<FH>) {
my ($line) = $_;
chomp($line);
if ( $line =~ m/KAC 50/ ) {
print $fh $_;
}
}
close($fh);
}
the script reads all the files in the directory and finds the line with this string 'KAC 50' and then appends that line to an output_$file for that inputfile. so there will be 1 output_$file for every inputfile that is read
issues with this script that I have noted and was looking to fix:
- it reads the '.' and '..' files in the directory and produces a
'output_.' and 'output_..' file
- it will also do the same with this script file.
I was also trying to make it dynamic by getting this script to work in any directory it is run in by adding this code:
use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";
and
opendir( DIR, $path ) or die "$!"; # open the current directory
open( FH, "$path/$file" ) or die "$!"; #open the file
**EDIT::I have tried combining the versions but am getting errors.Advise greatly appreciated*
UserName#wabcl13 ~/Perl
$ perl formatfile_QforStackOverflow.pl
Parentheses missing around "my" list at formatfile_QforStackOverflow.pl line 13.
source dir -> /home/UserName/Perl
Can't use string ("/home/UserName/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 28.
combined code::
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print $format_file scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
How do you plan on inputting the list of files to process and their preferred output destination? Maybe just have a fixed directory that you want to process all the cvs files, and prefix the result.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $source_dir = '/some/dir/with/cvs/files';
my $output_prefix = 'format_';
opendir my $dh, $source_dir;
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
.... old processing code here ...
}
Alternatively, you could just have an output directory instead of prefixing the files. Either way, this should get you on your way.

Perl - search and replace across multiple lines across multiple files in specified directory

At the moment this code replaces all occurences of my matching string with my replacement string, but only for the file I specify on the command line. Is there a way to change this so that all .txt files for example, in the same directory (the directory I specify) are processed without having to run this 100s of times on individual files?
#!/usr/bin/perl
use warnings;
my $filename = $ARGV[0];
open(INFILE, "<", $filename) or die "Cannot open $ARGV[0]";
my(#fcont) = <INFILE>;
close INFILE;
open(FOUT,">$filename") || die("Cannot Open File");
foreach $line (#fcont) {
$line =~ s/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/gm;
print FOUT $line;
}
close INFILE;
I have also tried this:
perl -p0007i -e 's/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/m' *.txt
But have noticed that is only changes the first occurence of the matched pattern and ignores all the rest in the file.
I also have tried this, but it doesn't work in the sense that it just creates a blank file:
use v5.14;
use strict;
use warnings;
use DBI;
my $source_dir = "C:/Testing2";
# Store the handle in a variable.
opendir my $dirh, $source_dir or die "Unable to open directory: $!";
my #files = grep /\.txt$/i, readdir $dirh;
closedir $dirh;
# Stop script if there aren't any files in the list
die "No files found in $source_dir" unless #files;
foreach my $file (#files) {
say "Processing $source_dir/$file";
open my $in, '<', "$source_dir/$file" or die "Unable to open $source_dir/$file: $!\n";
open(FOUT,">$source_dir/$file") || die("Cannot Open File");
foreach my $line (#files) {
$line =~ s/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/gm;
print FOUT $line;
}
close $in;
}
say "Status: Processing of complete";
Just wondering what am I missing from my code above? Thanks.
You could try the following:
opendir(DIR,"your_directory");
my #all_files = readdir(DIR);
closedir(DIR);
for (#all_files) .....

Perl Open: No such file or directory

I'm trying to read every text file in a directory into a variable then print the first 100 characters, including line breaks. However, Perl says that the files don't exist even though they really do exist.
use strict;
use warnings;
my $dir = "C:\\SomeFiles";
my #flist;
open(my $fh, "dir /a:-d /b $dir |") || die "$!";
while (<$fh>) {
if ($_ =~ /.*(.txt)$/i) {
push(#flist, $_);
}
}
foreach my $f (#flist) {
print "$dir\\$f";
my $txt = do {
local $/ = undef;
open(my $ff, "<", "$dir\\$f") || die "$!";
<$ff>;
};
print substr($txt, 0, 100);
}
When I run the script, the following is written to the console:
C:\SomeFiles\file1.txt
No such file or directory at script.pl line 19, <$fh> chunk 10.
It's looking at the right file and I'm certain that the file exists. When I try using this method to open a single file rather than getting each file via an array with foreach, it works just fine. Is there something obvious that I've overlooked here?
A better solution is to use readdir() instead (or File::Find if you ever want to do it recursively):
my $dir = "C:\\SomeFiles";
opendir(my $dh, $dir) || die "$!";
while (my $file = readdir($dh)) {
if ($file =~ /\\.txt$/i) {
print $file . "\n";
my $txt = do {
local $/ = undef;
open(my $ff, "<", "$dir\\$file") || die "$!";
<$ff>;
};
print substr($txt, 0, 100) . "\n";
}
}
closedir($dh);