File editing in perl - perl

Hi I am trying to delete a content of file based on regex match. Here is the following code:
my $file = "Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my #content = ();
open (INFILE, $file) || die "error2: $!";
while (<INFILE>)
{
chomp;
if ($_ =~ /\s*3374_Cioin/)
{#capture the query sequence
#content = $_;
print #content;
}
}
Sample data is:
===================================================================
Query= 3374_Cioin
(24,267 letters)
Database: /home/aprasanna/BLAST/DMel_renamedfile.fasta
14,047 sequences; 7,593,731 total letters
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= 578_Antlo
(88 letters)
=========================================================
I wish to remove from Query =3374_Coin... till -3402. i.e till next record separator. I am able to store the matched part in #content. However, I am not able to delete it in the original file. I wish my original file only has Query= 578_Antlo!
I am very new to Perl.

The easiest way is to simply write all lines you do want into some other file.
I would suggest something like:
my $file = "Cioin_PatchAnalysis.txt";
my $outfile = "Fixed_Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my #content = ();
open (INFILE, $file) || die "error2: $!";
open(my $outfile, '>', $outfile) or die "Could not open file '$outfile' $!";
while (<INFILE>)
{
chomp;
if ($_ !~ /\s*3374_Cioin/)
{#capture the query sequence
#content = $_;
print $outfile #content;
}
}
Than you can replace the original with the new file.
Another option is to keep all the lines that doesn't match the regex, than print them back into the original file:
my $file = "Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my #content = ();
open (INFILE, $file) || die "error2: $!";
while (<INFILE>)
{
chomp;
if ($_ !~ /\s*3374_Cioin/)
{#capture the query sequence
push #content, $_;
}
}
open(my $outfile, '>', $file) or die "Could not open file '$outfile' $!";
print $outfile #content;

Related

How delete extension in a perl script

A silly question, but it make me crazy!
I have a script to remove "fat" data in a bvh file. (Biovision mocap file..).
Works fine, but it create a double extension... (The name.bvh.bvh)
I just only need one extension (*.bvh, not the *.bvh.bvh!!!)
Here is the code:
#files = <*.bvh>;
foreach $file (#files) {
open (OLD, $file) || die "Couldn´t open $file: $!\n";
open (NEW, ">$file.bvh") || die "Couldn´t open $file.bvh: $!\n";
while (<OLD>) {
$line = $_;
if (/Normal/) { while (<OLD>) { last if /}/; } $line => ""; }
if (/normalIndex/) { while(<OLD>) { last if /[]]/; } $line = ""; }
$line =~ s/[-+]?[0-9]\.[0-9]+e[+-][0-9]+/0/g;
$line =~ s/([-+]?[0-9]+\.[0-9]{2})[0-9]+/$1/g;
$line =~ s/0\.00/0/g;
$line =~ s/[ ]+/ /g;
$line =~ s/[\t]+/ /g;
$line =~ s/^ //g;
print NEW $line;
}
close OLD;
unlink ($file);
close NEW;
}
Any help? Thanks!
#files = <*.bvh>;
This gives you a list of the files that have a .bvh extension. So it will end up containing something like:
('foo.bvh', 'bar.bvh', 'baz.bvh')
Then you walk the array with this code:
foreach $file (#files) {
...
}
Each time round this loop, $file will contain a value from your array. One the first iteration, for example, $file will contain foo.bvh.
Then you open input and output files:
open (OLD, $file) || die "Couldn´t open $file: $!\n";
open (NEW, ">$file.bvh") || die "Couldn´t open $file.bvh: $!\n";
As $file contains foo.bvh, your new file (which is created using the name "$file.bvh") will be called foo.bvh.bvh.
The naive fix would be to remove the .bvh from the open() statement:
# DON'T DO THIS
open (OLD, $file) || die "Couldn´t open $file: $!\n";
open (NEW, ">$file") || die "Couldn´t open $file.bvh: $!\n";
This won't work as your old and new files will now have the same name and when you open your new file for writing, it will truncate the file and remove all of your input data.
You'll need to rename a file at some point. And the easiest approach is to keep your existing filenames and once you have processed each file, rename it to the original name.
# And then at the end of your loop
# Note that as you're copying the new file over the old one,
# there's no need to delete the old one.
close OLD;
close NEW;
rename("$file.bvh", $file);
Just insert rename("$file.new", $file);
as shown in your code:
#files = <*.bvh>;
foreach $file (#files) {
open (OLD, $file) || die "Couldn´t open $file: $!\n";
open (NEW, ">$file.new") || die "Couldn´t open $file.new: $!\n";
while (<OLD>) {
$line = $_;
if (/Normal/) { while (<OLD>) { last if /}/; } $line => ""; }
if (/normalIndex/) { while(<OLD>) { last if /[]]/; } $line = ""; }
$line =~ s/[-+]?[0-9]\.[0-9]+e[+-][0-9]+/0/g;
$line =~ s/([-+]?[0-9]+\.[0-9]{2})[0-9]+/$1/g;
$line =~ s/0\.00/0/g;
$line =~ s/[ ]+/ /g;
$line =~ s/[\t]+/ /g;
$line =~ s/^ //g;
print NEW $line;
}
close OLD;
unlink ($file);
close NEW;
rename("$file.new", $file);
}
I am sure this is not the way however you can get the output.
Use this Module for renaming function
use File::Copy;
#files = qw(first.bhv second.bhv third.bhv);
foreach $file (#files) {
Just remove the extension on the original file
(my $rem_ext = $file)=~s/\.bhv$//;
rename the file without extension
rename($file, $rem_ext);
Just read the without extension file
open (OLD, $rem_ext) || die "Couldn´t open $rem_ext: $!\n";
open (NEW, ">$file") || die "Couldn´t open $file.bvh: $!\n";
while (<OLD>) {
...
}
close OLD;
Delete the files without extension which is created by ourself.
unlink($rem_ext);
close NEW;
}
It will works

Combining two csv files together in perl

Hi i'm very new to perl and i've got litle knowledge on it but i'm trying to create a script that conbines two .csv files into a new one
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV_XS;
my #rows;
{ # Read the CSV file
my $csv = Text::CSV_XS->new() or die "Cannot use Text::CSV_XS ($!)";
my $file = "file.csv";
open my $fh, '<', $file or die "Cannot open $file ($!)";
while (my $row = $csv->getline($fh)) {
push #rows, $row;
}
$csv->eof or $csv->error_diag();
close $fh or die "Failed to close $file ($!)";
}
{ # Gather the data
foreach my $row (#rows) {
foreach my $col (#{$row}) {
$col = uc($col);
}
print "\n";
}
}
# (over)Write the data
# Needs to be changed to ADD data
{
my $csv = Text::CSV_XS->new({ binary => 1, escape_char => undef })
or die "Cannot use Text::CSV ($!)";
my $file = "output.csv";
open my $fh, '>', $file or die "Cannot open $file ($!)";
$csv->eol("\n");
foreach my $row (#rows) {
$csv->print($fh, \#{$row}) or die "Failed to write $file ($!)";
}
close $fh or die "Failed to close $file ($!)";
}
this is my current code i do know this over write's the data insted of actually adding it to the new file but this is how far i managed to get with the limited time and knowledge i've got on perl
the csv format of both files are the same.
"Header1";"Header2";"Header3";"Header4";"Header5"
"Data1";"Data2";"Data3";"Data4";"Data5"
"Data1";"Data2";"Data3";"Data4";"Data5"
"Data1";"Data2";"Data3";"Data4";"Data5"
"Data1";"Data2";"Data3";"Data4";"Data5"
"Data1";"Data2";"Data3";"Data4";"Data5"
I believe the issue is here:
open my $fh, '>', $file
or die "Cannot open $file ($!)";
If I remember my Perl properly, the line should read:
open my $fh, '>>', $file
or die "Cannot open $file ($!)";
The >> should open the file handle $fh for append instead of for overwrite.
you could try something like this
opendir(hand,"DIRPATH");
#files = readdir(hand);
closedir(hand);
foreach(#files){
if(/\.csv$/i) { #if the filename has .csv at the end
push(#csvfiles,$_);
}
}
foreach(#csvfiles) {
$csvfile=$_;
open(hanr,"DIRPATH".$csvfile)or die"error $!\n"; #read handler
open(hanw , ">>DIRPATH"."outputfile.csv") or die"error $! \n"; #write handler for creating new sorted files
#lines=();
#lines=<hanr>;
foreach $line (#lines){
chomp $line;
$count++;
next unless $count; # skip header i.e the first line containing stock details
print hanw join $line,"\n";
}
$count= -1;
close(hanw);
close(hanr);
}`

Find a specific word in a file

First, I want to search for a particular file in the directory and then in the file I need to search for a specific word. Here's what I have so far:
$show_bp = 'ShowBuildProcess';
$get_bs = 'GetBuildStatus';
opendir (DIR, $my_dir) or die $!;
while(my $file = readdir(DIR))
{
if($file=~/\.log/)
{
if($file=~/GetBuildStatus/)
{
Filenames will be like GetStatus.<number>.log, e.g. GetStatus.123456.log. I need to:
Find all .log files in the directory
Search for a file with filename starting with GetStatus
Search for filename with the lower numeric part
Search for a particular word in that file
Here is a possible solution for you:
First we look at the file pattern and also extract $1,which is the first regex match )in the brackets). If the file fits we open it and look through it line by line and look for a match to your /YourSearchPattern/:
#!/usr/bin/perl
use warnings;
use strict;
my $mydir = './test/';
opendir (DIR, $mydir) or die $!;
while(my $file = readdir(DIR)){
if ($file =~ /^GetStatus\.(\d+)\.log$/){
if ($1 >= 123456 || $1 < 345678){
open(my $fh,'<', $mydir . $file) or die "Cannot open file $file: $!\n";
while (<$fh>){
if ($_ =~ /YourSearchPattern/){
print $_;
}
}
close($fh);
}
}
}
When you look for the smallest sequence number of the files from your dir you can simply store them in an array and then sort them after those numbers:
...
opendir (DIR, $mydir) or die $!;
my #files;
while(my $file = readdir(DIR)){
if ($file =~ /^GetStatus\.(\d+)\.log$/){
push #files $file;
}
}
my #sortedfiles = sort { my ($anum,$bnum); $a =~ /^GetStatus\.(\d+)\.log$/; $anum = $1; $b =~ /^GetStatus\.(\d+)\.log$/; $bnum = $1; $anum <=> $bnum } #files;
print $sortedfiles[0] . " has the smallest sequence number!\n";

copy text after a specific string from a file and append to another in perl

I want to extract the desired information from a file and append it into another. the first file consists of some lines as the header without a specific pattern and just ends with the "END OF HEADER" string. I wrote the following code for find the matching line for end of the header:
$find = "END OF HEADER";
open FILEHANDLE, $filename_path;
while (<FILEHANDLE>) {
my $line = $_;
if ($line =~ /$find/) {
#??? what shall I do here???
}
}
, but I don't know how can I get the rest of the file and append it to the other file.
Thank you for any help
I guess if the content of the file isn't enormous you can just load the whole file in a scalar and just split it with the "END OF HEADER" then print the output of the right side of the split in the new file (appending)
open READHANDLE, 'readfile.txt' or die $!;
my $content = do { local $/; <READHANDLE> };
close READHANDLE;
my (undef,$restcontent) = split(/END OF HEADER/,$content);
open WRITEHANDLE, '>>writefile.txt' or die $!;
print WRITEHANDLE $restcontent;
close WRITEHANDLE;
This code will take the filenames from the command line, print all files up to END OF HEADER from the first file, followed by all lines from the second file. Note that the output is sent to STDOUT so you will have to redirect the output, like this:
perl program.pl headfile.txt mainfile.txt > newfile.txt
Update Now modified to print all of the first file after the line END OF HEADER followed by all of the second file
use strict;
use warnings;
my ($header_file, $main_file) = #ARGV;
open my $fh, '<', $header_file or die $!;
my $print;
while (<$fh>) {
print if $print;
$print ||= /END OF HEADER/;
}
open $fh, '<', $main_file or die $!;
print while <$fh>;
use strict;
use warnings;
use File::Slurp;
my #lines = read_file('readfile.txt');
while ( my $line = shift #lines) {
next unless ($line =~ m/END OF HEADER/);
last;
}
append_file('writefile.txt', #lines);
I believe this will do what you need:
use strict;
use warnings;
my $find = 'END OF HEADER';
my $fileContents;
{
local $/;
open my $fh_read, '<', 'theFile.txt' or die $!;
$fileContents = <$fh_read>;
}
my ($restOfFile) = $fileContents =~ /$find(.+)/s;
open my $fh_write, '>>', 'theFileToAppend.txt' or die $!;
print $fh_write $restOfFile;
close $fh_write;
my $status = 0;
my $find = "END OF HEADER";
open my $fh_write, '>', $file_write
or die "Can't open file $file_write $!";
open my $fh_read, '<', $file_read
or die "Can't open file $file_read $!";
LINE:
while (my $line = <$fh_read>) {
if ($line =~ /$find/) {
$status = 1;
next LINE;
}
print $fh_write $line if $status;
}
close $fh_read;
close $fh_write;

merging two files using perl keeping the copy of original file in other file

I have to files like A.ini and B.ini ,I want to merge both the files in A.ini
examples of files:
A.ini::
a=123
b=xyx
c=434
B.ini contains:
a=abc
m=shank
n=paul
my output in files A.ini should be like
a=123abc
b=xyx
c=434
m=shank
n=paul
I want to this merging to be done in perl language and I want to keep the copy of old A.ini file at some other place to use old copy
A command line variant:
perl -lne '
($a, $b) = split /=/;
$v{$a} = $v{$a} ? $v{$a} . $b : $_;
END {
print $v{$_} for sort keys %v
}' A.ini B.ini >NEW.ini
How about:
#!/usr/bin/perl
use strict;
use warnings;
my %out;
my $file = 'path/to/A.ini';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
while(<$fh>) {
chomp;
my ($key, $val) = split /=/;
$out{$key} = $val;
}
close $fh;
$file = 'path/to/B.ini';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
while(<$fh>) {
chomp;
my ($key, $val) = split /=/;
if (exists $out{$key}) {
$out{$key} .= $val;
} else {
$out{$key} = $val;
}
}
close $fh;
$file = 'path/to/A.ini';
open my $fh, '>', $file or die "unable to open '$file' for writing: $!";
foreach(keys %out) {
print $fh $_,'=',$out{$_},"\n";
}
close $fh;
The two files to be merged can be read in a single pass and don't need to be treated as separate source files. That allows the use of <> to read all files passed as parameters on the command line.
Keeping a backup copy of A.ini is simply a matter of renaming it before writing the merged data to a new file of the same name.
This program appears to do what you need.
use strict;
use warnings;
my $file_a = $ARGV[0];
my (#keys, %values);
while (<>) {
if (/\A\s*(.+?)\s*=\s*(.+?)\s*\z/) {
push #keys, $1 unless exists $values{$1};
$values{$1} .= $2;
}
}
rename $file_a, "$file_a.bak" or die qq(Unable to rename "$file_a": $!);
open my $fh, '>', $file_a or die qq(Unable to open "$file_a" for output: $!);
printf $fh "%s=%s\n", $_, $values{$_} for #keys;
output (in A.ini)
a=123abc
b=xyx
c=434
m=shank
n=paul