perl increasing the counter number every time the script running - perl

I have a script to compare 2 files and print out the matching lines on the file. what I want to add a logic to help me to identify for how long these devices are matched. currently I have add the starting point 1 so I want to increase that number every time the script run and matched.
Example.
inputfile:-########################
retiredDevice.txt
Alpha
Beta
Gamma
Delta
prodDevice.txt
first
second
third
forth
Gamma
Delta
output file :-#######################
final_result.txt
1 Delta
1 Gamma
my objective is to add a counter stamp on each matching line to identify for how long "Delta" and "Gamma" matched. the script running every week. so every time the script running adding 1 so when I audit the 'finalResult.txt. the result should looks like
Delta 4
Gamma 3
the result indicate me Delta matched for last 4 weeks and Gamma for last 3 weeks.
#! /usr/local/bin/perl
my $ndays = 1;
my $f1 = "/opt/retiredDevice.txt ";
my $f2 = "prodDevice.txt";
my $outfile = "/opt/final_result.txt";
my %results = ();
open FILE1, "$f1" or die "Could not open file: $! \n";
while(my $line = <FILE1>){ $results{$line}=1;
}
close(FILE1);
open FILE2, "$f2" or die "Could not open file: $! \n";
while(my $line =<FILE2>) {
$results{$line}++;
}
close(FILE2);
open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
foreach my $line (keys %results) {
my $x = $ndays;
$x++;
print OUTFILE "$x : ", $line if $results{$line} != 1;
}
close OUTFILE;
Thanks in advance for any help!

Based on your earlier question and comments, perhaps this might work.
use strict;
use warnings;
use autodie;
my $logfile = 'int.txt';
my $f1 = shift || "/opt/test.txt";
my $f2 = shift || "/opt/test1.txt";
my %results;
open my $file1, '<', $f1;
while (my $line = <$file1>) {
chomp $line;
$results{$line} = 1;
}
open my $file2, '<', $f2;
while (my $line = <$file2>) {
chomp $line;
$results{$line}++;
}
{ ############ added part
my %c;
for (keys %results) {
$c{$_} = $results{$_} if $results{$_} > 1;
}
%results = %c;
} ############ end added part
my (%log, $log);
if ( -e $logfile ) {
open $log, '<', $logfile;
while (<$log>) {
my ($num, $key) = split;
$log{$key} = $num;
}
}
open $log, '>', $logfile or die $!;
for my $key (keys %results) {
my $old = ( $log{$key} || 0 ); # keep old count, or 0 otherwise
my $new = ( $results{$key} ? 1 : 0 ); # 1 if it exists, 0 otherwise
print $log $old + $new, " $key\n";
}

Perform this computation in two steps.
Each time you run the comparison between retired and prod, produce an output file that you save with a unique file name, e.g. result-XXX where XXX denotes when you ran the comparison.
Then write a script which iterates over all of the result-XXX files and produces a summary.
I would name the files result-YYYY-MM-DD where YYYY-MM-DD is the date that the comparison was created. Then it will be relatively easy to iterate over a subset of the files (e.g. ones for a certain month).
Or store the data in a relational database.

Related

Perl - Compare two text files and then match only the difference found on the first file

I'm trying to make a script that would only print the difference in text found in the first file but not in the second file.
For example the first text file contains:
a
b
c
d
While the second file contains:
a
x
y
z
With the script that I'm trying, it prints the difference for both the files which is:
b
c
d
x
y
z
But the result I can't figure out to make is just:
b
c
d
Here is the code:
use strict;
use warnings;
my $f1 = 'C:\Strawberry\new.raw';
my $f2 = 'C:\Strawberry\orig.raw';
my $outfile = 'C:\Strawberry\mt_deleted.txt';
my %results = ();
open FILE1, "$f1" or die "Could not open file: $! \n";
while(my $line = <FILE1>){
$results{$line}=1;
}
close(FILE1);
open FILE2, "$f2" or die "Could not open file: $! \n";
while(my $line =<FILE2>) {
$results{$line}++;
}
close(FILE2);
open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
foreach my $line (keys %results) {
print OUTFILE $line if $results{$line} == 1;
}
close OUTFILE;
You need to add chomp, and assign different value for keys of file2
use strict;
use warnings;
my $f1 = 'C:\Strawberry\new.raw';
my $f2 = 'C:\Strawberry\orig.raw';
my $outfile = 'C:\Strawberry\mt_deleted.txt';
my %results = ();
open FILE1, "$f1" or die "Could not open file: $! \n";
while ( my $line = <FILE1> ) {
chomp $line;
$results{$line} = 1;
}
close(FILE1);
open FILE2, "$f2" or die "Could not open file: $! \n";
while ( my $line = <FILE2> ) {
chomp $line;
$results{$line} = 2;
}
close(FILE2);
open( OUTFILE, ">$outfile" ) or die "Cannot open $outfile for writing \n";
foreach my $line ( keys %results ) {
print OUTFILE "$line\n" if $results{$line} == 1;
}
close OUTFILE;
Let's start by counting the number of occurrences of each line in file 2.
my %counts;
while (<$fh2>) {
chomp;
++$counts{$_};
}
To print each line of file 1 not matched by a line in file 2, simply process file 1 line by line, decrementing the count, and printing the line if the count is negative.
while (<$fh1>) {
chomp;
say if --$counts{$_} < 0;
}
You said the files could have duplicate lines, but you didn't say how you wanted to handle them. The above handles duplicates as follows:
File 1:
a
a
a
b
c
File 2:
c
a
Output:
a
a
b
Let's start by forming a lookup table of what's in file 2.
my %seen;
while (<$fh2>) {
chomp;
++$seen{$_};
}
To print each line of file 1 not found in file 2, simply process file 1 line by line and printing the line if it's not in the lookup table.
while (<$fh1>) {
chomp;
say if !$seen{$_};
}
You said the files could have duplicate lines, but you didn't say how you wanted to handle them. The above handles duplicates as follows:
File 1:
a
a
a
b
c
File 2:
c
a
Output:
b

Perl simple filehandling of text

What this program is meant to do is that it reads a text file which looks like:
Item \t\t Price
apple \t\t 20
orange \t\t 50
lime \t\t 30
I'm using split function to split these 2 columns and then i should apply a -25% discount on all items and print it out to a new file. My code so far does what i want but the new text file has a '0' value under my last number in price column. I also get 2 errors if i run it with "use warnings" which are:
Use of uninitialized value $item in multiplication * ...
Use of uninitialized value $item[0] in concatenation (.) ...
I should also tell total number of items calculated but i get like 5 1's instead of 5. (11111 instead of 5)
use strict;
use warnings;
my $filename = 'shop.txt';
if (-e $filename){
open (IN, $filename);
}
else{
die "Can't open input file for reading: $!";
}
open (OUT,">","discount.txt") or die "Can't open output file for writing: $!";
my $header = <IN>;
print OUT $header;
while (<IN>) {
chomp;
my #items = split(/\t\t/);
foreach my $item ($items[1]){
my $discount = $item * (0.75);
print OUT "$items[0]\t\t$discount\n";
}
}
This is too complicated and not clear what are you doing in foreach loop and you are not skipping empty lines. Keep it simple:
use warnings;
use strict;
use v5.10;
<>; # skip header
while(my $line = <>)
{
chomp $line;
next unless ($line);
my ($title, $price ) = split /\s+/, $line;
if( $title && defined $price )
{
$price *= 0.75;
say "$title\t\t$price";
}
}
and run like
perl script.pl <input.txt >output.txt
use strict;
use warnings;
my $filename = 'shop.txt';
if (-e $filename){
open (IN, $filename);
}
else{
die "Can't open input file for reading: $!";
}
open (OUT,">","discount.txt") or die "Can't open output file for writing: $!";
my $header = <IN>;
my $item;
my $price;
print OUT $header;
while (<IN>) {
chomp;
($item, $price) = split(/\t\t/);
my $discount = $price*0.75;
print OUT "$item $discount\n";
}
This should help! :)
If the total item count isn't very important to you:
$ perl -wane '$F[1] *= 0.75 if $. > 1; print join("\t", #F), "\n";' input.txt
Output:
Item Price
apple 15
orange 37.5
lime 22.5
If you really need the total item count:
$ perl -we 'while (<>) { #F = split; if ($. > 1) { $F[1] *= 0.75; $i++ } print join("\t", #F), "\n"; } print "$i items\n";' input.txt
Output:
Item Price
apple 15
orange 37.5
lime 22.5
3 items
I'd use this approach
#!/usr/bin/perl
use strict;
use warnings;
my %items;
my $filename = 'shop.txt';
my $discount = 'discount.txt';
open my $in, '<', $filename or die "Failed to open file! : $!\n";
open my $out, ">", $discount or die "Can't open output file for writing: $!";
print $out "Item\t\tPrice\n";
my $cnt = 0;
while (my $line = <$in>) {
chomp $line;
if (my ($item,$price) = $line =~ /(\w.+)\s+([0-9.]+)/){
$price = $price * (0.75);
print $out "$item\t\t$price\n";
$items{$item} = $price;
$cnt++;
}
}
close($in);
close($out);
my $total = keys %items;
print "Total items - $total \n";
print "Total items - $cnt\n";
Using regex capture groups to capture the item and price (using \w.+ in case the item is 2 words like apple sauce), this will also prevent empty lines from printing to file.
I also hard coded the Item and Price header, probably a good idea if you are going to be using a consistent header.
Hope it helps
---Update ----
I added 2 examples of a total count in my script. The first one is using a hash and printing out the hash size, the second method is using a counter. The hash option is good except if your list has 2 items that are the same in which case the key of the hash will be overridden with the last item found which shares the same name. The counter is a simple solution.

Trying to write a specific result to a new outfile

I am extremely new to the Perl process. I am very much enjoying the learning curve and Perl but I am frustrated beyond belief and have spent many, many hours on one task achieving little to no results.
#!/usr/bin/perl
use strict;
print "Average value of retroviruses for the length of each genome and each of the genes:\n"; #create a title for the script
my $infile = "Lab1_table.txt"; # This is the file path.
open INFILE, $infile or die "Can't open $infile: $!"; # Provides an error message if the file can'tbe found.
# set my initial values.
my $tally = 0;
my #header = ();
my #averages = ();
# create my first loop to run through the file by line.
while (my $line = <INFILE>){
chomp $line;
print "$line\n";
# add one to the loop and essentially remove the header line of value.
# the first line is what was preventing me from caclulating averages as Perl can't calculate words.
my #row = split /\t/, $line; # split the file by tab characters.
$tally++; #adds one to the tally.
if ( $tally == 1 ) { #if the tally = 1 the row is determined as a the header.
#header = #row;
}
# if the tally is anything else besides 1 then it will read those rows.
else {
for( my $i = 1; $i < scalar #row; $i++ ) {
$averages[$i] += $row[$i];
}
foreach my $element (#row){
}
foreach my $i (0..4){
$averages[$i] = $averages[$i] + $row[1..4];
}
}
}
print "Average values of genome, gag, pol and env:\n";
for( my $i = 1; $i < scalar #averages; $i++ ) { # this line is used to determine the averages of the columns and print the values
print $averages[$i]/($tally-1), "\n";
}
SO, I got the results to come up with what I wanted (not in the exact format I wanted but as close as I can seem to get at the moment) and they do average the columns.
The issue now is writing to a an outfile. I am trying to get my table and results from the previous code to appear in my outfile. I get a good file name but no results.
foreach my $i (1){
my $outfile= "Average_values".".txt";
open OUTFILE, ">$outfile" or die "$outfile: $!";
print "Average values of genome, gag, pol and env:\n";
}
close OUTFILE;
close INFILE;
I feel like there is an easy way to do this and a hard way and I have taken the very hard way. Any help would be much appreciated.
You did not tell Perl where to print:
print OUTFILE "Average values of genome, gag, pol and env:\n";
BTW, together with use strict, also use warnings. And for working with files, use lexical filehandles and the three argument form of open:
open my $FH, '>', $filename or die $!;
print $FH 'Something';
close $FH or die $!;

how to count the number of specific characters through each line from file?

I'm trying to count the number of 'N's in a FASTA file which is:
>Header
AGGTTGGNNNTNNGNNTNGN
>Header2
AGNNNNNNNGNNGNNGNNGN
so in the end I want to get the count of number of 'N's and each header is a read so I want to make a histogram so I would at the end output something like this:
# of N's # of Reads
0 300
1 240
etc...
so there are 300 sequences or reads that have 0 number of 'N's
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
my $line;
my $sequence;
my $length;
my $char_N_count = 0;
my #array;
my $count = 0;
if (!defined ($output_file)) {
die "USAGE: Input FASTA file\n";
}
open (IFH, "$file") or die "Cannot open input file$!\n";
open (OFH, ">$output_file") or die "Cannot open output file $!\n";
while($line = <IFH>) {
chomp $line;
next if $line =~ /^>/;
$sequence = $line;
#array = split ('', $sequence);
foreach my $element (#array) {
if ($element eq 'N') {
$char_N_count++;
}
}
print "$char_N_count\n";
}
Try this. I changed a few things like using scalar file handles. There are many ways to do this in Perl, so some people will have other ideas. In this case I used an array which may have gaps in it - another option is to store results in a hash and key by the count.
Edit: Just realised I'm not using $output_file, because I have no idea what you want to do with it :) Just change the 'print' at the end to 'print $out_fh' if your intent is to write to it.
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
if (!defined ($output_file)) {
die "USAGE: $0 <input_file> <output_file>\n";
}
open (my $in_fh, '<', $file) or die "Cannot open input file '$file': $!\n";
open (my $out_fh, '>', $output_file) or die "Cannot open output file '$output_file': $!\n";
my #results = ();
while (my $line = <$in_fh>) {
next if $line =~ /^>/;
my $num_n = ($line =~ tr/N//);
$results[$num_n]++;
}
print "# of N's\t# of Reads\n";
for (my $i = 0; $i < scalar(#results) ; $i++) {
unless (defined($results[$i])) {
$results[$i] = 0;
# another option is to 'next' if you don't want to show the zero totals
}
print "$i\t\t$results[$i]\n";
}
close($in_fh);
close($out_fh);
exit;

How can I delete the last 10 lines of a file in perl

I am taking a total number of line as a user input and then I am deleting those numbers of l ine from the file.
I saw this learn.perl.org/faq/perlfaq5.html#How-do-I-count-the-number-of-lines-in-a-file- and then I tired the below simple logic.
Logic:
Get the Total number of lines
Subtracts it by the numbers entered by user
print the lines
Here is my code :
#!/usr/bin/perl -w
use strict;
open IN, "<", "Delete_line.txt"
or die " Can not open the file $!";
open OUT, ">", "Update_delete_line.txt"
or die "Can not write in the file $!";
my ($total_line, $line, $number, $printed_line);
print"Enter the number of line to be delete\n";
$number = <STDIN>;
while ($line = <IN>) {
$total_line = $.; # Total number of line in the file
}
$printed_line = $total_line - $number;
while ($line = <IN>) {
print OUT $line unless $.== $printed_line;
}
Well, neither i am getting any error in code nor any out put ? why I just don't know.
Can any one give me some suggestion.
A Perl solution that's efficient for large files requires the use of File::ReadBackwards
use File::ReadBackwards qw( );
my $num_lines = 10;
my $qfn = 'file.txt';
my $pos = do {
my $fh = File::ReadBackwards->new($qfn)
or die $!;
$fh->readline() for 1..$num_lines;
$fh->tell()
};
truncate($qfn, $pos)
or die $!;
This does not read the whole file twice (unlike the OP's method).
This does not read the whole file (unlike the Tie::File solutions).
This does not read the whole file into memory.
Yet another way is to use Tie::File
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'myfile' or die "$!\n";
$#lines -= 10;
untie #lines;
This has the advantage of not loading the file into memory while acting like it does.
Here a solution that passes through a stream and prints all but the last n lines where n is a command line argument:
#!/usr/bin/perl
my #cache;
my $n = shift #ARGV;
while(<>) {
push #cache, $_;
print shift #cache if #cache > $n;
}
or the one-liner version:
perl -ne'BEGIN{$n=shift#ARGV}push#c,$_;print shift#c if#c>$n' NUMBER
After finishing reading from IN, you have to reopen it or seek IN, 0, 0 to reset its position. You also have to set $. to zero again.
Also, the final condition should be changed to unless $. > $printed_line so you skip all the lines over the threshold.
The "more fun" answer: use Tie::File!
use strict;
use warnings;
use Tie::File;
tie my #file, 'Tie::File', 'filename' or die "$!";
$#file -= 10;
Just read the file in reverse and delete the first n lines: -
open my $filehandle, "<", "info.txt";
my #file = <$filehandle>;
splice(#file, -10);
print #file;
Note: This loads the entire file into memory.
You could just buffer the last 10 lines and then not print out the remaining 10.
use English qw<$INPLACE_EDIT>;
{ local #ARGV = $name_of_file_to_edit;
local $INPLACE_EDIT = '.bak';
my #buffer;
for ( 1..$num_lines_to_trim ) {
push #buffer, <>;
}
while ( <> ) {
print shift #buffer;
push #buffer, $_;
}
}
You could also do this with File::Slurp::edit_file_lines:
my #buffer;
my $limit_reached = 0;
edit_file_lines {
push #buffer, $_;
return ( $limit_reached ||= #buffer > $num_lines_to_trim ) ? shift #buffer
: ''
;
} $name_of_file;
my $num_lines = 10;
my $qfn = 'file.txt';
system('head', '-n', -$num_lines, '--', $qfn);
die "Error" if $?;
Easy with a C like for :
#!/usr/bin/perl -w
use strict;
open(my $in,"<","Delete_line.txt") or die "Can not open the file $!";
open(my $out,">","Update_delete_line.txt") or die"Can not write in the file $!";
print"Enter the number of lines to be delete\n";
my $number=<STDIN>;
my #file = <$in>;
for (my $i = 0; $i < $#file - $number + 1; $i++) {
print $out $file[$i];
}
close $in;
close $out;
#
# Reads a file trims the top and the bottom of by passed num of lines
# and return the string
# stolen from : http://stackoverflow.com/a/9330343/65706
# usage :
# my $StrCatFile = $objFileHandler->ReadFileReturnTrimmedStrAtTopBottom (
# $FileToCat , $NumOfRowsToRemoveAtTop , $NumOfRowsToRemoveAtBottom) ;
sub ReadFileReturnTrimmedStrAtTopBottom {
my $self = shift ;
my $file = shift ;
my $NumOfLinesToRemoveAtTop = shift ;
my $NumOfLinesToRemoveAtBottom = shift ;
my #cache ;
my $StrTmp = () ;
my $StrReturn = () ;
my $fh = () ;
open($fh, "<", "$file") or cluck ( "can't open file : $file for reading: $!" ) ;
my $counter = 0;
while (<$fh>) {
if ($. >= $NumOfLinesToRemoveAtTop + 1) {
$StrTmp .= $_ ;
}
}
close $fh;
my $sh = () ;
open( $sh, "<", \$StrTmp) or cluck( "can't open string : $StrTmp for reading: $!" ) ;
while(<$sh>) {
push ( #cache, $_ ) ;
$StrReturn .= shift #cache if #cache > $NumOfLinesToRemoveAtBottom;
}
close $sh ;
return $StrReturn ;
}
#eof ReadFileReturnTrimmedStrAtTopBottom
#