MD5 hexdigest verification with a string - perl

I am having a problem comparing the result from the file_md5_hex( $dir ) subroutine with a string read from a file.
When I print they are both the same, but when I compare them in a if I get all the time equal regardless of the value they have.
elsif ( -f $dir )
{
if($dir ne "$mydir.txt" && $dir ne "log.txt")
{
my $filename = "$mydir.txt";
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
print FILE "$dir -> ";
while (my $row = <$fh>)
{
chomp($row);
if($row eq $dir)
{
my $hash = <$fh>;
chomp($hash);
print FILE "$hash = ";
break;
}
}
close $fh;
my $md5 = file_md5_hex( $dir );
print FILE "$md5\n";
print FILE ref($md5);
print FILE ref($hash);
if( $md5 eq $hash )
{
print FILE "Hash ok!\n";
}
else
{
print FILERESULT "In $mydir file $dir is corrupted. Correct is $hash, calculated is $md5\n";
print FILE "Hash Nok!\n";
}
}
}
In the log file I see that the 2 values $md5 and $hash are the same or different (depending on the case I simulate) but when I verify the program sees them as always equal. I think there might be a problem with the types of data but I don't know how to fix it.

use strict to detect errors with variable names and scopes.
$hash is not defined on if( $md5 eq $hash ) because my $hash = <$fh>; is out of scope.
Declare my $hash before while (my $row = <$fh>) and set the value with $hash = <$fh>;
http://perldoc.perl.org/functions/my.html

Related

Data driven perl script

I want to list file n folder in directory. Here are the list of the file in this directory.
Output1.sv
Output2.sv
Folder1
Folder2
file_a
file_b
file_c.sv
But some of them, i don't want it to be listed. The list of not included file, I list in input.txt like below. Note:some of them is file and some of them is folder
NOT_INCLUDED=file_a
NOT_INCLUDED=file_b
NOT_INCLUDED=file_c.sv
Here is the code.
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = "INPUT.txt";
open ( OUTPUT, ">OUTPUT.txt" );
file_in_directory();
close OUTPUT;
sub file_in_directory {
my $path = "experiment/";
my #unsort_output;
my #not_included;
open ( INFILE, "<", $input_file);
while (<INFILE>){
if ( $_ =~ /NOT_INCLUDED/){
my #file = $_;
foreach my $file (#file) {
$file =~ s/NOT_INCLUDED=//;
push #not_included, $file;
}
}
}
close INFILE;
opendir ( DIR, $path ) || die "Error in opening dir $path\n";
while ( my $filelist = readdir (DIR) ) {
chomp $filelist;
next if ( $filelist =~ m/\.list$/ );
next if ( $filelist =~ m/\.swp$/ );
next if ( $filelist =~ s/\.//g);
foreach $_ (#not_included){
chomp $_;
my $not_included = "$_";
if ( $filelist eq $not_included ){
next;
}
push #unsort_output, $filelist;
}
closedir(DIR);
my #output = sort #unsort_output;
print OUTPUT #output;
}
The output that I want is to list all the file in that directory except the file list in input.txt 'NOT_INCLUDED'.
Output1.sv
Output2.sv
Folder1
Folder2
But the output that i get seem still included that unwanted file.
This part of the code makes no sense:
while ( my $filelist = readdir (DIR) ) {
...
foreach $_ (#not_included){
chomp $_;
my $not_included = "$_";
if ( $filelist eq $not_included ){
next;
} # (1)
push #unsort_output, $filelist; # (2)
}
This code contains three opening braces ({) but only two closing braces (}). If you try to run your code as-is, it fails with a syntax error.
The push line (marked (2)) is part of the foreach loop, but indented as if it were outside. Either it should be indented more (to line up with (1)), or you need to add a } before it. Neither alternative makes much sense:
If push is outside of the foreach loop, then the next statement (and the whole foreach loop) has no effect. It could just be deleted.
If push is inside the foreach loop, then every directory entry ($filelist) will be pushed multiple times, one for each line in #not_included (except for the names listed somewhere in #not_included; those will be pushed one time less).
There are several other problems. For example:
$filelist =~ s/\.//g removes all dots from the file name, transforming e.g. file_c.sv into file_csv. That means it will never match NOT_INCLUDED=file_c.sv in your input file.
Worse, the next if s/// part means the loop skips all files whose names contain dots, such as Output1.sv or Output2.sv.
Results are printed without separators, so you'll get something like
Folder1Folder1Folder1Folder2Folder2Folder2file_afile_afile_bfile_b in OUTPUT.txt.
Global variables are used for no reason, e.g. INFILE and DIR.
Here is how I would structure the code:
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = 'INPUT.txt';
my %is_blacklisted;
{
open my $fh, '<', $input_file or die "$0: $input_file: $!\n";
while (my $line = readline $fh) {
chomp $line;
if ($line =~ s!\ANOT_INCLUDED=!!) {
$is_blacklisted{$line} = 1;
}
}
}
my $path = 'experiment';
my #results;
{
opendir my $dh, $path or die "$0: $path: $!\n";
while (my $entry = readdir $dh) {
next
if $entry eq '.' || $entry eq '..'
|| $entry =~ /\.list\z/
|| $entry =~ /\.swp\z/
|| $is_blacklisted{$entry};
push #results, $entry;
}
}
#results = sort #results;
my $output_file = 'OUTPUT.txt';
{
open my $fh, '>', $output_file or die "$0: $output_file: $!\n";
for my $result (#results) {
print $fh "$result\n";
}
}
The contents of INPUT.txt (more specifically, the parts after NOT_INCLUDED=) are read into a hash (%is_blacklisted). This allows easy lookup of entries.
Then we process the directory entries. We skip over . and .. (I assume you don't want those) as well as all files ending with *.list or *.swp (that was in your original code). We also skip any file that is blacklisted, i.e. that was specified as excluded in INPUT.txt. The remaining entries are collected in #results.
We sort our results and write them to OUTPUT.txt, one entry per line.
Not deviating too much from your code, here is the solution. Please find the comments:
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = "INPUT.txt";
open ( OUTPUT, ">OUTPUT.txt" );
file_in_directory();
close OUTPUT;
sub file_in_directory {
my $path = "experiment/";
my #unsort_output;
my %not_included; # creating hash map insted of array for cleaner and faster implementaion.
open ( INFILE, "<", $input_file);
while (my $file = <INFILE>) {
if ($file =~ /NOT_INCLUDED/) {
$file =~ s/NOT_INCLUDED=//;
$not_included{$file}++; # create a quick hash map of (filename => 1, filename2 => 1)
}
}
close INFILE;
opendir ( DIR, $path ) || die "Error in opening dir $path\n";
while ( my $filelist = readdir (DIR) ) {
next if $filelist =~ /^\.\.?$/xms; # discard . and .. files
chomp $filelist;
next if ( $filelist =~ m/\.list$/ );
next if ( $filelist =~ m/\.swp$/ );
next if ( $filelist =~ s/\.//g);
if (defined $not_included{$filelist}) {
next;
}
else {
push #unsort_output, $filelist;
}
}
closedir(DIR); # earlier the closedir was inside of while loop. Which is wrong.
my #output = sort #unsort_output;
print OUTPUT join "\n", #output;
}

Compare two files and write matching data from first file using perl

First file
FirstName:LastName:Location:Country:ID
FirstName1:LastName1:Location1:Country1:ID1
FirstName2:LastName2:Location2:Country2:ID2
FirstName3:LastName3:Location3:Country3:ID3
FirstName4:LastName4:Location4:Country4:ID4
Second file
FirstName:LastName:Location:Country:Old_ID
FirstName2:LastName2:Location2:Country2:Old_ID2
FirstName4:LastName4:Location4:Country4:Old_ID4
Have to compare first and second file and print matching rows with data from first file which is have new ID's.
Below script fetches me Old_ID's from second file and not the new ones from first file
use warnings;
use strict;
my $details = 'file2.txt';
my $old_details = 'file1.txt';
my %names;
open my $data, '<', $details or die $!;
while (<$data>)
{
my ($name, #ids) = split;
push #{ $names{$_} }, $name for #ids;
}
open my $old_data, '<', $old_details or die $!;
while (<$old_data>)
{
chomp;
print #{ $names{$_} // [$_] }, "\n";
}
Output:
FirstName:LastName:Location:Country:Old_ID
FirstName2:LastName2:Location2:Country2:Old_ID2
FirstName4:LastName4:Location4:Country4:Old_ID4
Expected output:
FirstName:LastName:Location:Country:ID
FirstName2:LastName2:Location2:Country2:ID2
FirstName4:LastName4:Location4:Country4:ID4
Just try this way:
use strict; # Use strict Pragma
use warnings;
my ($file1, $filecnt1, $file2, $filecnt2) = ""; #Declaring variables
$file1 = "a1.txt"; $file2 = "b1.txt"; #Sample files
readFileinString($file1, \$filecnt1); # Reading first file
readFileinString($file2, \$filecnt2); # Reading second file
$filecnt2=~s/\:Old\_ID/\:ID/g; # Replacing that difference content
my #firstfle = split "\n", $filecnt1; # Move content to array variable to compare
my #secndfle = split "\n", $filecnt2;
my %firstfle = map { $_ => 1 } #firstfle; #Mapping the array into hash variable
my #scdcmp = grep { $firstfle{$_} } #secndfle;
print join "\n", #scdcmp;
#---------------> File reading
sub readFileinString
#--------------->
{
my $File = shift;
my $string = shift;
open(FILE1, "<$File") or die "\nFailed Reading File: [$File]\n\tReason: $!";
read(FILE1, $$string, -s $File, 0);
close(FILE1);
}
#---------------> File Writing
sub writeFileinString
#--------------->
{
my $File = shift;
my $string = shift;
my #cDir = split(/\\/, $File);
my $tmp = "";
for(my $i = 0; $i < $#cDir; $i++)
{
$tmp = $tmp . "$cDir[$i]\\";
mkdir "$tmp";
}
if(-f $File){
unlink($File);
}
open(FILE, ">$File") or die "\n\nFailed File Open for Writing: [$File]\n\nReason: $!\n";
print FILE $$string;
close(FILE);
}

Nested if statements: Swapping headers and sequences in fasta files

I am opening a directory and processing each file. A sample file looks like this when opened:
>AAAAA
TTTTTTTTTTTAAAAATTTTTTTTTT
>BBBBB
TTTTTTTTTTTTTTTTTTBBBBBTTT
>CCCCC
TTTTTTTTTTTTTTTTCCCCCTTTTT
For the above sample file, I am trying to make them look like this:
>TAAAAAT
AAAAA
>TBBBBBT
BBBBB
>TCCCCCT
CCCCC
I need to find the "header" in next line sequence, take flanks on either side of the match, and then flip them. I want to print each file's worth of contents to another separate file.
Here is my code so far. It runs without errors, but doesn't generate any output. My guess is this is probably related to the nested if statements. I have never worked with those before.
#!/usr/bin/perl
use strict;
use warnings;
my ($directory) = #ARGV;
my $dir = "$directory";
my #ArrayofFiles = glob "$dir/*";
my $count = 0;
open(OUT, ">", "/path/to/output_$count.txt") or die $!;
foreach my $file(#ArrayofFiles){
open(my $fastas, $file) or die $!;
while (my $line = <$fastas>){
$count++;
if ($line =~ m/(^>)([a-z]{5})/i){
my $header = $2;
if ($line !~ /^>/){
my $sequence .= $line;
if ($sequence =~ m/(([a-z]{1})($header)([a-z]{1}))/i){
my $matchplusflanks = $1;
print OUT ">", $matchplusflanks, "\n", $header, "\n";
}
}
}
}
}
How can I fix this code? Thanks.
Try this
foreach my $file(#ArrayofFiles)
{
open my $fh," <", $file or die"error opening $!\n";
while(my $head=<$fh>)
{
chomp $head;
$head=~s/>//;
my $next_line = <$fh>;
my($extract) = $next_line =~m/(.$head.)/;
print ">$extract\n$head\n";
}
}
There are several mistakes in your code but the main problem is:
if ($line =~ m/(^>)([a-z]{5})/i) {
my $header = $2;
if ($line !~ /^>/) {
# here you write to the output file
Because the same line can't start and not start with > at the same time, your output files are never written. The second if statement always fails and its block is never executed.
open(OUT, ">", "/path/to/output_$count.txt") or die $!; and $count++ are misplaced. Since you want to produce an output file (with a new name) for each input file, you need to put them in the foreach block, not outside or in the while loop.
Example:
#!/usr/bin/perl
use strict;
use warnings;
my ($dir) = #ARGV;
my #files = glob "$dir/*";
my $count;
my $format = ">%s\n%s\n";
foreach my $file (#files) {
open my $fhi, '<', $file
or die "Can't open file '$file': $!";
$count++;
my $output_path = "/path/to/output_$count.txt";
open my $fho, '>', $output_path
or die "Can't open file '$output_path': $!";
my ($header, $seq);
while(<$fhi>) {
chomp;
if (/^>([a-z]{5})/i) {
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
($header, $seq) = ($1, '');
} else { $seq .= $_; }
}
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
}
close $fhi;
close $fho;

Reading and comparing lines in Perl

I am having trouble with getting my perl script to work. The issue might be related to the reading of the Extract file line by line within the while loop, any help would be appreciated. There are two files
Bad file that contains a list of bad IDs (100s of IDs)
2
3
Extract that contains a delimited data with the ID in field 1 (millions of rows)
1|data|data|data
2|data|data|data
2|data|data|data
2|data|data|data
3|data|data|data
4|data|data|data
5|data|data|data
I am trying to remove all the rows from the large extract file where the IDs match. There can be multiple rows where the ID matches. The extract is sorted.
#use strict;
#use warnning;
$SourceFile = $ARGV[0];
$ToRemove = $ARGV[1];
$FieldNum = $ARGV[2];
$NewFile = $ARGV[3];
$LargeRecords = $ARGV[4];
open(INFILE, $SourceFile) or die "Can't open source file: $SourceFile \n";
open(REMOVE, $ToRemove) or die "Can't open toRemove file: $ToRemove \n";
open(OutGood, "> $NewFile") or die "Can't open good output file \n";
open(OutLarge, "> $LargeRecords") or die "Can't open Large Records output file \n";
#Read in the list of bad IDs into array
#array = <REMOVE>;
#Loop through each bad record
foreach (#array)
{
$badID = $_;
#read the extract line by line
while(<INFILE>)
{
#take the line and split it into
#fields = split /\|/, $_;
my $extractID = $fields[$FieldNum];
#print "Here's what we got: $badID and $extractID\n";
while($extractID == $badID)
{
#Write out bad large records
print OutLarge join '|', #fields;
#Get the next line in the extract file
#fields = split /\|/, <INFILE>;
my $extractID = $fields[$FieldNum];
$found = 1; #true
#print " We got a match!!";
#remove item after it has been found
my $input_remove = $badID;
#array = grep {!/$input_remove/} #array;
}
print OutGood join '|', #fields;
}
}
Try this:
$ perl -F'|' -nae 'BEGIN {while(<>){chomp; $bad{$_}++;last if eof;}} print unless $bad{$F[0]};' bad good
First, you are lucky: The number of bad IDs is small. That means, you can read the list of bad IDs once, stick them in a hash table without running into any difficulty with memory usage. Once you have them in a hash, you just read the big data file line by line, skipping output for bad IDs.
#!/usr/bin/env perl
use strict;
use warnings;
# hardwired for convenience
my $bad_id_file = 'bad.txt';
my $data_file = 'data.txt';
my $bad_ids = read_bad_ids($bad_id_file);
remove_data_with_bad_ids($data_file, $bad_ids);
sub remove_data_with_bad_ids {
my $file = shift;
my $bad = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
exists $bad->{ $id } or print $line;
}
}
close $in
or die "Cannot close '$file': $!";
return;
}
sub read_bad_ids {
my $file = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
my %bad;
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
$bad{ $id } = undef;
}
}
close $in
or die "Cannot close '$file': $!";
return \%bad;
}
sub extract_id {
my $string_ref = shift;
if (my ($id) = ($$string_ref =~ m{\A ([0-9]+) }x)) {
return $id;
}
return;
}
I'd use a hash as follows:
use warnings;
use strict;
my #bad = qw(2 3);
my %bad;
$bad{$_} = 1 foreach #bad;
my #file = qw (1|data|data|data 2|data|data|data 2|data|data|data 2|data|data|data 3|data|data|data 4|data|data|data 5|data|data|data);
my %hash;
foreach (#file){
my #split = split(/\|/);
$hash{$split[0]} = $_;
}
foreach (sort keys %hash){
print "$hash{$_}\n" unless exists $bad{$_};
}
Which gives:
   
1|data|data|data
4|data|data|data
5|data|data|data

Why I am not getting "success" with this program?

I have written the following program with the hope of getting success. But I could never get it.
my $fileName = 'myfile.txt';
print $fileName,"\n";
if (open MYFILE, "<", $fileName) {
my $Data;
{
local $/ = undef;
$Data = <MYFILE>;
}
my #values = split('\n', $Data);
chomp(#values);
if($values[2] eq '9999999999') {
print "Success"."\n";
}
}
The content of myfile.txt is
160002
something
9999999999
700021
Try splitting by \s*[\r\n]+
my $fileName = 'myfile.txt';
print $fileName,"\n";
if (open MYFILE, "<", $fileName) {
my $Data;
{
local $/ = undef;
$Data = <MYFILE>;
}
my #values = split(/\s*[\r\n]+/, $Data);
if($values[2] eq '9999999999') {
print "Success";
}
}
If myfile.txt contain carriage return (CR, \r), it will not work as expected.
Another possible cause is trailing spaces before linefeed (LF, \n).
You don't need to read an entire file into an array to check one line. Open the file, skip the lines you don't care about, then play with the line you do care about. When you've done what you need to do, stop reading the file. This way, only one line is ever in memory:
my $fileName = 'myfile.txt';
open MYFILE, "<", $fileName or die "$filename: $!";
while( <MYFILE> ) {
next if $. < 3; # $. is the line number
last if $. > 3;
chomp;
print "Success\n" if $_ eq '9999999999';
}
close MYFILE;
my $fileName = 'myfile.txt';
open MYFILE, "<", $fileName || die "$fileName: $!";
while( $rec = <MYFILE> ) {
for ($rec) { chomp; s/\r//; s/^\s+//; s/\s+$//; } #Remove line-feed and space characters
$cnt++;
if ( $rec =~ /^9+$/ ) { print "Success\n"; last; } #if record matches "9"s only
#print "Success" and live the loop
}
close MYFILE;
#Or you can write: if ($cnt==3 and $rec =~ /^9{10}$/) { print "Success\n"; last; }
#If record 3 matches ten "9"s print "Success" and live the loop.