I'm trying to build a primary key into a new file from an original File which has the following structure (tbl_20180615.txt):
573103150033,0664,54,MSS02VEN*',INT,zxzc,,,,,
573103150033,0665,54,MSS02VEN,INT,zxzc,,,,,
573103150080,0659,29,MSS05ARA',INT,zxzc,,,,,
573103150080,0660,29,MSS05ARA ,INT,zxzc,,,,,
573103154377,1240,72,MSSTRI01,INT,zxzc,,,,,
573103154377,1240,72,MSSTRI01,INT,zxzc,,,,,
I launch my perl Verify.pl then I send the arguments, the first one is the number of columns to build the primary key in the new file, after I have to send the name of file (original file).
(Verify.pl)
#!/usr/bin/perl
use strict;
use warnings;
my $n1 = $ARGV[0];
my $name = $ARGV[1];
$n1 =~ s/"//g;
my $n2 = $n1 + 1;
my %seen;
my ( $file3 ) = qw(log.txt);
open my $fh3, '>', $file3 or die "Can't open $file3: $!";
print "Loading file ...\n";
open( my $file, "<", "$name" ) || die "Can't read file somefile.txt: $!";
while ( <$file> ) {
chomp;
my #rec = split( /,/, $_, $n2 ); #$n2 sirve para armar la primary key, hacer le split en los campos deseados
for ( my $i = 0; $i < $n1; $i++ ) {
print $fh3 "#rec[$i],";
}
print $fh3 "\n";
}
close( $file );
print "Done!\n";
#########to check duplicates
my ($file4) = qw(log.txt);
print "Checking duplicates records...\n\n";
open (my $file4, "<", "log.txt") || die "Can't read file log.txt: $!";
while ( <$file4> ) {
print if $seen{$_}++;
}
close($file4);
if I send the following instruction
perl Verify.pl 2 tbl_20180615.txt
this code build a new file called "log.txt" with the following structure, splitting the original file () into two columns given by the first argument:
(log.txt)
573103150033,0664,
573103150033,0665,
573103150080,0659,
573103150080,0660,
573103154377,1240,
573103154377,1240,
That works ok, but if I want to read the new file log.txt to check duplicates, it doesn't work, but If I comment the lines to generate the file log.txt (listed above) before the line in the code (###############to check duplicates################) launch the next part of the code it works ok, giving me two duplicates lines and looks like this:
(Result in command line)
573103154377,1240
573103154377,1240
How can I solve this issue?
I think this does what you're asking for. It builds a unique list of derived keys before printing any of them, using a hash to check whether a key has already been generated
Note that I have assigned values to #ARGV to emulate input values. You must remove that statement before running the program with input from the command line
#!/usr/bin/perl
use strict;
use warnings;
use autodie; # Handle bad IO statuses automatically
local #ARGV = qw/ 2 tbl_20180615.txt /; # For testing only
tr/"//d for #ARGV; # "
my ($key_fields, $input_file) = #ARGV;
my $output_file = 'log.txt';
my (#keys, %seen);
print "Loading input ... ";
open my $in_fh, '<', $input_file;
while ( <$in_fh> ) {
chomp;
my #rec = split /,/;
my $key = join ',', #rec[0..$key_fields-1];
push #keys, $key unless $seen{$key}++;
}
print "Done\n";
open my $out_fh, '>', $output_file;
print $out_fh "$_\n" for #keys;
close $out_fh;
output log.txt
573103150033,0664
573103150033,0665
573103150080,0659
573103150080,0660
573103154377,1240
Related
I am trying to send a variable that is defined in an if statement $abc to a new file. The code seems correct but, I know that it is not working because the file is not being created.
Data File Sample:
bos,control,x1,x2,29AUG2016,y1,y2,76.4
bos,control,x2,x3,30AUG2016,y2,y3,78.9
bos,control,x3,x4,01SEP2016,y3,y4,72.5
bos,control,x4,x5,02SEP2016,y4,y5,80.5
Perl Code:
#!/usr/bin/perl
use strict;
use warnings 'all';
use POSIX qw(strftime); #Pull in date
my $currdate = strftime( "%Y%m%d", localtime ); #Date in YYYYMMDD format
my $modded = strftime( "%d%b%Y", localtime ); #Date in DDMONYYYY format
my $newdate = uc $modded; #converts lowercase to uppercase
my $filename = '/home/.../.../text_file'; #Define full file path before opening
open(FILE, '<', $filename) or die "Uh, where's the file again?\n"; #Open file else give up and relay snarky error
while(<FILE>) #Open While Loop
{
chomp;
my #fields = split(',' , $_); #Identify columns
my $site = $fields[0];
my $var1 = $fields[1];
my $var2 = $fields[4];
my $var3 = $fields[7];
my $abc = print "$var1,$var2,$var3\n" if ($var1 =~ "control" && $var2 =~ "$newdate");
open my $abc, '>', '/home/.../.../newfile.txt';
close $abc;
}
close FILE;
In your code you have a few odd things that are likely mistakes.
my $abc = print "$var1,$var2,$var3\n" if ($var1 =~ "c01" && $var2 =~ "$newdate");
print will return success, which it does as 1. So you will print out the string to STDOUT, and then assign 1 to a new lexical variable $abc. $abc is now 1.
All of that only happens if that condition is met. Don't do conditional assignments. The behavior for this is undefined. So if the condition is false, your $abc might be undef. Or something else. Who knows?
open my $abc, '>', '/home/.../.../newfile.txt';
close $abc;
You are opening a new filehandle called $abc. The my will redeclare it. That's a warning that you would get if you had use warnings in your code. It also overwrites your old $abc with a new file handle object.
You don't write anything to the file
... are weird foldernames, but that's probably just obfuscation for your example
I think what you actually want to do is this:
use strict;
use warnings 'all';
# ...
open my $fh, '<', $filename or die $!;
while ( my $line = <$fh> ) {
chomp $line;
my #fields = split( ',', $line );
my $site = $fields[0];
my $var1 = $fields[1];
my $var2 = $fields[4];
my $var3 = $fields[7];
open my $fh_out, '>', '/home/.../.../newfile.txt';
print $fh_out "$var1,$var2,$var3\n" if ( $var1 =~ "c01" && $var2 =~ "$newdate" );
close $fh_out;
}
close $fh;
You don't need the $abc variable in between at all. You can just print to your new file handle $fh_out that's open for writing.
Note that you will overwrite the newfile.txt file every time you have a match in a line inside $filename.
Your current code:
Prints the string
Assigns the result of printing it to a variable
Immediately overwrites that variable with a file handle (assuming open succeeded)
Closes that file handle without using it
Your logic should look more like this:
if ( $var1 =~ "c01" && $var2 =~ "$newdate" ) {
my $abc = "$var1,$var2,$var3\n"
open (my $file, '>', '/home/.../.../newfile.txt') || die("Could not open file: " . $!);
print $file $abc;
close $file;
}
You have a number of problems with your code. In addition to what others have mentioned
You create a new output file every time you find a matching input line. That will leave the file containing only the last printed string
Your test checks whether the text in the second column contains c01, but all of the lines in your sample input have control in the second column, so nothing will be printed
I'm guessing that you want to test for string equality, in which case you need eq instead of =~ which does a regular expression pattern match
I think it should look something more like this
use strict;
use warnings 'all';
use POSIX 'strftime';
my $currdate = uc strftime '%d%b%Y', localtime;
my ($input, $output) = qw/ data.txt newfile.txt /;
open my $fh, '<', $input or die qq{Unable to open "$input" for input: $!};
open my $out_fh, '>', $output or die qq{Unable to open "$output" for output: $!};
while ( <$fh> ) {
chomp;
my #fields = split /,/;
my ($site, $var1, $var2, $var3) = #fields[0,1,4,7];
next unless $var1 eq 'c01' and $var2 eq $currdate;
print $out_fh "$var1,$var2,$var3\n";
}
close $out_fh or die $!;
I have written a Perl script to retrieve a hypothetical protein list from a FASTA file. I'm able to get only the header line with all hypothetical proteins, but I want to have all the sequences along with protein IDs.
The script is as follows.
#!/usr/bin/perl
use strict;
use warnings;
my $line;
open $fh, '<', '/home/Desktop/hypo_proteins/testprotein.fasta' or die "Cannot open file $fh, $!";
open OUT, ">output.txt";
while ( $line = <$fh> ) {
chomp $line;
if ( $line =~ /hypothetical protein/ ) {
print OUT "$line\n";
}
}
close $fh;
The output the I got from the above script is as follows
>gi|113461928|ref|YP_718205.1| hypothetical protein HS_1792 [Haemophilus somnus 129PT]
>gi|113460158|ref|YP_718214.1| hypothetical protein HS_0009 [Haemophilus somnus 129PT]
>gi|113460165|ref|YP_718221.1| hypothetical protein HS_0016 [Haemophilus somnus 129PT]
But I need the output as follows:
>gi|113461928|ref|YP_718205.1| hypothetical protein HS_1792 [Haemophilus somnus 129PT]
MFKSLIQFFKSKSNTSNIKKENAVQRQERQDIEGWITPYSGQELLNTELRQHHLGLLWQQVSMTREMFEH
LYQKPIERYAEMVQLLPASESHHHSHLGGMLDHGLEVISFAAKLRQNYVLPLNAAPEDQAKQKDAWTAAV
IYLALVHDIGKSIVDIEIQLQDGKRWLAWHGIPTLPYKFRYIKQRDYELHPVLGGFIANQLIAKETFDWL
ATYPEVFSALMYAMAGHYDKANVLAEIVQKADQNSVALALGGDITKLVQKPVISFAKQLILALRYLISQK
FKISSKGPGDGWLTEDGLWLMSKTTADQIRAYLMGQGISVPSDNRKLFDEMQAHRVIESTSEGNAIWYCQ
LSADAGWKPKDKFSLLRIKPEVIWDNIDDRPELFAGTICVVEKENEAEEKISNTVNEVQDTVPINKKENI
ELTSNLQEENTALQSLNPSQNPEVVVENCDNNSVDFLLNMFSDNNEQQVMNIPSADAEAGTTMILKSEPE
NLNTHIEVEANAIPKLPTNDDTHLKSEGQKFVDWLKDKLFKKQLTFNDRTAKVHIVNDCLFIVSPSSFEL
YLQEKGESYDEECINNLQYEFQALGLHRKRIIKNDTINFWRCKVIGPKKESFLVGYLVPNTRLFFGDKIL
INNRHLLLEE
This will do as you ask
#!/usr/bin/perl
use strict;
use warnings;
use constant INPUT => '/home/Desktop/hypo_proteins/testprotein.fasta';
use constant OUTPUT => 'output.txt';
open my $in_fh, '<', INPUT or die "Cannot open input file: $!";
open my $out_fh, '>', OUTPUT or die "Cannot open output file: $!";
select $out_fh;
my $print;
while ( <$in_fh> ) {
if ( /^>/ ) {
$print = /hypothetical protein/;
}
print if $print;
}
Regarding your (deleted) question about this solution, it uses the implicit variable $_ in several places. It is equivalent to this program
#!/usr/bin/perl
use strict;
use warnings;
use constant INPUT => '/home/Desktop/hypo_proteins/testprotein.fasta';
use constant OUTPUT => 'output.txt';
open my $in_fh, '<', INPUT or die "Cannot open input file: $!";
open my $out_fh, '>', OUTPUT or die "Cannot open output file: $!";
select $out_fh;
my $print;
while ( defined( $_ = <$in_fh>) ) {
if ( $_ =~ /^>/ ) {
$print = ( $_ =~ /hypothetical protein/ );
}
print $_ if $print;
}
So I hope you can see that $print = $_ =~ /hypothetical protein/ checks whether the current line (in $_) contains the string hypothetical protein and sets $print to a true value if so.
Because $print is defined outside the loop it keeps its value across iterations of the loop, and as you can see it is only changed on header lines, when the current line begins with >, and will stay true until the next header line, so that print if $print will output the header containing hypothetical protein and all following lines until the next header
I hope that helps
Apologies if this is a bit long winded, bu i really appreciate an answer here as i am having difficulty getting this to work.
Building on from this question here, i have this script that works on a csv file(orig.csv) and provides a csv file that i want(format.csv). What I want is to make this more generic and accept any number of '.csv' files and provide a 'output_csv' for each inputed file. Can anyone help?
#!/usr/bin/perl
use strict;
use warnings;
open my $orig_fh, '<', 'orig.csv' or die $!;
open my $format_fh, '>', 'format.csv' or die $!;
print $format_fh scalar <$orig_fh>; # Copy header line
my %data;
my #labels;
while (<$orig_fh>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
print $format_fh join(',', #{ $data{$label} }), "\n";
}
i was hoping to use this script from here but am having great difficulty putting the 2 together:
#!/usr/bin/perl
use strict;
use warnings;
#If you want to open a new output file for every input file
#Do it in your loop, not here.
#my $outfile = "KAC.pdb";
#open( my $fh, '>>', $outfile );
opendir( DIR, "/data/tmp" ) or die "$!";
my #files = readdir(DIR);
closedir DIR;
foreach my $file (#files) {
open( FH, "/data/tmp/$file" ) or die "$!";
my $outfile = "output_$file"; #Add a prefix (anything, doesn't have to say 'output')
open(my $fh, '>', $outfile);
while (<FH>) {
my ($line) = $_;
chomp($line);
if ( $line =~ m/KAC 50/ ) {
print $fh $_;
}
}
close($fh);
}
the script reads all the files in the directory and finds the line with this string 'KAC 50' and then appends that line to an output_$file for that inputfile. so there will be 1 output_$file for every inputfile that is read
issues with this script that I have noted and was looking to fix:
- it reads the '.' and '..' files in the directory and produces a
'output_.' and 'output_..' file
- it will also do the same with this script file.
I was also trying to make it dynamic by getting this script to work in any directory it is run in by adding this code:
use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";
and
opendir( DIR, $path ) or die "$!"; # open the current directory
open( FH, "$path/$file" ) or die "$!"; #open the file
**EDIT::I have tried combining the versions but am getting errors.Advise greatly appreciated*
UserName#wabcl13 ~/Perl
$ perl formatfile_QforStackOverflow.pl
Parentheses missing around "my" list at formatfile_QforStackOverflow.pl line 13.
source dir -> /home/UserName/Perl
Can't use string ("/home/UserName/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 28.
combined code::
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print $format_file scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
How do you plan on inputting the list of files to process and their preferred output destination? Maybe just have a fixed directory that you want to process all the cvs files, and prefix the result.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $source_dir = '/some/dir/with/cvs/files';
my $output_prefix = 'format_';
opendir my $dh, $source_dir;
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
.... old processing code here ...
}
Alternatively, you could just have an output directory instead of prefixing the files. Either way, this should get you on your way.
I am trying to solve below issues.
I have 2 files. Address.txt and File.txt. I want to replace all A/B/C/D (File.txt) with corresponding string value (Read from Address.txt file) using perl script. It's not replacing in my output file. I am getting same content of File.txt.
I tried below codes.
Here is Address.txt file
A,APPLE
B,BAL
C,CAT
D,DOG
E,ELEPHANT
F,FROG
G,GOD
H,HORCE
Here is File.txt
A B C
X Y X
M N O
D E F
F G H
Here is my code :
use strict;
use warnings;
open (MYFILE, 'Address.txt');
foreach (<MYFILE>){
chomp;
my #data_new = split/,/sm;
open INPUTFILE, "<", $ARGV[0] or die $!;
open OUT, '>ariout.txt' or die $!;
my $src = $data_new[0];
my $des = $data_new[1];
while (<INPUTFILE>) {
# print "In while :$src \t$des\n";
$_ =~ s/$src/$des/g;
print OUT $_;
}
close INPUTFILE;
close OUT;
# /usr/bin/perl -p -i -e "s/A/APPLE/g" ARGV[0];
}
close (MYFILE);
If i Write $_ =~ s/A/Apple/g;
Then output file is fine and A is replacing with "Apple". But when dynamically coming it's not getting replaced.
Thanks in advance. I am new in perl scripting language . Correct me if I am wrong any where.
Update 1: I updated below code . It's working fine now. My questions Big O of this algo.
Code :
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "Address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", "File1.txt" ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my #line = split;
for my $char ( #line ) {
( exists $lookup{$char} ) ? print $out_fh " $lookup{$char} " : print $out_fh " $char ";
}
print $out_fh "\n";
}
Not entirely sure how you want your output formatted. Do you want to keep the rows and columns as is?
I took a similar approach as above but kept the formatting the same as in your 'file.txt' file:
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", "file.txt" ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my #line = split;
for my $char ( #line ) {
( exists $lookup{$char} ) ? print $out_fh " $lookup{$char} " : print $out_fh " $char ";
}
print $out_fh "\n";
}
That will give you the output:
APPLE BAL CAT
X Y X
M N O
DOG ELEPHANT FROG
FROG GOD HORCE
Here's another option that lets Perl handle the opening and closing of files:
use strict;
use warnings;
my $addresses_txt = pop;
my %hash = map { $1 => $2 if /(.+?),(.+)/ } <>;
push #ARGV, $addresses_txt;
while (<>) {
my #array;
push #array, $hash{$_} // $_ for split;
print "#array\n";
}
Usage: perl File.txt Addresses.txt [>outFile.txt]
The last, optional parameter directs output to a file.
Output on your dataset:
APPLE BAL CAT
X Y X
M N O
DOG ELEPHANT FROG
FROG GOD HORCE
The name of the addresses' file is implicitly popped off of #ARGV for use later. Then, a hash is built, using the key/value pairs in File.txt.
The addresses' file is read, splitting each line into its single elements, and the defined-or (//) operator is used to returned the defined hash item or the single element, which is then pushed onto #array. Finally, the array is interpolated in a print statement.
Hope this helps!
First, here is your existing program, rewritten slightly
open the address file
convert the address file to a hash so that the letters are the keys and the strings the values
open the other file
read in the single line in it
split the line into single letters
use the letters to lookup in the hash
use strict;
use warnings;
open(my $a,"Address.txt")||die $!;
my %address=map {split(/,/) } map {split(' ')} <$a>;
open(my $f,"File.txt")||die $!;
my $list=<$f>;
for my $letter (split(' ',$list)) {
print $address{$letter}."\n" if (exists $address{$letter});
}
to make another file with the substitutions in place alter the loop that processes $list
for my $letter (split(' ',$list)) {
if (exists $address{$letter}) {
push #output, $address{$letter};
}
else {
push #output, $letter;
}
}
open(my $o,">newFile.txt")||die $!;
print $o "#output";
Your problem is that in every iteration of your foreach loop you overwrite any changes made earlier to output file.
My solution:
use strict;
use warnings;
open my $replacements, 'Address.txt' or die $!;
my %r;
foreach (<$replacements>) {
chomp;
my ($k, $v) = split/,/sm;
$r{$k} = $v;
}
my $re = '(' . join('|', keys %r) . ')';
open my $input, "<", $ARGV[0] or die $!;
while (<$input>) {
s/$re/$r{$1}/g;
print;
}
#!/usr/bin/perl -w
# to replace multiple text strings in a file with text from another file
#select text from 1st file, replace in 2nd file
$file1 = 'Address.txt'; $file2 = 'File.txt';
# save the strings by which to replace
%replacement = ();
open IN,"$file1" or die "cant open $file1\n";
while(<IN>)
{chomp $_;
#a = split ',',$_;
$replacement{$a[0]} = $a[1];}
close IN;
open OUT,">replaced_file";
open REPL,"$file2" or die "cant open $file2\n";
while(<REPL>)
{chomp $_;
#a = split ' ',$_; #replaced_data = ();
# replace strings wherever possible
foreach $i(#a)
{if(exists $replacement{$i}) {push #replaced_data,$replacement{$i};}
else {push #replaced_data,$i;}
}
print OUT trim(join " ",#replaced_data),"\n";
}
close REPL; close OUT;
########################################
sub trim
{
my $str = $_[0];
$str=~s/^\s*(.*)/$1/;
$str=~s/\s*$//;
return $str;
}
I have 2 CSV files, file1.csv and file2.csv . I have to pick each row of column 3 in file1 and iterate through column 3 of file2 to find a match and if the match occurs then display the complete matched rows(from column 1,2 and 3)only from file2.csv in a third csv file.My code till now only fetches the column 3 from both the csv files. How can I match column 3 of both the files and display the matched rows ? Please help.
File1:
Comp_Name,Date,Files
Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile26;
Component1,2013/04/25,/Com/src2;
File2:
Comp_name,Date,Files
Component1,2013/04/07,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component2,2013/04/23,/Com/src/folder1/folder2/newfile.txt;
Component3,2013/04/27,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/25,/Com/src2;
Output format:
Comp_Name,Date,Files
Component1,2013/04/07,/Com/src/folder1/folder2/newfile.txt;
Component2,2013/04/23,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component3,2013/04/27,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component1,2013/04/25,/Com/src2;
Code:
use strict;
use warnings;
my $file1 = "C:\\pick\\file1.csv";
my $file2 = "C:\\pick\\file2.csv";
my $file3 = "C:\\pick\\file3.csv";
my $type;
my $type1;
my #fields;
my #fields2;
open(my $fh, '<:encoding(UTF-8)', $file1) or die "Could not open file '$file1' $!"; #Throw error if file doesn't open
while (my $row = <$fh>) # reading each row till end of file
{
chomp $row;
#fields = split ",",$row;
$type = $fields[2];
print"\n$type";
}
open(my $fh2, '<:encoding(UTF-8)', $file2) or die "Could not open file '$file2' $!"; #Throw error if file doesn't open
while (my $row2 = <$fh2>) # reading each row till end of file
{
chomp $row2;
#fields2 = split ",",$row2;
$type1 = $fields2[2];
print"\n$type1";
foreach($type)
{
if ($type eq $type1)
{
print $row2;
}
}
}
This is not a matter to over complicate.. I would personally use a module Text::CSV_XS or as mentioned already Tie::Array::CSV to perform here.
If you're having trouble using a module, I suppose this would be an alternative. You can modify to your desired wants and needs, I used the data you supplied and got the results you want.
use strict;
use warnings;
open my $fh1, '<', 'file1.csv' or die "failed open: $!";
open my $fh2, '<', 'file2.csv' or die "failed open: $!";
open my $out, '>', 'file3.csv' or die "failed open: $!";
my %hash1 = map { $_ => 1 } <$fh1>;
my %hash2 = map { $_ => 1 } <$fh2>;
close $fh1;
close $fh2;
my #result =
map { join ',', $hash1{$_->[2]} ? () : $_->[0], $_->[1], $_->[2] }
sort { $a->[1] <=> $b->[1] || $a->[2] cmp $b->[2] || $a->[0] cmp $b->[0] }
map { s/\s*$//; [split /,/] } keys %hash2;
print $out "$_\n" for #result;
close $out;
__OUTPUT__
Comp_name,Date,Files
Component1,2013/04/07,/Com/src/folder1/folder2/newfile.txt;
Component2,2013/04/23,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component3,2013/04/27,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component1,2013/04/25,/Com/src2;
This is a job for a hash ( my %file1)
so instead of continually opening the files you can read the contents into hashes
#fields = split ",",$row;
$type = $fields[2];
$hash1{$type} = $row;
I see you have duplicates too so the hash entry will be replaced upon duplication
so you can store an array of values in the hash
$hash1{$type} = [] unless $hash1{$type};
push #{$hash1{$type}}, $row;
Your next problem is how to traverse the arrays inside hashes
Here is an example using my Tie::Array::CSV module. It uses some clever Perl tricks to represent each CSV file as a Perl array of arrayrefs. I use it to make an index of the first file, then to loop over the second file and finally to output to the third.
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::Array::CSV;
tie my #file1, 'Tie::Array::CSV', 'file1' or die 'Cannot tie file1';
tie my #file2, 'Tie::Array::CSV', 'file2' or die 'Cannot tie file2';
tie my #output, 'Tie::Array::CSV', 'output' or die 'Cannot tie output';
# setup a match table from file2
my %match = map { ( $_->[-1] => 1 ) } #file1[1..$#file1];
#header
push #output, $file2[0];
# iterate over file2
for my $row ( #file2[1..$#file2] ) {
next unless $match{$row->[-1]}; # check for match
push #output, $row; # print to output if match
}
The output I get is different from yours, but I cannot figure out why your output does not include testfile25 and src2.