Match column in 2 CSV files and display output in third file - perl

I have 2 CSV files, file1.csv and file2.csv . I have to pick each row of column 3 in file1 and iterate through column 3 of file2 to find a match and if the match occurs then display the complete matched rows(from column 1,2 and 3)only from file2.csv in a third csv file.My code till now only fetches the column 3 from both the csv files. How can I match column 3 of both the files and display the matched rows ? Please help.
File1:
Comp_Name,Date,Files
Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile26;
Component1,2013/04/25,/Com/src2;
File2:
Comp_name,Date,Files
Component1,2013/04/07,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component2,2013/04/23,/Com/src/folder1/folder2/newfile.txt;
Component3,2013/04/27,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/25,/Com/src2;
Output format:
Comp_Name,Date,Files
Component1,2013/04/07,/Com/src/folder1/folder2/newfile.txt;
Component2,2013/04/23,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component3,2013/04/27,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component1,2013/04/25,/Com/src2;
Code:
use strict;
use warnings;
my $file1 = "C:\\pick\\file1.csv";
my $file2 = "C:\\pick\\file2.csv";
my $file3 = "C:\\pick\\file3.csv";
my $type;
my $type1;
my #fields;
my #fields2;
open(my $fh, '<:encoding(UTF-8)', $file1) or die "Could not open file '$file1' $!"; #Throw error if file doesn't open
while (my $row = <$fh>) # reading each row till end of file
{
chomp $row;
#fields = split ",",$row;
$type = $fields[2];
print"\n$type";
}
open(my $fh2, '<:encoding(UTF-8)', $file2) or die "Could not open file '$file2' $!"; #Throw error if file doesn't open
while (my $row2 = <$fh2>) # reading each row till end of file
{
chomp $row2;
#fields2 = split ",",$row2;
$type1 = $fields2[2];
print"\n$type1";
foreach($type)
{
if ($type eq $type1)
{
print $row2;
}
}
}

This is not a matter to over complicate.. I would personally use a module Text::CSV_XS or as mentioned already Tie::Array::CSV to perform here.
If you're having trouble using a module, I suppose this would be an alternative. You can modify to your desired wants and needs, I used the data you supplied and got the results you want.
use strict;
use warnings;
open my $fh1, '<', 'file1.csv' or die "failed open: $!";
open my $fh2, '<', 'file2.csv' or die "failed open: $!";
open my $out, '>', 'file3.csv' or die "failed open: $!";
my %hash1 = map { $_ => 1 } <$fh1>;
my %hash2 = map { $_ => 1 } <$fh2>;
close $fh1;
close $fh2;
my #result =
map { join ',', $hash1{$_->[2]} ? () : $_->[0], $_->[1], $_->[2] }
sort { $a->[1] <=> $b->[1] || $a->[2] cmp $b->[2] || $a->[0] cmp $b->[0] }
map { s/\s*$//; [split /,/] } keys %hash2;
print $out "$_\n" for #result;
close $out;
__OUTPUT__
Comp_name,Date,Files
Component1,2013/04/07,/Com/src/folder1/folder2/newfile.txt;
Component2,2013/04/23,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component3,2013/04/27,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component1,2013/04/25,/Com/src2;

This is a job for a hash ( my %file1)
so instead of continually opening the files you can read the contents into hashes
#fields = split ",",$row;
$type = $fields[2];
$hash1{$type} = $row;
I see you have duplicates too so the hash entry will be replaced upon duplication
so you can store an array of values in the hash
$hash1{$type} = [] unless $hash1{$type};
push #{$hash1{$type}}, $row;
Your next problem is how to traverse the arrays inside hashes

Here is an example using my Tie::Array::CSV module. It uses some clever Perl tricks to represent each CSV file as a Perl array of arrayrefs. I use it to make an index of the first file, then to loop over the second file and finally to output to the third.
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::Array::CSV;
tie my #file1, 'Tie::Array::CSV', 'file1' or die 'Cannot tie file1';
tie my #file2, 'Tie::Array::CSV', 'file2' or die 'Cannot tie file2';
tie my #output, 'Tie::Array::CSV', 'output' or die 'Cannot tie output';
# setup a match table from file2
my %match = map { ( $_->[-1] => 1 ) } #file1[1..$#file1];
#header
push #output, $file2[0];
# iterate over file2
for my $row ( #file2[1..$#file2] ) {
next unless $match{$row->[-1]}; # check for match
push #output, $row; # print to output if match
}
The output I get is different from yours, but I cannot figure out why your output does not include testfile25 and src2.

Related

Writing a file and Reading it in Perl

I'm trying to build a primary key into a new file from an original File which has the following structure (tbl_20180615.txt):
573103150033,0664,54,MSS02VEN*',INT,zxzc,,,,,
573103150033,0665,54,MSS02VEN,INT,zxzc,,,,,
573103150080,0659,29,MSS05ARA',INT,zxzc,,,,,
573103150080,0660,29,MSS05ARA ,INT,zxzc,,,,,
573103154377,1240,72,MSSTRI01,INT,zxzc,,,,,
573103154377,1240,72,MSSTRI01,INT,zxzc,,,,,
I launch my perl Verify.pl then I send the arguments, the first one is the number of columns to build the primary key in the new file, after I have to send the name of file (original file).
(Verify.pl)
#!/usr/bin/perl
use strict;
use warnings;
my $n1 = $ARGV[0];
my $name = $ARGV[1];
$n1 =~ s/"//g;
my $n2 = $n1 + 1;
my %seen;
my ( $file3 ) = qw(log.txt);
open my $fh3, '>', $file3 or die "Can't open $file3: $!";
print "Loading file ...\n";
open( my $file, "<", "$name" ) || die "Can't read file somefile.txt: $!";
while ( <$file> ) {
chomp;
my #rec = split( /,/, $_, $n2 ); #$n2 sirve para armar la primary key, hacer le split en los campos deseados
for ( my $i = 0; $i < $n1; $i++ ) {
print $fh3 "#rec[$i],";
}
print $fh3 "\n";
}
close( $file );
print "Done!\n";
#########to check duplicates
my ($file4) = qw(log.txt);
print "Checking duplicates records...\n\n";
open (my $file4, "<", "log.txt") || die "Can't read file log.txt: $!";
while ( <$file4> ) {
print if $seen{$_}++;
}
close($file4);
if I send the following instruction
perl Verify.pl 2 tbl_20180615.txt
this code build a new file called "log.txt" with the following structure, splitting the original file () into two columns given by the first argument:
(log.txt)
573103150033,0664,
573103150033,0665,
573103150080,0659,
573103150080,0660,
573103154377,1240,
573103154377,1240,
That works ok, but if I want to read the new file log.txt to check duplicates, it doesn't work, but If I comment the lines to generate the file log.txt (listed above) before the line in the code (###############to check duplicates################) launch the next part of the code it works ok, giving me two duplicates lines and looks like this:
(Result in command line)
573103154377,1240
573103154377,1240
How can I solve this issue?
I think this does what you're asking for. It builds a unique list of derived keys before printing any of them, using a hash to check whether a key has already been generated
Note that I have assigned values to #ARGV to emulate input values. You must remove that statement before running the program with input from the command line
#!/usr/bin/perl
use strict;
use warnings;
use autodie; # Handle bad IO statuses automatically
local #ARGV = qw/ 2 tbl_20180615.txt /; # For testing only
tr/"//d for #ARGV; # "
my ($key_fields, $input_file) = #ARGV;
my $output_file = 'log.txt';
my (#keys, %seen);
print "Loading input ... ";
open my $in_fh, '<', $input_file;
while ( <$in_fh> ) {
chomp;
my #rec = split /,/;
my $key = join ',', #rec[0..$key_fields-1];
push #keys, $key unless $seen{$key}++;
}
print "Done\n";
open my $out_fh, '>', $output_file;
print $out_fh "$_\n" for #keys;
close $out_fh;
output log.txt
573103150033,0664
573103150033,0665
573103150080,0659
573103150080,0660
573103154377,1240

Duplicate values in column

I have a original file which has following columns,
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,C,Sell,0.25,2000
02-May-2018,JPM,Sell,0.25,3000
02-May-2018,WFC,Sell,0.25,5000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,GOOG,Sell,0.25,8000
02-May-2018,GOOG,Sell,0.25,9000
02-May-2018,C,Sell,0.25,2000
02-May-2018,AAPL,Sell,0.25,3000
I am trying to print this original line if I see value in the second column more then 2 times.. for example, if I see AAPL more then 2 times desired result should print
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
So Far, I have written the following which prints results multiple times which is wrong.. can you please help on what I am doing wrong?
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
%count = ();
#symbol = ();
while ($line = <FILE>)
{
chomp $line;
(#data) = split(/,/,$line);
$count{$data[1]}++;
#keys = sort {$count{$a} cmp $count{$b}} keys %count;
for my $key (#keys)
{
if ( $count{$key} > 2 )
{
print "$line\n";
}
}
}
I'd do it something like this - store lines you've seen in a 'buffer' and print them out again if the condition is hit (before continuing to print as you go):
#!/usr/bin/env perl
use strict;
use warnings;
my %buffer;
my %count_of;
while ( my $line = <> ) {
my ( $date, $ticker, #values ) = split /,/, $line;
#increment the count
$count_of{$ticker}++;
if ( $count_of{$ticker} < 3 ) {
#count limit not hit, so stash the current line in the buffer.
$buffer{$ticker} .= $line;
next;
}
#print the buffer if the count has been hit
if ( $count_of{$ticker} == 3 ) {
print $buffer{$ticker};
}
#only gets to here once the limit is hit, so just print normally.
print $line;
}
With your input data, this outputs:
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
Simple answer:
push #{ $lines{(split",")[1]} }, $_ while <>;
print #{ $lines{$_} } for grep #{ $lines{$_} } > 2, sort keys %lines;
perl program.pl inputfile > outputfile
You need to read the input file twice, because you don't know the final counts until you get to the end of the file
use strict;
use warnings 'all';
my ($TMPFILE, $TMPFILE1) = qw/ infile outfile /;
my %counts;
{
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
while ( <$fh> ) {
my #fields = split /,/;
++$counts{$fields[1]};
}
}
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
open my $out_fh, '>', $TMPFILE1 or die "Could not open $TMPFILE1: $!";
while ( <$fh> ) {
my #fields = split /,/;
print $out_fh $_ if $counts{$fields[1]} > 2;
}
output
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
This should work:
use strict;
use warnings;
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
my %data;
while ( my $line = <FILE> ) {
chomp $line;
my #line = split /,/, $line;
push(#{$data{$line[1]}}, $line);
}
foreach my $key (keys %data) {
if(#{$data{$key}} > 2) {
print "$_\n" foreach #{$data{$key}};
}
}

Reading and comparing lines in Perl

I am having trouble with getting my perl script to work. The issue might be related to the reading of the Extract file line by line within the while loop, any help would be appreciated. There are two files
Bad file that contains a list of bad IDs (100s of IDs)
2
3
Extract that contains a delimited data with the ID in field 1 (millions of rows)
1|data|data|data
2|data|data|data
2|data|data|data
2|data|data|data
3|data|data|data
4|data|data|data
5|data|data|data
I am trying to remove all the rows from the large extract file where the IDs match. There can be multiple rows where the ID matches. The extract is sorted.
#use strict;
#use warnning;
$SourceFile = $ARGV[0];
$ToRemove = $ARGV[1];
$FieldNum = $ARGV[2];
$NewFile = $ARGV[3];
$LargeRecords = $ARGV[4];
open(INFILE, $SourceFile) or die "Can't open source file: $SourceFile \n";
open(REMOVE, $ToRemove) or die "Can't open toRemove file: $ToRemove \n";
open(OutGood, "> $NewFile") or die "Can't open good output file \n";
open(OutLarge, "> $LargeRecords") or die "Can't open Large Records output file \n";
#Read in the list of bad IDs into array
#array = <REMOVE>;
#Loop through each bad record
foreach (#array)
{
$badID = $_;
#read the extract line by line
while(<INFILE>)
{
#take the line and split it into
#fields = split /\|/, $_;
my $extractID = $fields[$FieldNum];
#print "Here's what we got: $badID and $extractID\n";
while($extractID == $badID)
{
#Write out bad large records
print OutLarge join '|', #fields;
#Get the next line in the extract file
#fields = split /\|/, <INFILE>;
my $extractID = $fields[$FieldNum];
$found = 1; #true
#print " We got a match!!";
#remove item after it has been found
my $input_remove = $badID;
#array = grep {!/$input_remove/} #array;
}
print OutGood join '|', #fields;
}
}
Try this:
$ perl -F'|' -nae 'BEGIN {while(<>){chomp; $bad{$_}++;last if eof;}} print unless $bad{$F[0]};' bad good
First, you are lucky: The number of bad IDs is small. That means, you can read the list of bad IDs once, stick them in a hash table without running into any difficulty with memory usage. Once you have them in a hash, you just read the big data file line by line, skipping output for bad IDs.
#!/usr/bin/env perl
use strict;
use warnings;
# hardwired for convenience
my $bad_id_file = 'bad.txt';
my $data_file = 'data.txt';
my $bad_ids = read_bad_ids($bad_id_file);
remove_data_with_bad_ids($data_file, $bad_ids);
sub remove_data_with_bad_ids {
my $file = shift;
my $bad = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
exists $bad->{ $id } or print $line;
}
}
close $in
or die "Cannot close '$file': $!";
return;
}
sub read_bad_ids {
my $file = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
my %bad;
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
$bad{ $id } = undef;
}
}
close $in
or die "Cannot close '$file': $!";
return \%bad;
}
sub extract_id {
my $string_ref = shift;
if (my ($id) = ($$string_ref =~ m{\A ([0-9]+) }x)) {
return $id;
}
return;
}
I'd use a hash as follows:
use warnings;
use strict;
my #bad = qw(2 3);
my %bad;
$bad{$_} = 1 foreach #bad;
my #file = qw (1|data|data|data 2|data|data|data 2|data|data|data 2|data|data|data 3|data|data|data 4|data|data|data 5|data|data|data);
my %hash;
foreach (#file){
my #split = split(/\|/);
$hash{$split[0]} = $_;
}
foreach (sort keys %hash){
print "$hash{$_}\n" unless exists $bad{$_};
}
Which gives:
   
1|data|data|data
4|data|data|data
5|data|data|data

How to replace string dynamically using perl script

I am trying to solve below issues.
I have 2 files. Address.txt and File.txt. I want to replace all A/B/C/D (File.txt) with corresponding string value (Read from Address.txt file) using perl script. It's not replacing in my output file. I am getting same content of File.txt.
I tried below codes.
Here is Address.txt file
A,APPLE
B,BAL
C,CAT
D,DOG
E,ELEPHANT
F,FROG
G,GOD
H,HORCE
Here is File.txt
A B C
X Y X
M N O
D E F
F G H
Here is my code :
use strict;
use warnings;
open (MYFILE, 'Address.txt');
foreach (<MYFILE>){
chomp;
my #data_new = split/,/sm;
open INPUTFILE, "<", $ARGV[0] or die $!;
open OUT, '>ariout.txt' or die $!;
my $src = $data_new[0];
my $des = $data_new[1];
while (<INPUTFILE>) {
# print "In while :$src \t$des\n";
$_ =~ s/$src/$des/g;
print OUT $_;
}
close INPUTFILE;
close OUT;
# /usr/bin/perl -p -i -e "s/A/APPLE/g" ARGV[0];
}
close (MYFILE);
If i Write $_ =~ s/A/Apple/g;
Then output file is fine and A is replacing with "Apple". But when dynamically coming it's not getting replaced.
Thanks in advance. I am new in perl scripting language . Correct me if I am wrong any where.
Update 1: I updated below code . It's working fine now. My questions Big O of this algo.
Code :
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "Address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", "File1.txt" ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my #line = split;
for my $char ( #line ) {
( exists $lookup{$char} ) ? print $out_fh " $lookup{$char} " : print $out_fh " $char ";
}
print $out_fh "\n";
}
Not entirely sure how you want your output formatted. Do you want to keep the rows and columns as is?
I took a similar approach as above but kept the formatting the same as in your 'file.txt' file:
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", "file.txt" ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my #line = split;
for my $char ( #line ) {
( exists $lookup{$char} ) ? print $out_fh " $lookup{$char} " : print $out_fh " $char ";
}
print $out_fh "\n";
}
That will give you the output:
APPLE BAL CAT
X Y X
M N O
DOG ELEPHANT FROG
FROG GOD HORCE
Here's another option that lets Perl handle the opening and closing of files:
use strict;
use warnings;
my $addresses_txt = pop;
my %hash = map { $1 => $2 if /(.+?),(.+)/ } <>;
push #ARGV, $addresses_txt;
while (<>) {
my #array;
push #array, $hash{$_} // $_ for split;
print "#array\n";
}
Usage: perl File.txt Addresses.txt [>outFile.txt]
The last, optional parameter directs output to a file.
Output on your dataset:
APPLE BAL CAT
X Y X
M N O
DOG ELEPHANT FROG
FROG GOD HORCE
The name of the addresses' file is implicitly popped off of #ARGV for use later. Then, a hash is built, using the key/value pairs in File.txt.
The addresses' file is read, splitting each line into its single elements, and the defined-or (//) operator is used to returned the defined hash item or the single element, which is then pushed onto #array. Finally, the array is interpolated in a print statement.
Hope this helps!
First, here is your existing program, rewritten slightly
open the address file
convert the address file to a hash so that the letters are the keys and the strings the values
open the other file
read in the single line in it
split the line into single letters
use the letters to lookup in the hash
use strict;
use warnings;
open(my $a,"Address.txt")||die $!;
my %address=map {split(/,/) } map {split(' ')} <$a>;
open(my $f,"File.txt")||die $!;
my $list=<$f>;
for my $letter (split(' ',$list)) {
print $address{$letter}."\n" if (exists $address{$letter});
}
to make another file with the substitutions in place alter the loop that processes $list
for my $letter (split(' ',$list)) {
if (exists $address{$letter}) {
push #output, $address{$letter};
}
else {
push #output, $letter;
}
}
open(my $o,">newFile.txt")||die $!;
print $o "#output";
Your problem is that in every iteration of your foreach loop you overwrite any changes made earlier to output file.
My solution:
use strict;
use warnings;
open my $replacements, 'Address.txt' or die $!;
my %r;
foreach (<$replacements>) {
chomp;
my ($k, $v) = split/,/sm;
$r{$k} = $v;
}
my $re = '(' . join('|', keys %r) . ')';
open my $input, "<", $ARGV[0] or die $!;
while (<$input>) {
s/$re/$r{$1}/g;
print;
}
#!/usr/bin/perl -w
# to replace multiple text strings in a file with text from another file
#select text from 1st file, replace in 2nd file
$file1 = 'Address.txt'; $file2 = 'File.txt';
# save the strings by which to replace
%replacement = ();
open IN,"$file1" or die "cant open $file1\n";
while(<IN>)
{chomp $_;
#a = split ',',$_;
$replacement{$a[0]} = $a[1];}
close IN;
open OUT,">replaced_file";
open REPL,"$file2" or die "cant open $file2\n";
while(<REPL>)
{chomp $_;
#a = split ' ',$_; #replaced_data = ();
# replace strings wherever possible
foreach $i(#a)
{if(exists $replacement{$i}) {push #replaced_data,$replacement{$i};}
else {push #replaced_data,$i;}
}
print OUT trim(join " ",#replaced_data),"\n";
}
close REPL; close OUT;
########################################
sub trim
{
my $str = $_[0];
$str=~s/^\s*(.*)/$1/;
$str=~s/\s*$//;
return $str;
}

Extracting specific multiple line of records that is pipe delimited in perl

I have a file that looks like
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
I want to separate the records by country. I have stored each line into array variable #fields
my #fields = split(/\|/, $_ );
making $fields[3] as my basis for sorting it. I wanted it to separate into 2 output text files
OUTPUT TEXT FILE 1:
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
OUTPUT TEXT FILE 2
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
Putting all that is from JPN to output text 1 & non-JPN country to output text file 2
here's the code that what trying to work out
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my $tmp_var;
my $count;
;
my ($line, $i);
my $filename = 'data.txt';
open(my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open(OUTPUTA, ">", 'JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
open(OUTPUTB, ">", 'Non-JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
my $fh;
while (<$input_fh>) {
chomp;
my #fields = split /\|/;
if ($fields[0] eq 'NAME') {
for ($i=1; $i < #fields; $i++) {
if ($fields[3] eq 'JPN') {
$fh = $_;
print OUTPUTA $fh;
}
else {
$fh = $_;
print OUTPUTB $fh;
}
}
}
}
close(OUTPUTA);
close(OUTPUTB)
Still has no luck on it :(
Here is the way I think ikegami was saying, but I've never tried this before (although it gave the correct results).
#!/usr/bin/perl
use strict;
use warnings;
open my $jpn_fh, ">", 'o33.txt' or die $!;
open my $other_fh, ">", 'o44.txt' or die $!;
my $fh;
while (<DATA>) {
if (/^NAME/) {
if (/JPN$/) {
$fh = $jpn_fh;
}
else {
$fh = $other_fh;
}
}
print $fh $_;
}
close $jpn_fh or die $!;
close $other_fh or die $!;
__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
You didn't say what you needed help with, so I'm assuming it's coming up with an algorithm. Here's a good one:
Open the file to read.
Open the file for the JPN entries.
Open the file for the non-JPN entries.
While not eof,
Read a line.
Parse the line.
If it's the first line of a record,
If the person's country is JPN,
Set current file handle to the file handle for JPN entries.
Else,
Set current file handle to the file handle for non-JPN entries.
Print the line to the current file handle.
my $jpn_qfn = '...';
my $other_qfn = '...';
open(my $jpn_fh, '>', $jpn_qfn)
or die("Can't create $jpn_qfn: $!\n");
open(my $other_fh, '>', $other_qfn)
or die("Can't create $other_qfn: $!\n");
my $fh;
while (<>) {
chomp;
my #fields = split /\|/;
if ($fields[0] eq 'NAME') {
$fh = $fields[3] eq 'JPN' ? $jpn_fh : $other_fh;
}
say $fh $_;
}
#!/usr/bin/env perl
use 5.012;
use autodie;
use strict;
use warnings;
# store per country output filehandles
my %output;
# since this is just an example, read from __DATA__ section
while (my $line = <DATA>) {
# split the fields
my #cells = split /[|]/, $line;
# if first field is NAME, this is a new record
if ($cells[0] eq 'NAME') {
# get the country code, strip trailing whitespace
(my $country = $cells[3]) =~ s/\s+\z//;
# if we haven't created and output file for this
# country, yet, do so
unless (defined $output{$country}) {
open my $fh, '>', "$country.out";
$output{$country} = $fh;
}
my $out = $output{$country};
# output this and the next two lines to
# country specific output file
print $out $line, scalar <DATA>, scalar <DATA>;
}
}
close $_ for values %output;
__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
Thanks for your Help heaps
I was able to solved this problem in perl,
many thanks
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my $tmp_var;
my ($rec_type, $country);
my $filename = 'data.txt';
open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open my $OUTPUTA, ">", 'o33.txt' or die $!;
open my $OUTPUTB, ">", 'o44.txt' or die $!;
my $Combline;
while (<$input_fh>) {
$_ = _trim($_);
#fields = split (/\|/, $_);
$rec_type = $fields[0];
$country = $fields[3];
if ($rec_type eq 'NAME') {
if ($country eq 'JPN') {
*Combline = $OUTPUTA;
}
else {
*Combline = $OUTPUTB;
}
}
print Combline;
}
close $OUTPUTA or die $!;
close $OUTPUTB or die $!;
sub _trim {
my $word = shift;
if ( $word ) {
$word =~ s/\s*\|/\|/g; #remove trailing spaces
$word =~ s/"//g; #remove double quotes
}
return $word;
}