Script merging two files - perl

I'm fairly inexperienced with coding, but I often use Perl to merge files and match ID's and information between two files. I have just tried matching two files using a program I have used many times previously, but this time it's not working and I don't understand why.
Here is the code:
use strict;
use warnings;
use vars qw($damID $damF $damAHC $prog $hash1 %hash1 $info1 $ID $sire $dam $F $FB $AHC $FA $hash2 %hash2 $info2);
open (FILE1, "<damF.txt") || die "$!\n Couldn't open damF.txt\n";
my $N = 1;
while (<FILE1>){
chomp (my $line=$_);
next if 1..$N==$.;
my ($damID, $damF, $damAHC, $prog) = split (/\t/, $line);
if ($prog){
$hash1 -> {$prog} -> {info1} = "$damID\t$damF\t$damAHC";
}
open (FILE2, "<whole pedigree_F.txt") || die "$!\n whole pedigree_F.txt \n";
open (Output, ">Output.txt")||die "Can't Open Output file";
while (<FILE2>){
chomp (my $line=$_);
next if 1..$N==$.;
my ($ID, $sire, $dam, $F, $FB, $AHC, $FA) = split (/\t/, $line);
if ($ID){
$hash2 -> {$ID} -> {info2} = "$F\t$AHC";
}
if ($ID && ($hash1->{$prog})){
$info1 = $hash1 -> {$prog} -> {info1};
$info2 = $hash2 -> {$ID} -> {info2};
print "$ID\t$info2\t$info1\n";
}
}
}
close(FILE1);
close FILE2;
close Output;
print "Done!\n";
and these snippets from the two input file formats:
File 1:
501093 0 0 3162
2958 0 0 3163
1895 0 0 3164
1382 0 0 3165
2869 0 0 3166
2361 0 0 3167
754 0 0 3168
3163 0 0 3169
File 2:
49327 20543 49325 0.077 0.4899 0.808 0.0484
49328 15247 49326 0.0755 0.5232 0.8972 0.0499
49329 27823 49327 0.0834 0.5138 0.8738 0.0541
I want to match the values from column 4 in file 1, with column 1 in file 2.
Then I also want to print the matching values from columns 2 and 3 in file 1 and columns 3 and 5 in file 2.
Also, it is probably worth mentioning there are about 500000 entries on each file.
This is the output I am getting:
11476 0.0362 0.3237 501093 0 0
11477 0.0673 0.4768 501093 0 0
11478 0.0443 0.2619 501093 0 0
Note that it isn’t looping through the first hash that I created.

Create two tables in SQLite. Load the TSVs into them. Do a SQL join. It will be simpler and faster.
Refer to this answer about how to load data into SQLite. In your case you want .mode tabs.
sqlite> create table file1 ( col1 int, col2 int, col3 int, col4 int );
sqlite> create table file2 ( col1 int, col2 int, col3 int, col4 numeric, col5 numeric, col6 numeric, col7 numeric );
sqlite> .mode tabs
sqlite> .import /path/to/file1 file1
sqlite> .import /path/to/file2 file2
There's any number of ways to improve those tables, but I don't know what your data is. Use better names in your own. You'll also want to declare things like primary and foreign keys as well as indexes to speed things up.
Now you have your data in an easy to manipulate format using a well known query language, not a bunch of custom code.
I want to match the values from column 4 in file 1, with column 1 in file 2.
Then I also want to print the matching values from columns 2 and 3 in file 1 and columns 3 and 5 in file 2.
You can do this with a SQL join between the two tables.
select file1.col2, file1.col3, file2.col3, file2.col5
from file1
join file2 on file1.col4 = file2.col1

Related

Perl ASCII variable to Decimal with "." after every letter

I'm making a Perl plugin for Nagios for the F5 load balancer. I have to convert the pool name to a decimal format that matches the OID for SNMP.
my ( $PoolName ) = $ARGV[1];
my ( $rootOIDPoolStatus ) = '1.3.6.1.4.1.3375.2.2.5.5.2.1.2';
For example, $PoolName is "/Common/Atlassian" and I
need to convert that to /.C.o.m.m.o.n./.A.t.l.a.s.s.i.a.n
and then to 47.67.111.109.109.111.110.47.65.116.108.97.115.115.105.97.110
Once that has been converted they would get pulled into one variable
my ( $PoolStatus ) = "$rootOIDPoolStatus.$OIDPoolName"
I have been backwards-engineering other people's Perl plugins for Nagios and this is what someone else is doing, but I couldn't make it work no matter what kind of combinations I was doing. Their $name would be my $PoolName
sub to_oid($) {
my $oid;
my ($name) = $_[0];
return "" if ( ! $name );
$oid = ( length $name ) . '.' . ( join '.', ( map { unpack 'C', $ } ( split '',$name ) ) );
return $oid;
}
Could someone help me to build or understand the Perl logic in order to convert $PoolName to the decimal format I need for the OID?
You seem to be using a string as an index to an SNMP table. The index of a table can be thought of as the row number or row id for that table. Often the index for a table is just a number starting from 1 and increasing with each row the table has. Such a number is encoded in the OID as is, i.e. if the table has 3 columns and two rows, they would have these OIDs:
$base.1 # table
$base.1.1 # table entry
$base.1.1.1.1 # col1, row1
$base.1.1.1.2 # col1, row2
$base.1.1.2.1 # col2, row1
$base.1.1.2.2 # col2, row2
$base.1.1.3.1 # col3, row1
$base.1.1.3.2 # col3, row2
^---index
Sometimes the index is an IP address, a combination of IP:port, or a combination of two IP addresses, especially for IP related tables. An IP address as index would look like this:
$base.1 # table
$base.1.1 # table entry
$base.1.1.1.1.0.0.127 # col1, row "127.0.0.1"
$base.1.1.1.0.0.0.0 # col1, row "0.0.0.0"
$base.1.1.2.1.0.0.127 # col2, row "127.0.0.1"
$base.1.1.2.0.0.0.0 # col2, row "0.0.0.0"
$base.1.1.3.1.0.0.127 # col3, row "127.0.0.1"
$base.1.1.3.0.0.0.0 # col3, row "0.0.0.0"
^^^^^^^---- index
As you can see, the length of the index varies depending on its datatype (there's a dedicated IPV4 datatype).
Sometimes the index is a string (as in your case). When a string is used it must as well be somehow encoded to make up a "row number" for the table. Strings as indexes are encoded character-wise and preceeded by their length, i.e.:
$base.1 # table
$base.1.1 # table entry
$base.1.1.1.2.65.66 # col1, row "AB"
$base.1.1.1.3.120.121.122 # col1, row "xyz"
$base.1.1.2.2.65.66 # col2, row "AB"
$base.1.1.2.3.120.121.122 # col2, row "xyz"
$base.1.1.3.2.65.66 # col3, row "AB"
$base.1.1.3.3.120.121.122 # col3, row "xyz"
^^^^^^^^^^^^^---- index
So "AB" becomes "2.65.66" because length('AB')==2 and ord('A')==65, ord('B')==66. Likewise "xyz" becomes "3.120.121.122".
Your function to_oid does exactly that, although I'd simplify it as follows:
#!/usr/bin/env perl
use strict;
use warnings;
sub to_oid
{
my $string = shift;
return sprintf('%d.%s', length($string), join('.', unpack('C*', $string)));
}
my $rootOIDPoolStatus = '1.3.6.1.4.1.3375.2.2.5.5.2.1.2';
my $PoolName = '/Common/Atlassian';
my $poolname_oid = to_oid($PoolName);
my $complete_oid = "$rootOIDPoolStatus.$poolname_oid";
print $complete_oid, "\n";
Output:
1.3.6.1.4.1.3375.2.2.5.5.2.1.2.17.47.67.111.109.109.111.110.47.65.116.108.97.115.115.105.97.110
|<------- rootOID ----------->|<------------ poolname_oid ----...--->|
my $poolStatus = join '.', $rootOIDPoolStatus, map ord, split //, $poolName;
Not sure what the length() is for in your code, you don't show anything like that in your example.
my $PoolStatus = join('.', $rootOIDPoolStatus, unpack('C*', $PoolName));
or
my $PoolStatus = sprintf("%s.%vd", $rootOIDPoolStatus, $PoolName);

Join multiple files into one using a key and rearrange the columns using perl.

What approach should i take if i am trying to read multiple large files and join them using a key. There is a possibility of 1 to many combinations so reading one line at a time works for my simple scenario. Looking for some guidance. Thanks!
use strict;
use warnings;
open my $head, $ARGV[0] or die "Can't open $ARGV[0] for reading: $!";
open my $addr, $ARGV[1] or die "Can't open $ARGV[1] for reading: $!";
open my $phone, $ARGV[2] or die "Can't open $ARGV[2] for reading: $!";
#open my $final, $ARGV[3] or die "Can't open $ARGV[3] for reading: $!";
while( my $line1 = <$head> and my $line2 = <$addr> and my $line3 = <$phone>)
{
#split files to fields
my #headValues = split('\|', $line1);
my #addrValues = split('\|', $line2);
my #phoneValues = split('\|', $line3);
# if the key matches, join them
if($headValues[0]==$addrValues[0] and $headValues[0]==$phoneValues[0])
{
print "$headValues[0]|$headValues[1]|$headValues[2]|$addrValues[1]|$addrValues[2]|$phoneValues[1]";
}
}
close $head;
I am not sure if it's exactly what you're looking for but did you try the UNIX command join?
Consider these two files:
x.tsv
001 X1
002 X2
004 X4
y.tsv
002 Y2
003 Y3
004 Y4
the command join x.tsv y.tsv produces:
002 X2 Y2
004 X4 Y4
That is, it merges lines with the same ID and discard the others (to keep things simple).
If I were you, then I would build an sqlite database from the three file then it would be much easier to use sql to retrive the results.
I did not know how fast it is going to be, but i think it is much robust than reading three files in paralel. SQlite could handle this amount of data.
http://perlmaven.com/simple-database-access-using-perl-dbi-and-sql
SQLite for large data sets?
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbfile = "sample.db";
my $dsn = "dbi:SQLite:dbname=$dbfile";
my $user = "";
my $password = "";
my $dbh = DBI->connect($dsn, $user, $password, {
PrintError => 1,
RaiseError => 1,
FetchHashKeyName => 'NAME_lc',
AutoCommit => 0,
});
$dbh->do('PRAGMA synchronous = OFF');
my $sql = <<'END_SQL';
CREATE TABLE t1 (
id INTEGER PRIMARY KEY,
c1 VARCHAR(100),
c2 VARCHAR(100),
c3 VARCHAR(100),
c4 VARCHAR(100),
)
END_SQL
$dbh->do($sql);
my $sql = <<'END_SQL';
CREATE TABLE t2 (
id INTEGER PRIMARY KEY,
c1 VARCHAR(100),
c2 VARCHAR(100),
c3 VARCHAR(100),
c4 VARCHAR(100),
)
END_SQL
$dbh->do($sql);
my $sql = <<'END_SQL';
CREATE TABLE t3 (
id INTEGER PRIMARY KEY,
c1 VARCHAR(100),
c2 VARCHAR(100),
c3 VARCHAR(100),
c4 VARCHAR(100),
)
END_SQL
$dbh->do($sql);
### populate data
open my $fh, $ARGV[0] or die "Can't open $ARGV[0] for reading: $!";
while( my $line = <$fh> ){
my #cols = split('\|', $line);
$dbh->do('INSERT INTO t1 (id, c1, c2, c3, c4) VALUES (?, ?, ?)',undef,$col[0],$col[1],$col[2],$col[3]);
}
close($fh);
$dbh->commit();
open my $fh, $ARGV[1] or die "Can't open $ARGV[1] for reading: $!";
while( my $line = <$fh> ){
my #cols = split('\|', $line);
$dbh->do('INSERT INTO t2 (id, c1, c2, c3, c4) VALUES (?, ?, ?)',undef,$col[0],$col[1],$col[2],$col[3]);
}
close($fh);
$dbh->commit();
open my $fh, $ARGV[2] or die "Can't open $ARGV[2] for reading: $!";
while( my $line = <$fh> ){
my #cols = split('\|', $line);
$dbh->do('INSERT INTO t3 (id, c1, c2, c3, c4) VALUES (?, ?, ?)',undef,$col[0],$col[1],$col[2],$col[3]);
}
close($fh);
$dbh->commit();
### process data
my $sql = 'SELECT t1.c1, t1.c2, t1.c3, t2.c2, t2.c3, t3.c2 FROM t1,t2,t3 WHERE t1.c1=t2.c1 AND t1.c1=t3.c1 ORDER BY t1.c1';
my $sth = $dbh->prepare($sql);
$sth->execute(1, 10);
while (my #row = $sth->fetchrow_array) {
print join("\t",#row)."\n";
}
$dbh->disconnect;
#unlink($dbfile);
Trying to understand your files. You have one file of head values (whatever those are) one file filled with phone numbers, and one file filled with addresses. Is that correct? Each file can have multiple head, addresses, or phone numbers, and each file somehow corresponds to each other.
Could you give an example of the data in the files, and how they relate to each other? I'll update my answer as soon as I get a better understanding on what your data actually looks like.
Meanwhile, it's time to learn about references. References allow you to create more complex data structures. And, once you understand references, you can move onto Object Oriented Perl which will really allow you to tackle programming tasks that you didn't know were possible.
Perl references allow you to have hashes of hashes, arrays of arrays, arrays of hashes, or hashes of arrays, and of course those arrays or hashes in that array or hash can itself have arrays or hashes. Maybe an example will help.
Let's say you have a hash of people assigned by employee number. I'm assuming that your first file is employee_id|name, and the second file is address|city_state, and the third is home_phone|work_phone:
First, just read in the files into arrays:
use strict;
use warnings;
use autodie;
use feature qw(say);
open my $heading_fh, "<", $file1;
open my $address_fh, "<", $file2;
open my $phone_fh, "<", $file3;
my #headings = <$heading_fh>;
chomp #headings;
close $heading_fh;
my #addresses = <$address_fh>;
chomp #addresses;
close $address_fh;
my #phones = <$phone_fh>;
chomp #phones;
close $phone_fh;
That'll make it easier to manipulate the various data streams. Now, we can go through each row:
my %employees;
for my $employee_number (0..$#headings) {
my ( $employee_id, $employee_name ) = split /\s*\|\s*/, $employees[$employee_number];
my ( $address, $city ) = split /\s*\|\s*/, $phones[$employee_number];
my ( $work_phone, $home_phone ) = split /\s*\|\s*/, $addresses[$employee_number];
my $employees{$employee_id}->{NAME} = $employee_name;
my $employees{$employee_id}->{ADDRESS} = $address;
my $employess{$employee_id}->{CITY} = $city;
my $employees{$employee_id}->{WORK} = $work_phone;
my $employees{$employee_id}->{HOME} = $home_phone;
}
Now, you have a single hash called %employees that is keyed by the $employee_id, and each entry in the hash is a reference to another hash. You have a hash of hashes.
The end result is a single data structure (your %employees) that are keyed by $employee_id, but each field is individually accessible. What is the name of employee number A103?, It's $employees{A103}->{NAME}.
Code is far from complete. For example, you probably want to verify that all of your initial arrays are all the same size and die if they're not:
if ( ( not $#employees == $#phones ) or ( not $#employees == $#addresses ) ) {
die qq(The files don't have the same number of entries);
}
I hope the idea of using references and making use of more complex data structures makes things easier to handle. However, if you need more help. Post an example of what your data looks like. Also explain what the various fields are and how they relate to each other.
There are many postings on Stackoverflow are look like this to me:
My data looks like this:
ajdjadd|oieuqweoqwe|qwoeqwe|(asdad|asdads)|adsadsnrrd|hqweqwe
And, I need to make it look like this:
##*()#&&###|##*##&)(*&!#!|####&(*&##

XML File Creation in Perl - Updated Requirements [duplicate]

This question already has answers here:
XML file creation in Perl
(2 answers)
Closed 8 years ago.
Sorry, I am posting this again but lot of requirements have been changed and I need advice.
My First input file is
Root1 TBLA KEY1 COLA A B
Root1 TBLA KEY1 COLB D E
Root1 TBLA KEY3 COLX M N
Root2 TBLB KEY4 COLX M N
Root2 TBLB KEY4 COLD A B
Root3 TBLC KEY5 COLD A B
My second input file is
Root1 TBLA KEY6
Root2 TBLB KEY7
Root3 TBLC KEY8
My third input file is
Root1 TBLA KEY9
Root1 TBLA KEY10
Root3 TBLC KEY11
Basically File representation is
1) First file represents the old and new values. First is root table, Second is actual table in which diff is there. Third column tells the key value. Fourth and Fifth represents old and new value.
2) Second file represents the primary key which exists in db1 only and not in db2. First is root table, Second is actual table in which key exists. Third column tells the key value
3) Third file represents the primary key which exists in db2 only and not in db1. First is root table, Second is actual table in which key exists. Third column tells the key value
The output to be created in xml format as
<Data>
<Root1>
<TBLA>
<NEW1>
<KEY>KEY6</KEY>
<NEW1>
<NEW2>
<KEY>KEY9</KEY>
<KEY>KEY10</KEY>
<NEW2>
<MODIFIED>
<KEY name =KEY1>
<COLA>
<oldvalue>A</oldvalue>
<newvalue>B</newvalue>
</COLA>
<COLB>
<oldvalue>D</oldvalue>
<newvalue>E</newvalue>
</COLB>
</KEY>
<KEY name =KEY3>
<COLX>
<oldvalue>M</oldvalue>
<newvalue>N</newvalue>
</COLX>
</KEY>
</MODIFIED>
</TBLA>
</Root1>
<Data>
THIS IS NOT COMPLETE OUTPUT. PART OF OUTPUT IS DISPLAYED
Can anyone suggest what would be the best way to do this. Should i convert this text file to hash of hashes first and then try using pltoxml(). does this make sense. Can XML::Simple or XML::Writer suffice this.
This is the first time I am working on xml and not sure which approach will help efficicently my solution.
A small example wrt to my req would be appreciated.
*Input file will always be sorted on Root and then TBLNAME
Output format
Output contains for every root, every table in that root and that for every table, key which exists in one and then key which exists in second only. This comes in section new1 and new2 respectively. Third section contains Modified which needs to read from first input file and list the key value and with that key value what columns are modified (their old and new value)
If I have to use XML::Simple, how do i create hashref from these files which i can pass it to XMLout. There is no key in any of these files.
This is simply a matter of using split to split the data into fields, storing it into a hash and then transforming it using XML::Simple.
Note that I stick things into an array to enforce the order you intended.
All the data is read from the DATA handle. You shouldn't need me to show you IO code.
The #processors array is simply the different processors you would use on the various files:
Code:
use 5.016;
use strict;
use warnings;
use XML::Simple qw(:strict);
my %roots;
my #processors
= ( sub {
my ( $root, $table, $key, $col, $old, $new ) = split /\s+/;
$roots{ $root }{ $table }[2]{MODIFIED}{ $col }
= { oldvalue => $old
, newvalue => $new
};
return;
}
, sub {
my ( $root, $table, $key ) = split /\s+/;
push #{ $roots{ $root }{ $table }[0]{NEW1}{KEY} }, $key;
}
, sub {
my ( $root, $table, $key ) = split /\s+/;
push #{ $roots{ $root }{ $table }[1]{NEW2}{KEY} }, $key;
}
);
my $processor = shift #processors;
while ( <> ) {
chomp;
if ( $_ eq '---' ) {
$processor = shift #processors;
}
else {
$processor->( $_ );
}
}
my $xs = XML::Simple->new( NoAttr => 1, RootName => 'Data', );
my $xml = $xs->XMLout( \%roots, KeyAttr => {} );
say $xml;
It produces:
<Data>
<Root1>
<TBLA>
<NEW1>
<KEY>KEY6</KEY>
</NEW1>
</TBLA>
<TBLA>
<NEW2>
<KEY>KEY9</KEY>
<KEY>KEY10</KEY>
</NEW2>
</TBLA>
<TBLA>
<MODIFIED>
<COLA>
<newvalue>B</newvalue>
<oldvalue>A</oldvalue>
</COLA>
<COLB>
<newvalue>E</newvalue>
<oldvalue>D</oldvalue>
</COLB>
<COLX>
<newvalue>N</newvalue>
<oldvalue>M</oldvalue>
</COLX>
</MODIFIED>
</TBLA>
</Root1>
<Root2>
<TBLB>
<NEW1>
<KEY>KEY7</KEY>
</NEW1>
</TBLB>
<TBLB></TBLB>
<TBLB>
<MODIFIED>
<COLD>
<newvalue>B</newvalue>
<oldvalue>A</oldvalue>
</COLD>
<COLX>
<newvalue>N</newvalue>
<oldvalue>M</oldvalue>
</COLX>
</MODIFIED>
</TBLB>
</Root2>
<Root3>
<TBLC>
<NEW1>
<KEY>KEY8</KEY>
</NEW1>
</TBLC>
<TBLC>
<NEW2>
<KEY>KEY11</KEY>
</NEW2>
</TBLC>
<TBLC>
<MODIFIED>
<COLD>
<newvalue>B</newvalue>
<oldvalue>A</oldvalue>
</COLD>
</MODIFIED>
</TBLC>
</Root3>
</Data>

Unix join on more than two files

I have three files, each with an ID and a value.
sdt5z#fir-s:~/test$ ls
a.txt b.txt c.txt
sdt5z#fir-s:~/test$ cat a.txt
id1 1
id2 2
id3 3
sdt5z#fir-s:~/test$ cat b.txt
id1 4
id2 5
id3 6
sdt5z#fir-s:~/test$ cat c.txt
id1 7
id2 8
id3 9
I want to create a file that looks like this...
id1 1 4 7
id2 2 5 8
id3 3 6 9
...preferably using a single command.
I'm aware of the join and paste commands. Paste will duplicate the id column each time:
sdt5z#fir-s:~/test$ paste a.txt b.txt c.txt
id1 1 id1 4 id1 7
id2 2 id2 5 id2 8
id3 3 id3 6 id3 9
Join works well, but for only two files at a time:
sdt5z#fir-s:~/test$ join a.txt b.txt
id1 1 4
id2 2 5
id3 3 6
sdt5z#fir-s:~/test$ join a.txt b.txt c.txt
join: extra operand `c.txt'
Try `join --help' for more information.
I'm also aware that paste can take STDIN as one of the arguments by using "-". E.g., I can replicate the join command using:
sdt5z#fir-s:~/test$ cut -f2 b.txt | paste a.txt -
id1 1 4
id2 2 5
id3 3 6
But I'm still not sure how to modify this to accomodate three files.
Since I'm doing this inside a perl script, I know I can do something like putting this inside a foreach loop, something like join file1 file2 > tmp1, join tmp1 file3 > tmp2, etc. But this gets messy, and I would like to do this with a one-liner.
join a.txt b.txt|join - c.txt
should be sufficient
Since you're doing it inside a Perl script, is there any specific reason you're NOT doing the work in Perl as opposed to spawning in shell?
Something like (NOT TESTED! caveat emptor):
use File::Slurp; # Slurp the files in if they aren't too big
my #files = qw(a.txt b.txt c.txt);
my %file_data = map ($_ => [ read_file($_) ] ) #files;
my #id_orders;
my %data = ();
my $first_file = 1;
foreach my $file (#files) {
foreach my $line (#{ $file_data{$file} }) {
my ($id, $value) = split(/\s+/, $line);
push #id_orders, $id if $first_file;
$data{$id} ||= [];
push #{ $data{$id} }, $value;
}
$first_file = 0;
}
foreach my $id (#id_orders) {
print "$d " . join(" ", #{ $data{$id} }) . "\n";
}
perl -lanE'$h{$F[0]} .= " $F[1]" END{say $_.$h{$_} foreach keys %h}' *.txt
Should work, can't test it as I'm answering from my mobile. You also could sort the output if you put a sort between foreach and keys.
pr -m -t -s\ file1.txt file2.txt|gawk '{print $1"\t"$2"\t"$3"\t"$4}'> finalfile.txt
Considering file1 and file2 have 2 columns and 1 and 2 represents columns from file1 and 3 and 4 represents columns from file2.
You can also print any column from each file in this way and it will take any number of files as input. If your file1 has 5 columns for example, then $6 will be the first column of the file2.

How can match records in two files using Perl?

I have two files, CUSTOMER_ACCOUNT_LOG.TXT, CUSOMER_ID_LOG.TXT.
Am this is log, maintain the timestamp and account id, same like in another file timestamp and customerid,
simple, i want to pick the AccountID and CustomerID with matched TIMESTAMP,
For example, 123456793 is TIMESTAMP, FOR this Equlent match records are ABC0103,CUSTOMER_ID_0103,
like this i want to pick detaild and need to make these matched records wrtite into another file,
CUSOMER_ACCOUNT_LOG.TXT
TIMESTAMP| N1| N2 |ACCOUNT ID
-----------------------------------
123456789,111,1000,ABC0101
123456791,112,1001,ABC0102
123456793,113,1002,ABC0103
123456795,114,1003,ABC0104
123456797,115,1004,ABC0105
123456799,116,1005,ABC0106
123456801,117,1006,ABC0107
123456803,118,1007,ABC0108
123456805,119,1008,ABC0109
123456807,120,1009,ABC0110
123456809,121,1010,ABC0111
123456811,122,1011,ABC0112
123456813123,1012,ABC0113
123456815,124,1013,ABC0114
123456817,125,1014,ABC0115
123456819,126,1015,ABC0116
123456821,127,1016,ABC0117
123456823,128,1017,ABC0118
123456825,129,1018,ABC0119
123456827,130,1019,ABC0120
123456829,131,1020,ABC0121
CUSOMER_ID_LOG.TXT
TIMESTAMP| N1| N2 | CUSTOMER ID
-----------------------------------
123456789,111,1000,CUSTOMER_ID_0101
123456791,112,1001,CUSTOMER_ID_0102
123456793,113,1002,CUSTOMER_ID_0103
123456795,114,1003,CUSTOMER_ID_0104
123456797,115,1004,CUSTOMER_ID_0105
123456799,116,1005,CUSTOMER_ID_0106
123456801,117,1006,CUSTOMER_ID_0107
123456803,118,1007,CUSTOMER_ID_0108
123456805,119,1008,CUSTOMER_ID_0109
123456807,120,1009,CUSTOMER_ID_0110
123456809,121,1010,CUSTOMER_ID_0111
123456811,122,1011,CUSTOMER_ID_0112
123456813123,1012,CUSTOMER_ID_0113
123456815,124,1013,CUSTOMER_ID_0114
123456817,125,1014,CUSTOMER_ID_0115
123456819,126,1015,CUSTOMER_ID_0116
123456821,127,1016,CUSTOMER_ID_0117
123456823,128,1017,CUSTOMER_ID_0118
123456825,129,1018,CUSTOMER_ID_0119
123456827,130,1019,CUSTOMER_ID_0120
123456829,131,1020,CUSTOMER_ID_0121
I am a PHP programer, and new to Perl.
First i read the file, and then i just maded array, now my array contains the timestampe rest of the required details, actually what should do know ? we should read the file and fille values into array, so guess, array key should contain Account id and array value should timestamp wise versa not sure, same like another file, finally we should compare the time stamp, which timestamps are matched then timestamps account id and customer id we should pick, upto my knowledge i filled the array, now i dont knwo how to proceed further, because, here should use the foreach and then need to match noth file timestamps, am stuck here !
Here are the steps I'd take:
0) Some rudimentary Perl boilerplate (this is step 0 because you should always always always do it, and some people will add other stuff to this boilerplate, but this is the bare minimum):
use strict;
use warnings;
use 5.010;
1) Read the first file into a hash whose keys are the timestamp:
my %account;
open( my $fh1, '<', $file1 ) or die "$!";
while( my $line = <$fh1> ) {
my #values = split ',', $line;
$account{$values[0]} = $values[3];
}
close $fh1;
2) Read the second file, and each time you read a line, pull out the timestamp, then print out the timestamp, the account ID, and the customer ID to a new file.
open( my $out_fh, '>', $outfile ) or die "$!";
open( my $fh2, '<', $file2 ) or die "$!";
while( my $line = <$fh2> ) {
my #values = split ',', $line;
say $out_fh join ',', $values[0], $account{$values[0]}, $values[3];
}
close $out_fh;
close $fh2;
You don't want to read the whole file into an array because that's a waste of memory. Only store the information that you need, and take advantage of Perl's datatypes to help you store that information.