Load multiple csv file in oracle table from perl - perl

after some research decided to put question here for more expert answers.Couldn't find exact scenario as my problem so here it goes...
I think it will take few days for me to get something working, can't even think about how to move forward now.
DB: 11gR2
OS: Unix
I'm trying to load multiple csv file into Oracle table using perl script.
List what all csv I need to work on, since directory where csv file exist contains many other files.
Open csv file and insert into table
If there are any error then rollback all inserts of that file and move into next file
Record how many inserts done by each file
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
use DBD::Oracle;
my $exitStatus = 0;
my $dow = `date +%a`; chomp $dow;
my $csvDow = `date -dd +%a`; chomp $csvDow;
# define logfile
my logFile;
$logFile = "log.dbinserts"
# define csv file directory
my $csvLogDir = "Home/log/$csvDow";
# csv Files in array to list all possible match of file
opendir(my $dh, $csvLogDir ) || die "can't opendir $csvLogDir : $!";
my #csvFile = grep { /csv.*host1/ && -f "$csvLogDir/$_" } readdir($dh); chomp #csvFile;
closedir $dh;
foreach my $i (#csvFile)
{
$logFile (CSV File: $i);
}
foreach my $file (#csvFile)
{
chomp ($item);
$logFile-> ("Working under: $file");
&insertRecords($csvLogDir."/".$file);
}
$logFile-> ("Exit status")
#----------------
sub insertRecords
{
my $filetoInsert=shift;
my $row;
open my $fh, "<" or die "$fileToInsert: $!";
my $csv = Text::CSV->new ({
binary =>1,
auto_diag =>1,
});
while ($row = $csv->getline ($fh))
{
print "first column : $row->[0]\n,";
}
close $fh;
}
========
CSV File
=========
date, host, first, number1, number2
20141215 13:05:08, S1, John, 100, 100.20
20141215 13:06:08, S2, Ray, 200, 200.50
...
...
...
=========
Table - tab1
=========
Sample_Date
Server
First
N1
N2

For the first step it depends one which criteria you'll need to select your CSV files
if it's on the name of those CSV you could simply use opendir and get the list of files with readd :
my $dirToScan = '/var/data/csv';
opendir(my $dh, $dirToScan ) || die "can't opendir $dirToScan : $!";
my #csvFiles = grep { /.csv$/ && -f "$some_dir/$_" } readdir($dh);
closedir $dh;
In this example you'll retrieve a array with all the files that end whith .csv (whithin the design dir)
After that you'll need to use your foreach on your array.
You can find more example and explanation here
I don't know the structure of your CSV but I would advise to use a module like Text::CSV, it's a simple CSV parser that will wrap Text::CSV_PP or Text::CSV_XS, if it's installed on your system ( it's faster than the PP version (because written in perl/XS)
this module allows you to transform a CSV row in a array like this :
use Text::CSV;
my $file = "listed.csv";
open my $fh, "<", $file or die "$file: $!";
my $csv = Text::CSV->new ({
binary => 1, # Allow special character. Always set this
auto_diag => 1, # Report irregularities immediately
});
while (my $row = $csv->getline ($fh)) {
print "first colum : $row->[0]\n";
}
close $fh;
from : perlmeme.org
You'll need to open() your file (within the foreach loop), pass it to the Text::CSV element (you can declare your parser outside of the loop)
That's the easiest case where you know the column number of you CSV, if you need to use the column name you'll need to user the getline_hr() function (see the CPAN doc of Text::CSV)
And once you have your values (you should be whithin the foreach loop of you file list and in the while, that list the rows of your CSV, you will need to insert this data in your database.
For this you'll need the DBD::Oracle module that will allow you to connect to the database.
Like every DBI connector you'll need to instanciate a connection, using this syntax :
use DBI;
$dbh = DBI->connect("dbi:Oracle:$dbname", $user, $passwd);
And then in your loop (while your reading you CSV rows) you should be able to do something like this :
$SQL = "INSERT INTO yourTable (foobar,baz) VALUES (?,?)";
$sth = $dbh->prepare($SQL);
$sth->execute($row->[0],$row->[1]);
here you have tree step where you prepare the request with the value replaced by '?' (you can also use declared variable instead, if you have a lot of columns)
after the preparation you execute the request with the desired value (once again you don't have to use anonymous vars)
To catch if the request failed you only have to set RaiseError to when the connection is declared, that would look like something like this :
$dbh = DBI->connect("dbi:Oracle:$dbname", $user, $passwd,
{
PrintError => 1,
PrintWarn => 1,
RaiseError => 1
});
And then when playing the request :
try
{
$sth->execute($row->[0],$row->[1]);
}
catch
{
warn "INSERT error : $_";
$CSVhasFailures = 1;
};
You'll need to set the value of $CSVhasFailures to 0 before each CSV
After that, by testing the value of the CSVhasFailures at the end of the while loop you could decide to execute a commit or a rollback using the integrated function commit and rollback whithin the DBD::Oracle module
if you wan't to count the number of insert you'll just have to put a $counter++ after the $sth->execute statement
for more info on the DBD::Oracle I would suggest you to read the CPAN documentation page.
Last suggestion, begin step by step : Lists your CSV files, read the rows of each CSV, read a column, print a set of column and then insert you data in a temporary table.

Related

Using Text::CSV to print a line on a csv files based on variables content

My script is fairly large but I'll simplify the code here.
Suppose that I create a CSV and I write the header like this:
my $csv = Text::CSV->new ({binary => 1}, eol => "\n");
open(my $out, ">", "$dir/out.csv") or die $!; #create
$out->print("CodeA,CodeB,Name,Count,Pos,Orientation\n"); #I write the header
Suppose that I got the some values stored in different variables and I want to write those variables as a line in the CSV.
I cannot figure out how, because on the Text::CSV documentation the print is not clearly explained, there's no direct examples and i don't know what an array ref is.
Here's a trivial example of using Text::CSV to write a CSV file. It generates a header line and a data line, and does so from fixed data.
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({binary => 1, eol => $/ })
or die "Failed to create a CSV handle: $!";
my $filename = "output.csv";
open my $fh, ">:encoding(utf8)", $filename or die "failed to create $filename: $!";
my(#heading) = ("CodeA", "CodeB", "Name", "Count", "Pos", "Orientation");
$csv->print($fh, \#heading); # Array ref!
my(#datarow) = ("A", "B", "Abelone", 3, "(6,9)", "NW");
$csv->print($fh, \#datarow); # Array ref!
close $fh or die "failed to close $filename: $!";
The row of data is collected in an array — I used #heading and #datarow. If I was outputting several rows, each row could be collected or created in #datarow and then output. The first argument to $csv->print should be the I/O handle — here, $fh, a file handle for the output file. The second should be an array ref. Using \#arrayname is one way of creating an array ref; the input routines for the Text::CSV module also create and return array refs.
Note the difference between the notation used here in the Text::CSV->new call and the notation used in your example. Also note that your $out->print("…"); call is using the basic file I/O and nothing to do with Text::CSV. Contrast with $csv->print($fh, …).
The rest of the code is more or less boilerplate.
output.csv
CodeA,CodeB,Name,Count,Pos,Orientation
A,B,Abelone,3,"(6,9)",NW
Note that the value with an embedded comma was surrounded by quotes by the Text::CSV module. The other values did not need quotes so they did not get them. You can tweak the details of the CSV output format with the options to Text::CSV->new.
For the headers you can use
$status = $csv->print ($out,[qw(CodeA CodeB Name Count Pos Orientation)]);
and for a row of values use
$status = $csv->print ($out,[$valueA,$valueB,$valueName,$valueCount,$valuePos,$valueOrientation]);
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
my $csv = Text::CSV->new ({
binary => 1,
auto_diag => 1,
sep_char => ',' # not really needed as this is the default
});
my $sum = 0;
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
$sum += $fields->[2];
}
if (not $csv->eof) {
$csv->error_diag();
}
close $data;
print "$sum\n";

Need help using perl's IO::Handle::Sync on a 64-bit strawberry perl installation

I'm a .net developer with no perl experience. I have to set up a perl script as a scheduled task on my company's 64-bit Windows Server 2012. The script was written by another department in the company, and now my department has to take over handling it. Strawberry Perl (64-bit) 5.20.2.1-64bit is installed on the server. I've managed to figure out how to install perl, change the info in the program, etc, so that it all points to the new server.
When I try to run the script, I get this error when the program tries to read data from a csv and upsert it to a database: "IO::Handle::sync not implemented on this architecture".
IO::Handle is installed on the server, but I can't install IO::Handle::sync. I think it has something to do with the server being 64-bit?
I don't know enough about perl to feel comfortable changing the script to use a different module, and I don't have enough time to learn the language before this has to be set up. Is there anything I can do to make IO::Handle::sync work on this system? Can I install 32-bit Strawberry Perl on the 64-bit server? and if so, would that solve the problem?
Here's the function that I'm having problems with:
# Convert CSV data file to bulk insert format.
sub convert_data_file ($$$$$$$) {
my ($omniture_dbh, $omniture_mappings, $feed, $file, $basename, $table, $rsid) = #_;
# Open CSV data file.
print "Processing data file: $file";
open my $in, "<:encoding(utf8)", $file or die "$file: $!\n";
# Remember start time.
my $start = time;
# Discard byte-order mark if present.
seek $in, 0, 0 if sysread $in, $_, 1 and $_ ne "\x{FEFF}";
# Initialize CSV parser.
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2, allow_loose_quotes => 1, allow_loose_escapes => 1, allow_whitespace => 1 });
# Read header row from CSV data file.
my $header = $csv->getline($in) or die "Header line missing!\n";
# CSV row.
my $row = {};
# Bind CSV data to hash elements by header field name.
$csv->bind_columns(\#{$row}{#{$header}});
# Map CSV fields to database columns.
my ($column_source, $hooks) = map_csv_fields $omniture_dbh, $omniture_mappings, $feed, $header, $row;
# Map database columns for bulk insert.
my $fields = map_database_columns $omniture_dbh, $table, $row, $basename, $rsid, $feed->{feed_version}, $column_source;
# Use ".data" extension for bulk insert data file.
my $bulk_insert_file = "$basename.data";
# Open bulk insert data file.
open my $out, ">:encoding(UCS-2LE)", $bulk_insert_file or die "$bulk_insert_file: $!\n";
# Write Byte-Order Mark (BOM).
print $out "\x{FEFF}";
# Record counter.
my $records = 0;
# Read data rows from CSV data file.
while ($csv->getline($in)) {
# Call hooks as necessary.
$_->() foreach #{$hooks};
# Create bulk insert data record from mapped data values.
$_ = join "|~|", map { ${$_}; } #{$fields};
# Unescape hex escapes.
s/\\x([A-Fa-f0-9]{2})/pack "C", hex $1/eg;
# Strip ASCII control codes (except newline/tab) and invalid UCS-2 characters.
{
no warnings;
tr/\n\t\x{0020}-\x{d7ff}\x{e000}-\x{ffff}//cd;
}
# Unescape backslashes, newlines and tabs.
s/\\(\\|\n|\t)/$1/g;
# Write data record and record terminator to bulk insert data file.
print $out "$_|~~|\n";
# Increment record counter.
$records++;
}
# Close CSV data file.
close $in or die "$file: $!";
# Flush output buffers.
$out->flush;
# Sync file to disk.
$out->sync;
# Close bulk insert data file.
close $out or die "$bulk_insert_file: $!";
# Print informational message.
printf " (%d records converted (%.2f seconds)", $records, time - $start;
# Return variables of interest.
return $column_source, $bulk_insert_file, $records;
}

Perl modifying CSV files

I have a small section of code I'm trying to modify.
What I'm trying to do is have the filename inputted into the third column. At the moment I almost have it working, but I'd like to remove the ".csv"s from the end of each entry in the column. I'd also like to give the column the heading "filename".
I hope the difference between "table1" and "table2" shown above summarises quite well the modification which I'm trying to make here.
The code I'm currently using to create "table1" is the following:
#!/usr/bin/perl
use warnings;
use strict;
open M,"<mapcodelist.txt" or die "mapcodelist.txt $!";
my %m;
while( <M> ){
my($k,$v)=split;
$v=~s/\./_/g;
$m{$k}=$v;
}
close M;
chdir "C:/Users/Stephen/Desktop/Database_Design/" or die $!;
#ARGV=<*.csv>;
$^I=".bak";
while( <> ){
chomp;
$\=/^mass/?",filename$/": ",$ARGV$/";
print;
}
for( <*.csv> ){
my $r;
($r=$_) =~ s/\w+_(\w+)(?=\.csv)/$1_$m{$1}/;
rename $_,$r or warn " rename $_,$r $!";
}
Any advice with this would be very much appreciated.
Thanks.
You can try following perl script:
#!/usr/bin/env perl;
use strict;
use warnings;
use Text::CSV_XS;
my ($prev_lc);
open my $fh, '<', shift or die;
my $csv = Text::CSV_XS->new({ eol => "\n" }) or die;
while ( my $row = $csv->getline($fh) ) {
if ( $csv->record_number == 1 ) {
$prev_lc = $row->[$#$row];
$csv->print( \*STDOUT, [ #$row[0 .. $#$row - 1], 'Filename' ] );
next;
}
$prev_lc =~ s/\.csv$//;
$csv->print( \*STDOUT, [ #$row[0 .. $#$row - 1], $prev_lc ] );
## Previous last column.
$prev_lc = $row->[$#$row];
}
It uses an auxiliar variable to add the missing header and process each whole data line at the same time. I simply use a regular expression to remove the extension.
With following dummy test data (infile) and assuming that last line doesn't have a file name because of the header:
mass,intensity,20130730_p12_A2.csv
2349.345,56.23423,20130730_p12_A2.csv
744.2884,5.01
Run the script like:
perl script.pl infile
That yields:
mass,intensity,Filename
2349.345,56.23423,20130730_p12_A2
744.2884,20130730_p12_A2
Perhaps it's not perfect based in particular data that you didn't show, and I didn't take into account all that code that you posted where you handle many files. But you can see that it works in the way you asked it and it's left as work for you to adapt it to your needs, if neccesary.

Write from database using Perl Text::CSV

Am trying to use database query to print query output to CSV but can't get the output on to separate lines. How to do so?
Here's the code:
use warnings;
use DBI;
use strict;
use Text::CSV;
#set up file
my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attribute.
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
#set up query
my $dbh = DBI->connect("DBI:mysql:db", "name") or die ("Error: $DBI::errstr");
my $sql = qq(select * from one_table join two_table using (primary_key));
my $query = $dbh->prepare($sql);
$query->execute;
#loop through returned rows of query and save each row as an array
while ( my (#row ) = $query->fetchrow_array ) {
#print each row to the csv file
$csv->print ($fh, [#row]);
# every line seems to be appended to same line in "new.csv"
# tried adding "\n" to no avail
}
close $fh or die "new.csv: $!";
This must be a common use case but couldn't find anything about issues with new lines.
I assume your problem is that all your CSV data ends up on the same line?
You should set the eol option in your CSV object:
my $csv = Text::CSV->new ( {
binary => 1, # should set binary attribute.
eol => $/, # end of line character
}) or die "Cannot use CSV: ".Text::CSV->error_diag ();
This character will be appended to the end of line in print. You might also consider not copying the values from your fetchrow call every iteration, since print takes an array ref. Using references will be more straightforward.
while (my $row = $query->fetchrow_arrayref) {
....
$csv->print($fh, $row);
First of all, you have a missing semicolon at the end of the line
my $sql = qq(select * from one_table join two_table using (primary_key))
By default, Text::CSV uses the current value of $\, the output record separator at end of line. And, again by default, this is set to undef, so you won't get any separator printed.
You can either set up your $csv object with
my $csv = Text::CSV->new({ binary => 1, eol => "\n" });
or just print the newline explicitly, like this. Note that's there's no need to fetch the row into an array and then copy it to an anonymous array to get this to work. fetchrow_arrayref will return an array reference that you can just pass directly to print.
while (my $row = $query->fetchrow_arrayref) {
$csv->print($fh, $row);
print $fh "\n";
}
try this sql query
select * from one_table join two_table using (primary_key)
INTO OUTFILE '/tmp/new.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'

Perl parsing the csv file

I am just trying to read .csv file first time.I have gone through the below link :
http://metacpan.org/pod/Text::CSV_XS#Reading-a-CSV-file-line-by-line:
I have few doubt, well if you want, u can tell me this are silly question but i don't know, why i am not able to figure it out that how exactly perl is reading csv file :(
So, my doubt is:
First Question
What is the difference between reading the csv file line by line and parsing the file.
I have simple program where i am reading the csv file line by line.
Below is my program:
#!/usr/bin/perl -w
use strict;
use Text::CSV;
use Data::Dumper;
my $csv=Text::CSV->new( );
my $my_file="test.csv";
open(my $fl,"<",$my_file) or die"can not open the file $!";
#print "$ref_list\n";
while(my $ref_list=$csv->getline($fl))
{
print "$ref_list->[0]\n";
}
Below is the data in csv file :
"Emp_id","Emp_name","Location","Company"
102713,"raj","Banglore","abc"
403891,"Rakesh","Pune","Infy"
530201,"Kiran","Hyd","TCS"
503110,"raj","Noida","HCL"
Second Question:
If I want to get specific Emp_id along with Location then how can i proceed.
Third Question :
If I want only 102713 ,530201,503110 Emp record i.e name,location,compnay name then what should i do ?
Thanks
A CSV file is a good representation of tabular data in a text format, but it is unsuitable for an in-memory represenation. Because of that, we have to create an adequate representation. One such representation would be a hash:
my $hashref = {
Emp_Id => ...,
Emp_name => ...,
Location => ...,
Company => ...,
};
If the header row is in the array #header, we can create this hash with:
my #header = ...;
my #row = #{$csv->getline($fl)}; # turn the arrayref into an array
my $hashref = {};
for my $i (0..$#header) {
$hashref->{$header[$i]} = $row[$i];
}
# The $hashref now looks as described above
We can then create lookup hashes that use the id values as keys. So %lookup looks like this:
my %lookup = (
102713 => $hashref_to_first_line,
...,
);
We populate it by doing
$lookup{$row[0]} = $hashref;
after the above loop. We can then access a certain hashref with
my $a_certain_id_hashref = $lookup{102713};
or access certain elements directly with
my $a_certain_id_location = $lookup{102713}{Location};
If the key does not exist, these lookups should return undef.
If the CSV file is too big, this might cause perl to run out of memory. In that case, the hashes should be tied to files, but that is a different topic completely.
Here's another option that addresses your second question and part of your third question:
use Modern::Perl;
use Text::CSV;
my #empID = qw/ 102713 530201 503110 /;
my $csv = Text::CSV->new( { binary => 1 } )
or die 'Cannot use CSV: ' . Text::CSV->error_diag();
my $my_file = "test.csv";
open my $fl, '<', $my_file or die "can not open the file $!";
while ( my $ref_list = $csv->getline($fl) ) {
if ( $ref_list->[0] ~~ #empID ) {
say "Emp_id: $ref_list->[0] is Location: $ref_list->[2]";
}
}
$csv->eof or $csv->error_diag();
close $fl;
Output:
Emp_id: 102713 is Location: Banglore
Emp_id: 530201 is Location: Hyd
Emp_id: 503110 is Location: Noida
The array #empID contains the ID(s) you're interested in. In the while loop, each Emp_id is checked using the smart match operator (Perl v5.10+) to see if it's in the list of IDs. If so, the Emp_id and its corresponding Location is printed.