help using perl dbi and creating unique output files - perl

I am having a problem printing out the correct number of records for a given file. My test script simply does a perl dbi connection to a mysql database and given a list of tables, extracts (1) record per table.
For every table I list, I also want to print that (1) record out to its own file. For example if I have a list of 100 tables, I should expect 100 uniques files with (1) record each.
So far, I am able to generate the 100 files, but there is more than (1) record. There are up to 280 records in the file. Ironically, I am generating a unique key for each record and the keys are unique.
If I print out the $data to a single file (outside the foreach loop), I get the expected results, but in one single file. So one file with 100 records for example, but I want to create a file for each.
I seem to have a problem opening up a file handle and outputting this correctly? or something else is wrong with my code.
Can someone show me how to set this up properly? Show me some best practices for achieving this?
Thank you.
Here is my test code:
# Get list of table
my #tblist = qx(mysql -u foo-bar -ppassw0rd --database $dbsrc -h $node --port 3306 -ss -e "show tables");
#Create data output
my $data = '';
foreach my $tblist (#tblist)
{
chomp $tblist;
#Testing to create file
my $out_file = "/home/$node-$tblist.$dt.dat";
open (my $out_fh, '>', $out_file) or die "cannot create $out_file: $!";
my $dbh = DBI->connect("DBI:mysql:database=$dbsrc;host=$node;port=3306",'foo-bar','passw0rd');
my $sth = $dbh->prepare("SELECT UUID(), '$node', ab, cd, ef, gh, hi FROM $tblist limit 1");
$sth->execute();
while (my($id, $nd,$ab,$cd,$ef,$gh,$hi) = $sth->fetchrow_array() ) {
$data = $data. "__pk__^A$id^E1^A$nd^E2^A$ab^E3^A$cd^E4^A$ef^E5^A$gh^E6^A$hi^E7^D";
}
$sth->finish;
$dbh->disconnect;
#Testing to create file
print $out_fh $data;
close $out_fh or die "Failed to close file: $!";
};
#print $data; #Here if I uncomment and output to a single file, I can see the correct number of record, but its in (1) file

You need to clear $data on each $tblist loop iteration (outer loop).
In this line: $data = $data. "__pk__^A$id^E1^A$... you are appending the data from new table each iteration on TOP of the old data, and it gets preserved in your code between different tables since the $data variable is scoped OUTSIDE the outer loop and its value never gets reset inside of it.
The simplest solution is to declare $data inside the outer ($tblist) loop:
foreach my $tblist (#tblist) {
my $data = '';
You could keep declaring it before the outer loop and simply assign it "" value at the start of the loop, but there's no point - there is usually no legitimate reason to know the value of $data in a loop like this after a loop finishes so there's no need for it to be in the scope bigger than the loop block.

Related

How to extract unique fields from a CSV file using a Perl script

I have a CSV file with data that looks similar to this:
alpha,a,foo,bar
alpha,b,foo,bar
alpha,c,foo,bar
beta,d,foo,bar
beta,e,foo,bar
I'm able to use the following code to successfully create two new files using the data:
open (my $FH, '<', '/home/<username>/inputs.csv') || die "ERROR Cannot read file\n";
while (my $line = <$FH>) {
chomp $line;
my #fields = split "," , $line;
my $file = "ziggy.$fields[0]";
open (my $FH2, '>>', $file) || die "ERROR Cannot open file\n";
print $FH2 "$fields[1]\n";
print $FH2 "$fields[2]\n";
print $FH2 "$fields[3]\n\n";
close $FH2;
}
Basically, this code reads through the rows in the CSV file and creates content in files that are named based on the first field. So, the "ziggy.alpha" file has nine lines of content, while the "ziggy.beta" file has six lines of content. Note that I'm appending data to these files as the rows are being read via the "while" loop.
My challenge:
Following the data set example cited, I need to create a second pair of files that use the same "first field" naming convention (something like "zaggy.alpha" and "zaggy.beta"). The files will only be created once with static content written to them, and will not have additional data appended to them from the CSV file.
My question:
Is there a way to identify the unique values in the first field ("alpha" and "beta"), store them in a hash, then reference them in a "while" loop in order to create my second set of files while the inputs.csv file is open?
Thanks in advance for any insight that can be provided!
In perl you can a get a list of keys from an associative array like:
my #keys = keys %hash;
So something like this will work;
my %unique_first_values;
Then later in the loop.
$my_unique_first_values{$fields[0]} = 1;
You can then call 'keys' on the hash to get the unique values.
#unique = keys %my_unique_virst_values;
In order to "create my second set of files while the inputs.csv file is open" you're going to want to know if you've seen a value before.
The conventional way to do this in Perl is to create a hash to store previously-seen values, and check-then-set in order to determine whether you've seen it, record that it has been seen, and go on.
if (exists($seen_before{$key})) {
# seen it
}
else {
# new key!
$seen_before{$key} = 1;
}
Given that you're going to be opening files and appending data, it might make sense to store a file handle in the hash instead of a 1. That way, your # new key! code could just be opening the file, and your # seen it code could be a default condition (fall-through) writing the fields out. Something like this:
unless (exists($file_handle{$key})) {
$file_handle{$key} = open ... or die ...
}
# now we know it's in the hash, write the data:
print $file_handle{$key} ...

Dynamic Loop outputs the same on each iteration

I am attempting to write a script to automate some data collection. Initially the script runs a series of commands which are carried out by the system. The output of these commands is stored in two text files. Following data collection, I am attempting to implement a for loop so that a third output file is generated which lists the value of interest from the first line of output 1 and the second line of output one, as well as the relative error. The following code completes the correct number of times, but returns the same values on all four lines. I suspect this has to do with the filehandler variable, but am unsure how to solve the issue.
for($ln = 1; $ln<5;$ln++){
open($fh, '<',"theoretical.dat",<$ln>)
or die "Could not open file 'theoretical.dat' $!";
#line = split(' ',<$fh>);
$v = $line[3];
open($fh2, '<',"actual.dat",<$ln>)
or die "Could not open file 'actual.dat' $!";
#line = split(' ',<$fh2>);
$v0 = $line[3];
$e = abs(($v0-$v)/$v0);
$rms = $rms + $e^2;
my #result = ($v, $v0, $e);
print "#result \n";
}
The output file code has been omitted. It contains an if/else depending upon if output should be piped into results.dat or appended.
Note that the data in question is stored in as 4 numbers per line, only the fourth of which I wish to access with this script. From the output generated it seems that $ln is changing accordingly after each iteration, but the line being read is not despite the argument within the open command which dictates to read line number $ln.
I have tried undefing $fh and $fh2 after each loop, but it still outputs the same.
You can't specify the line number of a file on the open call. In fact reopening a file will cause it to be read again starting from the top.
Without seeing your data files I can't be sure, but I think you want something like this.
Note that you can use autodie instead of coding an explicit test for an open succeeding. You must also use strict and use warnings a the top of every Perl program, and declare all of your variables using my as close as possible to their first point of use. I have declared $rms outside the loop here so that it can accumulate an aggregate sum of squares instead of being destroyed and recreated each time around the loop.
use strict;
use warnings;
use autodie;
open my $theo_fh, '<', 'theoretical.dat';
open my $act_fh, '<', 'actual.dat';
my $rms;
for my $ln (1 .. 5) {
my $v_theo = (split ' ', <$theo_fh>)[3];
my $v_act = (split ' ', <$act_fh>)[3];
my $e = abs(($v_act - $v_theo) / $v_act);
my $rms = $rms + $e ^ 2;
my #result = ($v_theo, $v_act, $e);
print "#result\n";
}

Perl Win32::OLE Generating Access Database

Description: I am reading from a list of flat files and generating and loading an access database. Windows XP, Perl 5.8.8, and no access to additional modules outside the default installed.
Issue(s): Performance, Performance, Performance. It is taking ~20 minutes to load in all of the data. I am assuming that there might be a better way to load the data rather than addnew & update.
Logic: Without posting a lot of my transformations and additional logic here is what I am attempting:
Open file x
read row 0 of file x
jet->execute a Create statement from string dervied from step 2
read in rows 1 - n creating a tab delimitted string and store into an array
Open a recordset using select * from tablename
for each item in array
recordset->AddNew
split the item based on the tab
for each item in the split
rs->Fields->Item(pos)->{Value} = item_value
recordset->Update
One issue in slow loads is doing a commit on every update. Make sure that automatic commits are off and do one every 1000 rows or whatever. If it is not a gigantic load, don't do them at all. Also, do not create indexes during the load, create them afterwards.
Also, I'm not sure that OLE is the best way to do this. I load Access db's all of the time using DBI and Win32::ODBC. Goes pretty fast.
Per request, here is sample load program, did about 100k records per minute on WinXP, Access 2003, ActiveState Perl 5.8.8.
use strict;
use warnings;
use Win32::ODBC;
$| = 1;
my $dsn = "LinkManagerTest";
my $db = new Win32::ODBC($dsn)
or die "Connect to database $dsn failed: " . Win32::ODBC::Error();
my $rows_added = 0;
my $error_code;
while (<>) {
chomp;
print STDERR "." unless $. % 100;
print STDERR " $.\n" unless $. % 5000;
my ($source, $source_link, $url, $site_name) = split /\t/;
my $insert = qq{
insert into Links (
URL,
SiteName,
Source,
SourceLink
)
values (
'$url',
'$site_name',
'$source',
'$source_link'
)
};
$error_code = $db->Sql($insert);
if ($error_code) {
print "\nSQL update failed on line $. with error code $error_code\n";
print "SQL statement:\n$insert\n\n";
print "Error:\n" . $db->Error() . "\n\n";
}
else {
$rows_added++;
}
$db->Transact('SQL_COMMIT') unless $. % 1000;
}
$db->Transact('SQL_COMMIT');
$db->Close();
print "\n";
print "Lines Read: $.\n";
print "Rows Added: $rows_added\n";
exit 0;

Nested while loop which does not seem to keep variables appropriately

I'm an amature Perl coder, and I'm having a lot of trouble figuring what is causing this particular issue. It seems as though it's a variable issue.
sub patch_check {
my $pline;
my $sline;
while (<SYSTEMINFO>) {
chomp($_);
$sline = $_;
while (<PATCHLIST>) {
chomp($_);
$pline = $_;
print "sline $sline pline $pline underscoreline $_ "; #troubleshooting
print "$sline - $pline\n";
if ($pline =~ /($sline)/) {
#print " - match $pline -\n";
}
} #end while
}
}
There is more code, but I don't think it is relevant. When I print $sline in the first loop it works fine, but not in the second loop. I tried making the variables global, but that did not work either.
The point of the subform is I want to open a file (patches) and see if it is in (systeminfo). I also tried reading the files into arrays and doing foreach loops.
Does anyone have another solution?
It looks like your actual goal here is to find lines which are in both files, correct? The normal (and much more efficient! - it only requires you to read in each file once, rather than reading all of one file for each line in the other) way to do this in Perl would be to read the lines from one file into a hash, then use hash lookups on each line in the other file to check for matches.
Untested (but so simple it should work) code:
sub patch_check {
my %slines;
while (<SYSTEMINFO>) {
# Since we'll just be comparing one file's lines
# against the other file's lines, there's no real
# reason to chomp() them
$slines{$_}++;
}
# %slines now has all lines from SYSTEMINFO as its
# keys and the values are the number of times the
# line appears, in case that's interesting to you
while (<PATCHLIST>) {
print "match: $_" if exists $slines{$_};
}
}
Incidentally, if you're reading your data from SYSTEMINFO and PATCHLIST, then you're doing it the old-fashioned way. When you get a chance, read up on lexical filehandles and the three-argument form of open if you're not already familiar with them.
Your code is not entering the PATCHLIST while loop the 2nd time through the SYSTEMINFO while loop because you already read all the contents of PATCHLIST the first time through. You'd have to re-open the PATCHLIST filehandle to accomplish what you're trying to do.
That's a pretty inefficient way to see if the lines of one file match the lines of another file. Take a look at grep with the -f flag for another way.
grep -f PATCHFILE SYSTEMINFO
What I like to do in such cases is: read one file and create keys for a hash from the values you are looking for. And then read the second file and look if the keys are already existing. In this way you have to read each file only once.
Here is example code, untested:
sub patch_check {
my %patches = ();
open(my $PatchList, '<', "patch.txt") or die $!;
open(my $SystemInfo, '<', "SystemInfo.txt") or die $!;
while ( my $PatchRow = <$PatchList> ) {
$patches($PatchRow) = 0;
}
while ( my $SystemRow = <$SystemInfo> ) {
if exists $patches{$SystemRow} {
#The Patch is in System Info
#Do whateever you want
}
}
}
You can not read one file inside the read loop of another. Slurp one file in, then have one loop as a foreach line of the slurped file, the outer loop, the read loop.

Using perl to parse a file and insert specific values into a database

Disclaimer: I'm a newbie at scripting in perl, this is partially a learning exercise (but still a project for work). Also, I have a much stronger grasp on shell scripting, so my examples will likely be formatted in that mindset (but I would like to create them in perl). Sorry in advance for my verbosity, I want to make sure I am at least marginally clear in getting my point across
I have a text file (a reference guide) that is a Word document converted to text then swapped from Windows to UNIX format in Notepad++. The file is uniform in that each section of the file had the same fields/formatting/tables.
What I have planned to do, in a basic way is grab each section, keyed by unique batch job names and place all of the values into a database (or maybe just an excel file) so all the fields can be searched/edited for each job much easier than in the word file and possibly create a web interface later on.
So what I want to do is grab each section by doing something like:
sed -n '/job_name_1_regex/,/job_name_2_regex/' file.txt --how would this be formatted within a perl script?
(grab the section in total, then break it down further from there)
To read the file in the script I have open FORMAT_FILE, 'test_format.txt'; and then use foreach $line (<FORMAT_FILE>) to parse the file line by line. --is there a better way?
My next problem is that since I converted from a word doc with tables, which looks like:
Table Heading 1 Table Heading 2
Heading 1/Value 1 Heading 2/Value 1
Heading 1/Value 2 Heading 2/Value 2
but the text file it looks like:
Table Heading 1
Table Heading 2Heading 1/Value 1Heading 1/Value 2Heading 2/Value 1Heading 2/Value 2
So I want to have "Heading 1" and "Heading 2" as a columns name and then put the respective values there. I just am not sure how to get the values in relation to the heading from the text file. The values of Heading 1 will always be the line number of Heading 1 plus 2 (Heading 1, Heading 2, Values for heading 1). I know this can be done in awk/sed pretty easily, just not sure how to address it inside a perl script.
---EDIT---
For this I was thinking of doing an array something like:
my #heading1 = ($value1, $value2, etc.)
my #heading2 = ($value1, $value2, etc.)
I just need to be able to associate the correct values and headings together. So that heading1 = the line after heading2 (where the values start).
Like saying (in shell):
x=$(grep -n "Heading 1" file.txt | cut -d":" -f1) #gets the line that "Heading 1" is on in the file
(( x = x+2 )) #adds 2 to the line (where the values will start)
#print values from file.txt from the line where they start to the
#last one (I'll figure that out at some point before this)
sed -n "$x,$last_line_of_values p" file.txt
This is super-hacked together for the moment, to try to elaborate what I want to do...let me know if it clears it up a little...
---/EDIT---
After I have all the right values and such, linking it up to a database may be an issue as well, I haven't started looking at the way perl interacts with DBs yet.
Sorry if this is a bit scatterbrained...it's still not fully formed in my head.
http://perlmeme.org/tutorials/connect_to_db.html
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $driver = "mysql"; # Database driver type
my $database = "test"; # Database name
my $user = ""; # Database user name
my $password = ""; # Database user password
my $dbh = DBI->connect(
"DBI:$driver:$database",
$user, $password,
{
RaiseError => 1,
PrintError => 1,
}
) or die $DBI::errstr;
my $sth = $dbh->prepare("
INSERT INTO test
(col1, col2)
VALUES (?, ?)
") or die $dbh->errstr;
my $intable = 0;
open my $file, "file.txt" or die "can't open file $!";
while (<$file>) {
if (/job_name_1_regex/../job_name_2_regex/) { # job 1 section
$intable = 1 if /Table Heading 1/; # table start
if ($intable) {
my $next_line = <$file>; # heading 2 line
chomp; chomp $next_line;
$sth->execute($_, $next_line) or die $dbh->errstr;
}
}
}
close $file or die "can't close file $!";
$dbh->disconnect;
Several things in this post... First, the basic "best practices" :
use modern perl. start your scripts with
use strict; use warnings;
don't use global filehandles, use lexical filehandles (declare them in a variable).
always check "open" for return values.
open my $file, "/some/file" or die "can't open file : $!"
Then, about pattern matching : I don't understand your example at all but I suppose you want something like :
foreach my $line ( <$file> ) {
if ( $line =~ /regexp1/) {
# do something...
}
}
Edit : about table, I suppose the best thing is to build two arrays, one for each column.
If I understand correctly when reading the file you need to split the line and put one part in the #col1 array, and the second part in the #col2 array. The clear and easy way is to use two temporary variables :
my ( $val1, $val2 ) = split /\s+/, $line;
push #col1, $val1;
push #col2, $val2;