CSV import to MySQL - perl

Hi I keep getting an error when trying to run the following perl script to import a csv file into an existing mysql database table. Every time I run it I get the message "Died at /home/perl/dep_import_2.pl line 10.
Any help would be appreciated
Thanks
#!/usr/bin/perl
use DBI;
use DBD::mysql;
use warnings "all";
if ($#ARGV != 0) {
print "Usage: dep_import_2.pl filename\n";
die;
}
$filename = $ARGV[0];
# MySQL CONFIG VARIABLES
$host = "localhost";
$user = "standard";
$pw = "standard";
$database = "data_track";
$dsn = "DBI:mysql:database=" . $database . ";host=" . $host;
$dbh = DBI->connect($dsn, $user, $pw)
or die "Can't connect to the DB: $DBI::errstr\n";
print "Connected to DB!\n";
open FILE, "/home/dep/new_study_forms_2.csv", $filename or die $!;
$_ = <FILE>;
$_ = <FILE>;
while (<FILE>) {
#f = split(/,/, $_);
$sql = "INSERT INTO dep (date, subject, weight, size, time, hi_pre, hi_post, hi_afternoon, hi_test, actical_on, actical_off, saggital_1, saggital_2, crown_heel1, crown_heel2, crown_rump1, crown_rump2, scan, record_number, tap, sample, dye, left_chip, right_chip) VALUES('$f[0]', '$f[1]', '$f[2]', '$f[3]' '$f[4]', '$f[5]', '$f[6]', '$f[7]', '$f[8]', '$f[9]', '$f[10]', '$f[11]', '$f[12]', '$f[13]', '$f[14]', '$f[15]', '$f[16]', '$f[17]', '$f[18]', '$f[19]', '$f[20]', '$f[21]', '$f[22]', '$f[23]')";
print "$sql\n";
my $query = $dbh->do($sql);
}

There are a few issues with your code. First, and most importantly, you are not using
use strict;
use warnings;
This is bad because you will not get information about errors in your code without them.
As others have pointed out, the reason the script dies is because $#ARGV is not zero. Meaning that you have either passed too few or too many arguments to the script. The arguments to the script must be exactly one, like the usage statement says.
However, that would not solve your problem, because your open statement below is screwed up. My guess is that you tried to add your file name directly. This line:
open FILE, "/home/dep/new_study_forms_2.csv", $filename or die $!;
It will probably give you the error unknown open() mode .... It should probably be
open FILE, "<", $filename or die $!;
And then you pass /home/dep/new_study_forms_2.csv to the script on the command line, assuming that is the correct file to use.
Also, in your query string, you should not interpolate variables, you should use placeholders, as is described in the documentation for DBI. The placeholders will take care of the quoting for you and avoid any data corruption. To make your query line a bit simpler, you can do something like:
my $sth = $dbh->prepare(
"INSERT INTO dep (date, subject, weight, size, time, hi_pre, hi_post,
hi_afternoon, hi_test, actical_on, actical_off, saggital_1, saggital_2,
crown_heel1, crown_heel2, crown_rump1, crown_rump2, scan, record_number,
tap, sample, dye, left_chip, right_chip)
VALUES(" . join(",", ("?") x #f) . ")");
$sth->execute(#f);

Here's a script which uses Text::CSV to properly parse CSV. It assumes that the first row contains column names, and then loads the CSV in batches, commiting after every 100 inserts. Every parameter (user, password, database) is configurable via command-line options. Usage is an in-line POD document.
#!/usr/bin/env perl
use strict;
use warnings qw(all);
use DBI;
use Getopt::Long;
use Pod::Usage;
use Text::CSV_XS;
=pod
=head1 SYNOPSIS
dep_import_2.pl --filename=file.csv --host=localhost --user=standard --pw=standard --database=data_track
=head1 DESCRIPTION
Loads a CSV file into the specified MySQL database.
=cut
my $host = 'localhost';
my $user = 'standard';
my $pw = 'standard';
my $database = 'data_track';
my $commit = 100;
GetOptions(
'help' => \my $help,
'filename=s' => \my $filename,
'host=s' => \$host,
'user=s' => \$user,
'pw=s' => \$pw,
'database=s' => \$database,
'commit=i' => \$commit,
) or pod2usage(q(-verbose) => 1);
pod2usage(q(-verbose) => 2) if $help;
my $dbh = DBI->connect("DBI:mysql:database=$database;host=$host", $user => $pw)
or die "Can't connect to the DB: $DBI::errstr";
my $csv = Text::CSV_XS->new
or die "Text::CSV error: " . Text::CSV->error_diag;
open(my $fh, '<:utf8', $filename)
or die "Can't open $filename: $!";
my #cols = #{$csv->getline($fh)};
$csv->column_names(\#cols);
my $query = "INSERT INTO dep (#{[ join ',', #cols ]}) VALUES (#{[ join ',', ('?') x (scalar #cols) ]})";
my $sth = $dbh->prepare($query);
my $i = 0;
while (my $row = $csv->getline_hr($fh)) {
$sth->execute(#{$row}{#cols});
$dbh->commit if ((++$i % $commit) == 0);
}
$dbh->commit;
$dbh->disconnect;
$csv->eof or $csv->error_diag;
close $fh;

Related

How to insert mysql data into perl graph (Editted)

So my task is creating a graph of the data that is inside MySQL table (which is around 41 data with 7 rows) . I only did a basic graph ... so is it possible for me to even create a graph using the data that is inside MySQL table using perl script.
sorry for the lack of codding though since i dont even know how to create a perl graph using MySQL data
Edit
i tried doing the graph but it seems that the data isnt showing up as it attended . it only shown only an empty graph and the values start with a negative for some reason... is there something that i did wrong ?
my sql table
create table Top_1 (
CPU_User float, CPU_System float, CPU_Waiting float);
my script
#!/usr/bin/perl
use DBI;
use warnings;
use strict;
use autodie;
use Data::Dumper;
use GD::Graph::bars;
use GD::Graph::Data;
my $username = "root";
my $password = "";
my $db = "Top_Data_1";
my $host = "127.0.0.1";
my $port = "3306";
my $dsn = "DBI:mysql:database=$db;host=$host;port=$port";
my %attr = (PrintError=>0,RaiseError=>1 );
my $dbh = DBI->connect($dsn,$username,$password,\%attr) or die $DBI::errstr;
my $sth = $dbh->prepare('CPU_User, CPU_System, CPU_Waiting from Top_1');
$sth->execute();
my #row;
while ( #row = $sth->fetchrow_array) {
print "this is CPU_User\n";
print "$row[0]\n";
print "this is CPU_System \n";
print "$row[1]\n";
print "this is CPU_Waiting \n";
print "$row[2]\n";
}
my $data = GD::Graph::Data->new([
["8 am","10 pm","12 pm"],
['$row[0]'],
['$row[1]'],
['$row[2]'],
]) or die GD::Graph::Data->error;
my $graph = GD::Graph::bars->new;
$graph->set(
x_label => 'File_Name',
y_label => 'Value',
title => 'TOP CPU DISPLAY',
x_labels_vertical => 1,
bar_spacing => 10,
bar_width => 3,
long_ticks => 1,
) or die $graph->error;
$graph->set_legend_font(GD::gdMediumBoldFont);
$graph->set_legend('CPU USER','CPU_System','CPU_Waiting');
$graph->plot($data) or die $graph->error;
my $file = 'bars.png';
print "Your Picture Has Been Added To Your Directory\n";
open(my $out, '>', $file) or die "Cannot open '$file' for write: $!";
binmode $out;
print $out $graph->gd->png;
close $out;
$sth->finish();
$dbh->disconnect();
This is possible to do. One can use DBI to get data from MySQL. Then, you can arrange the data into a format to graph it using a graphing library. I have used GD::Graph and is easy to use.
Links:
MySQL: https://metacpan.org/pod/DBD::mysql
Graph: https://metacpan.org/pod/GD::Graph
The data structure is not correct.
Not sure of your database table, but if you have the time field in it then something like below should work.
my $data = GD::Graph::Data->new();
my $sth = $dbh_mysql->prepare('SELECT hour, CPU_USER, CPU_System, CPU_Waiting FROM top1');
$sth->execute();
while (my #row = $sth->fetchrow_array)
{
$data->add_point(#row);
}

Display full taxon path from NCBI GI number

I prepared the following script that takes a GI ID number from NCBI that I prepared in my tsv file and prints the scientific name associated with the ID:
#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::Taxonomy;
my ($filename) = #ARGV;
open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
while(<>) {
my ($taxonid, $counts) = (split /\t/);
for my $each($taxonid) {
print "$each\n";
my $db = Bio::DB::Taxonomy->new(-source => 'entrez');
my $taxon = $db->get_taxon(-taxonid => $taxonid);
print "Taxon ID is $taxon->id, \n";
print "Scientific name is ", $taxon->scientific_name, "\n";
}
}
With this script, I receive the following:
1760
Taxon ID is Bio::Taxon=HASH(0x33a91f8)->id,
Scientific name is Actinobacteria
What I want to do
Now the next step is for me to list the full taxon path of the bacteria in question. So for the above example, I want to see k__Bacteria; p__ Actinobacteria; c__ Actinobacteria as output. Furthermore, I want the GI IDs on my table to be repliaced with this full taxon path.
In which direction should I go?
First, I notice you open $filename which is your first command line argument, but you don't use the file pointer $fh you created.
So, these two lines are not needed in your case because you already do the trick with <>
my ($filename) = #ARGV;
open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
Next. I don't know what is inside your filename and your database so I cannot help you more. Can you provide an example of what is inside your database and your file?
One more thing, what I can see here is that you may not need to create your $db instance inside the loop.
#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::Taxonomy;
my $db = Bio::DB::Taxonomy->new(-source => 'entrez');
while(<>) {
my ($taxonid, $counts) = (split /\t/);
for my $each($taxonid) {
print "$each\n";
my $taxon = $db->get_taxon(-taxonid => $taxonid);
print "Taxon ID is $taxon->id, \n";
print "Scientific name is ", $taxon->scientific_name, "\n";
}
}
Edit
From your command Is is hard to help you. When you write
my $taxon = $db->get_taxon(-taxonid => $taxonid);
You receive a Bio::Taxon node where the documentation ca be found here
I don't know what k__Bacteria; p__ Actinobacteria; c__ Actinobacteria representy for you. Is it an information offered by a Bio::Taxon node?
Anyway, you can still explore $taxon with this:
#!/usr/bin/env perl
# Author: Yves Chevallier
# Date:
use strict;
use warnings;
use Data::Dumper;
use Bio::DB::Taxonomy;
my $db = Bio::DB::Taxonomy->new(-source => 'entrez');
while(<DATA>) {
my ($taxonid, $counts) = (split /\t/);
for my $each($taxonid) {
print "$each\n";
my $taxon = $db->get_taxon(-taxonid => $taxonid);
print Dumper $taxon;
print "Taxon ID is $taxon->id, \n";
print "Scientific name is ", $taxon->scientific_name, "\n";
}
}
__DATA__
12 1760

How can I redirect the output of a SQL query to a file?

I need a little help in redirecting the output of a SQL query to a file. My code looks like this:
my $sth = $dbh->prepare(
"select count(parameter2),
parameter2 as file_type
from KCRT_TABLE_ENTRIES where request_id = $mycrnum
group by parameter2"
) or die "Can't prepare SQL statement: ", $dbh->errstr(), "\n";
$sth->execute > $mydir\\file_detail.txt
or die "Can't execute SQL statement: ", $sth->errstr(), "\n";
I've had to invent a lot of code as you don't show much of your program, but the program below gives you the rough idea
Once you've called execute you have to call one of the fetch methods to retrieve the data in whatever form is most useful to you. Here I've just asked for a reference to an array containing each row's data
Then it's simply a matter of opening the required file for output and printing the rows of data to it
I've removed the status checks on each DBI call and replaced it with the RaiseError flag which does the same thing automatically. I've also replaced the parameter $mycrnum in the SQL statement with a placeholder and passed its value to execute. That way DBI looks after any necessary quoting etc.
use strict;
use warnings;
use DBI;
my ($dsn, $user, $pass);
my ($mycrnum, $mydir);
my $dbh = DBI->connect($dsn, $user, $pass);
#{$dbh}{qw/ PrintError RaiseError /} = (0, 1);
my $sth = $dbh->prepare(
"SELECT COUNT(parameter2),
parameter2 AS file_type
FROM kcrt_table_entries
WHERE request_id = ?
GROUP BY parameter2"
);
$sth->execute($mycrnum);
open my $fh, '>', "$mydir/file_detail.txt" or die $!;
select $fh;
while ( my $row = $sth->fetchrow_arrayref ) {
printf "%5d %s\n", #$row;
}
After the execute, open the output file:
open my $of, ">", "$mydir\\file_detail.txt";
Then read each line (or row) in the results:
while ( #row = $sth->fetchrow_array ) {
Printing the output to the opened file handle:
print $of "#row\n"; # NO COMMA AFTER $of!
Close the while() loop:
}
Finally, close your opened file handle:
close $of;
Now your done.
Something like this, perhaps?
my $sth = $dbh->prepare(q{
select count(parameter2),
parameter2 as file_type
from KCRT_TABLE_ENTRIES where request_id = ?
group by parameter2
}) or die "Can't prepare SQL statement: ", $dbh->errstr(), "\n";
$sth->execute($mycrnum);
open my $OUT, '>', "$mydir/file_detail.txt" or die;
while (my #row = $sth->fetchrow_array) {
print $OUT #row, "\n"; # or whatever...
}
close $OUT;
$sth->finish;
This is a little bit of overkill, since you are only reading a single value, but it at least demonstrates a boilerplate for getting it done for future queries.
If you ever have a guaranteed single row, you can do something like this:
my ($val1, $val1) = $dbh->selectrow_array(q{
select foo, bar
});

Using Text::CSV to find text with round brackets - not working

I have a csv file that I am searching for lines that contain a certain model. The program works perfectly when searching for '2GM' model but NOT for '2GM(F)'
This is the program:
#!/usr/bin/perl
# Searches modeltest.txt for all instances of model
# Writes a file called <your model>.txt with all lines
# in modeltest.txt where the model is found
# Edit $model for different uses
use strict;
use warnings;
use Text::CSV;
my $input_file = 'modeltest.txt';
my #lines = ();
# my $model = '2GM'; # Search for 2GM - WORKS PERFECTLY
my $model = '2GM(F)'; # Search for 2GM(F) - DOES NOT WORK!
# my $model = '2GM\(F\)'; # Does not work either!
print "Search pattern is $model\n";
my $output_file = $model . '.txt';
my $csv = Text::CSV->new({binary => 1, auto_diag => 1, eol=> "\012"})
or die "Cannot use CSV: ".Text::CSV->error_diag ();
print "Searching modeltest.txt for $model....\n";
open my $infh, '<', $input_file or die "Can't open '$input_file':$!" ;
open my $outfh, '>', $output_file or die "Can't open '$output_file':$!" ;
while (my $row = $csv->getline($infh))
{
my #fields = $csv->fields();
if (/^($model)$/ ~~ #fields) # search for pattern
{
$csv->print ($outfh, ["Y $fields[1]",$model]) or $csv->error_diag;
}
}
close $infh;
close $outfh;
$csv->eof or die "Processing of '$input_file' terminated prematurely\n";
print "All Done see output files...\n";
Here is the modeltest.txt file:
3,721575-42702,121575-42000,"PUMP ASSY, WATER",,26,COOLING SEA WATER PUMP,-,2GM(F),3GM(F),-,3HM,3HMF,,
1,721575-42702,121575-42000,"PUMP ASSY, WATER",,73,COOLING SEA WATER PUMP,-,2GM,3GM,-,3HM,-,,
45,103854-59191,,"BOLT ASSY, JOINT M12",W,38,FUEL PIPE,1GM,2GM(F),3GM(F),3GMD,3HM,3HMF,,
21,104200-11180,,"RETAINER, SPRING",,11,CYLINDER HEAD,1GM,2GM(F),3GM(F),3GMD,-,-,,
24,23414-080000,,"GASKET, 8X1.0",,77,FUEL PIPE,-,2GM,3GM,-,3HM,-,,
3,124223-42092,124223-42091,IMPELLER,,73,COOLING SEA WATER PUMP,-,2GM,3GM,-,3HM,-,,
Here is the output for 2GM.txt
"Y 721575-42702",2GM
"Y 23414-080000",2GM
"Y 124223-42092",2GM
There is no output for 2GM(F) - the program does not work! and I have no idea why?
Can anyone throw some light onto my problem?
YES this Worked Thank you again !!
Happy not to be using smartmatch...
Did the following:
Changed the search expression to
my $model = "2GM\(F\)";
Used the following code
while (my $row = $csv->getline($infh))
{
my #fields = $csv->fields();
foreach my $field (#fields)
{
if ($model eq $field) # search for pattern match in any field
{
$csv->print ($outfh, ["Y $fields[1]",$model]) or $csv->error_diag;
}
}
}
Parentheses have a special meaning in regular expressions, they create capture groups.
If you want to match literal parentheses(or any other special character) in a regular expression you need to escape them with backslashes, so your search pattern needs to be 2GM\(F\).
You can also use \Q and \E to disable special characters in your pattern match and leave your search pattern the same:
if (/^(\Q$model\E)$/ ~~ #fields) # search for pattern
...
The smartmatch operator ~~ is deprecated I believe, it would be more straightforward to loop over #fields:
foreach my $field ( $csv->fields() ) {
if (/^($model)/ =~ $field) # search for pattern
...
}
And really there is no reason to pattern match when you can compare directly:
foreach my $field ( #{$csv->fields()} ) {
if ($model eq $field) # search for pattern
...
}
It is best to use \Q in the regex so that you don't have to mess with escaping characters when you define $model.
The data is already in the array referred to by $row - there is no need to call fields to fetch it again.
It is much clearer, and may be slightly faster, to use any from List::Util
It's tidier to use autodie if all you want to do is die on an IO error
Setting auto_diag to a value greater than one will cause it to die in the case of any errors instead of just warning
This is a version of your own program with these issues altered
use strict;
use warnings;
use autodie;
use Text::CSV;
use List::Util 'any';
my $input_file = 'modeltest.txt';
my $model = '2GM(F)';
my $output_file = "$model.txt";
my $csv = Text::CSV->new({ binary => 1, eol => $/, auto_diag => 2 })
or die "Cannot use CSV: " . Text::CSV->error_diag;
open my $infh, '<', $input_file;
open my $outfh, '>', $output_file;
print qq{Searching "$input_file" for "$model"\n};
while (my $row = $csv->getline($infh)) {
if (any { /\Q$model/ } #$row) {
$csv->print($outfh, ["Y $row->[1]",$model]);
}
}
close $outfh;

Unable to read the count of some words from file of size ~2GB with Perl

I have written a Perl program which will match certain words in a log file and store the results in a database. The problem is this program works fine with a small file but doesn't work with file size ~2GB. Is it size or program need to be changed?
use POSIX qw(strftime);
# load module
use DBI;
open( FILE, "/root/temp.log" ) or die "Unable to open logfile:$!\n";
$count_start = 0;
$count_interim = 0;
$count_stop = 0;
while (<FILE>) {
#test = <FILE>;
foreach $line (#test) {
if ( $line =~ m/server start/ ) {
#print "yes\n";
$count_start++;
}
elsif ( $line =~ m/server interim-update/ ) {
$count_stop++;
}
elsif ( $line =~ m/server stop/ ) {
$count_interim++;
}
}
print "$count_start\n";
print "$count_stop\n";
print "$count_interim\n";
$now_string = strftime "%b %e %H:%M:%S", localtime;
print $now_string;
# connect
my $dbh = DBI->connect( "DBI:Pg:dbname=postgres;host=localhost",
"postgres", "postgres", { 'RaiseError' => 1 } );
# execute INSERT query
my $rows = $dbh->do(
"insert into radcount (acc,bcc,dcc) Values ('$count_start','$count_stop','$count_interim')"
);
print "$rows row(s) affected\n";
# clean up
$dbh->disconnect();
}
close(LOG);
There's a few things here - first off I'd recommend changing to three arg open for your file handle - reasoning here
open( my $fileHandle, '<', '/root/temp.log' ) or die "blah" ;
Secondly you're reading the whole file into an array - with a large file this will eat a lot of ram. Instead read it line by line and process it:
while(<$fileHandle>){
#contents of your foreach loop
}
I have a few comments about your program.
Always use strict and use warnings at the start of your program, and declare variables using my at their point of first use
Always use lexical filehandles and the three-parameter form of open, and always check the status of an open call
You are opening the file using filehandle FILE, but closing LOG
Your while statement reads the first line of the file and throws it away
#test = <FILE> attempts to read all of the rest of the file into the array. This is what is causing your problem
You should connect to the database once and use the same database handle for the rest of the code
You should use prepare your statement with placeholders and pass the actual values with execute
You are incrementing $count_stop for an interim-update record and $count_interim for a stop record
The core module Time::Piece provides a strftime method without the bloat of POSIX
Here is a modification of your program to show these ideas. I have not set up a log file and database to test it but it looks fine to me and does compile.
use strict;
use warnings;
use Time::Piece;
use DBI;
open my $log, '<', '/root/temp.log' or die "Unable to open log file: $!";
my ($count_start, $count_interim, $count_stop) = (0, 0, 0);
while (<$log>) {
if ( /server start/ ) {
$count_start++;
}
elsif ( /server interim-update/ ) {
$count_interim++;
}
elsif ( /server stop/ ) {
$count_stop++;
}
}
print <<END;
Start: $count_start
Interim: $count_interim
Stop: $count_stop
END
print localtime->strftime("%b %e %H:%M:%S"), "\n";
my $dbh = DBI->connect(
"DBI:Pg:dbname=postgres;host=localhost", "postgres", "postgres",
{ 'RaiseError' => 1 } );
my $insert = $dbh->prepare('INSERT INTO radcount (acc, bcc, dcc) VALUES (?, ?, ?)');
my $rows = $insert->execute($count_start, $count_stop, $count_interim);
printf "%d %s affected\n", $rows, $rows == 1 ? 'row' : 'rows';