Perl Array Dereference Problem with DBI::fetchall_arrayref - perl

I'm a Perl newbie and am having issues with dereferencing an array that is a result of fetchall_arrayref in the DBI module:
my $sql = "SELECT DISTINCT home_room FROM $classlist";
my $sth = $dbh->prepare($sql);
$sth->execute;
my $teachers = $sth->fetchall_arrayref;
foreach my $teacher (#{$teachers}) {
print $teacher;
}
Running this will print the reference instead of the values in the array.
However, when I run:
my $arrref = [1,2,4,5];
foreach (#{$arrref}) {
print "$_\n";
}
I get the values of the array.
What am I doing wrong? Thank you for your help!
Jeff

From the doc
The fetchall_arrayref method can be
used to fetch all the data to be
returned from a prepared and executed
statement handle. It returns a
reference to an array that contains
one reference per row.
So in your example, $teacher is an ARRAY ref.
So you will need to loop through this array ref
foreach my $teacher (#{$teachers}) {
foreach my $titem (#$teacher) {
print $titem;
}
}

if you want to extract only the teacher column, you want to use:
my #teachers = #{$dbh->selectcol_arrayref($sql)};

fetchall_arrayref fetches all the results of the query, so what you're actually getting back is a reference to an array of arrays. Each row returned will be an arrayref of the columns. Since your query has only one column, you can say:
my $teachers = $sth->fetchall_arrayref;
foreach my $teacher (#{$teachers}) {
print $teacher->[0];
}
to get what you want.
See more:
Arrays of arrays in Perl.

You have a reference to an array of rows. Each row is a reference to an array of fields.
foreach my $teacher_row (#$teachers) {
my ($home_room) = #$teacher_row;
print $home_room;
}
You would have seen the difference with Data::Dumper.
use Data::Dumper;
print(Dumper($teachers));
print(Dumper($arrref));

$sth->fetchall_arrayref returns a reference to an array that contains one reference per row!
Take a look at DBI docs here.

Per the documentation of DBI's fetchall_arrayref():
The fetchall_arrayref method can be
used to fetch all the data to be
returned from a prepared and executed
statement handle. It returns a
reference to an array that contains
one reference per row.
You're one level of indirection away:
my $sql = "SELECT DISTINCT home_room FROM $classlist";
my $sth = $dbh->prepare($sql);
$sth->execute;
my $teachers = $sth->fetchall_arrayref;
foreach my $teacher (#{$teachers}) {
local $" = ', ';
print "#{$teacher}\n";
}
The data structure might be a little hard to visualize sometimes. When that happens I resort to Data::Dumper so that I can insert lines like this:
print Dumper $teacher;
I've found that sometimes by dumping the datastructure I get an instant map to use as a reference-point when creating code to manipulate the structure. I recently worked through a real nightmare of a structure just by using Dumper once in awhile to straighten my head out.

You can use map to dereference the returned structure:
#teachers = map { #$_->[0] } #$teachers;
Now you have a simple array of teachers.

Related

What is the meaning of these empty array assignments in perl?

foreach my $tp (#tpList)
{
print "inside function 14";
my $result1_fail = "";
$_=$tp;
next if(/^$/);
print "TP : $tp\n";
$result.="<h3>$tp</h3><BR>\n";
$result1_fail.="<h3>$tp</h3><BR>\n";
#------------------------------#
print "inside function 15";
***my #emptytables=();
my #tables=();***
#tables= getAllTables4TP($tp);
Please explain the meaning of my #emptytables=();
And also my #tables=();
is this used for defining some empty array?
if it is, then what is its use?
These initialize the arrays as empty and if the next thing is an array assignment, it's basically useless. I would write
my #tables = getAllTables4TP($tp);
I can't say anything about #emptytables because I don't see code using it.
my #tables; creates an empty array
my #tables = (); creates an empty array, then replaces its contents with nothing (empties it).
my #tables = (); #tables = getAllTables4TP($tp); creates an empty array, then replaces its contents with nothing, then replaces its contents with something else.
I would use just the following:
my #tables = getAllTables4TP($tp);
Yes you are correct, it's defining an empty array.
Check this part
my #tables=();
#tables= getAllTables4TP($tp);
Here the data insertion is done into the tables array. You can directly write the above lines as one line:
my #tables= getAllTables4TP($tp);

Perl - Data comparison taking huge time

open(INFILE1,"INPUT.txt");
my $modfile = 'Data.txt';
open MODIFIED,'>',$modfile or die "Could not open $modfile : $!";
for (;;) {
my $line1 = <INFILE1>;
last if not defined $line1;
my $line2 = <INFILE1>;
last if not defined $line2;
my ($tablename1, $colname1,$sql1) = split(/\t/, $line1);
my ($tablename2, $colname2,$sql2) = split(/\t/, $line2);
if ($tablename1 eq $tablename2)
{
my $sth1 = $dbh->prepare($sql1);
$sth1->execute;
my $hash_ref1 = $sth1->fetchall_hashref('KEY');
my $sth2 = $dbh->prepare($sql2);
$sth2->execute;
my $hash_ref2 = $sth2->fetchall_hashref('KEY');
my #fieldname = split(/,/, $colname1);
my $colcnt=0;
my $rowcnt=0;
foreach $key1 ( keys(%{$hash_ref1}) )
{
foreach (#fieldname)
{
$colname =$_;
my $strvalue1='';
#val1 = $hash_ref1->{$key1}->{$colname};
if (defined #val1)
{
my #filtered = grep /#val1/, #metadata;
my $strvalue1 = substr(#filtered[0],index(#filtered[0],'||') + 2);
}
my $strvalue2='';
#val2 = $hash_ref2->{$key1}->{$colname};
if (defined #val2)
{
my #filtered = grep /#val2/, #metadata2;
my $strvalue2 = substr(#filtered[0],index(#filtered[0],'||') + 2);
}
if ($strvalue1 ne $strvalue2 )
{
$colcnt = $colcnt + 1;
print MODIFIED "$tablename1\t$colname\t$strvalue1\t$strvalue2\n";
}
}
}
if ($colcnt>0)
{
print "modified count is $colcnt\n";
}
%$hash_ref1 = ();
%$hash_ref2 = ();
}
The program is Read input file in which every line contrain three strings seperated by tab. First is TableName, Second is ALL Column Name with commas in between and third contain the sql to be run. As this utlity is doing comparison of data, so there are two rows for every tablename. One for each DB. So data needs to be picked from each respective db's and then compared column by column.
SQL returns as ID in the result set and if the value is coming from db then it needs be translated to a string by reading from a array (that array contains 100K records with Key and value seperated by ||)
Now I ran this for one set of tables which contains 18K records in each db. There are 8 columns picked from db in each sql. So for every record out of 18K, and then for every field in that record i.e. 8, this script is taking a lot of time.
My question is if someone can look and see if it can be imporoved so that it takes less time.
File contents sample
INPUT.TXT
TABLENAME COL1,COL2 select COL1,COL2 from TABLENAME where ......
TABLENAMEB COL1,COL2 select COL1,COL2 from TABLENAMEB where ......
Metadata array contains something like this(there are two i.e. for each db)
111||Code 1
222||Code 2
Please suggest
Your code does look a bit unusual, and could gain clarity from using subroutines vs. just using loops and conditionals. Here are a few other suggestions.
The excerpt
for (;;) {
my $line1 = <INFILE1>;
last if not defined $line1;
my $line2 = <INFILE1>;
last if not defined $line2;
...;
}
is overly complicated: Not everyone knows the C-ish for(;;) idiom. You have lots of code duplication. And aren't you actually saying loop while I can read two lines?
while (defined(my $line1 = <INFILE1>) and defined(my $line2 = <INFILE1>)) {
...;
}
Yes, that line is longer, but I think it's a bit more self-documenting.
Instead of doing
if ($tablename1 eq $tablename2) { the rest of the loop }
you could say
next if $tablename1 eq $tablename2;
the rest of the loop;
and save a level of intendation. And better intendation equals better readability makes it easier to write good code. And better code might perform better.
What are you doing at foreach $key1 (keys ...) — something tells me you didn't use strict! (Just a hint: lexical variables with my can perform slightly better than global variables)
Also, doing $colname = $_ inside a for-loop is a dumb thing, for the same reason.
for my $key1 (keys ...) {
...;
for my $colname (#fieldname) { ... }
}
my $strvalue1='';
#val1 = $hash_ref1->{$key1}->{$colname};
if (defined #val1)
{
my #filtered = grep /#val1/, #metadata;
my $strvalue1 = substr(#filtered[0],index(#filtered[0],'||') + 2);
}
I don't think this does what you think it does.
From the $hash_ref1 you retrive a single element, then assign that element to an array (a collection of multiple values).
Then you called defined on this array. An array cannot be undefined, and what you are doing is quite deprecated. Calling defined function on a collection returns info about the memory management, but does not indicate ① whether the array is empty or ② whether the first element in that array is defined.
Interpolating an array into a regex isn't likely to be useful: The elements of the array are joined with the value of $", usually a whitespace, and the resulting string treated as a regex. This will wreak havoc if there are metacharacters present.
When you only need the first value of a list, you can force list context, but assign to a single scalar like
my ($filtered) = produce_a_list;
This frees you from weird subscripts you don't need and that only slow you down.
Then you assign to a $strvalue1 variable you just declared. This shadows the outer $strvalue1. They are not the same variable. So after the if branch, you still have the empty string in $strvalue1.
I would write this code like
my $val1 = $hash_ref1->{$key1}{$colname};
my $strvalue1 = defined $val1
? do {
my ($filtered) = grep /\Q$val1/, #metadata;
substr $filtered, 2 + index $filtered, '||'
} : '';
But this would be even cheaper if you pre-split #metadata into pairs and test for equality with the correct field. This would remove some of the bugs that are still lurking in that code.
$x = $x + 1 is commonly written $x++.
Emptying the hashrefs at the end of the iteration is unneccessary: The hashrefs are assigned to a new value at the next iteration of the loop. Also, it is unneccessary to assist Perls garbage collection for such simple tasks.
About the metadata: 100K records is a lot, so either put it in a database itself, or at the very least a hash. Especially for so many records, using a hash is a lot faster than looping through all entries and using slow regexes … aargh!
Create the hash from the file, once at the beginning of the program
my %metadata;
while (<METADATA>) {
chomp;
my ($key, $value) = split /\|\|/;
$metadata{$key} = $value; # assumes each key only has one value
}
Simply look up the key inside the loop
my $strvalue1 = defined $val1 ? $metadata{$val1} // '' : ''
That should be so much faster.
(Oh, and please consider using better names for variables. $strvalue1 doesn't tell me anything, except that it is a stringy value (d'oh). $val1 is even worse.)
This is not really an answer but it won't really fit well in a comment either so, until you provide some more information, here are some observations.
Inside you inner for loop, there is:
#val1 = $hash_ref1->{$key1}->{$colname};
Did you mean #val1 = #{ $hash_ref1->{$key1}->{$colname} };?
Later, you check if (defined #val1)? What did you really want to check? As perldoc -f defined points out:
Use of "defined" on aggregates (hashes and arrays) is
deprecated. It used to report whether memory for that aggregate
had ever been allocated. This behavior may disappear in future
versions of Perl. You should instead use a simple test for size:
In your case, if (defined #val1) will always be true.
Then, you have my #filtered = grep /#val1/, #metadata; Where did #metadata come from? What did you actually intend to check?
Then you have my $strvalue1 = substr(#filtered[0],index(#filtered[0],'||') + 2);
There is some interesting stuff going on in there.
You will need to verbalize what you are actually trying to do.
I strongly suspect there is a single SQL query you can run that will give you what you want but we first need to know what you want.

XML parsing using perl

I tried to research on simple question I have but couldn't do it. I am trying to get data from web which is in XML and parse it using perl. Now, I know how to loop on repeating elements. But, I am stuck when its not repeating (I know this might be silly). If the elements are repeating, I put it in array and get the data. But, when there is only a single element it throws and error saying 'Not an array reference'. I want my code such that it can parse at both time (for single and multiple elements). The code I am using is as follows:
use LWP::Simple;
use XML::Simple;
use Data::Dumper;
open (FH, ">:utf8","xmlparsed1.txt");
my $db1 = "pubmed";
my $query = "13054692";
my $q = 16354118; #for multiple MeSH terms
my $xml = new XML::Simple;
$urlxml = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=$db1&id=$query&retmode=xml&rettype=abstract";
$dataxml = get($urlxml);
$data = $xml->XMLin("$dataxml");
#print FH Dumper($data);
foreach $e(#{$data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading}})
{
print FH $e->{DescriptorName}{content}, ' $$ ';
}
Also, can I do something such that the separator $$ will not get printed after the last element?
I also tried the following code:
$mesh = $data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading};
while (my ($key, $value) = each(%$mesh)){
print FH "$value";
}
But, this prints all the childnodes and I just want the content node.
Perl's XML::Simple will take a single item and return it as a scalar, and if the value repeats it sends it back as an array reference. So, to make your code work, you just have to force MeshHeading to always return an array reference:
$data = $xml->XMLin("$dataxml", ForceArray => [qw( MeshHeading )]);
I think you missed the part of "perldoc XML::Simple" that talks about the ForceArray option:
check out ForceArray because you'll almost certainly want to turn it on
Then you will always get an array, even if the array contains only one element.
As others have pointed out, the ForceArray option will solve this particular problem. However you'll undoubtedly strike another problem soon after due to XML::Simple's assumptions not matching yours. As the author of XML::Simple, I strongly recommend you read Stepping up from XML::Simple to XML::LibXML - if nothing else it will teach you more about XML::Simple.
Since $data->{PubmedArticle}-> ... ->{MeshHeading} can be either a string or an array reference depending on how many <MeshHeading> tags are present in the document, you need to examine the value's type with ref and conditionally dereference it. Since I am unaware of any terse Perl idioms for doing this, your best bet is to write a function:
sub toArray {
my $meshes = shift;
if (!defined $meshes) { return () }
elsif (ref $meshes eq 'ARRAY') { return #$meshes }
else { return ($meshes) }
}
and then use it like so:
foreach my $e (toArray($data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading})) { ... }
To prevent ' $$ ' from being printed after the last element, instead of looping over the list, concatenate all the elements together with join:
print FH join ' $$ ', map { $_->{DescriptionName}{content} }
toArray($data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading});
This is a place where XML::Simple is being...simple. It deduces whether there's an array or not by whether something occurs more than once. Read the doc and look for the ForceArray option to address this.
To only include the ' $$ ' between elements, replace your loop with
print FH join ' $$ ', map $_->{DescriptorName}{content}, #{$data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading}};

How do I handle a varying number of items from a database query?

Effectively a duplicate of: How can I display data in table with Perl
The accepted answer there applies here. So do some of the alternatives.
I am trying to run raw database queries from Perl program and display results to the user. Something like select * from table. I want to display the information in a HTML table. The columns in the HTML table correspond with the returned columns.
I am having some issues. I can run describe table query to return the number of columns there are in the table. However, how will I store the information from the returned results into arrays?
So if I am storing results like this:
while (($f1, $t2, $n3, $k4, $d5, $e6) = $sth1->fetchrow_array)
In this case I only know that there are, say four columns (which I got from describe table). But this number four is dynamic and can change depending on the table name. I can not declare my variables based on this number. Any suggestions?
Try:
print "<table>\n";
# display HTML header
#cols = #{$sth->{NAMES_uc}};
print "<tr>".join("", map { "<th>${_}</th>" } #cols)."</tr>\n";
# display one HTML table row for each DB row
while (my #row = $sth->fetchrow_array) {
print "<tr>".join("", map { "<td>${_}</td>" } #row)."</tr>\n";
}
print "</table>\n";
while (my #row = $sth->fetchrow_array)
{
print "<tr>".join("", map{ "<td>${_}</td>" } #row)."</tr>" ;
}
Use the technique suggested in the answer(s) to the other question - use fetchrow_array to fetch into an array:
while (my #row = $sth->fetchrow_array())
{
...process array...
}
Or use an alternative to fetchrow_array(), such as fetchrow_hashref().

How can I create multidimensional arrays in Perl?

I am a bit new to Perl, but here is what I want to do:
my #array2d;
while(<FILE>){
push(#array2d[$i], $_);
}
It doesn't compile since #array2d[$i] is not an array but a scalar value.
How should I declare #array2d as an array of array?
Of course, I have no idea of how many rows I have.
To make an array of arrays, or more accurately an array of arrayrefs, try something like this:
my #array = ();
foreach my $i ( 0 .. 10 ) {
foreach my $j ( 0 .. 10 ) {
push #{ $array[$i] }, $j;
}
}
It pushes the value onto a dereferenced arrayref for you. You should be able to access an entry like this:
print $array[3][2];
Change your "push" line to this:
push(#{$array2d[$i]}, $_);
You are basically making $array2d[$i] an array by surrounding it by the #{}... You are then able to push elements onto this array of array references.
Have a look at perlref and perldsc to see how to make nested data structures, like arrays of arrays and hashes of hashes. Very useful stuff when you're doing Perl.
There's really no difference between what you wrote and this:
#{$array2d[$i]} = <FILE>;
I can only assume you're iterating through files.
To avoid keeping track of a counter, you could do this:
...
push #array2d, [ <FILE> ];
...
That says 1) create a reference to an empty array, 2) storing all lines in FILE, 3) push it onto #array2d.
Another simple way is to use a hash table and use the two array indices to make a hash key:
$two_dimensional_array{"$i $j"} = $val;
If you're just trying to store a file in an array you can also do this:
fopen(FILE,"<somefile.txt");
#array = <FILE>;
close (FILE);