XML::Smart Parser in Perl - perl

I'm continuing to work out of an outdated bioinformatics book and I'm attempting to use the XML::Smart Module.
I suspect the module's methods have changed over the course of 6 years and I'm inexperienced with perl to troubleshoot from cpan source. The commented out code proves the ncbi.gov query functions, I'm having trouble with the 'new' method - it's not parsing the XML. What am I doing wrong? Thanks!
Update Specifically I'm running into trouble with parsing and displaying the Id array: my #Id = $results->{eSearchResult}{IdList}{Id}{'#'}; I'm running this on OSX terminal and I don't see any Ids when I run this script. I am seeing the proper Count. Thanks!
#!/usr/local/bin/perl
# use lib "/Users/fogonthedowns/myperllib";
# use LWP::Simple;
use XML::Smart;
use strict;
#Set base URL for all eutils
my $utils = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils";
my $db = "Pubmed";
my $query ="Cancer+Prostate";
my $retmax = 10;
my $esearch = "$utils/esearch.fcgi?" .
"db=$db&retmax=$retmax&term=";
# my $esearch_result = get($esearch.$query);
# print "ESEARCH RESULT: $esearch_result\n";
# print "Using Query: \n$esearch$query\n";
# print "hello world\n";
my $results = XML::Smart->new($esearch.$query,"XML::Parser");
my $count = $results->{eSearchResult}{Count};
my #Id = $results->{eSearchResult}{IdList}{Id}{'#'};
my $all_Id = join("\n", #Id);
print "Count = $count\n";
print "$all_Id\n";

The first thing you have done wrong is to comment out use strict, the second is to use -w instead of use warnings.
With strict turned on, perl will report:
Bareword "XML::Parser" not allowed while "strict subs" in use at tmp:test.pl line 19.
This lets us trace where the problem is occurring.
The examples in the documentation say that the second argument (the parser to use) should be quoted, and you haven't quoted it.
So we change to:
my $results = XML::Smart->new($esearch.$query,"XML::Parser");
… and it runs.
(Incidentally, the language is called "Perl", not "perl" or "PERL")

change:
my #Id = $results->{eSearchResult}{IdList}{Id}{'#'};
to:
my #Id = $results->{eSearchResult}{IdList}{Id}('#');

Related

Reading from two files (one raw, one XMP) with ExifTool

I am new to PERL and even newer to ExifTool—and am therefore likely missing something quite basic.
The goal is to read XMP fields from a photo file. Looking at the exiftool documentation on both the ExifTool site and CPAN, I was able to read tagged jpeg and the XMP sidecar files, both without issues.
The problem is when I read from a raw file—which obviously doesn't have custom fields—I would get an error with an uninitialized value. That is to be expected.
So, I want to have code that says "if you read a field/tag from the raw file and it isn't there, look at the associated XMP file, and if that fails, return a blank string."
I therefore tried to open a second instance of ExifTool, such as:
my $exifInfo = ImageInfo($filePath);
goes to
my $exifInfoXMP = ImageInfo($filePathXMP);
But that keeps failing. If I read the XMP directly from the get-go, it works just fine, so I am getting the impression that I cannot read two ExifTool structures at the same time (which can't be right; I have to be the error here). The code below works, but I cannot "interleave" the conditionals on the two files. I have to process the raw first, then run a second pass with a new handler for the XMP. Knowing how efficient PERL is, my approach cannot possibly be a good one (even though it does the job).
In particular, there is one line that puzzles me. If I remove it, nothing works. (it should be well marked).
$filePath =~ s/$photoExtensions$/.XMP/i;
That line essential does the same as reading the XMP from the get-go (not my ideal solution).
Anyone have an idea as to where I am messing up?
Thanks,
Paul
header [EDITED TO SHOW ALL OPTIONS; HAD SHOWN ALL USED IN QUESTION]
#!/usr/bin/perl
# load standard packages
use strict;
use warnings;
use Data::Dumper;
use File::Find;
no warnings 'File::Find';
use Image::ExifTool ':Public';
# define proxy for ExifTool
my $exifTool = new Image::ExifTool;
my $exifToolXMP = new Image::ExifTool;
# turn on immediate updates
$|=1;
# common extensions that I want to recognize
my $photoExtensions = "\.(jpg|crw|cr2|cr3|rw2|orf|raw|nef|arw|dng)";
my $imageExtensions = "\.(tiff|tif|psd|png|eps|hdr|exr|svg|gif|afphoto|pdf)";
my $videoExtensions = "\.(flv|vob|ogv|avi|mts|m2ts|mov|qt|wmv|mp4|m4p|m4v|svi|3gp|3g2)";
my $audioExtensions = "\.(aiff|aac|wav|mp3|m4a|m4p|ogg|wma)";
my $appFileExtensions = "\.(on1|cos|cof)";
my $GPSFileExtensions = "\.(gpx|kml|kmz|log)";
# start main program
main();
routine in question
sub listKeywords {
print "Reads and displays file information from certain tags (typically set in Photomechanic):\n";
print "\t1. Subject\n";
print "\t2. Hierarchical Subject\n";
print "\t3. Supplemental Categories\n";
print "\t4. Label Name 1\n";
print "\t5. Label Name 2\n";
print "\t6. Label Name 3\n";
print "\t7. Label Name 4\n\n";
print "List Keywords ---\n\tEnter file name (with path) --> ";
my $filePath = <STDIN>;
chomp $filePath;
$filePath =~ s/\\//g;
$filePath =~ s/\s+$//;
########################################################
# COMMENT OUT THE FOLLOWING LINE AND NOTHING WORKS;
# $filePathXMP should be defined anyway, which suggests to
# me that the second invocation of ImageInfo doesn't actually occur.
# But I don't understand why.
$filePath =~ s/$photoExtensions$/.XMP/i;
print "\n\n";
my $filePathXMP = $filePath;
$filePathXMP =~ s/$photoExtensions$/.XMP/i; # TO FIX: filename may not have uppercase extension
# Get Exif information from image file
my $exifInfo = $exifTool->ImageInfo($filePath);
# my $exifInfoXMP = $exifToolXMP->ImageInfo($filePath =~ s/$photoExtensions$/.XMP/gi);
print "XMP Sidecar: \[$filePathXMP\]\n\n";
########################################################
# Get Specific Tag Value
my $hierarchicalSubject = $exifTool->GetValue('HierarchicalSubject');
my $subject = $exifTool->GetValue('Subject');
my $supplementalCategories = $exifTool->GetValue('SupplementalCategories');
my $labelName1 = $exifTool->GetValue('LabelName1');
my $labelName2 = $exifTool->GetValue('LabelName2');
my $labelName3 = $exifTool->GetValue('LabelName3');
my $labelName4 = $exifTool->GetValue('LabelName4');
my $exifInfo = ImageInfo($filePathXMP);
if (not defined $hierarchicalSubject) {$hierarchicalSubject = $exifTool->GetValue('HierarchicalSubject');}
if (not defined $hierarchicalSubject) {$hierarchicalSubject = "";}
if (not defined $subject) {$subject = $exifTool->GetValue('Subject');}
if (not defined $subject) {$subject = "";}
if (not defined $supplementalCategories) {$supplementalCategories = $exifTool->GetValue('SupplementalCategories');}
if (not defined $supplementalCategories) {$supplementalCategories = "";}
if (not defined $labelName1) {$labelName1 = $exifTool->GetValue('LabelName1');}
if (not defined $labelName1) {$labelName1 = "";}
if (not defined $labelName2) {$labelName2 = $exifTool->GetValue('LabelName2');}
if (not defined $labelName2) {$labelName2 = "";}
if (not defined $labelName3) {$labelName3 = $exifTool->GetValue('LabelName3');}
if (not defined $labelName3) {$labelName3 = "";}
if (not defined $labelName4) {$labelName4 = $exifTool->GetValue('LabelName4');}
if (not defined $labelName4) {$labelName4 = "";}
print "Subject:\n------------------------------\n$subject\n\n";
print "Hierarchical Subject:\n------------------------------\n$hierarchicalSubject\n\n";
print "Supplemental Categories:\n------------------------------\n$supplementalCategories\n\n";
print "Label Name 1:\n------------------------------\n$labelName1\n\n";
print "Label Name 2:\n------------------------------\n$labelName2\n\n";
print "Label Name 3:\n------------------------------\n$labelName3\n\n";
print "Label Name 4:\n------------------------------\n$labelName4\n\n";
}
As your code is incomplete, I have to ask: did you make sure to start your script with the following lines?
use strict;
use warnings;
Those two lines are not there to annoy you, they will protect you from simple mistakes you might have made in your code.
IMHO the real problem with your sub listKeywords() is the following line:
my $exifInfo = ImageInfo($filePathXMP);
There are two problems here:
you redefine the variable $exifInfo from a few lines before.
you are not using the OO approach for the 2nd image info.
I think what you intended to write was the following line:
my $exifInfoXMP = $exifToolXMP->ImageInfo($filePathXMP);

How are these quoted strings replaced with the values in perl .pm file?

Below is the Perl code in .pm file which is supposed to replace the specified strings (that are in "quotes") with some values. But its not happening. Can anyone explain what is happening in this code?
package SomePackage;
require Exporter;
#ISA = qw(Exporter);
#EXPORT = qw(send_request, create_mmd_and_transfer, update_mmd_file);
sub send_request {
my ( $service, $action, $torole ) = #_;
my ( $seller_request_mmd );
my $replace_contents = ();
$replace_contents{"REPLACE_Service"} = $service;
$replace_contents{"REPLACE_RequestAction"} = $action;
$replace_contents{"REPLACE_TradingPartner"} = $torole;
$replace_contents{"REPLACE_Requestxml"} = "Request.xml";
create_mmd_and_transfer( \%replace_contents, $seller_request_mmd, "/MMD.xml" );
}
sub create_mmd_and_transfer {
my $local_replace_contents = shift;
my $input_mmd = shift;
my $local_output_mmd = shift;
my $output_mmd = shift;
update_mmd_file( "$input_mmd", "temp_mmd_file.xml", $local_replace_contents );
}
sub update_mmd_file {
my $input_file = shift;
my $output_file = shift;
my $contents = shift;
open( MMD_FILE, "<$input_file" )
or main::error_exit(" Cannot open MMD file template $input_file \n $input_file not found int the Templates folder \n Please place the same and then run the script ");
open( TEMP_MMD_FILE, ">$output_file" );
while ( <MMD_FILE> ) {
s/^M//g; # Getrid of the ^Ms
foreach my $content ( keys( %$contents ) ) {
my $exact_value = ${%$contents}{$content};
if ( $main::test_scenario =~ /^Invalid Request Action \a\n\d Service/
and ( $content =~ /REPLACE_Service|REPLACE_RequestAction/i ) ) {
}
else {
if ( $exact_value ne "" ) {
s/$content/$exact_value/g;
}
}
}
print TEMP_MMD_FILE;
}
close MMD_FILE;
close TEMP_MMD_FILE;
}
The following will not make your script work, just create the better base for some future questions.
Before you even thinking about posting a perl question here:
1.) add to the top of your script:
use strict;
use warnings;
Posting a code here without these two lines, nobody will bother even trying to read the code.
2.) use perl -c SomePackage.pm for the check. If it will tell you: SomePackage.pm syntax OK - you can start thinking about posting a question here. ;)
Some basic problems with your script:
package SomePackage;
use strict; # see the above
use warnings;
require Exporter;
# these variables are defined outside of this package, so, tell perl this fact. use the `our`
our #ISA = qw(Exporter);
#the use warnings will warn you about the following line
# #EXPORT = qw(send_request, create_mmd_and_transfer, update_mmd_file);
#the correct one is without commas
our #EXPORT = qw(send_request create_mmd_and_transfer update_mmd_file); #not saying anything about the #EXPORT rudeness. :)
#my $replace_contents = ();
#the $replace_contents is a scalar. Bellow you using a hash. So,
my %replace_contents;
#or use the scalar but the lines bellow should use the hashref notation, e.g.
# $replace_contents->{"REPLACE_Service"} = $service;
# you decide. :)
# the seller_request_mmd contains undef here.
create_mmd_and_transfer( \%replace_contents, $seller_request_mmd, "/MMD.xml");
# also bellow, in the subroutine definition it wants 4 arguments.
# indicates a problem...
# using 2-arg open is not the best practice.
# Also, you should to use lexical filehandles
# open (MMD_FILE, "<$input_file")
# better
open (my $mmd_file, '<', $input_file)
# of course, you need change every MMD_FILE to $mmd_file
# check the result of the open and die if not successful
# or you can use the
use autodie;
# instead of $exact_value = ${%$contents}{$content};
# you probably want
my $exact_value = $contents->{$content};
Indent your code!
All the above are just about the syntactic problems and not solving anything about the "logic" of your code.
Ps: And me is still an beginner, so, others sure will find much more problems with the above code.
Ok. Here's what I've done to test this.
Firstly, you didn't give us an input file or the code that you use to call the module. So I invented them. I made the simplest possible input file:
REPLACE_Service
REPLACE_RequestAction
REPLACE_TradingPartner
REPLACE_Requestxml
And this driver program:
#!/usr/bin/perl
use strict;
use warnings;
use SomePackage;
send_request('foo', 'bar', 'baz');
sub error_exit {
die #_;
}
The first time, I ran it, I got this error:
Undefined subroutine &main::send_request called at test line 8.
That was because your #EXPORT line was wrong. You had:
#EXPORT = qw(send_request, create_mmd_and_transfer, update_mmd_file);
But the point of qw(...) is that you don't need the commas. So I corrected it to:
#EXPORT = qw(send_request create_mmd_and_transfer update_mmd_file);
Then I re-ran the program and got this error:
Cannot open MMD file template
not found int the Templates folder
Please place the same and then run the script at test line 11.
That looked like there was something missing. I changed the error message, adding indicators of where the variable interpolation was supposed to happen:
open( MMD_FILE, "<$input_file" )
or main::error_exit(" Cannot open MMD file template <$input_file> \n <$input_file> not found int the Templates folder \n Please place the same and then run the script ");
Then the error message looked like this:
Cannot open MMD file template <>
<> not found int the Templates folder
Please place the same and then run the script at test line 11.
So it seems clear that the $input_file variable isn't set in the update_mmd_file() subroutine. Tracing that variable back, we see that this value is originally the $seller_request_mmd variable in send_request(). But in send_request() you declare $seller_request_mmd but you never give it a value. So let's do that:
my ( $seller_request_mmd ) = 'test_input.txt';
Now, when I run your program, it runs to completion without any errors. And I find a new temp_mmd_file.xml is generated. But it is exactly the same as the input file. So more investigation is needed.
Digging into the update_mmd_file() subroutine, we find this interesting line:
my $exact_value = ${%$contents}{$content};
I think you're trying to extract a value from $contents, which is a hash reference. But your syntax is wrong. You were probably aiming at:
my $exact_value = ${$contents}{$content};
But most Perl programmers prefer the arrow notation for working with reference look-ups.
my $exact_value = $contents->{$content};
Making that change and re-running the program, I get an output file that contains:
foo
bar
baz
Request.xml
Which is exactly what I expected. So the program now works.
But there is still a lot of work to do. As you have been told repeatedly, you should always add:
use strict;
use warnings;
to your code. That will find a lot of potential problems in your code - which you should fix.
To be honest, this feels to me like you were trying to run before you could walk. I'd recommend spending some time to work through a good Perl introductory book before taking on my more Perl work.
And there was a lot of useful information missing from your question. It wouldn't have taken as long to get to the solution if you had shown us your driver program and your input data.

Is there a way to check, if an argument is passed in single quotes?

Is there a (best) way to check, if $uri was passed in single quotes?
#!/usr/local/bin/perl
use warnings;
use 5.012;
my $uri = shift;
# uri_check
# ...
Added this example, to make my problem more clear.
#!/usr/local/bin/perl
use warnings;
use 5.012;
use URI;
use URI::Escape;
use WWW::YouTube::Info::Simple;
use Term::Clui;
my $uri = shift;
# uri check here
$uri = URI->new( $uri );
my %params = $uri->query_form;
die "Malformed URL or missing parameter" if $params{v} eq '';
my $video_id = uri_escape( $params{v} );
my $yt = WWW::YouTube::Info::Simple->new( $video_id );
my $info = $yt->get_info();
my $res = $yt->get_resolution();
my #resolution;
for my $fmt ( sort { $a <=> $b } keys %$res ) {
push #resolution, sprintf "%d : %s", $fmt, $res->{$fmt};
}
# with an uri-argument which is not passed in single quotes
# the script doesn't get this far
my $fmt = choose( 'Resolution', #resolution );
$fmt = ( split /\s:\s/, $fmt )[0];
say $fmt;
You can't; bash parses the quotes before the string is passed to the Perl interpreter.
To expand on Blagovest's answer...
perl program http://example.com/foo?bar=23&thing=42 is interpreted by the shell as:
Execute perl and pass it the arguments program and http://example.com/foo?bar=23
Make it run in the background (that's what & means)
Interpret thing=42 as setting the environment variable thing to be 42
You should have seen an error like -bash: thing: command not found but in this case bash interpreted thing=42 as a valid instruction.
The shell handles the quoting and Perl has no knowledge of that. Perl can't issue an error message, it just sees arguments after shell processing. It never even sees the &. This is just one of those Unix things you'll have to learn to live with. The shell is a complete programming environment, for better or worse.
There are other shells which dumb things down quite a bit so you can avoid this issue, but really you're better off learning the quirks and powers of a real shell.

Perl-SQLite3: Basic Question

I admit it's been a while since I've used Perl, but this has me stumped.
Here's the issue:
#!/usr/bin/perl
use CGI::Carp qw(fatalsToBrowser);
use DBI;
print "Content-type: text/html\n\n";
print "<html><head><title></title></head></body>";
my $login = "admin#xxxxx.com";
my $dbfile = "/var/www/shopsite-data/shopsite_db";
my $sql = qq`SELECT ss_OrderID FROM ss_ORDER WHERE ss_Email=?`;
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile", "", "") || die "Cannot connect: $DBI::errstr";
my $sth = $dbh->prepare($sql);
$sth->execute($login) or die $sth->errstr();
while (my #result = $sth->fetchrow_array()) {
print "OrderID: $result[0]<br />";
}
$sth->finish;
print "</body>";
print "</html>";
$dbh->disconnect;
returns nothing, but I get a resultset when logged in with sqlite3 using the same query. I also get a resultset when I change the query from
my $sql = qq`SELECT ss_OrderID FROM ss_ORDER WHERE ss_Email=?`;
to
my $sql = qq`SELECT ss_OrderID FROM ss_ORDER`;
The obvious problem is the # inside the double quotes:
my $login = "admin#xxxxx.com";
is probably coming out as
$login = "admin.com"
and, if you had warnings switched on, a warning would be printed to the log file, because Perl sees #xxxx as an array and tries to interpolate it, then warns because it is empty. That is, assuming you don't have an array called #xxxx. If you do, then you would get all the values of it in the string.
Where you have the email address, use single quotes around it to prevent #xxxx being interpolated as an array:
my $login = 'admin#xxxxx.com';
Or you could use
my $login = "admin\#xxxxx.com";
to prevent the # starting an interpolation.
There may be other problems with your script but this is the most obvious one.
Strangely enough I was just reading about drawbacks of interpolation in Perl.
One more thing ...
One more thing: you already have fatalsToBrowser, but
use warnings;
use strict;
use CGI::Carp qw(fatalsToBrowser warningsToBrowser);
would probably have given you a warning on your browser about uninitialized values, so it might be worth turning warningsToBrowser on until your script seems to be working (or if it stops working again) (documentation here), and the other two on always.
I believe Kinopiko already pinpointed the problem.
I will add that, if you are going to use CGI.pm, you should not generate headers by hand. Instead, use CGI::header.
Also:
print "<html><head><title></title></head></body>";
Note the closing tag for body when you meant to use an opening tag.
Last, but definitely not least, you should
use strict;
use warnings;
in your scripts.

What's an easy way to print a multi-line string without variable substitution in Perl?

I have a Perl program that reads in a bunch of data, munges it, and then outputs several different file formats. I'd like to make Perl be one of those formats (in the form of a .pm package) and allow people to use the munged data within their own Perl scripts.
Printing out the data is easy using Data::Dump::pp.
I'd also like to print some helper functions to the resulting package.
What's an easy way to print a multi-line string without variable substitution?
I'd like to be able to do:
print <<EOL;
sub xyz {
my $var = shift;
}
EOL
But then I'd have to escape all of the $'s.
Is there a simple way to do this? Perhaps I can create an actual sub and have some magic pretty-printer print the contents? The printed code doesn't have to match the input or even be legible.
Enclose the name of the delimiter in single quotes and interpolation will not occur.
print <<'EOL';
sub xyz {
my $var = shift;
}
EOL
You could use a templating package like Template::Toolkit or Text::Template.
Or, you could roll your own primitive templating system that looks something like this:
my %vars = qw( foo 1 bar 2 );
Write_Code(\$vars);
sub Write_Code {
my $vars = shift;
my $code = <<'END';
sub baz {
my $foo = <%foo%>;
my $bar = <%bar%>;
return $foo + $bar;
}
END
while ( my ($key, $value) = each %$vars ) {
$code =~ s/<%$key%>/$value/g;
}
return $code;
}
This looks nice and simple, but there are various traps and tricks waiting for you if you DIY. Did you notice that I failed to use quotemeta on my key names in the substituion?
I recommend that you use a time-tested templating library, like the ones I mentioned above.
You can actually continue a string literal on the next line, like this:
my $mail = "Hello!
Blah blah.";
Personally, I find that more readable than heredocs (the <<<EOL thing mentioned elsewhere).
Double quote " interpolates variables, but you can use '. Note you'll need to escape any ' in your string for this to work.
Perl is actually quite rich in convenient things to make things more readable, e.g. other quote-operations. qq and q correspond to " and ' and you can use whatever delimiter makes sense:
my $greeting = qq/Hello there $name!
Nice to meet you/; # Interpolation
my $url = q|http://perlmonks.org/|; # No need to escape /
(note how the syntax coloring here didn't quite keep up)
Read perldoc perlop (find in page: "Quote and Quote-like Operators") for more information.
Use a data section to store the Perl code:
#!/usr/bin/perl
use strict;
use warnings;
print <DATA>;
#print munged data
__DATA__
package MungedData;
use strict;
use warnings;
sub foo {
print "foo\n";
}
Try writing your code as an actual perl subroutine, then using B::Deparse to get the source code at runtime.