Perl Plucene Index Search - perl

Fooling around more with the Perl Plucene module and, having created my index, I am now trying to search it and return results.
My code to create the index is here...chances are you can skip this and read on:
#usr/bin/perl
use Plucene::Document;
use Plucene::Document::Field;
use Plucene::Index::Writer;
use Plucene::Analysis::SimpleAnalyzer;
use Plucene::Search::HitCollector;
use Plucene::Search::IndexSearcher;
use Plucene::QueryParser;
use Try::Tiny;
my $content = $ARGV[0];
my $doc = Plucene::Document->new;
my $i=0;
$doc->add(Plucene::Document::Field->Text(content => $content));
my $analyzer = Plucene::Analysis::SimpleAnalyzer->new();
if (!(-d "solutions" )) {
$i = 1;
}
if ($i)
{
my $writer = Plucene::Index::Writer->new("solutions", $analyzer, 1); #Third param is 1 if creating new index, 0 if adding to existing
$writer->add_document($doc);
my $doc_count = $writer->doc_count;
undef $writer; # close
}
else
{
my $writer = Plucene::Index::Writer->new("solutions", $analyzer, 0);
$writer->add_document($doc);
my $doc_count = $writer->doc_count;
undef $writer; # close
}
It creates a folder called "solutions" and various files to it...I'm assuming indexed files for the doc I created. Now I'd like to search my index...but I'm not coming up with anything. Here is my attempt, guided by the Plucene::Simple examples of CPAN. This is after I ran the above with the param "lol" from the command line.
#usr/bin/perl
use Plucene::Simple;
my $plucy = Plucene::Simple->open("solutions");
my #ids = $plucy->search("content : lol");
foreach(#ids)
{
print $_;
}
Nothing is printed, sadly )-=. I feel like querying the index should be simple, but perhaps my own stupidity is limiting my ability to do this.

Three things I discovered in time:
Plucene is a grossly inefficient proof-of-concept and the Java implementation of Lucene is BY FAR the way to go if you are going to use this tool. Here is some proof: http://www.kinosearch.com/kinosearch/benchmarks.html
Lucy is a superior choice that does the same thing and has more documentation and community (as per the comment on the question).
How to do what I asked in this problem.
I will share two scripts - one to import a file into a new Plucene index and one to search through that index and retrieve it. A truly working example of Plucene...can't really find it easily on the Internet. Also, I had tremendous trouble CPAN-ing these modules...so I ended up going to the CPAN site (just Google), getting the tar's and putting them in my Perl lib (I'm on Strawberry Perl, Windows 7) myself, however haphazard. Then I would try to run them and CPAN all the dependencies that it cried for. This is a sloppy way to do things...but it's how I did them and now it works.
#usr/bin/perl
use strict;
use warnings;
use Plucene::Simple;
my $content_1 = $ARGV[0];
my $content_2 = $ARGV[1];
my %documents;
%documents = (
"".$content_2 => {
content => $content_1
}
);
print $content_1;
my $index = Plucene::Simple->open( "solutions" );
for my $id (keys %documents)
{
$index->add($id => $documents{$id});
}
$index->optimize;
So what does this do...you call the script with two command line arguments of your choosing - it creates a key-value pair of the form "second argument" => "first argument". Think of this like the XMLs in the tutorial at the apache site (http://lucene.apache.org/solr/api/doc-files/tutorial.html). The second argument is the field name.
Anywho, this will make a folder in the directory the script was run in - in that folder will be files made by lucene - THIS IS YOUR INDEX!! All we need to do now is search that index using the power of Lucene, something made easy by Plucene. The script is the following:
#usr/bin/perl
use strict;
use warnings;
use Plucene::Simple;
my $content_1 = $ARGV[0];
my $index = Plucene::Simple->open( "solutions" );
my (#ids, $error);
my $query = $content_1;
#ids = $index->search($query);
foreach(#ids)
{
print $_."---seperator---";
}
You run this script by calling it from the command line with ONE argument - for example's sake let it be the same first argument as you called the previous script. If you do that you will see that it prints your second argument from the example before! So you have retrieved that value! And given that you have other key-value pairs with the same value, this will print those too! With "---seperator---" between them!

Related

Manipulating a Word doc with PERL and Win32::OLE works on command line, but not from a BuildForge Step

I have functioning PERL scripts which are used to update Word documents.
The PERL scripts are coded to work with MS Office 2003 or 2007.
The machine I'm trying to do the updates via BF on has Office 2003 on, the appropriate template is installed, the Macro Security settings have been updated.
When I run the exact command I want BF to use on the command line, it works as expected.
When I run it via a BF step I get "*** Unable to open doc at \servername\projectname\bin\updateVer.pl line 94" (the line number is the croak in the Perl script).
The script looks like this up to the croak:
# enable Sanity checking and make the variable names meaningful
use strict;
use warnings;
use English;
use Win32::OLE;
# Gain access to MS Word 'wd' constants
use Win32::OLE::Const ('Microsoft Word');
use FindBin qw($RealDir);
use lib ($RealDir, "$RealDir/..", "$RealDir/../lib");
# include the common and log utilities
use SCCM::Common;
use SCCM::Logs;
# use command line inputs
use Getopt::Long qw(:config auto_abbrev permute ignore_case pass_through);
# set up logs and process logfile options
logOptions(qw(-log now));
my $bookmark_update_result = "";
my $update_ref_result = "";
# Get input from user
my $path;
my $bookmarkName;
my $bookmarkValue;
my $Word;
my $newWord = 0;
GetOptions("path=s" => \$path,
"bookmarkName=s" => \$bookmarkName,
"bookmarkValue=s" => \$bookmarkValue);
unless ( defined($path) )
{ croakf "%[Fail] Path and filename of SVD are required\n"; }
unless ( defined($bookmarkName) && defined($bookmarkValue) )
{ croakf "%[Fail] bookmarkName and bookmarkValue parameters are both required.\n"; }
# Start Word in a safer way, checking to see if user has it open first.
eval
{
$Word = Win32::OLE->GetActiveObject('Word.Application');
if (! $Word)
{
$newWord = 1;
$Word = Win32::OLE->new('Word.Application', 'Quit');
}
};
croakf "%[Fail] -- unable to start Word Engine: $#\n", Win32::OLE->LastError() if ($# || ! $Word);
my $dispAlerts = $Word->{'DisplayAlerts'};
$Word->{'DisplayAlerts'} = wdAlertsNone;
if ($newWord)
{
$Word->{'Visible'} = 0;
}
my $doc = $Word->Documents->Open($path) or
croakf ("%[Fail] Unable to open doc ", Win32::OLE->LastError() );
The script is being called like this:
ccperl \servername\projectname\bin\updateVer.pl -path "C:\BuildForgeBuilds\BFProjectName\BFProjectName_0177\MyDocument.doc" -bookmarkName REV_Baseline -bookmarkValue My_Baseline_10.20.30
Can I get some direction to convince BF that it's OK to open my work doc?
Thank you!
Turns out after I removed the 'Archive' bit from the template on the machine the job was running on, BF was able to successfully run its job.
Funny I could run it directly on the server with the 'Archive' bit set on the template. But hey, I'm no longer stuck on this one.
Thanks to anybody who read my question and even considered what might have been the problem.

Perl HTML::TableExtract- can't find headers

I'm having a little trouble getting the HTML:TableExtract module in perl working. The problem (I think), is that the table headers contain html code to produce subscripts and special symbols, so I'm not sure how this should be searched for using the headers method. I've tried using the full headers (with tags), and also just the text, neither of which work. I'm trying to extract the tables from the following page (and similar ones for other isotopes):
http://www.nndc.bnl.gov/nudat2/getdataset.jsp?nucleus=208PB&unc=nds
Since I've had no luck with the headers method, I've also tried just specifying the depth and count in the object constructor (presumably both = 0 since there is only one top level table on the page), but it still doesn't find anything. Any assistance would be greatly appreciated!
Here is my attempt using the headers method:
#!/usr/bin/perl -w
use strict;
use warnings;
use HTML::TableExtract;
my $numArgs = $#ARGV + 1;
if ($numArgs != 1) {
print "Usage: perl convertlevels.pl <HTML levels file>\n";
exit;
}
my $htmlfile = $ARGV[0];
open(INFILE,$htmlfile) or die();
my $OutFileName;
if($htmlfile =~ /getdataset.jsp\?nucleus\=(\d+\w+)/){
$htmlfile =~ /getdataset.jsp\?nucleus\=(\d+\w+)/;
$OutFileName = "/home/dominic/run19062013/src/levels/".$1.".lev";
}
my $htmllines = <INFILE>;
open(OUTFILE,">",$OutFileName) or die();
my $te = new HTML::TableExtract->new(headers => ['E<sub>level</sub> <br> (keV)','XREF','Jπ','T<sub>1/2</sub>'] );
$te->parse_file($htmllines);
if ($te->tables)
{
print "I found a table!";
}else{
print "No tables found :'(";
}
close INFILE;
close OUTFILE;
Please ignore for now what is going on with the OUTFILE- the intention is to reformat the table contents and print into a separate file that can be easily read by another application. The trouble I am having is that the table extract method cannot find any tables, so when I test to see if anything found, the result is always false! I've also tried some of the other options in the constructor of the table extract object, but same story for every attempt! First time user so please excuse my n00bishness.
Thanks!

perl - system command arguments give error

Tried hard to find a solution for this. But I probably need some help. I am trying to pass a bunch of arguments in system command in perl. But I get an irrelevant error. I have my variables correctly declared with the right scope and still get this error below. Here is my code.
#!/usr/bin/perl
use warnings;
use strict;
my $mi = 0;
my $mj = 0;
my #regbyte;
my #databyte;
my $filename;
my #args;
#regbyte = ("00","01","02","03","04","05","06","07","08","09","0A","0B","0C","0D","0E","0F","10","11","12");
#databyte = ("00","01","02","03","04","05","06","07","08","09", "0A", "0B");
for($mi=0; $mi<13; $mi++)
{
for($mj=0; $mj<256; $mj++)
{
$filename = "write_" . $regbyte[$mi] . "_" . $databyte[$mj] . ".atp";
system("perl perl_2_ver2.5.pl", $filename, $regbyte[$mi], $databyte[$mj], "n");
}
}
This is the error message I get.
Global symbol "$databyte" requires explicit package name at perl_2_ver2.8.pl line 20.
Execution of perl_2_ver2.8.pl aborted due to compilation errors.
I'm puzzled about a few things, in particular the trailing "n" you have in your system call. Is that supposed to be "\n"? Because it's unnecessary and wrong in that context.
The main problem is that you have
for ( $mj = 0; $mj < 256; $mj++ ) { .. }
and then access $databyte[$mj] when #databyte has only twelve elements. It's hard to know what you might mean.
Here's how I would write something that works, but may not be your intention.
use strict;
use warnings 'FATAL';
for my $regbyte (0 .. 0x12) {
for my $databyte (0 .. 0x0B) {
my $filename = sprintf "write_%02X_%02X.atp", $regbyte, $databyte;
system("perl perl_2_ver2.5.pl $filename $regbyte $databyte");
}
}
It looks like you want to run your script perl_2_ver2.5.pl with input consisting of all files that look like write_*_*.atp. Is that right?
Unless the directory contains atp files that you don't want to process, you are probably better off using just
while (my $filename = glob 'write*.atp') {
next unless /\Awrite_(\p{hex}{2})_(\p{hex}{2}).atp\z/;
system("perl perl_2_ver2.5.pl $filename $1 $2");
}
which just processes all the files that do exist and match the pattern.
I copy/pasted your code and only replaced the program parameter for the system call and I do not get the error you are reporting. However there are many array elements accessed, that don't exist.
You can limit your loops using the arraysizes like this:
for($mi=0; $mi<$#regbyte; $mi++)
And I believe you have two alternatives for your system call, either perl_2_ver2.5.pl is executable, then you can say (supposed, same directory):
system("./perl_2_ver2.5.pl", $filename, $regbyte[$mi], $databyte[$mj], "n");
Or you have to call:
system("perl" , "./perl_2_ver2.5.pl", $filename, $regbyte[$mi], $databyte[$mj], "n");

Separating configuration data and script logic in Perl scripts

I find the following anti-pattern repeated in my Perl scripts: the script contains some machine/setup specific settings which I store in-line as constants in the script whereas the rest of the script is general in nature:
#!/usr/bin/perl
use strict;
use warnings;
# machine specific settings at the start of the script.
my $SETTING_1 = "foo";
my #SETTING_2 = ("123", "456");
my $SETTING_3 = "something";
# general part of script follows.
...
This pattern is somewhat okay when running on one machine, but as soon as I want to distribute the script to multiple machines the trouble starts since I must keep track so that I do not overwrite the settings part with new updates in the general part.
The correct solution is obviously to have one general script file and have it read a configuration file which is specific to the environment that the script runs in.
My question is: What CPAN module would you recommend for solving this problem? Why?
For configuration files, I like to use YAML. Simple, cross-platform, human-readable, and no danger of your configuration accidentally morphing into an actual program.
My favorite is Config::Std. I like the way it handles multi-line and multi-part configuration values.
You have to be careful when a variable is potentially multi-valued: If a single value exists in the configuration file, it will store the value in a scalar; if multiple values exist, you will get an array reference.
I find it convenient to have two configuration files: One for values that describe the operating environment (where to find libraries etc) and another for user-modifiable behavior.
I also like to write a wrapper around it. For example (updated to include autogenerated read-only accessors):
#!/usr/bin/perl
package My::Config;
use strict; use warnings;
use Config::Std;
use FindBin qw($Bin);
use File::Spec::Functions qw( catfile );
sub new {
my $class = shift;
my ($config_file) = #_;
$config_file = catfile($Bin, 'config.ini');
read_config $config_file => my %config;
my $object = bless \%config => $class;
$object->gen_accessors(
single => {
install => [ qw( root ) ],
},
multi => {
template => [ qw( dir ) ],
},
);
return $object;
}
sub gen_accessors {
my $config = shift;
my %args = #_;
my $class = ref $config;
{
no strict 'refs';
for my $section ( keys %{ $args{single} } ) {
my #vars = #{ $args{single}->{$section} };
for my $var ( #vars ) {
*{ "${class}::${section}_${var}" } = sub {
$config->{$section}{$var};
};
}
}
for my $section ( keys %{ $args{multi} } ) {
my #vars = #{ $args{multi}->{$section} };
for my $var ( #vars ) {
*{ "${class}::${section}_${var}" } = sub {
my $val = $config->{$section}{$var};
return [ $val ] unless 'ARRAY' eq ref $val;
return $val;
}
}
}
}
return;
}
package main;
use strict; use warnings;
my $config = My::Config->new;
use Data::Dumper;
print Dumper($config->install_root, $config->template_dir);
C:\Temp> cat config.ini
[install]
root = c:\opt
[template]
dir = C:\opt\app\tmpl
dir = C:\opt\common\tmpl
Output:
C:\Temp> g.pl
$VAR1 = 'c:\\opt';
$VAR2 = [
'C:\\opt\\app\\tmpl',
'C:\\opt\\common\\tmpl'
];
The Config:Properties library is good for reading and writing key/value pair property files.
I prefer YAML and YAML::XS for configuration data. It's simple, readable, and has bindings for almost any programming language. Another popular choice is Config::General.
The usual low-tech method is to simply do EXPR a configuration file. Have you looked into this?
At the risk of being laughed out of class, one solution is to store the config in XML (or for more adventurous, JSON). Human-consumable, interoperable outside of Perl, doesn't have to live on local PC (both XML and JSON can be requested off of a "config URL") and a bunch of standard modules (XML::Simple is usually good enough for config XML files) exist on CPAN.
For simple configuration like this, especially for trivial things where I don't expect this data to change in the real world, I often simply use YAML. The simplicity cannot be beat:
First, write your Perl data structure containing your configuration.
use YAML;
my $SETTINGS = {
'1' => "foo",
'2' => ["123", "456"],
'3' => "something",
};
Then, pass it to YAML::DumpFile();
YAML::DumpFile("~/.$appname.yaml", $SETTINGS);
Delete the data structure and replace it with
my $SETTINGS = YAML::LoadFile("~/.$appname.yaml");
And then forget about it. Even if you don't know or want to learn YAML syntax, small changes to the config can be made by hand and more major ones can be done in Perl and then re-dumped to YAML.
Don't tie yourself to a format -- use Config::Any, or for a little more whizbang DWIM factor, Config::JFDI (which itself wraps Config::Any). With them you buy yourself the ability to support INI, YAML, XML, Apache-style config, and more.
Config::JFDI builds on this by trying to capture some of the magic of Catalyst's config loader: merging of instance-local config with app-wide config, environment variable support, and a limited macro facility (__path_to(foo/bar)__ comes in handy surprisingly often.)

Why does my Perl script complain about 'Global symbol "$connection" requires explicit package name'?

my $now = &GetDate;
my $myHeader = &ReadMessage;
my $mySoftwareVersions = param('mySoftwareVersions');
my $q = new CGI;print $q->header();
use CGI::Carp(fatalsToBrowser);
getAllSoftwareVersions();
sub getAllSoftwareVersions
{
my $user = "zxxx";
my $passwd = "xxxx";
# my $tableName = "config_table";
# my $connection = DBI->connect("DBI:mysql:MESCI:hamysql02.stl.mo.boeing.com:18080", $user, $passwd, { AutoCommit => 0, RaiseError => 1}) or die "Couldn't connect to Database: " . DBI->errstr;
print "Must be connected\n\n";
print "\n\n";
# Error here.
my #Rows = &getConfigTableRows($connection, $config_table, $mySoftwareVersions );
my $total = #Rows;
print "total is ";
print $total;
The Above code dies with:
Global symbol "$connection" requires explicit package name
Edit
This question is related to :
https://stackoverflow.com/questions/682695/how-do-i-resolve
https://stackoverflow.com/questions/681557/xml-cannot-be-displayed-error-from-perl-cgi-script-using-mysql
Hopefully the original poster will be able to clean this up so it makes more sense, but here's what we've got so far so we can attempt to help.
Although Kent's answer is thinking ahead, the error is simply telling you that you did not declare $connection as a lexical ( "my" ) variable. So therefore, perl interprets it as that you must be referring to a package global.
Raw Perl does not complain when you use an undefined variable, it considers it a package global. You appear to have strict on somewhere (a standard and recommended practice), which wants you to declare your variables before using them. If you didn't declare the variable in the current package (or "namespace"), it assumes you're referring to a variable declared in another package, so it asks you to append the package name, just to keep everything clear and aboveboard.
Perl uses my to declare scoped variables, and our to declare package globals.
my $connection = "Rainbow";
OR
our $connection = 'French';
Just in case you got the wrong idea, the error message would go away if you turned strict off, your problem wouldn't. And they might go underground.
{ no strict;
my #rows = getConfigTableRows( $nothing, #other_stuff );
}
Perl just won't complain that $nothing is nothing. And this easy-to-fix error could cause subtler errors in other places. Plus think if you had assigned $connection successfully, only to type:
{ no strict;
my #rows = getConfigTableRows( $connecion, $config_table, $mySoftwareVersions );
}
Perl gives you a message about '$connecion' and hopefully, you can tell that it was a typo, and forgo at least 30 minutes not seeing it and wondering if your query is wrong, or whatever.
Well, if this is the related content for this other question,
The reason this is erroring is because you have commented out the line that creates the connection variable.
How are you going to query the database for a table row when you don't have the database connection defined?
Re:
my $now = &GetDate;
my $myHeader = &ReadMessage;
If you are going to use & on your function calls, make sure you use parentheses too:
my $now = &GetDate();
my $myHeader = &ReadMessage();
Otherwise, the currently executing sub's parameters are made available to (and alterable by) the subroutine you are calling.
This means that if you use your cgi script under mod_perl, suddenly you are in effect doing
my $now = &GetDate( Apache2::RequestUtil->request );
which is likely to be very wrong if GetDate takes an optional argument.