Perl create byte array and file stream - perl

I need to be able to send a file stream and a byte array as a response to an HTTP POST for the testing of a product. I am using CGI perl for the back end, but I am not very familiar with Perl yet, and I am not a developer, I am a Linux Admin. Sending a string based on query strings was very easy, but I am stuck on these two requirements. Below is the script that will return a page with Correct or Incorrect depending on the query string. How can I add logic to return a filestream and byte array as well?
#!/usr/bin/perl
use CGI ':standard';
print header();
print start_html();
my $query = new CGI;
my $value = $ENV{'QUERY_STRING'};
my $number = '12345';
if ( $value == $number ) {
print "<h1>Correct Value</h1>\n";
} else {
print "<h1>Incorrect value, you said: $value</h1>\n";
}
print end_html();

Glad to see new people dabbling in Perl from the sysadmin field. This is precisely how I started.
First off, if you're going to use the CGI.pm module I would suggest you use it to your advantage throughout the script. Where you've previously inputted <h1> you can use your CGI object to do this for you. In the end, you'll end up with much cleaner, more manageable code:
#!/usr/bin/perl
use CGI ':standard';
print header();
print start_html();
my $value = $ENV{'QUERY_STRING'};
my $number = '12345';
if ( $value == $number ) {
h1("Correct Value");
} else {
h1("Incorrect value, you said: $value");
}
print end_html();
Note that your comparison operator (==) will only work if this is a number. To make it work with strings as well, use the eq operator.
A little clarification regarding what you mean regarding filestreams and byte arrays ... by file stream, do you mean that you want to print out a file to the client? If so, this would be as easy as:
open(F,"/location/of/file");
while (<F>) {
print $_;
}
close(F);
This opens a file handle linked to the specified file, read-only, prints the content line by line, then closes it. Keep in mind that this will print out the file as-is, and will not look pretty in an HTML page. If you change the Content-type header to "text/plain" this would probably be more within the lines of what you're looking for. To do this, modify the call which prints the HTTP headers to:
print header(-type => 'text/plain');
If you go this route, you'll want to remove your start_html() and end_html() calls as well.
As for the byte array, I guess I'll need a little bit more information about what is being printed, and how you want it formatted.

Related

Perl HTML::TableExtract- can't find headers

I'm having a little trouble getting the HTML:TableExtract module in perl working. The problem (I think), is that the table headers contain html code to produce subscripts and special symbols, so I'm not sure how this should be searched for using the headers method. I've tried using the full headers (with tags), and also just the text, neither of which work. I'm trying to extract the tables from the following page (and similar ones for other isotopes):
http://www.nndc.bnl.gov/nudat2/getdataset.jsp?nucleus=208PB&unc=nds
Since I've had no luck with the headers method, I've also tried just specifying the depth and count in the object constructor (presumably both = 0 since there is only one top level table on the page), but it still doesn't find anything. Any assistance would be greatly appreciated!
Here is my attempt using the headers method:
#!/usr/bin/perl -w
use strict;
use warnings;
use HTML::TableExtract;
my $numArgs = $#ARGV + 1;
if ($numArgs != 1) {
print "Usage: perl convertlevels.pl <HTML levels file>\n";
exit;
}
my $htmlfile = $ARGV[0];
open(INFILE,$htmlfile) or die();
my $OutFileName;
if($htmlfile =~ /getdataset.jsp\?nucleus\=(\d+\w+)/){
$htmlfile =~ /getdataset.jsp\?nucleus\=(\d+\w+)/;
$OutFileName = "/home/dominic/run19062013/src/levels/".$1.".lev";
}
my $htmllines = <INFILE>;
open(OUTFILE,">",$OutFileName) or die();
my $te = new HTML::TableExtract->new(headers => ['E<sub>level</sub> <br> (keV)','XREF','Jπ','T<sub>1/2</sub>'] );
$te->parse_file($htmllines);
if ($te->tables)
{
print "I found a table!";
}else{
print "No tables found :'(";
}
close INFILE;
close OUTFILE;
Please ignore for now what is going on with the OUTFILE- the intention is to reformat the table contents and print into a separate file that can be easily read by another application. The trouble I am having is that the table extract method cannot find any tables, so when I test to see if anything found, the result is always false! I've also tried some of the other options in the constructor of the table extract object, but same story for every attempt! First time user so please excuse my n00bishness.
Thanks!

Perl decoding, for looping, & downloading

I wrote this script but I'm not sure if it is correct.
What I want to do is process a JSON file by reading its content, decoding it, and looping through each item as $item. The contents from a certain URL with the ID defined as $items[$i]['paper_item_id'] are saved with that ID into the defined destination.
But the code doesn't seem to function. I'm not sure on where I went wrong but any help or tips to improve the code and make it work would be good.
I'm not asking you to do the job, just need help seeing on where I went wrong and correct it for me.
The script should basically decode the JSON and then download the swf files from a certain directory URL to a directory on my PC using the IDs.
This is the code
use LWP::Simple;
$items = 'paper_items.json';
my $s = $items or die;
$dcode = decode_json($items);
for ($i = 0 ; $i < $count ($items) ; $i++) {
use File::Copy;
$destination = "paper/";
copy(
"http://media1.clubpenguin.com/play/v2/content/global/clothing/paper/"
. $items[$i]['paper_item_id'] . ".swf",
$destination . $items[$i]['paper_item_id'] . ".swf"
);
The program can be broken down into three steps:
Fetch the JSON source.
Parse the JSON.
Iterate over decoded data structure. We expect an array of hashes. Mirror files denoted by the paper_item_id to the working directory.
We will use LWP::Simple functions here.
Our script has the following header:
#!/usr/bin/perl
use strict; # disallow bad constructs
use warnings; # warn about possible bugs
use LWP::Simple;
use JSON;
Fetching the JSON
my $json_source = get "http://media1.clubpenguin.com/play/en/web_service/game_configs/paper_items.json";
die "Can't access the JSON source" unless defined $json_source;
That was easy: we dispatch a get request on that URL. If the output is undefined, we throw a fatal exception.
Parsing the JSON
my $json = decode_json $json_source;
That was easy; we expect the $json_source to be an UTF-8 encoded binary string.
If we want to inspect what is inside that data structure, we can print it out like
use Data::Dumper; print Dumper $json;
or
use Data::Dump; dd $json;
If everything works as expected, this should give a screenfull of an array of hashes.
Iterating
The $json is an array reference, so we'll loop over all items:
my $local_path = "paper";
my $server_path = "http://media1.clubpenguin.com/play/v2/content/global/clothing/paper";
for my $item (#$json) {
my $filename = "$item->{paper_item_id}.swf";
my $response = mirror "$server_path/$filename" => "$local_path/$filename";
warn "mirror failed for $filename with $response" unless $response == 200;
}
Perl has a concept of references, which is similar to pointers. Because data structures like hashes or arrays can only contain scalars, other arrays or hashes are only referenced. Given an array reference, we can access the array like #$reference or #{ $reference }.
To access an entry, the subscript operator [...] for arrays or {...} for hashes is seperated by the dereference operator ->.
Thus, given %hash and $hashref to the same hash,
my %hash = (key => "a", otherkey => "b");
my $hashref = \%hash;
then $hashref->{key} eq $hash{key} holds.
Therefore, we loop over the items in #$json. All of these items are hash references, therefore we use $item->{$key}, not $hash{key} syntax.
What you are trying to do is to download the Shockwave Flash resources from Disney's Club Penguin game site.
I cannot imagine Disney would be too happy about this, and the site's terms of use say this under "Use of Content" ("DIMG" is Disney Interactive Media Group)
Except as we specifically agree in writing, no Content from any DIMG Site may be used, reproduced, transmitted, distributed or otherwise exploited in any way other than as part of the DIMG Site ...
Code is untested.
use File::Slurp qw(read_file);
use JSON qw(decode_json);
use LWP::Simple qw(mirror);
for my $item (#{ decode_json read_file 'paper_items.json' }) {
my $id = $item->{paper_item_id};
mirror "http://media1.clubpenguin.com/play/v2/content/global/clothing/paper/$id.swf", "paper/$id.swf";
}

How to deal with calling data that is parsed into a hash in perl

So I parsed the following XML code using Perl and i'm trying to call the spectrum results but i'm having difficulty since it is a hash. I keep getting the error message reference found where even sized list expected.
<message>
<cmd id="result_data">
<result-file-header>
<path>String</path>
<duration>Float</duration>
<spectra-count>Integer</spectra-count>
</result-file-header>
<scan-results count="Integer">
<scan-result>
<spectrum-index>Integer</spectrum-index>
<time-stamp>Integer</time-stamp>
<tic>Float</tic>
<start-mass>Float</start-mass>
<stop-mass>Float</stop-mass>
<spectrum count="Integer">mass,abundance;mass1,abundance1;
mass2,abundance2</spectrum>
</scan-result>
<scan-result>
<spectrum-index>Integer</spectrum-index>
<time-stamp>Integer</time-stamp>
<tic>Float</tic>
<start-mass>Float</start-mass>
<stop-mass>Float</stop-mass>
<spectrum count="Integer">mass3,abundance3;mass4,abundance4;
mass5,abundance5</spectrum>
</scan-result>
</scan-results>
</cmd>
</message>
Here is the Perl code i'm using:
my $file = "gapiparseddataexample1.txt";
unless(open FILE, '>'.$file) {
die "\nUnable to create $file\n";
}
use warnings;
use XML::Simple;
use Data::Dumper;
my $values= XMLin('samplegapi.xml', ForceArray => [ 'scan-result' ,'result-file-header']);
print Dumper($values);
my $results = $values->{'cmd'}->{'scan-results'}->{'scan-result'};
my $results1=$values->{'cmd'}->{'result-file-header'};
for my $data (#$results) {
print FILE "Spectrum Index",":",$data->{"spectrum-index"},"\n";
print FILE "Total Ion Count",":",$data->{tic},"\n";
%spectrum=$data->{spectrum};
print FILE "Spectrum",":",%spectrum, "\n";
for my $data1 (#$results1) {
print FILE "Duration",":",$data1->{duration},"\n";
}
}
I want to be able to print out the spectrum value pairs.
This:
$spectrum=$data->{spectrum};
print FILE "Spectrum",":", $spectrum->{'content'}, "\n";
for my $data1 (#$results1) {
print FILE "Duration",":",$data1->{duration},"\n";
}
Should give you this (which I assume is what you want):
Spectrum:mass,abundance;mass1,abundance1;
mass2,abundance2
You'll want to remove the newline value from 'content' I imagine (so it doesn't split over two lines).
Explanation for anyone that's curious
The element contents have been shoved into "->content" because element also has an attribute. In this case, one called "count":
<spectrum count="Integer">mass3,abundance3;mass4,abundance4;
mass5,abundance5</spectrum>
This sort of behaviour is common in other languages and other XML parsing libraries too (e.g. sometimes they shove it into an element with the key 0). Sometimes it happens even when elements don't have regular attributes but are of specific types.
If you were to var dump $data->{$spectrum} you'd see the structure (again that usually applies in other languages and with other XML parsing libraries too).

How to scrape multiple log files for an exception

I'm writing a node.js function to ssh to a remote machine, and attempt to scrape logs for exceptions from a variety of different log files. The important bit of the log file will look something like this:
.... gunk ....
2013-01-29 04:06:39,133 com.blahblah.BaseServlet processRequest Thread-1629 Site-102 Cons-0 Url-http://theurlthat.com/caused/the/problem App-26 yada yada yada
java.lang.NullPointerException
at com.blahblah.MyClass.hi(MyClass.java:173)
at com.blahblah.StandardStackTrace.main(StandardStackTrace.java:125)
at com.blahblah.SoOnAndSo.forth(SoOnAndSo.java:109)
at java.lang.Thread.run(Thread.java:595)
2013-01-29 04:06:39,133 com.blahblah.BaseServlet defaultThrowableHandler Thread-1629 Site-102 Cons-0 Url-http://theurlthat.com/caused/the/problem App-26 yad yada yada
TechnicalDifficultiesException: TD page delivered by handleThrowable
http://theurlthat.com/caused/the/problem
....more gunk....
I need to find the exception and corresponding date in the log file that meets the following three requirements:
The exception must be the first that precedes this static text:
TechnicalDifficultiesException: TD page delivered by handleThrowable
The Exception must be directly between two lines that have "BaseServlet.*Site-102"
The exception must be the most recent (last) in the log files that meets the above conditions. The log is rolled over periodically, so it need to be the last in Log, or if that doesn't exist the last in Log.001, or if that doesn't exist the last in Log.002, etc.
Since this program has to ssh into one of many potential servers, it's better to only have to maintain the logic in the node.js program and not on the machines with the logs. Thus, a one-liner in perl/sed/awk/grep/etc would be most ideal.
So your question looks like this, if I understand correctly:
The log file has a number of sections seperated by double newline.
Each is headed by a line with a date etc..
We are only interested in the sections whose headers match /BaseServlet.*?Site-102/.
If the body of a section matches /^TechnicalDifficultiesException: TD page delivered by handleThrowable/, we want to select the body of the previously matched section, which we should maybe validate to look like a java exception.
We process the whole log file, and return the last exception found this way.
Fair enough.
#!/usr/bin/perl
use strict; use warnings;
local $/ = ""; # paragraph mode
my ($prev_sec, $prev_err);
SECTION:
while (my $head = <>) {
my $body = <>;
defined $body or die "Can't read from empty filehandle.";
next SECTION unless $head =~ /BaseServlet.*?Site-102/;
if ($body =~ /^TechnicalDifficultiesException: TD page delivered by handleThrowable/) {
$prev_err = $prev_sec;
}
$prev_sec = $body;
}
die "No error found" unless defined $prev_err;
print $prev_err;
(not really tested that much, but prints out the error from your snippet)
The code is a bit to long for a one-liner. You could always pipe the source into the perl interpreter, if you wanted.
perl -ne'BEGIN{$/=""}END{print$prev_err}$b=<>;defined$b or die"empty FH";/BaseServlet.*?Site-102/ or next;$prev_err=$prev_sec if $b=~/^TechnicalDifficultiesException: TD page delivered by handleThrowable/;$prev_sec=$b'
Specify the log file as a command line argument, or pipe the file contents directly into that program. Finding the correct log file isn't hard. In a snippet of Perl:
my $log_dir = ...;
my ($log) = sort glob "$log_dir/LOG*";
die "no log in $log_dir" unless defined $log;
Update
If the date should be captured as well, the code would change to
#!/usr/bin/perl
use strict; use warnings;
local $/ = ""; # paragraph mode
my (#prev, #prev_err);
SECTION:
while (my $head = <>) {
my $body = <>;
defined $body or die "Can't read from empty filehandle.";
next SECTION unless $head =~ /BaseServlet.*?Site-102/;
if ($body =~ /^TechnicalDifficultiesException: TD page delivered by handleThrowable/) {
#prev_err = #prev;
}
#prev = ($head, $body);
}
die "No error found" unless #prev_err;
my ($date) = $prev_err[0] =~ /^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d),/;
print "$date\n\n$prev_err[1]";
And as the one-liner:
perl -ne'BEGIN{$/=""}END{#perr||die"No error found";($date)=$perr[0]=~/^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d),/;print"$date\n\n$perr[1]"}$b=<>;defined$b or die"empty FH";/BaseServlet.*?Site-102/ or next;#perr=#p if $b=~/^TechnicalDifficultiesException: TD page delivered by handleThrowable/;#p=($_,$b)'
I don't understand how it could only return the first match; this code should process the whole file. If you could provide a more complete testcase, I could verify that this code works as required.

Perl Plucene Index Search

Fooling around more with the Perl Plucene module and, having created my index, I am now trying to search it and return results.
My code to create the index is here...chances are you can skip this and read on:
#usr/bin/perl
use Plucene::Document;
use Plucene::Document::Field;
use Plucene::Index::Writer;
use Plucene::Analysis::SimpleAnalyzer;
use Plucene::Search::HitCollector;
use Plucene::Search::IndexSearcher;
use Plucene::QueryParser;
use Try::Tiny;
my $content = $ARGV[0];
my $doc = Plucene::Document->new;
my $i=0;
$doc->add(Plucene::Document::Field->Text(content => $content));
my $analyzer = Plucene::Analysis::SimpleAnalyzer->new();
if (!(-d "solutions" )) {
$i = 1;
}
if ($i)
{
my $writer = Plucene::Index::Writer->new("solutions", $analyzer, 1); #Third param is 1 if creating new index, 0 if adding to existing
$writer->add_document($doc);
my $doc_count = $writer->doc_count;
undef $writer; # close
}
else
{
my $writer = Plucene::Index::Writer->new("solutions", $analyzer, 0);
$writer->add_document($doc);
my $doc_count = $writer->doc_count;
undef $writer; # close
}
It creates a folder called "solutions" and various files to it...I'm assuming indexed files for the doc I created. Now I'd like to search my index...but I'm not coming up with anything. Here is my attempt, guided by the Plucene::Simple examples of CPAN. This is after I ran the above with the param "lol" from the command line.
#usr/bin/perl
use Plucene::Simple;
my $plucy = Plucene::Simple->open("solutions");
my #ids = $plucy->search("content : lol");
foreach(#ids)
{
print $_;
}
Nothing is printed, sadly )-=. I feel like querying the index should be simple, but perhaps my own stupidity is limiting my ability to do this.
Three things I discovered in time:
Plucene is a grossly inefficient proof-of-concept and the Java implementation of Lucene is BY FAR the way to go if you are going to use this tool. Here is some proof: http://www.kinosearch.com/kinosearch/benchmarks.html
Lucy is a superior choice that does the same thing and has more documentation and community (as per the comment on the question).
How to do what I asked in this problem.
I will share two scripts - one to import a file into a new Plucene index and one to search through that index and retrieve it. A truly working example of Plucene...can't really find it easily on the Internet. Also, I had tremendous trouble CPAN-ing these modules...so I ended up going to the CPAN site (just Google), getting the tar's and putting them in my Perl lib (I'm on Strawberry Perl, Windows 7) myself, however haphazard. Then I would try to run them and CPAN all the dependencies that it cried for. This is a sloppy way to do things...but it's how I did them and now it works.
#usr/bin/perl
use strict;
use warnings;
use Plucene::Simple;
my $content_1 = $ARGV[0];
my $content_2 = $ARGV[1];
my %documents;
%documents = (
"".$content_2 => {
content => $content_1
}
);
print $content_1;
my $index = Plucene::Simple->open( "solutions" );
for my $id (keys %documents)
{
$index->add($id => $documents{$id});
}
$index->optimize;
So what does this do...you call the script with two command line arguments of your choosing - it creates a key-value pair of the form "second argument" => "first argument". Think of this like the XMLs in the tutorial at the apache site (http://lucene.apache.org/solr/api/doc-files/tutorial.html). The second argument is the field name.
Anywho, this will make a folder in the directory the script was run in - in that folder will be files made by lucene - THIS IS YOUR INDEX!! All we need to do now is search that index using the power of Lucene, something made easy by Plucene. The script is the following:
#usr/bin/perl
use strict;
use warnings;
use Plucene::Simple;
my $content_1 = $ARGV[0];
my $index = Plucene::Simple->open( "solutions" );
my (#ids, $error);
my $query = $content_1;
#ids = $index->search($query);
foreach(#ids)
{
print $_."---seperator---";
}
You run this script by calling it from the command line with ONE argument - for example's sake let it be the same first argument as you called the previous script. If you do that you will see that it prints your second argument from the example before! So you have retrieved that value! And given that you have other key-value pairs with the same value, this will print those too! With "---seperator---" between them!