Perl Parsing Log/Storing Results/Reading Results - perl

A while back I created a log parser. The logs can be several thousands of lines up to millions of lines. I store the parsed entries in an array of hash refs.
I am looking for suggestions on how to store my output, so that I can quickly read it back in if the script is run again (this prevents the need to re-parse the log).
The end goal is to have a web interface that will allow users to create queries (basically treating the parsed output like it existed within a database).
I have already considered writing the output of Data::Dumper to a file.
Here is an example array entry printed with Data::Dumper:
$VAR =
{
'weekday' => 'Sun',
'index' => 26417,
'timestamp' => '1316326961',
'text' => 'sys1 NSP
Test.cpp 1000
This is a example error message.
',
'errname' => 'EM_TEST',
'time' => {
'array' => [
2011,
9,
18,
'06',
22,
41
],
'stamp' => '20110918062241',
'whole' => '06:22:41',
'hour' => '06',
'sec' => 41,
'min' => 22
},
'month' => 'Sep',
'errno' => '2261703',
'dayofmonth' => 18,
'unknown2' => '1',
'unknown3' => '1',
'year' => 2011,
'unknown1' => '0',
'line' => 219154
},`
Is there a more efficient way of accomplishing my goal?

If your output is an object (or if you want to make it into an object), then you can use KiokuDB (along with a database back end of your choice). If not, then you can use Storable. Of course, if your data structure essentially mimics a CSV file, then you can just write the output to file. Or you can output the data into a JSON object that you can store in a file. Or you can forgo the middleman and simply use a database.
You mentioned that your data structure is a "array of hashes" (presumably you mean an array of hash references). If the keys of each hash reference are the same, then you can store this in CSV.
You're unlikely to get a specific answer without being more specific about your data.
Edit: Now that you've posted some sample data, you can simply write this to a CSV file or a database with the values for index,timestamp,text,errname,errno,unknown1,unknown2,unknown3, and line.

use Storable;
# fill my hash
store \%hash, 'file';
%hash = ();
%hash = %{retrieve('file')};
# print my hash

You can always use KiokuDB, Storable or what have we, but if you are planning to do aggregation, using a relational data base (or some data store that supports queries) may be the best solution in the longer run. A lightweight data store with an SQL engine like SQLite that doesn't require running a database server could be a good starting point.

Related

Perl Concatenating 2 hashes based on a certain key

Hi I have two hashes %Asset & %Activity
%Asset
Name,Computer Name
David,X
Clark,Y
Sam,Z
%Activity
Name,Activity
David,A
Clark,B
Sam,C
David,D
Clark,E
Sam,F
The second hash has the name repeated multiple times (can be more than 2) .. I want to get a hash with the concise infromation.. something like
Name,Computer Name,Activity
David,X,A&D
Clark,Y,B&E
Sam,Z,C&F
my idea in a pseudo code kind of way is;
foreach (#Activity{qw[Name]}) {
push #Asset{qw[Name Activity]}, $Activity['Activity']
}
What you want is a hash of hashes. Conceptually, you'd combine all information about an asset into a single hash.
my %dave = (
name => "Dave",
computer_name => "X",
activity => "A"
);
Then this goes into a larger hash of all assets keyed by their name.
$Assets{$dave{name}} = \%dave;
If you want to find Dave's activity...
print $Assets{Dave}{activity};
You can pull out all information about Dave and pass it around as a hash reference.
my $dave = $Assets{Dave};
print $dave->{activity};
This sort of structure inevitably leads to modelling your assets as objects.
You can learn more about hashes of hashes in the Perl Data Structures Cookbook.

XML::Simple removes root element

Hi I have a xml data which i get from array of hashes and when I do a Dumper on it the output is as follows:
$var1=
'<Data>
<Data1>ABC</Data1>
<Data2>ABCD</Data2>
</Data>';
This I have in a variable call $var1. Now I am using XML::Simple on it.. it is somewhat like: {Data1=>'ABC',Data2=>'ABCd'};
The first tag Data is gone. What is wrong?
Seems to be well-documented:
KeepRoot => 1:
In its attempt to return a data structure free of superfluous detail
and unnecessary levels of indirection, XMLin() normally discards the
root element name. Setting the KeepRoot option to 1 will cause the
root element name to be retained. So after executing this code:
$config = XMLin('<config tempdir="/tmp" />', KeepRoot => 1)
You'll be able to reference the tempdir as
"$config->{config}->{tempdir}" instead of the default
"$config->{tempdir}".

MongoDB Distinct Values

I m using this code to find specific text into database then i will load into page with mojolicious.Is this method is good or how fast it is?
use MongoDB;
use Data::Dump q(dump);
my $connection = MongoDB::Connection->new(host => 'localhost', port => 27017);
my $database = $connection->test;
my $col = $database->user;
my $r3 = $database->run_command([
"distinct" => "person",
"key" => "text",
"query" =>""
]);
for my $d ( #{ $r3->{values} } ) {
if ($d=~ /value/){
print "D: $d\n";
}
}
distinct command can certainly work (and it seems that it does), so it's good. It is also probably the fastest way to do this (the implementation just opens appropriate index, reads from it and populates hash table, IIRC).
Note, however, that it will fail with error if total size of distinct values is greater than BSON size limit (16MB currently).
If you ever run into this, you'll have to resort to slower alternatives. MapReduce, for example.

How do you compare hashes for equality that contain different key formats (some strings, some symbols) in Ruby?

I'm using ruby 1.9.3 and I need to compare two hashes that have different key formats. For example, I want the equality of the following two hashes to be the true:
hash_1 = {:date => 2011-11-01, :value => 12}
hash_2 = {"date" => 2011-11-01, "value" => 12}
Any ideas on how these two hashes can be compared in one line of code?
Stringify the keys on the hash that has symbols:
> hash_1.stringify_keys
=> {"date"=>"2011-11-01", "value"=>12}
Then compare. So, your answer, in one line, is:
> hash_1.stringify_keys == hash_2
=> true
You could also do it the other way around, symbolizing the string keys in hash_2 instead of stringifying them in hash_1:
> hash_1 == hash_2.symbolize_keys
=> true
If you want the stringification/symbolization to be a permanent change, use the version with the bang !: stringify_keys! or symbolize_keys! respectively
> hash_1.stringify_keys! # <- Permanently changes the keys in hash_1 into Strings
=> {"date"=>"2011-11-01", "value"=>12} # as opposed to temporarily changing them for comparison
Ref: http://as.rubyonrails.org/classes/HashWithIndifferentAccess.html
Also, I'm guessing you meant to put quotes around the dates...
:date => "2011-11-01"
...or, explicitly instantiate them as Date objects?
:date => Date.new("2011-11-01")
The way you have the date written now sets :date to 2011-11-01 These are currently being interpreted as series of integers with subtraction in between them.
That is:
> date = 2011-11-01
=> 1999 # <- integer value of 2011, minus 11, minus 1

Why can't I add a chart to my Excel spreadsheet with Perl's Spreadsheet::WriteExcel

When creating a chart in a spreadsheet using Spreadsheet::WriteExcel, the file it creates keeps coming up with an error reading
Excel found unreadable content in "Report.xls"
and asks me if I want to recover it. I have worked it out that the problem line in the code is where I actually insert the chart, with
$chartworksheet->insert_chart(0, 0, $linegraph, 10, 10);
If I comment out this one line, the data is fine (but of course, there's no chart). The rest of the relevant code is as follows (any variables not defined here are defined earlier in the code, like $lastrow).
printf("Creating\n");
my $chartworksheet = $workbook->add_worksheet('Graph');
my $linegraph = $workbook->add_chart(type => 'line', embedded => 1);
$linegraph->add_series(values => '=Data!$D$2:$D$lastrow', name => 'Column1');
$linegraph->add_series(values => '=Data!$E$2:$E$lastrow', name => 'Column2');
$linegraph->add_series(values => '=Data!$G$2:$G$lastrow', name => 'Column3');
$linegraph->add_series(values => '=Data!$H$2:$H$lastrow', name => 'Column4');
$linegraph->set_x_axis(name => 'x-axis');
$linegraph->set_y_axis(name => 'y-axis');
$linegraph->set_title(name => 'title');
$linegraph->set_legend(position => 'bottom');
$chartworksheet->activate();
$chartworksheet->insert_chart(0, 0, $linegraph, 10, 10);
printf("Finished\n");
I am at a total loss here, and I can't find any answers. Help please!
Looking at the expression:
'=Data!$D$2:$D$lastrow'
Is $lastrow some convention in Spreadsheet::WriteExcel or is it a variable from your script to be interpolated into the string expression? If it's your var, then this code probably won't do what you want inside single quotes, and you may want to use something like
'=Data!$D$2:$D' . $lastrow
"=Data!\$D\$2:\$D:$lastrow"
sprintf('=Data!$D2:$D%d',$lastrow)
The problem, as mobrule correctly points out, is that you are using single quotes on the series string and $lastrow doesn't get interpolated.
You can avoid these type of issues entirely when programmatically generating chart series strings by using the xl_range_formula() utility function.
$chart->add_series(
categories => xl_range_formula( 'Sheet1', 1, 9, 0, 0 ),
values => xl_range_formula( 'Sheet1', 1, 9, 1, 1 ),
);
# Which is the same as:
$chart->add_series(
categories => '=Sheet1!$A$2:$A$10',
values => '=Sheet1!$B$2:$B$10',
);
See the following section of the WriteExcel docs for more details: Working with Cell Ranges.