Ordered hash of hashes - setting and accessing key/value pairs - perl

I want to implement an ordered hash where the value of each key value pair will be a another nested hash map. I am unable to do so. I am not getting any errors but nothing is being printed.
use Hash::Ordered;
use constant { lead_ID => 44671 , lag_ID => 11536 , start_time => time };
my $dict_lead=Hash::Ordered->new;
my $dict_lag=Hash::Ordered->new;
open(my $f1,"<","tcs_07may_nse_fo") or die "cant open input file";
open(my $f2,">","bid_ask_".&lead_ID) or die "cant open output file";
open(my $f3,">","ema_data/bid_ask_".&lag_ID) or die "cant open output file";
while(my $line =<$f1>){
my #data=split(/,/,$line);
chomp(#data);
my ($tstamp, $instr) = (int($data[0]), $data[1]);
if($instr==&lead_ID){
$dict_lead->set($tstamp=>{"bid"=>$data[5],"ask"=>$data[6]});
}
if($instr==&lag_ID){
$dict_lag->set($tstamp=>{"bid"=>$data[5],"ask"=>$data[6]});
}
}
close $f1;
foreach my $key ($dict_lead->keys){
my $spread=$dict_lead{$key}{"ask"}-$dict_lead{$key}{"bid"};
%hash=$dict_lead->get($key);
print $key.",".$hash{"ask"}."\n";
print $f2 $key.",".$dict_lead{$key}{"bid"}.","
.$dict_lead{$key}{"ask"}.",".$spread."\n";
}
foreach my $key ($dict_lag->keys){
my $spread=$dict_lag{$key}{"ask"}-$dict_lag{$key}{"bid"};
print $f3 $key.",".$dict_lag{$key}{"bid"}.","
.$dict_lag{$key}{"ask"}.",".$spread."\n";
}
close $f2;
close $f3;
print "Ring destroyed in " , time() - &start_time , " seconds\n";
The output printed on my terminal is :
1430992791,
1430992792,
1430992793,
1430992794,
1430992795,
1430992796,
1430992797,
1430992798,
1430992799,
Ring destroyed in 24 seconds
I realize from the first column of output that I am able to insert the key in ordered hash. But I don't understand how to insert another hash as value for those keys. Also how would I access those values while iterating through the keys of the hash?
The output in the file corresponding to file handle $f2 is:
1430970394,,,0
1430970395,,,0
1430970396,,,0
1430970397,,,0
1430970398,,,0
1430970399,,,0
1430970400,,,0

First of all, I don't see why you want to use a module that keeps your hash in order. I presume you want your output ordered by the timestamp fields, and the data that you are reading from the input file is already ordered like that, but it would be simple to sort the keys of an ordinary hash and print the contents in order without relying on the incoming data being presorted
You have read an explanation of why your code isn't behaving as it should. This is how I would write a solution that hopefully behaves properly (although I haven't been able to test it beyond checking that it compiles)
Instead of a hash, I have chosen to use a two-element array to contain the ask and bid prices for each timestamp. That should make the code run fractionally faster as well as making it simpler and easier to read
It's also noteworthy that I have added use autodie, which makes perl check the status of IO operations such as open and chdir automatically and removes the clutter caused by coding those checks manually. I have also defined a constant for the path to the root directory of the files, and used chdir to set the working directory there. That removes the need to repeat that part of the path and reduces the length of the remaining file path strings
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use autodie;
use Hash::Ordered;
use constant DIR => '../tcs_nse_fo_merged';
use constant LEAD_ID => 44671;
use constant LAG_ID => 11536;
chdir DIR;
my $dict_lead = Hash::Ordered->new;
my $dict_lag = Hash::Ordered->new;
{
open my $fh, '<', 'tcs_07may_nse_fo';
while ( <$fh> ) {
chomp;
my #data = split /,/;
my $tstamp = int $data[0];
my $instr = $data[1];
if ( $instr == LEAD_ID ) {
$dict_lead->set( $tstamp => [ #data[5,6] ] );
}
elsif ( $instr == LAG_ID ) {
$dict_lag->set( $tstamp => [ #data[5,6] ] );
}
}
}
{
my $file = 'ema_data/bid_ask_' . LEAD_ID;
open my $out_fh, '>', $file;
for my $key ( $dict_lead->keys ) {
my $val = $dict_lead->get($key);
my ($ask, $bid) = #$val;
my $spread = $ask - $bid;
print join(',', $key, $ask), "\n";
print $out_fh join(',', $key, $bid, $ask, $spread), "\n";
}
}
{
my $file = 'ema_data/bid_ask_' . LAG_ID;
open my $out_fh, '>', $file;
for my $key ( $dict_lag->keys ) {
my $val = $dict_lead->get($key);
my ($ask, $bid) = #$val;
my $spread = $ask - $bid;
print $out_fh join(',', $key, $bid, $ask, $spread), "\n";
}
}
printf "Ring destroyed in %d seconds\n", time - $^T;

With ordered hashes constructed using Hash::Ordered, the hash is an object. Those objects have properties (e.g. an index; if you examine a Hash::Ordered object it will have more than just hash elements inside of it) and they provide methods for you manipulate and access their data. So you need to use the supplied methods - like set to access the hash such as you do in this line:
$dict_lead->set($tstamp=>{"bid"=>$data[5],"ask"=>$data[6]});
where you create a key using the the scalar $tstamp and then associate it with an anonymous hash as it value.
But while you are using Hash::Ordered objects, your script also makes use of a plain data-structure (%hash) that you populate using $dict_lead->get($key) in your first foreach loop. All the normal techniques, idioms and rules for adding keys to a hash still apply in this case. You don't want to repeatedly copy the nested hash out of $dict_lead Hash::Ordered object into %hash here, you want to add the nested hash to %hash and associate it with a unique key.
Without sample data to test or a description of the expected output to compare against it is difficult to know for sure, but you probably just need to change:
%hash=$dict_lead->get($key);
to something like:
$hash{$key} = $dict_lead->get($key);
to populate your temporary %hash correctly. Or, since each key's value is an anonymous hash that is nested, you might instead want to try changing print $key.",".$hash{"ask"}."\n"; to:
print $key.",".$hash{$key}{"ask"}."\n"
There are other ways to "deeply" copy part of one nested data structure to another (see the Stackoverflow reference below) and you maybe be able to avoid using the temporary variable all together, but these small changes might be all that is necessary in your case.
In general, in order to "insert another hash as a value for ... keys" you need to use a reference or an anonymous hash constructor ({ k => "v" , ... }). So e.g. to add one key:
my %sample_hash ;
$sample_hash{"key_0"} = { bid => "1000000" , timestamp => 1435242285 };
dd %sample_hash ;
Output:
("key_0", { bid => 1000000, timestamp => 1435242285 })
To add multiple keys from one hash to another:
my %to_add = ( key_1 => { bid => "1500000" , timestamp => 1435242395 },
key_2 => { bid => "2000000" , timestamp => 1435244898 } );
for my $key ( keys %to_add ) {
$sample_hash{$key} = $to_add{$key}
}
dd %sample_hash ;
Output:
(
"key_1",
{ bid => 1000000, timestamp => 1435242285 },
"key_0",
{ bid => 1400000, timestamp => 1435242395 },
"key_2",
{ bid => 2000000, timestamp => 1435244898 },
)
References
How can I combine hashes in Perl? ++
perldoc perlfaq4
perldoc perldsc

Related

Parse report in blocks to CSV

I have lots of data dumps in a pretty huge amount of data structured as follow
Key1:.............. Value
Key2:.............. Other value
Key3:.............. Maybe another value yet
Key1:.............. Different value
Key3:.............. Invaluable
Key5:.............. Has no value at all
Which I would like to transform to something like:
Key1,Key2,Key3,Key5
Value,Other value,Maybe another value yet,
Different value,,Invaluable,Has no value at all
I mean:
Generate a collection of all the keys
Generate a header line with all the Keys
Map all the values to their correct "columns" (notice that in this example I have no "Key4", and Key3/Key5 interchanged)
Possibly in Perl, since it would be easier to use in various environments.
But I am not sure if this format is unusual, or if there is a tool that already does this.
This is fairly easy using hashes and the Text::CSV_XS module:
use strict;
use warnings;
use Text::CSV_XS;
my #rows;
my %headers;
{
local $/ = "";
while (<DATA>) {
chomp;
my %record;
for my $line (split(/\n/)) {
next unless $line =~ /^([^:]+):\.+\s(.+)/;
$record{$1} = $2;
$headers{$1} = $1;
}
push(#rows, \%record);
}
}
unshift(#rows, \%headers);
my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => $/});
$csv->column_names(sort(keys(%headers)));
for my $row_ref (#rows) {
$csv->print_hr(*STDOUT, $row_ref);
}
__DATA__
Key1:.............. Value
Key2:.............. Other value
Key3:.............. Maybe another value yet
Key1:.............. Different value
Key3:.............. Invaluable
Key5:.............. Has no value at all
Output:
Key1,Key2,Key3,Key5
Value,"Other value","Maybe another value yet",
"Different value",,Invaluable,"Has no value at all"
If your CSV format is 'complicated' - e.g. it contains commas, etc. - then use one of the Text::CSV modules. But if it isn't - and this is often the case - I tend to just work with split and join.
What's useful in your scenario, is that you can map key-values within a record quite easily using a regex. Then use a hash slice to output:
#!/usr/bin/env perl
use strict;
use warnings;
#set paragraph mode - records are blank line separated.
local $/ = "";
my #rows;
my %seen_header;
#read STDIN or files on command line, just like sed/grep
while ( <> ) {
#multi - line pattern, that matches all the key-value pairs,
#and then inserts them into a hash.
my %this_row = m/^(\w+):\.+ (.*)$/gm;
push ( #rows, \%this_row );
#add the keys we've seen to a hash, so we 'know' what we've seen.
$seen_header{$_}++ for keys %this_row;
}
#extract the keys, make them unique and ordered.
#could set this by hand if you prefer.
my #header = sort keys %seen_header;
#print the header row
print join ",", #header, "\n";
#iterate the rows
foreach my $row ( #rows ) {
#use a hash slice to select the values matching #header.
#the map is so any undefined values (missing keys) don't report errors, they
#just return blank fields.
print join ",", map { $_ // '' } #{$row}{#header},"\n";
}
This for you sample input, produces:
Key1,Key2,Key3,Key5,
Value,Other value,Maybe another value yet,,
Different value,,Invaluable,Has no value at all,
If you want to be really clever, then most of that initial building of the loop can be done with:
my #rows = map { { m/^(\w+):\.+ (.*)$/gm } } <>;
The problem then is - you would need to build up the 'headers' array still, and that means a bit more complicated:
$seen_header{$_}++ for map { keys %$_ } #rows;
It works, but I don't think it's as clear about what's happening.
However the core of your problem may be the file size - that's where you have a bit of a problem, because you need to read the file twice - first time to figure out which headings exist throughout the file, and then second time to iterate and print:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $input, '<', 'your_file.txt') or die $!;
local $/ = "";
my %seen_header;
while ( <$input> ) {
$seen_header{$_}++ for m/^(\w+):/gm;
}
my #header = sort keys %seen_header;
#return to the start of file:
seek ( $input, 0, 0 );
while ( <$input> ) {
my %this_row = m/^(\w+):\.+ (.*)$/gm;
print join ",", map { $_ // '' } #{$this_row}{#header},"\n";
}
This will be slightly slower, as it'll have to read the file twice. But it won't use nearly as much memory footprint, because it isn't holding the whole file in memory.
Unless you know all your keys in advance, and you can just define them, you'll have to read the file twice.
This seems to work with the data you've given
use strict;
use warnings 'all';
my %data;
while ( <> ) {
next unless /^(\w+):\W*(.*\S)/;
push #{ $data{$1} }, $2;
}
use Data::Dump;
dd \%data;
output
{
Key1 => ["Value", "Different value"],
Key2 => ["Other value"],
Key3 => ["Maybe another value yet", "Invaluable"],
Key5 => ["Has no value at all"],
}

Hash to CSV/TSV conversion in Perl

I'm trying to convert a simple hash to CSV/TSV in Perl. Now, the tricky part is that I'm unable to use Text::CSV::Slurp due to some funny reason, and I'm left with using Text::CSV_XS and Text::CVS.
Problem Description:
I am able to create a CSV file from the hash that I have, but display of the values isn't how I would desire them to be.
Example:
This is how my hash looks like:
`$VAR1 = {
'2015-12-09 10:49:00' => '750 mW',
'2015-12-09 10:49:02' => '751 mW'
};`
I would want keys to be under one tab and values to be under another tab. Instead, I get a CVS which has everything in a comma-separated state.
Desired Output:
key1 value1
key2 value2
Actual Output:
key1 value1 key2 value2
This is how my code looks like as of now:
open(DATA, "+>file.csv") || die "Couldn't open file file.csv, $!";
my $csv = Text::CSV_XS->new();
if ($input == 19){
my $status = $csv->print (\*DATA, \#{[%hash1]});
}
elsif ($input == 11){
my $status = $csv->print (\*DATA, \#{[%hash2]});
}
close(DATA) || die "Couldn't close file properly";
I have went through numerous questions in Stack Overflow and Perl Monks, but I somehow haven't been able to figure out the solution to this without using Text::CSV::Slurp.
Please help.
P.S: %hash1 and %hash2 are simple hashes which have basic key-value pairing, and are not hash of hashes as of now. However, as the code develops, I may have to implement the logic on HoH as well.
If I'm reading you right, something like this is what you're after:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $VAR1 = {
'2015-12-09 10:49:00' => '750 mW',
'2015-12-09 10:49:02' => '751 mW'
};
my $csv = Text::CSV -> new ( { sep_char => "\t", eol => "\n", binary => 1 } );
foreach my $key ( sort keys %{$VAR1} ) {
$csv -> print ( \*STDOUT, [ $key, $VAR1 -> {$key} ] );
}
(Or if you're doing it with a hash, not hash ref):
foreach my $key ( sort keys %hash ) {
$csv -> print ( \*STDOUT, [ $key, $hash{$key} ] );
}
Note - this is explicitly sorting, because hashes are unordered. You look to be using an sortable date format, so this should be ok, but you may need to parse a data to an epoch and parse based on that.
Output
"2015-12-09 10:49:00" "750 mW"
"2015-12-09 10:49:02" "751 mW"
Note - TSV embeds quotes because the fields contain spaces. You can remove those by:
my $csv = Text::CSV -> new ( { sep_char => "\t",
eol => "\n",
binary => 1,
quote_char => undef } );
I would strongly suggest not using DATA as your output filehandle, as it's used already in perl. In fact, I would suggest using lexical filehandles with 3 arg open:
open ( my $output, '>', 'file.csv' ) or die $!;
# ...
$csv -> print ( $output, ### everything else

Perl : Trying to deference an array after sorting it

I am currently writing a perl script where I have a reference to an array (students) of references. After adding the hash references to the array. Now I add the references to the array of students and then ask the user how to sort them. This is where it gets confusing. I do not know how to deference the sorted array. Using dumper I can get the sorted array but in a unorganized output. How can I deference the array of hash references after sorting?
#!bin/usr/perl
use strict;
use warnings;
use Data::Dumper;
use 5.010;
#reference to a var $r = \$var; Deferencing $$r
#reference to an array $r = \#var ; Deferencing #$r
#referenc to a hash $r = \%var ; deferencing %$r
my $filename = $ARGV[0];
my $students = [];
open ( INPUT_FILE , '<', "$filename" ) or die "Could not open to read \n ";
sub readLines{
while(my $currentLine = <INPUT_FILE>){
chomp($currentLine);
my #myLine = split(/\s+/,$currentLine);
my %temphash = (
name => "$myLine[0]",
age => "$myLine[1]",
GPA => "$myLine[2]",
MA => "$myLine[3]"
);
pushToStudents(\%temphash);
}
}
sub pushToStudents{
my $data = shift;
push $students ,$data;
}
sub printData{
my $COMMAND = shift;
if($COMMAND eq "sort up"){
my #sortup = sort{ $a->{name} cmp $b->{name} } #$students;
print Dumper #sortup;
}elsif($COMMAND eq "sort down"){
my #sortdown = sort{ $b->{name} cmp $a->{name} } #$students;
print Dumper #sortdown;
//find a way to deference so to make a more organize user friendly read.
}else{
print "\n quit";
}
}
readLines();
#Output in random, the ordering of each users data is random
printf"please choose display order : ";
my $response = <STDIN>;
chomp $response;
printData($response);
The problem here is that you're expected Dumper to provide an organised output. It doesn't. It dumps a data structure to make debugging easier. The key problem will be that hashes are explicitly unordered data structures - they're key-value mappings, they don't produce any output order.
With reference to perldata:
Note that just because a hash is initialized in that order doesn't mean that it comes out in that order.
And specifically the keys function:
Hash entries are returned in an apparently random order. The actual random order is specific to a given hash; the exact same series of operations on two hashes may result in a different order for each hash.
There is a whole section in perlsec which explains this in more detail, but suffice to say - hashes are random order, which means whilst you're sorting your students by name, the key value pairs for each student isn't sorted.
I would suggest instead of:
my #sortdown = sort{ $b->{name} cmp $a->{name} } #$students;
print Dumper #sortdown;
You'd be better off with using a slice:
my #field_order = qw ( name age GPA MA );
foreach my $student ( sort { $b -> {name} cmp $a -> {name} } #$students ) {
print #{$student}{#field_order}, "\n";
}
Arrays (#field_order) are explicitly ordered, so you will always print your student fields in the same sequence. (Haven't fully tested for your example I'm afraid, because I don't have your source data, but this approach works with a sample data snippet).
If you do need to print the keys as well, then you may need a foreach loop instead:
foreach my $field ( #field_order ) {
print "$field => ", $student->{$field},"\n";
}
Or perhaps the more terse:
print "$_ => ", $student -> {$_},"\n" for #field_order;
I'm not sure I like that as much though, but that's perhaps a matter of taste.
The essence of your mistake is to assume that hashes will have a specific ordering. As #Sobrique explains, that assumption is wrong.
I assume you are trying to learn Perl, and therefore, some guidance on the basics will be useful:
#!bin/usr/perl
Your shebang line is wrong: On Windows, or if you run your script with perl script.pl, it will not matter, but you want to make sure the interpreter that is specified in that line uses an absolute path.
Also, you may not always want to use the perl interpreter that came with the system, in which case #!/usr/bin/env perl maybe helpful for one-off scripts.
use strict;
use warnings;
use Data::Dumper;
use 5.010;
I tend to prefer version constraints before pragmata (except in the case of utf8). Data::Dumper is a debugging aid, not something you use for human readable reports.
my $filename = $ARGV[0];
You should check if you were indeed given an argument on the command line as in:
#ARGV or die "Need filename\n";
my $filename = $ARGV[0];
open ( INPUT_FILE , '<', "$filename" ) or die "Could not open to read \n ";
File handles such as INPUT_FILE are called bareword filehandles. These have package scope. Instead, use lexical filehandles whose scope you can restrict to the smallest appropriate block.
There is no need to interpolate $filename in the third argument to open.
Always include the name of the file and the error message when dying from an error in open. Surrounding the filename with ' ' helps you identify any otherwise hard to detect characters that might be causing the problem (e.g. a newline or a space).
open my $input_fh, '<', $filename
or die "Could not open '$filename' for reading: $!";
sub readLines{
This is reading into an array you defined in global scope. What if you want to use the same subroutine to read records from two different files into two separate arrays? readLines should receive a filename as an argument, and return an arrayref as its output (see below).
while(my $currentLine = <INPUT_FILE>){
chomp($currentLine);
In most cases, you want all trailing whitespace removed, not just the line terminator.
my #myLine = split(/\s+/,$currentLine);
split on /\s+/ is different than split ' '. In most cases, the latter is infinitely more useful. Read about the differences in perldoc -f split.
my %temphash = (
name => "$myLine[0]",
age => "$myLine[1]",
GPA => "$myLine[2]",
MA => "$myLine[3]"
);
Again with the useless interpolation. There is no need to interpolate those values into fresh strings (except maybe in the case where they might be objects which overloaded the stringification, but, in this case, you know they are just plain strings.
pushToStudents(\%temphash);
No need for the extra pushToStudents subroutine in this case, unless this is a stub for a method that will later be able to load the data to a database or something. Even in that case, it be better to provide a callback to the function.
sub pushToStudents{
my $data = shift;
push $students ,$data;
}
You are pushing data to a global variable. A program where there can only ever be a single array of student records is not useful.
sub printData{
my $COMMAND = shift;
if($COMMAND eq "sort up"){
Don't do this. Every subroutine should have one clear purpose.
Here is a revised version of your program.
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
use Carp qw( croak );
run(\#ARGV);
sub run {
my $argv = $_[0];
#$argv
or die "Need name of student records file\n";
open my $input_fh, '<', $argv->[0]
or croak "Cannot open '$argv->[0]' for reading: $!";
print_records(
read_student_records($input_fh),
prompt_sort_order(),
);
return;
}
sub read_student_records {
my $fh = shift;
my #records;
while (my $line = <$fh>) {
last unless $line =~ /\S/;
my #fields = split ' ', $line;
push #records, {
name => $fields[0],
age => $fields[1],
gpa => $fields[2],
ma => $fields[3],
};
}
return \#records;
}
sub print_records {
my $records = shift;
my $sorter = shift;
if ($sorter) {
$records = [ sort $sorter #$records ];
}
say "#{ $_ }{ qw( age name gpa ma )}" for #$records;
return;
}
sub prompt_sort_order {
my #sorters = (
[ "Input order", undef ],
[ "by name in ascending order", sub { $a->{name} cmp $b->{name} } ],
[ "by name in descending order", sub { $b->{name} cmp $a->{name} } ],
[ "by GPA in ascending order", sub { $a->{gpa} <=> $b->{gpa} } ],
[ "by GPA in descending order", sub { $b->{gpa} <=> $a->{gpa} } ],
);
while (1) {
print "Please choose the order in which you want to print the records\n";
print "[ $_ ] $sorters[$_ - 1][0]\n" for 1 .. #sorters;
printf "\n\t(%s)\n", join('/', 1 .. #sorters);
my ($response) = (<STDIN> =~ /\A \s*? ([1-9][0-9]*?) \s+ \z/x);
if (
$response and
($response >= 1) and
($response <= #sorters)
) {
return $sorters[ $response - 1][1];
}
}
# should not be reached
return;
}

How to use Lingua::EN::Ngram for multiple files

I am implementing a naive Bayesian classification algorithm. In my training set I have a number of abstracts in separate files. I want to use N-gram in order to get the term frequency weight, but the code is not taking multiple files.
I edited my code, and now the error I am getting is
cant call method tscore on an undefined value. To check this, I printed #ngrams and it is showing me junk values like hash0*29G45 or something like that.
#!c:\perl\bin\perl.exe -w
use warnings;
use Algorithm::NaiveBayes;
use Lingua::EN::Splitter qw(words);
use Lingua::StopWords qw(getStopWords);
use Lingua::Stem;
use Algorithm::NaiveBayes;
use Lingua::EN::Ngram;
use Data::Dumper;
use Text::Ngram;
use PPI::Tokenizer;
use Text::English;
use Text::TFIDF;
use File::Slurp;
my $pos_file = 'D:\aminoacids';
my $neg_file = 'D:\others';
my $test_file = 'D:\testfiles';
my #vectors = ();
my $categorizer = Algorithm::NaiveBayes->new;
my #files = <$pos_file/*>;
my #ngrams;
for my $filename (#files) {
open(FH, $filename);
my $ngram = Lingua::EN::Ngram->new($filename);
my $tscore = $ngram->tscore;
foreach (sort { $$tscore{$b} <=> $$tscore{$a} } keys %$tscore) {
print "$$tscore{ $_ }\t" . "$_\n";
}
my $trigrams = $ngram->ngram(2);
foreach my $trigram (sort { $$trigrams{$b} <=> $$trigrams{$a} } keys %$trigrams) {
print $$trigrams{$trigram}, "\t$trigram\n";
}
my %positive;
$positive{$_}++ for #files;
$categorizer->add_instance(
attributes => \%positive,
label => 'positive'
);
}
close FH;
Your code <$pos_file/*> should work fine ( thanks #borodir ), still, here is an alternative so as to not mess up the history.
Try
opendir (DIR, $directory) or die $!;
and then
while (my $filename = readdir(DIR)) {
open ( my $fh, $filename );
# work with filehandle
close $fh;
}
closedir DIR;
If called in list context, readdir should give you a list of files:
my #filenames = readdir(DIR);
# you could call that function you wanted to call with this list, file would need to be
# opened still, though
Another point:
If you want to pass a reference to an array, do it like so:
function( list => \#stems );
# thus, your ngram line should probably rather be
my $ngram = Lingua::EN::Ngram->new (file => \#stems );
However, the docs for Lingua::EN::Ngram only talk about scalar for file and so on, it does not seem to expect an array for input. ( Exception being the 'intersection' method )
So you would have to put it in a loop and cycle through, or use map
my #ngrams = map{ Lingua::EN::Ngram->new( file => $_ ) }#filenames
Seems unnecessary to open in filehandle first, Ngram does that by itself.
If you prefer a loop:
my #ngrams;
for my $filename ( #filenames ){
push #ngrams, Lingua::EN::Ngram->new( file => $filename );
}
I think now I got what you actually want to do.
get the tscore: you wrote $tscore = $ngram->tscore, but $ngram is not defined anymore.
Not sure how to get the tscore for a single word. ( "significance of word in text" ) kind of indicates a text.
Thus: make an ngram not for each word, but either for each sentence or each file.
Then you can determine the t-score of that word in that sentence or file ( text ).
for my $filename ( #files ){
my $ngram = Lingua::EN::Ngram->new( file => $filename );
my $tscore = $ngram->tscore();
# tscore returns a hash reference. Keys are bigrams, values are tscores
# now you can do with the tscore what you like. Note that for arbitrary length,
# tscore will not work. This you would have to do yourself.

sort 2 key hash by value

i keep learning hashes and various things u can do with them.
taday i have this question. how do i sort a hash by value, when i have 2 keys in it? and how do i print it out?
i have a csv file. im trying to store values in the hash, sort it by value. this way I'll be able to print the biggest and the smallest value, i also need the date this value was there.
so far i can print the hash, but i cant sort it.
#!/usr/bin/perl
#find openMin and openMax.
use warnings;
use strict;
my %pick;
my $key1;
my $key2;
my $value;
my $file= 'msft2.csv';
my $lines = 0;
my $date;
my $mm;
my $mOld = "";
my $open;
my $openMin;
my $openMax;
open (my $fh,'<', $file) or die "Couldnt open the $file:$!\n";
while (my $line=<$fh>)
{
my #columns = split(',',$line);
$date = $columns[0];
$open = $columns[1];
$mm = substr ($date,5,2);
if ($lines>=1) { #first line of file are names of columns wich i
$key1 = $date; #dont need. data itself begins with second line
$key2 = "open";
$value = $open;
$pick{$key1}{"open"}=$value;
}
$lines++;
}
foreach $key1 (sort keys %pick) {
foreach $key2 (keys %{$pick{$key1}}) {
$value = $pick{$key1}{$key2};
print "$key1 $key2 $value \n";
}
}
exit;
1. Use a real CSV parser
Parsing a CSV with split /,/ works fine...unless one of your fields contains a comma. If you are absolutely, positively, 100% sure that your code will never, ever have to parse a CSV with a comma in one of the fields, feel free to ignore this. If not, I'd recommend using Text::CSV. Example usage:
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1 } )
or die "Cannot use CSV: " . Text::CSV->error_diag ();
open my $fh, "<", $file or die "Failed to open $file: $!";
while (my $line = $csv->getline($fh)) {
print #$line, "\n";
}
$csv->eof or $csv->error_diag();
close $fh;
2. Sorting
I only see one secondary key in your hash: open. If you're trying to sort based on the value of open, do something like this:
my %hash = (
foo => { open => "date1" },
bar => { open => "date2" },
);
foreach my $key ( sort { $hash{$a}{open} cmp $hash{$b}{open} } keys %hash ) {
print "$key $hash{$key}{open}\n";
}
(this assumes that the values you're sorting are not numeric. If the values are numeric (e.g. 3, -17.57) use the spaceship operator <=> instead of the string comparison operator cmp. See perldoc -f sort for details and examples.)
EDIT: You haven't explained what format your dates are in. If they are in YYYY-MM-DD format, sorting as above will work, but if they're in MM-DD-YYYY format, for example, 01-01-2014 would come before 12-01-2013. The easiest way to take care of this is to reorder the components of your date from most to least significant (i.e. year followed by month followed by day). You can do this using Time::Piece like this:
use Time::Piece;
my $date = "09-26-2013";
my $t = Time::Piece->strptime($date, "%m-%d-%Y");
print $t->strftime("%Y-%m-%d");
Another tidbit: in general you should only declare variables right before you use them. You gain nothing by declaring everything at the top of your program except decreased readability.
You could concatenate key1 and key2 into a single key as:
$key = "$key1 key2";
$pick{$key} = $value;