I am currently developing a piece of monitoring software that takes an input file of server names and ip addresses and creates a rudimentary database of information. I want to default some values as it processes the config file and it works fine for the first time round the loop but any subsequent entries get created with weird (well weird to me was the best way to describe it as it is probably correct and the code is wrong, as in the code is doing exactly what i have asked it to do but not necessarily what i want it to do).
the output from the code below looks like:
$VAR1 = [
{
'IPAddress' => '196.8.150.163',
'Boxname' => 'MPLRDFDSOAK1',
'CurrentStatusInfo' => {
'LineHandlersRunning' => [
{
'NumberOfGaps' => 0,
'LineHandlerName' => 'DEFAULT',
'NumberOfCommLinkDowns' => 0,
'LineHandlerUpTime' => 0,
'MemoryUsage' => 0
}
]
},
'PreviousStatusInfo' => {
'LineHandlersRunning' =>
$VAR1->[0]{'CurrentStatusInfo'}{'LineHandlersRunning'}[0]
]
}
},
{
'IPAddress' => '196.8.150.164',
'Boxname' => 'MPLRDFDSOAK2',
'CurrentStatusInfo' => {
'LineHandlersRunning' =>
$VAR1->[0]{'CurrentStatusInfo'}{'LineHandlersRunning'}
},
'PreviousStatusInfo' => {
'LineHandlersRunning' =>
$VAR1->[0]{'PreviousStatusInfo'}{'LineHandlersRunning'}
}
}
];
The following is the code:
#######################################################################################
# Version History #
#######################################################################################
# example of the ini file
#box=>MPLRDFDSOAK1;ip=>196.8.150.163
#box=>MPLRDFDSOAK2;ip=>196.8.150.164
use strict;
use warnings;
# include the library to allow easy access to command line arguments
use Getopt::Long;
# include the data dumper utility
use Data::Dumper;
my $usageInstructions = "Some instructions\n";
my $showMeTheInstructions = "";
my $iniFileToReadIn = "";
my #boxes;
# read in the command line arguments
GetOptions( "ini=s" => \$iniFileToReadIn,
"H|h|?!" => \$showMeTheInstructions);
if ($showMeTheInstructions)
{
print $usageInstructions;
exit 0;
}
readInINIFileIn($iniFileToReadIn, \#boxes) if ($iniFileToReadIn ne "");
print Dumper(\#boxes);
print "\n\#\n\# END OF DATA DUMP\n\#\n\n";
exit 0;
#######################################################################################
# subroutine to read in the ini file and create the empty records for the boxes
# specified
sub readInINIFileIn
{
my ($iniFile, $pointerToBoxes) = #_;
my $noCRLFOnString = "";
# open the file
open (ConfigFile, "<$iniFile") || die $!;
# read in all the lines into an array
my #configurationItems = <ConfigFile>;
# close the file
close (ConfigFile);
# temporary record storage
my %tempRecord;
# create the defaults for all boxes
my #LineHandlersRunning;
my %tmpLineHandlerRunning = ( LineHandlerName => "DEFAULT",
LineHandlerUpTime => 0,
NumberOfCommLinkDowns => 0,
NumberOfGaps => 0,
MemoryUsage => 0 );
push (#LineHandlersRunning, {%tmpLineHandlerRunning});
my %CurrentStatusInfo;
my %PreviousStatusInfo;
push #{ $CurrentStatusInfo{'LineHandlersRunning'} }, #LineHandlersRunning;
push #{ $PreviousStatusInfo{'LineHandlersRunning'} }, #LineHandlersRunning;
# loop through the config file and create the defaults for the database of boxes
foreach my $configLine (#configurationItems)
{
my #TokenisedLineFromFileItems = ();
my #TokenisedLineFromFileNameValuePairs = ();
# store parameters
# each line will be ; separated then => separated, as in each one will have a number of items separated by ;'s and
# each item will be be a name & value pair separated by =>'s
#TokenisedLineFromFileItems = split(/;/,$configLine);
# remove quote marks around the outside of each element of the newly created array
s/^"|"$//g foreach #TokenisedLineFromFileItems;
# create information in database record to add to boxes
foreach my $NameValuePair (#TokenisedLineFromFileItems)
{
#TokenisedLineFromFileNameValuePairs = split(/=>/,$NameValuePair);
$noCRLFOnString = $TokenisedLineFromFileNameValuePairs[1];
$noCRLFOnString =~ s/(\n|\r)//g;
$tempRecord{'Boxname'} = $noCRLFOnString if ($TokenisedLineFromFileNameValuePairs[0] eq "box");
$tempRecord{'IPAddress'} = $noCRLFOnString if ($TokenisedLineFromFileNameValuePairs[0] eq "ip");
}
# add all other defaults as blank
$tempRecord{'CurrentStatusInfo'} = {%CurrentStatusInfo};
$tempRecord{'PreviousStatusInfo'} = {%PreviousStatusInfo};
push(#$pointerToBoxes, {%tempRecord});
}
}
I don't have the patience to wade through all of your code, but I'll bet your problem is related to this aspect of the Data::Dumper output:
$VAR1->[0]{'CurrentStatusInfo'}{'LineHandlersRunning'}[0]
In other words, your data structure contains a reference to other parts of the structure.
Perhaps you think you are making a copy of part of the data structure, but instead you are getting a shallow copy rather than a deep copy? For example, I'm suspicious of this code:
$tempRecord{'CurrentStatusInfo'} = {%CurrentStatusInfo};
$tempRecord{'PreviousStatusInfo'} = {%PreviousStatusInfo};
If indeed the problem is related to shallow copying, the Clone module might help.
Use lexical filehandles, declare variables in the smallest possible scope. I do not know what your problem is, but it is most likely caused by some variable persisting longer than you think it does.
I'm guessing it's because these two lines end up pushing the same hash reference into two locations - so if you alter the hashref contents in one location, the other will change as well which is probably not what you want for default values.
As FM pointed out, this is why you have the circular reference in your Dumper output.
If someone I'm waiting to get off the phone takes long enough i'll refactor your code for you.
Update: ok, so without knowing the full scenario it's hard to say if this is a sensible approach. certainly you should look at the various INI parsing modules in CPAN, but here is a very quick tweak of your code, leaving your existing logic structure in place:
use strict;
use warnings;
use Getopt::Long;
use Data::Dumper;
my $cmd_help = "Some instructions\n";
my $show_help = "";
my $ini_file_path = "";
# read in the command line arguments
GetOptions( "ini=s" => \$ini_file_path,
"H|h|?!" => \$show_help );
if ($show_help) {
print $cmd_help;
exit 0;
}
if (! -f $ini_file_path) {
die "File '$ini_file_path' doesn't seem to exist.";
}
my $boxes = read_ini_file($ini_file_path);
print Dumper($boxes);
exit 0;
=head2 read_ini_file
read in the ini file and create the empty records for the boxes
=cut
sub read_ini_file {
my ($ini_file) = #_;
my #boxes;
my #config_lines;
{
# consider using File::Slurp
open (my $ini_fh, '<', $ini_file_path) || die $!;
#config_lines = <$ini_fh>;
chomp #config_lines; # remove \r\n
# file handle will close when $ini_fh goes out of scope
}
# create the defaults for all boxes
my %line_handlers_running_defaults = ( LineHandlerName => "DEFAULT",
LineHandlerUpTime => 0,
NumberOfCommLinkDowns => 0,
NumberOfGaps => 0,
MemoryUsage => 0 );
# loop through the config file and create the defaults for the database of boxes
foreach my $line (#config_lines) {
my %record;
my #token_pairs = map { s/^"//; s/^$//; $_ } split(/;/,$line);
# create information in database record to add to boxes
foreach my $pair (#token_pairs) {
my ($key, $val) = split(/=>/,$pair);
$record{Boxname} = $val if $key eq "box";
$record{IPAddress} = $val if $key eq "ip";
}
# add all other defaults as blank
$record{CurrentStatusInfo} = { LineHandlersRunning => [{%line_handlers_running_defaults}] };
$record{PreviousStatusInfo} = { LineHandlersRunning => [{%line_handlers_running_defaults}] };
push #boxes, \%record;
}
return \#boxes;
}
gives this output:
$VAR1 = [
{
'IPAddress' => '196.8.150.163',
'CurrentStatusInfo' => {
'LineHandlersRunning' => [
{
'NumberOfGaps' => 0,
'LineHandlerName' => 'DEFAULT',
'NumberOfCommLinkDowns' => 0,
'LineHandlerUpTime' => 0,
'MemoryUsage' => 0
}
]
},
'Boxname' => 'MPLRDFDSOAK1',
'PreviousStatusInfo' => {
'LineHandlersRunning' => [
{
'NumberOfGaps' => 0,
'LineHandlerName' => 'DEFAULT',
'NumberOfCommLinkDowns' => 0,
'LineHandlerUpTime' => 0,
'MemoryUsage' => 0
}
]
}
},
{
'IPAddress' => '196.8.150.164',
'CurrentStatusInfo' => {
'LineHandlersRunning' => [
{
'NumberOfGaps' => 0,
'LineHandlerName' => 'DEFAULT',
'NumberOfCommLinkDowns' => 0,
'LineHandlerUpTime' => 0,
'MemoryUsage' => 0
}
]
},
'Boxname' => 'MPLRDFDSOAK2',
'PreviousStatusInfo' => {
'LineHandlersRunning' => [
{
'NumberOfGaps' => 0,
'LineHandlerName' => 'DEFAULT',
'NumberOfCommLinkDowns' => 0,
'LineHandlerUpTime' => 0,
'MemoryUsage' => 0
}
]
}
}
];
Related
I have an array of hashrefs built from a database using fethrow_hashref(). The data structure is built like so:
while (my $ref = $sth->fetchrow_hashref()) {
push #lines, $ref;
}
I sort the data in the query by program name ascending, so all of the references in the array are still in alphabetical order. Then, I go through each hash and find the value that is numerically equal to a '1'. I then take the caolumn name, and store it to compare to the rest of the hashrefs with that program name to ensure they all have a '1' in the same column.
my $pgm = "";
my $met_lvl = "";
my #devs = ();
my %errors = ();
my $error = "";
foreach my $line_ref (#lines) {
if ($pgm ne $line_ref->{"PROGRAM"}) {
if (#devs && $error) {
# print " Different number metal layers for $pgm: #devs \n";
$error = "";
}
#devs = ();
$pgm = $line_ref->{"PROGRAM"};
($met_lvl) = grep { $line_ref->{$_} == 1 } keys(%$line_ref);
push #devs, $line_ref->{"DEVICE"};
} elsif ($pgm eq $line_ref->{"PROGRAM"}) {
push #devs, $line_ref->{"DEVICE"};
my ($met_chk ) = grep { $line_ref->{$_} == 1 } keys(%$line_ref);
if ($met_chk ne $met_lvl) {
$errors{$line_ref->{"PROGRAM"}} = $line_ref->{"PROGRAM"};
$error = "YUP";
}
}
}
I'd like to be able to access the hashrefs individually, based on matching column names from the database. How can I access the hashrefs with "TEST" values for "PROGRAM" keys? I used Data::Dumper to provide an example of a few of the hashrefs I'd like to access based on "PROGRAM" value:
'PLM' => undef,
'SLM' => undef,
'QLM' => undef,
'DEVICE' => 'DEV1',
'TLM' => '1',
'DLM' => undef,
'ROUTING' => 'NORMAL',
'PROGRAM' => 'TEST'
};
$VAR455 = {
'PLM' => undef,
'SLM' => undef,
'QLM' => undef,
'DEVICE' => 'DEV2',
'TLM' => '1',
'DLM' => undef,
'ROUTING' => 'NORMAL',
'PROGRAM' => 'TEST'
};
$VAR456 = {
'PLM' => undef,
'SLM' => undef,
'QLM' => undef,
'DEVICE' => 'DEV3',
'TLM' => '1',
'DLM' => undef,
'ROUTING' => 'NON_STANDARD',
'PROGRAM' => 'EXP'
};
$VAR457 = {
'PLM' => undef,
'SLM' => undef,
'QLM' => undef,
'DEVICE' => 'DEV4',
'TLM' => '1',
'DLM' => undef,
'ROUTING' => 'NORMAL',
'PROGRAM' => 'FINAL'
};
I'd like to be able to access key values for the hashrefs which contain the same program name. I cannot even begin to figure out what type of operation to use for this. I assume map is the correct way to do it, but dereferencing the "PROGAM" value for each element (hashref) in the array is beyond the scope of my understanding. I hope I was able to define the problem well enough for you guys to be able to help.
Edit: The impetus for wanting to access hashrefs with the same 'PROGRAM" value is to be able to provide an output of selected values to print to a logfile. So, after I compare and find differences between those hashrefs with the same "PROGRAM" value, I want to access them all again, and print out the desired column values to the lofgile.
Looks like you need to exrtact subsets of your data (hashrefs) with the same PROGRAM name.
Can preprocess your data to build a hash with those names as keys, and arrayrefs (with suitable hashrefs) as values. Then process those groups one at a time.
use warnings;
use strict;
use feature 'say';
use Data::Dumper; # to print complex data below
... populate #lines with hashrefs as in the question or copy-paste a sample
# Build hash: ( TEST => [ hashrefs w/ TEST ], EXP => [ hashrefs w/ EXP ], ... )
my %prog_subset;
for my $hr (#lines) {
push #{ $prog_subset{$hr->{PROGRAM}} }, $hr;
# Or, using "postfix dereferencing" (stable from v5.24)
# push $prog_subset{$hr->{PROGRAM}}->#*, $hr;
}
foreach my $prog (keys %prog_subset) {
say "\nProcess hashrefs with PROGRAM being $prog";
foreach my $hr (#{ $prog_subset{$prog} }) {
say Dumper $hr;
}
}
(See postfix dereference)
Now %prog_subset contains keys TEST, EXP, FINAL (and whatever other PROGRAM names are in data), each having for value an arrayref of all hashrefs which have that PROGRAM name.
There are other ways, and there are libraries that can be leveraged, but this should do it.
OK! I found an example of this with the google machine. I replaced #lines = (); with $lines = [];. This allowed me to change the grep statement to (#found) = grep { $pgm eq $_->{PROGRAM} } #$lines;. Now the returned array is a list of the hashrefs that share the program name I'm looking for. Thanks for the help #zdim!
I have written the following code in Perl. The code is reading a pdb file and getting some values. Ignore the top part of the code,where everything is working perfect.
Problem is in the sub-routine part, where I try to store arrays in the hash3 with model as key another key position
the array values can be accessed inside the if condition using this :
$hash3{$model}{$coordinates}[1].
but when I go out of all foreach loop and try to access the elements I only get one value.
Please look at the end foreach loop and tell me is it the wrong way to access the hash values.
The pdb file I am using can be downloaded from this link http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1NZS
#!/usr/bin/perl
open(IN,$ARGV[0]);
my #phosphosites;
my $model=1;
my %hash3;
while(<IN>)
{
#findmod(#line);
#finddist;
#findfreq;
if((/^MODRES/) && (/PHOSPHO/))
{
#line=split;
push(#phosphosites, $line[2]);
#print "$line[4]";
}
foreach $elements (#phosphosites){
if(/^HETATM\s+\d+\s+CA\s+$i/)
{
#line1=split;
#print "$line1[5]";
#print "$line1[6] $line1[7] $line1[8]\n";
push(#phosphositesnum, $line1[5]);
}
}
$pos=$line1[5];
#findspatial(\#line,\#line1);
}
my #ori_data=removeDuplicates(#phosphositesnum);
sub removeDuplicates {
my %seen = ();
my #vals = ();
foreach my $i (#_) {
unless ($seen{$i}) {
push #vals, $i;
$seen{$i} = 1;
}
}
return #vals;
}
$a=(#phosphosites);
print "$a\n";
print "#phosphosites\n";
print "#ori_data\n";
close(IN);
open(IN1,$ARGV[0]);
my (#data)=<IN1>;
spatial(\#ori_data);
sub spatial {
my #spatial_array1=#{$_[0]};
foreach $coordinates(#spatial_array1)
{
$model=1;
{foreach $data1(#data){
if($data1=~ m/^HETATM\s+\d+\s+CA\s+[A-Z]*\s+[A-Z]*\s+$coordinates/)
{
#cordivals=split(/\s+/,$data1);
push #{ $sphash{$model} },[$cordivals[6], $cordivals[7], $cordivals[8]];
$hash3{$model}{$coordinates}= \#cordivals;
#print "$model $coordinates $hash3{$model}{$coordinates}[6] $hash3{$model}{$coordinates}[7] $hash3{$model}{$coordinates}[8]\n";
#print "$model $sphash{$model}[$i][0] $sphash{$model}[$i][1] $sphash{$model}[$i][2]\n";
}
elsif($data1=~ m/^ENDMDL/)
{
$model++;
}
#print "$model $coordinates $hash3{$model}{$coordinates}[6] $hash3{$model}{$coordinates}[7] $hash3{$model}{$coordinates}[8]\n";
}
}
}
#foreach $z1 (sort keys %hash3)
# {
# foreach $z2(#spatial_array1){
# print "$z1 $z2";
# print "$hash3{$z1}{$z2}[6]\n";
# print "$z2\n";
# }
# }
}
After using the Data::Dumper option it is giving me this kind of output
$VAR1 = {
'11' => {
'334' => [
'HETATM',
'115',
'CA',
'SEP',
'A',
'343',
'-0.201',
'-2.884',
'1.022',
'1.00',
'99.99',
'C'
],
'342' => $VAR1->{'11'}{'334'},
'338' => $VAR1->{'11'}{'334'},
'335' => $VAR1->{'11'}{'334'},
'340' => $VAR1->{'11'}{'334'},
'343' => $VAR1->{'11'}{'334'},
'336' => $VAR1->{'11'}{'334'}
},
'7' => {
'334' => $VAR1->{'11'}{'334'},
'342' => $VAR1->{'11'}{'334'},
'338' => $VAR1->{'11'}{'334'},
'335' => $VAR1->{'11'}{'334'},
'340' => $VAR1->{'11'}{'334'},
'343' => $VAR1->{'11'}{'334'},
'336' => $VAR1->{'11'}{'334'}
},
'2' => {
'334' => $VAR1->{'11'}{'334'},
'342' => $VAR1->{'11'}{'334'},
...
Change:
#cordivals=split(/\s+/,$data1);
to:
my #cordivals=split(/\s+/,$data1);
What seems to be happening is that all the hash elements contain references to the same array variable, because you're not making the variable local to that iteration.
In general, you should use my with all variables.
I am just learning how to use perl hashes and ran into this message in perl. I am using XML::Simple to parse xml output and using exists to check on the hash keys.
Message:
Pseudo-hashes are deprecated at ./h2.pl line 53.
Argument "\x{2f}\x{70}..." isn't numeric in exists at ./h2.pl line 53.
Bad index while coercing array into hash at ./h2.pl line 53.
I had the script working earlier with one test directory and then executed the script on another directory for testing when I got this message. How do I resolve/workaround this?
Code that the error references:
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
#my $data = XMLin($xml);
my $data = XMLin($xml, ForceArray => [qw (file) ]);
my $size=0;
if (exists $data->{class}
and $data->{class}=~ /FileNotFound/) {
print "The directory: $Path does not exist\n";
exit;
} elsif (exists $data->{file}->{path}
and $data->{file}->{path} =~/test-out-00/) {
$size=$data->{file}->{size};
if ($size < 1024000) {
print "FILE SIZE:$size BYTES\n";
exit;
}
} else {
exit;
}
print Dumper( $data );
Working test case, data structure looks like this:
$VAR1 = {
'recursive' => 'no',
'version' => '0.20.202.1.1101050227',
'time' => '2011-09-30T02:49:39+0000',
'filter' => '.*',
'file' => {
'owner' => 'test_act',
'replication' => '3',
'blocksize' => '134217728',
'permission' => '-rw-------',
'path' => '/source/feeds/customer/test/test-out-00',
'modified' => '2011-09-30T02:48:41+0000',
'size' => '135860644',
'group' => '',
'accesstime' => '2011-09-30T02:48:41+0000'
'modified' => '2011-09-30T02:48:41+0000'
},
'exclude' => ''
};
recursive:no
version:0.20.202.1.1101050227
time:2011-10-01T07:06:16+0000
filter:.*
file:HASH(0x84c83ec)
path:/source/feeds/customer/test
directory:HASH(0x84c75d8)
exclude:
Data structure with seeing error:
$VAR1 = {
'recursive' => 'no',
'version' => '0.20.202.1.1101050227',
'time' => '2011-10-03T04:49:36+0000',
'filter' => '.*',
'file' => [
{
'owner' => 'test_act',
'replication' => '3',
'blocksize' => '134217728',
'permission' => '-rw-------',
'path' => '/source/feeds/customer/test/20110531/test-out-00',
'modified' => '2011-10-03T04:47:46+0000',
'size' => '121406618',
'group' => 'feeds',
'accesstime' => '2011-10-03T04:47:46+0000'
},
Test xml file:
<?xml version="1.0" encoding="UTF-8"?><listing time="2011-10-03T04:49:36+0000" recursive="no" path="/source/feeds/customer/test/20110531" exclude="" filter=".*" version="0.20.202.1.1101050227"><directory path="/source/feeds/customer/test/20110531" modified="2011-10-03T04:48:19+0000" accesstime="1970-01-01T00:00:00+0000" permission="drwx------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-00" modified="2011-10-03T04:47:46+0000" accesstime="2011-10-03T04:47:46+0000" size="121406618" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-01" modified="2011-10-03T04:48:04+0000" accesstime="2011-10-03T04:48:04+0000" size="127528522" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-02" modified="2011-10-03T04:48:19+0000" accesstime="2011-10-03T04:48:19+0000" size="125452919" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/></listing>
The "Pseudo-hashes are deprecated" error means you're trying to access an array as a hash, which means that either $data->{file} or $data->{file}{path} is an arrayref.
You can check the data type by using print ref $data->{file}. The Data::Dumper module may also help you to see what is in your data structure (perhaps while setting $Data::Dumper::Maxdepth = N to limit the dump to N number of levels if the structure is big).
UPDATE
Now that you are using ForceArray, $data->{file} should always point to an arrayref, which may possibly have multiple references to path. Here is a modified segment of your code to handle that. But note that the logic of the if-then-exit conditions may have to change.
if (defined $data->{class} and $data->{class}=~ /FileNotFound/) {
print "The directory: $Path does not exist\n";
exit;
}
exit if ! defined $data->{file};
# filter the list for the first file entry named test-out-00
my ( $file ) = grep {
defined $_->{path} && $_->{path} =~ /test-out-00/
} #{ $data->{file} };
exit if ! defined $file;
$size = $file->{size};
if ($size < 1024000) {
print "FILE SIZE:$size BYTES\n";
exit;
}
When using XML::Simple, the ForceArray option is one of the most important to understand, especially in cases when your input data has nested elements that can occur 1 or more times. For example:
use XML::Simple;
use Data::Dumper;
my #xml_snippets = (
'<opt> <name x="3" y="4">B</name> <name x="5" y="6">C</name> </opt>',
'<opt> <name x="1" y="2">A</name> </opt>',
);
for my $xs (#xml_snippets){
my $data = XMLin($xs, ForceArray => 0);
print Dumper($data);
}
Output:
$VAR1 = {
'name' => [ # Array ref because there are 2 <name> elements.
{
'y' => '4',
'content' => 'B',
'x' => '3'
},
{
'y' => '6',
'content' => 'C',
'x' => '5'
}
]
};
$VAR1 = {
'name' => { # No intermediate array ref.
'y' => '2',
'content' => 'A',
'x' => '1'
}
};
By activating the ForceArray option, you can direct XML::Simple to produce consistent data structures that always use the intermediate array reference, even when there is only 1 of a particular nested element. You can activate the option globally or for specific tags, as illustrated here:
my $data = XMLin($xs, ForceArray => 1 ); # Globally.
my $data = XMLin($xs, ForceArray => [qw(name foo bar)]);
First, I recommend that you use ForceArray => [qw( file )] as previously discussed. That will cause an array to be returned for file, whether there's one or more file element. This is easier to handle than having two possible formats.
As I previously indicated, the problem is that you made no provision for looping over multiple file elements. You said you wanted to exit if the file doesn't exist, so that means you want
my $found;
for my $file (#{ $data->{file} }) {
if ($file->{path} =~ m{/test-out-00\z}) {
$found = $file;
last;
}
}
die("Test file not found\n") if !$found;
... do something with file data in $found ...
I'm writing a script that parses the "pure-ftpwho -s" command to get a list of the current transfers. But when a user disconnects from the FTP and reconnects back and resumes a transfer, the file shows up twice. I want to remove the ghosted one with Perl. After parsing, here is what the arrayref looks like (dumped with Data::Dumper)
$VAR1 = [
{
'status' => 'DL',
'percent' => '20',
'speed' => '10',
'file' => 'somefile.txt',
'user' => 'user1',
'size' => '14648'
},
{
'status' => 'DL',
'percent' => '63',
'speed' => '11',
'file' => 'somefile.txt',
'user' => 'user1',
'size' => '14648'
},
{
'status' => 'DL',
'percent' => '16',
'speed' => '60',
'file' => 'somefile.txt',
'user' => 'user2',
'size' => '14648'
}
];
Here user1 and user2 are downloading the same file, but user1 appears twice because the first one is a "ghost". What's the best way to check and remove elements that I don't need (in this case the first element of the arrayref). The condition to check is that - If the "file" key and "user" key is the same, then delete the hashref that contains the smaller value of "percent" key (if they're the same then delete all except one).
If order in the original arrayref doesn't matter, this should work:
my %users;
my #result;
for my $data (#$arrayref) {
push #{ $users{$data->{user}.$data->{file}} }, $data;
}
for my $value (values %users) {
my #data = sort { $a->{percent} <=> $b->{percent} } #$value;
push #result, $data[-1];
}
This can definitely be improved for efficiency.
The correct solution in this case would have been to use a hash when parsing the log file. Put all information into a hash, say %log, keyed by user and file:
$log{$user}->{$file} = {
'status' => 'DL',
'percent' => '20',
'speed' => '10',
'size' => '14648'
};
etc. Latter entries in the log file would overwrite earlier ones. Alternatively, you can choose to overwrite entries with lower percent completed with ones that have higher completion rates.
Using a hash would get rid of a lot of completely superfluous code working around the choice of the wrong data structure.
For what it's worth, here's my (slightly) alternative approach. Again, it doesn't preserve the original order:
my %most_progress;
for my $data ( sort { $b->{percent} <=> $a->{percent} } #$data ) {
next if exists $most_progress{$data->{user}.$data->{file}};
$most_progress{$data->{user}.$data->{file}} = $data;
}
my #clean_data = values %most_progress;
This will preserve order:
use strict;
use warnings;
my $data = [ ... ]; # As posted.
my %pct;
for my $i ( 0 .. $#{$data} ){
my $r = $data->[$i];
my $k = join '|', $r->{file}, $r->{user};
next if exists $pct{$k} and $pct{$k}[1] >= $r->{percent};
$pct{$k} = [$i, $r->{percent}];
}
#$data = #$data[sort map $_->[0], values %pct];
my %check;
for (my $i = 0; $i <= $#{$arrayref}; $i++) {
my $transfer = $arrayref->[$i];
# check the transfer for user and file
my $key = $transfer->{user} . $transfer->{file};
$check{$key} = { } if ( !exists $check{$key} );
if ( $transfer->{percent} <= $check{$key}->{percent} ) {
# undefine this less advanced transfer
$arrayref->[$i] = undef;
} else {
# remove the other transfer
$arrayref->[$check{$key}->{index}] = undef if exists $check{$key}->{index};
# set the new standard
$check{$key} = { index => $i, percent => $transfer->{percent} }
}
}
# remove all undefined transfers
$arrayref = [ grep { defined $_ } #$arrayref ];
Variation on the theme with Perl6::Gather
use Perl6::Gather;
my #cleaned = gather {
my %seen;
for (sort { $b->{percent} <=> $a->{percent} } #$data) {
take unless $seen{ $_->{user} . $_->{file} }++;
}
};
A rather odd experience. Using the latest PDK (v7.3) from ActiveState, I used perlctrl to build a COM DLL. Perlctrl ran without a hitch. OLEView read the typelib okay. RegSvr32 registered it okay. However ... there's no sign of it in registry, and anything that tries to use it fails. I hunted for the various UIDs using RegEdit and they're just not there.
The code is below. It's a wrapping of Lingua::ZH::WordSegmenter, but with the encoding changed to utf8 rather than gbk.
It's probably something obvious ...
package ZHWordSeg;
use strict;
use warnings;
use utf8;
use ws;
use Encode;
use constant STX => chr( 2 ); #[
use constant ETX => chr( 3 ); #]
use constant FS => chr( 28 ); #^
use constant RS => chr( 30 ); #~
use constant TAB_SEPARATOR => 0;
use constant CARET_SEPARATOR => 1;
use constant FS_SEPARATOR => 2;
use constant SPACE_SEPARATOR => 3;
use constant AS_ARRAY => 4;
use feature 'switch';
our $segmenter;
sub ZHWordSeg_Setup {
my $dic = shift;
my $dic_encoding = shift;
my $separator = shift;
my $verbose = shift;
$dic_encoding = 'utf8' unless defined( $dic_encoding );
$separator = " " unless defined( $separator );
$verbose = 0 unless defined( $verbose );
if ( defined( $dic ) ) {
$segmenter = ws->new( dic => $dic, dic_encoding => $dic_encoding, seperator => $separator, verbose => $verbose );
} else {
$segmenter = ws->new( dic_encoding => $dic_encoding, seperator => $separator, verbose => $verbose );
}
}
sub ZHWordSeg {
my $source = shift;
print STDERR $source;
my $sepcode = shift;
$source = encode("utf8",$source);
my $stringres = $segmenter->seg($source);
my #arrayres;
given ($sepcode) {
when (TAB_SEPARATOR) {
$stringres =~ tr/ /\t/;
return $stringres;
}
when (CARET_SEPARATOR) {
$stringres =~ tr/ /^/;
$stringres .= "^";
return $stringres;
}
when (FS_SEPARATOR) {
$stringres =~ s/ /FS/eg;
$stringres .= FS;
return $stringres;
}
when (SPACE_SEPARATOR) {
return $stringres;
}
default {
#arrayres = split( / /, $stringres );
return \#arrayres;
}
}
}
sub SetDictionary {
my ($source) = shift;
my $res = set_dic($source);
return $res;
}
1;
=pod
=begin PerlCtrl
%TypeLib = (
PackageName => 'ZHWordSeg',
DocString => 'Chinese word segmentation',
HelpContext => 1,
TypeLibGUID => '{F6C9BD66-7CA1-4610-B77F-E219A7122C18}', # do NOT edit this line
ControlGUID => '{45D47C6A-2B9A-4D62-9CFD-F18C95DC00C5}', # do NOT edit this line either
DispInterfaceIID=> '{007E4E7A-3B75-4DC3-864C-7746860941B3}', # or this one
ControlName => 'BOCWS',
ControlVer => 2, # increment if new object with same ProgID
# create new GUIDs as well
ProgID => 'ZHWordSeg.BOCWS',
LCID => 0,
DefaultMethod => 'ChineseWordSegmenter',
Methods => {
'ChineseWordSegmenter' => {
RetType => VT_VARIANT,
TotalParams => 2,
NumOptionalParams => 1,
ParamList =>
[ 'source' => VT_BSTR,
'sepcode' => VT_I4
]
},
'ChineseWordSegmenter_Setup' => {
RetType => VT_VARIANT,
TotalParams => 4,
NumOptionalParams => 4,
ParamList =>
[ 'dic' => VT_BSTR,
'dic_encoding' => VT_BSTR,
'separator' => VT_BSTR,
'verbose' => VT_BSTR
]
}
}, # end of 'Methods'
Properties => {
TAB_SEPARATOR => {
DocString => "Separate items with TAB (0x0)",
Type => VT_I4,
DispID => 3,
ReadOnly => 1,
},
CARET_SEPARATOR => {
DocString => "Separate items with ^ (0x1)",
Type => VT_I4,
DispID => 4,
ReadOnly => 1,
},
FS_SEPARATOR => {
DocString => "Separate items with ascii 28 (0x2)",
Type => VT_I4,
DispID => 5,
ReadOnly => 1,
},
SPACE_SEPARATOR => {
DocString => "Separate items with space (0x3)",
Type => VT_I4,
DispID => 6,
ReadOnly => 1,
},
AS_ARRAY => {
DocString => "Separate items as array (0x4)",
Type => VT_I4,
DispID => 7,
ReadOnly => 1,
}
}, # end of 'Properties'
); # end of %TypeLib
=end PerlCtrl
=cut
This is the .perlctrl file, in case it matters:
#!C:\Program Files\ActiveState Perl Dev Kit 7.3 Deployment\bin\lib\pai.exe
PAP-Version: 1.0
Packer: C:\Program Files\ActiveState Perl Dev Kit 7.3 Deployment\bin\perlctrl.exe
Script: ZHWordSeg.ctrl
Cwd: P:\BOCWS
Byref: 0
Clean: 0
Date: 2008-10-24 18:05:42
Debug: 127.0.0.1:2000
Dependent: 0
Dyndll: 0
Exe: BOCWS.dll
Force: 1
Gui: 0
Hostname: xi
No-Compress: 0
No-Gui: 0
No-Logo: 0
Runlib:
Shared: none
Singleton: 0
Tmpdir:
Verbose: 0
Version-Comments:
Version-CompanyName:
Version-FileDescription: Wrapper of Lingua::ZH::WordSegmenter.pm
Version-FileVersion: 1.0
Version-InternalName: ZHWordSeg
Version-LegalCopyright:
Version-LegalTrademarks:
Version-OriginalFilename: ZHWordSeg.ctrl
Version-ProductName: BOChineseWordSegmenter
Version-ProductVersion: 1.0
Warnings: 0
Xclude: 1
The only "solution" that I've found was suggested over on news:comp.os.ms-windows.programmer.win32
i am not a PDK user but from experience i can tell you, that you should check the DllRegister exports code and what it internally does, since this is what the regsvr32 calls and this code is creating the registry keys for your com server/proxy, etc, ...
You can track the registry for changes with sysinternal tools like regmon or procmon, just to make sure!
Ultimately, I gave up and went back to a previous version that worked and tweaked it.