Dealing with multiple capture groups in multiple records - perl

Data Format:
attribname: data
Data Example:
cheese: good
pizza: good
bagel: good
fire: bad
Code:
my $subFilter='(.+?): (.+)';
my #attrib = ($dataSet=~/$subFilter/g);
for (#attrib)
{
print "$_\n";
}
The code spits out:
cheese
good
pizza
good
[etc...]
I was wondering what an easy Perly way to do this is? I am parsing the data from a log the data above is trash for simplicity. I am newer to Perl, I suspect I could do this via fanangling indexes, but I was wondering if there is a short method of implementing this? Is there any way to have the capture groups put into two different variables instead of serially appended to the list along with all matches?
Edit: I want the attribute and it's associated value together so I can the do what I need to to them. For example if within my for loop I could access both the attribute name and attribute value.
Edit:
I tried
my %attribs;
while (my $line = <$data>)
{
my ($attrib, $value) = ($line=~m/$subFilter/);
print $attribs{$attrib}," : ", $value,"\n";
}
and no luck :( I don't get any output with this. My data is in a variable not a file, because it parsed out of a set of parent data which is in a file. It would be convenient if the my variable worked so that my (#attrib, #value) = ($line=~/$subFilter/g); filled the lists appropriately with the multiple matches.
Solution:
my #line = ($7 =~/(.+?)\n/g);
for (#line)
{
my ($attrib, $value) = ($_=~m/$subFilter/);
if ($attrib ne "")
{
print $attrib," : ", $value,"\n";
}
}

I'm not really clear on what you actually want to store, but here's how you could store the data in a hash table, with '1' indicating good and '0' indicating 'bad':
use strict;
use warnings;
use Data::Dumper;
my %foods;
while (my $line = <DATA>)
{
chomp $line;
my ($food, $good) = ($line =~ m/^(.+?): (.+)$/);
$foods{$food} = ($good eq 'good' ? 1 : 0);
}
print Dumper(\%foods);
__DATA__
cheese: good
pizza: good
bagel: good
fire: bad
This prints:
$VAR1 = {
'bagel' => 1,
'cheese' => 1,
'fire' => 0,
'pizza' => 1
};

A sensible approach would be to make use of the split function:
my %attrib;
open my $data, '<', 'fileName' or die "Unable to open file: $!";
while ( my $line = <$data> ) {
my ( $attrib, $value ) = split /:\s*/, $line, 2;
$attrib{$attrib} = $value;
}
close $data;
foreach my $attrib ( keys %attrib ) {
print "$attrib: $attrib{$attrib}\n";
}
If you're into one-liners, the following would achieve the same:
$ perl -F/:\s*/ -ane '$attrib{$F[0]} = $F[1]; } END { print $_,"\t",$attrib{$_},"\n" foreach keys %attrib;" fileName

Related

How to distinguish between "0" and NULL in perl?

Here we are looking for the string "reftext" in the given file. The line next to this contains a string with 3 integers. So we are extracting them in #all_num. We are printing the value of #all_num[2] only if is not NULL. But the logic used here doesn't print #all_num[2] even if it has 0.
#!/usr/bin/perl
open( READFILE, "<myfile.txt" );
#list = <READFILE>;
$total_lines = scalar #list;
for ( $count = 0; $count < $total_lines; $count++ ) {
if (#list[ $count =~ /reftext/ )
{
#all_num = #list[ $count + 1 ] =~ /(\d+)/g;
if ( #all_num[2] != NULL ) {
print "#all_num[2]\n";
}
}
}
Hope this helps,
use strict;
use warnings;
my #fvals = (
[ i => undef ],
[ j => 0 ],
[ k => "" ],
);
for my $r (#fvals) {
my ($k, $v) = #$r;
if (!defined($v)) { print "$k is undef\n"; }
elsif (!length($v)) { print "$k is empty string\n"; }
# elsif (!$v) { print "$k is zero\n"; }
# recognizes zero value in "0.0" or "0E0" notation
elsif ($v == 0) { print "$k is zero\n"; }
}
output
i is undef
j is zero
k is empty string
Perl does not include a NULL, so the line
if(#all_num[2]!= NULL)
is nonsensical in Perl. (More accurately, it attempts to locate a sub named NULL and run it to get the value to compare against #all_num[2], but fails to do so because you (presumably) haven't defined such a sub.) Note that, if you had enabled use strict, this would cause a fatal error instead of pretending to work. This is one of the many reasons to always use strict.
Side note: When you pull a value out of an array, it's only a single value, so you should say $all_num[2] rather than #all_num[2] when referring to the third element of the array #all_num. (Yes, this is a little confusing to get used to. I hear that it's been changed in Perl 6, but I'm assuming you're using Perl 5 here.) Note that, if you had enabled use warnings, it would have told you that "Scalar value #all_num[2] better written as $all_num[2]". This is one of the many reasons to always use warnings.
If you want to test whether $all_num[2] contains a value, the proper way to express that in Perl is
if (defined $all_num[2])
This is how your program would look using best practices
You should
Always use strict and use warnings, and declare all your variables with my
Use the three-parameter form of open
Check that open calls succeeded, and include $! in the die string if not
Use a while loop to process a file one line at a time, in preference to reading the entire file into memory
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', 'myfile.txt' or die $!;
while ( <$fh> ) {
next unless /reftext/;
my $next_line = <$fh>;
my #all_num = $next_line =~ /\d+/g;
print "$all_num[2]\n" if defined $all_num[2];
}
Try this:
#!/usr/bin/perl
use warnings;
use strict;
open(READFILE, "<", "myfile.txt") or die $!;
my #list = <READFILE>;
my $total_lines = scalar #list;
close (READFILE);
for(my $count=0; $count<$total_lines; $count++)
{
if($list[$count] =~ /reftext/)
{
my #all_num = $list[$count+1] =~ /(\d+)/g;
if($all_num[2] ne '')
{
print "$all_num[2]\n";
}
}
}
To check a variable is null or not:
if ($str ne '')
{
print $str;
}
or better:
my ($str);
$str = "";
if (defined($str))
{
print "defined";
}
else
{
print "not defined";
}
If the other answers do not work, try treating the variable as a string:
if ( $all_num[2] == 'null' && length($all_num[2]) == 4 ){
# null
} else {
# not null
}
As with any code you write, be sure to test your code.

How to show matching and Miss matching records of two text files in command prompt using Perl?

I'm using two text files sampleA.txt and sampleB.txt. I have two fields in each file and I need to compare first record(first row) of sampleA.txt with the first row of sampleB.txt and I want to show matching records as well as miss matching records in command prompt.I need to do that in Perl.
Using the below script I'm getting one output but it is wrong. I need to populate both matching as well as mismatching. How to do that?
sampleA.txt:
1|X
2|A
4|Z
5|A
sampleB.txt:
2|A
2|X
3|B
4|C
Output I'm getting:
2|A
2|X
4|C
Outputs I want:
Matching-Output:
2|A
Miss-matching-Output:
1|X
4|Z
5|A
3|B
4|C
Perl Script:
#!/usr/bin/perl
use strict;
use warnings;
open(FILE1,'C:\Users\sathiya.kumar\Desktop\sampleA.txt') || die $!;
open(FILE2,'C:\Users\sathiya.kumar\Desktop\sampleB.txt') || die $!;
my $interline;
while (my $line= <FILE1>) {
my #fields = split('\|',$line);
parser($fields[0]);
}
sub parser {
my $mergeid = shift;
while (defined $interline || ($interline= <FILE2>)) {
my #fields = split('\|',$interline);
my $key = $fields[0];
if ($key lt $mergeid) {
# Skip non-matching records
$interline = undef;
next;
} elsif ($key gt $mergeid) {
# wait for next key
last;
} else {
print $interline;
$interline = undef;
}
}
}
close(FILE1);
close(FILE2);
Let me know if you need more information.
You left out 2|X:
use strict;
use warnings;
use 5.016;
use Data::Dumper;
#Create a set from the entries in sampleA.txt:
my $fname = 'sampleA.txt';
open my $A_INFILE, '<', $fname
or die "Couldn't open $fname: $!";
my %a;
while (my $line = <$A_INFILE>) {
chomp $line;
$a{$line} = undef;
}
close $A_INFILE;
say Dumper(\%a);
#Create a set from the entries in sampleB.txt:
$fname = 'sampleB.txt';
open my $B_INFILE, '<', $fname
or die "Couldn't open $fname: $!";
my %b;
while (my $line = <$B_INFILE>) {
chomp $line;
$b{$line} = undef;
}
close $B_INFILE;
say Dumper(\%b);
#Divide the entries in both files into matches and mismatches:
my (#matches, #mismatches);
for my $a_val (keys %a) {
if (exists $b{$a_val}) {
push #matches, $a_val;
}
else {
push #mismatches, $a_val;
}
}
for my $b_val (keys %b) {
if (not exists $a{$b_val}) {
push #mismatches, $b_val;
}
}
say Dumper(\#matches);
say Dumper(\#mismatches);
--output:--
$VAR1 = {
'5|A' => undef,
'4|Z' => undef,
'1|X' => undef,
'2|A' => undef
};
$VAR1 = {
'2|X' => undef,
'3|B' => undef,
'4|C' => undef,
'2|A' => undef
};
$VAR1 = [
'2|A'
];
$VAR1 = [
'5|A',
'4|Z',
'1|X',
'2|X',
'3|B',
'4|C'
];
If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to find out whether Perl's internal hashing algorithm is performing poorly on your data set. For example, you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/16" , which means only one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen. If a tied hash is evaluated in scalar context, the SCALAR method is called (with a fallback to FIRSTKEY ).
http://perldoc.perl.org/perldata.html

functional Perl: Filter, Iterator

I have to write Perl although I'm much more comfortable with Java, Python and functional languages. I'd like to know if there's some idiomatic way to parse a simple file like
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
I want a function that I pass an iterator over the lines of the files and that returns a map from keys to list of values. But to be functional and structured I'd like to:
use a filter that wraps the given iterator and returns an iterator without empty lines or comment lines
The mentioned filter(s) should be defined outside of the function for reusability by other functions.
use another function that is given the line and returns a tuple of key and values string
use another function that breaks the comma separated values into a list of values.
What is the most modern, idiomatic, cleanest and still functional way to do this? The different parts of the code should be separately testable and reusable.
For reference, here is (a quick hack) how I might do it in Python:
re_is_comment_line = re.compile(r"^\s*#")
re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$")
re_splitter = re.compile(r"\s*,\s*")
is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line))
and re_key_values.match(line)
def parse(lines):
interesting_lines = ifilter(is_interesting_line, imap(strip, lines))
key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines)
splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values)
return dict(splitted_values)
A direct translation of your Python would be
my $re_is_comment_line = qr/^\s*#/;
my $re_key_values = qr/^\s*(\w+)\s*=\s*(.*)$/;
my $re_splitter = qr/\s*,\s*/;
my $is_interesting_line= sub {
my $_ = shift;
length($_) and not /$re_is_comment_line/ and /$re_key_values/;
};
sub parse {
my #lines = #_;
my #interesting_lines = grep $is_interesting_line->($_), #lines;
my #key_values = map [/$re_key_values/], #interesting_lines;
my %splitted_values = map { $_->[0], [split $re_splitter, $_->[1]] } #key_values;
return %splitted_values;
}
Differences are:
ifilter is called grep, and can take an expression instead of a block as first argument. These are roughly equivalent to a lambda. The current item is given in the $_ variable. The same applies to map.
Perl doesn't emphazise laziness, and seldomly uses iterators. There are instances where this is required, but usually the whole list is evaluated at once.
In the next example, the following will be added:
Regexes don't have to be precompiled, Perl is very good with regex optimizations.
Instead of extracting key/values with regexes, we use split. It takes an optional third argument that limits the number of resulting fragments.
The whole map/filter stuff can be written in one expression. This doesn't make it more efficient, but emphazises the flow of data. Read the map-map-grep from bottom upwards (actually right to left, think of APL).
.
sub parse {
my %splitted_values =
map { $_->[0], [split /\s*,\s*/, $_->[1]] }
map {[split /\s*=\s*/, $_, 2]}
grep{ length and !/^\s*#/ and /^\s*\w+\s*=\s*\S/ }
#_;
return \%splitted_values; # returning a reference improves efficiency
}
But I think a more elegant solution here is to use a traditional loop:
sub parse {
my %splitted_values;
LINE: for (#_) {
next LINE if !length or /^\s*#/;
s/\A\s*|\s*\z//g; # Trimming the string—omitted in previous examples
my ($key, $vals) = split /\s*=\s*/, $_, 2;
defined $vals or next LINE; # check if $vals was assigned
#{ $splitted_values{$key} } = split /\s*,\s*/, $vals; # Automatically create array in $splitted_values{$key}
}
return \%splitted_values
}
If we decide to pass a filehandle instead, the loop would be replaced with
my $fh = shift;
LOOP: while (<$fh>) {
chomp;
...;
}
which would use an actual iterator.
You could now go and add function parameters, but do this only iff you are optimizing for flexibility and nothing else. I already used a code reference in the first example. You can invoke them with the $code->(#args) syntax.
use Carp; # Error handling for writing APIs
sub parse {
my $args = shift;
my $interesting = $args->{interesting} or croak qq("interesting" callback required);
my $kv_splitter = $args->{kv_splitter} or croak qq("kv_splitter" callback required);
my $val_transform= $args->{val_transform} || sub { $_[0] }; # identity by default
my %splitted_values;
LINE: for (#_) {
next LINE unless $interesting->($_);
s/\A\s*|\s*\z//g;
my ($key, $vals) = $kv_splitter->($_);
defined $vals or next LINE;
$splitted_values{$key} = $val_transform->($vals);
}
return \%splitted_values;
}
This could then be called like
my $data = parse {
interesting => sub { length($_[0]) and not $_[0] =~ /^\s*#/ },
kv_splitter => sub { split /\s*=\s*/, $_[0], 2 },
val_transform => sub { [ split /\s*,\s*/, $_[0] ] }, # returns anonymous arrayref
}, #lines;
I think the most modern approach consists in taking advantage of the CPAN modules. In your example, Config::Properties may helps:
use strict;
use warnings;
use Config::Properties;
my $config = Config::Properties->new(file => 'example.properties') or die $!;
my $value = $config->getProperty('key');
As indicated in the posts linked to by #collapsar, Higher-Order Perl is a great read for exploring functional techniques in Perl.
Here is an example that hits your bullet points:
use strict;
use warnings;
use Data::Dumper;
my #filt_rx = ( qr{^\s*\#},
qr{^[\r\n]+$} );
my $kv_rx = qr{^\s*(\w+)\s*=\s*([^\r\n]*)};
my $spl_rx = qr{\s*,\s*};
my $iterator = sub {
my ($fh) = #_;
return sub {
my $line = readline($fh);
return $line;
};
};
my $filter = sub {
my ($it,#r) = #_;
return sub {
my $line;
do {
$line = $it->();
} while ( defined $line
&& grep { $line =~ m/$_/} #r );
return $line;
};
};
my $kv = sub {
my ($line,$rx) = #_;
return ($line =~ m/$rx/);
};
my $spl = sub {
my ($values,$rx) = #_;
return split $rx, $values;
};
my $it = $iterator->( \*DATA );
my $f = $filter->($it,#filt_rx);
my %map;
while ( my $line = $f->() ) {
my ($k,$v) = $kv->($line,$kv_rx);
$map{$k} = [ $spl->($v,$spl_rx) ];
}
print Dumper \%map;
__DATA__
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
It produces the following hash on the provided input:
$VAR1 = {
'key2' => [
'value1',
'value2',
'value3'
],
'key1' => [
'value'
]
};
you might be interested in this SO question as well as this one.
the following code is a self-contained perl script destined to give you an idea of how to implement in perl (only partially in a functional style; in case you don't revulse seeing the particular coding style and/or language construct, i can refine the solution somewhat).
Miguel Prz is right that in most cases you'd search CPAN for solutions to match your requirements.
my (
$is_interesting_line
, $re_is_comment_line
, $re_key_values
, $re_splitter
);
$re_is_comment_line = qr(^\s*#);
$re_key_values = qr(^\s*(\w+)\s*=\s*(.*)$);
$re_splitter = qr(\s*,\s*);
$is_interesting_line = sub {
my $line = shift;
return (
(!(
!defined($line)
|| ($line eq '')
))
&& ($line =~ /$re_key_values/)
);
};
sub strip {
my $line = shift;
# your implementation goes here
return $line;
}
sub parse {
my #lines = #_;
#
my (
$dict
, $interesting_lines
, $k
, $v
);
#
#$interesting_lines =
grep {
&{$is_interesting_line} ( $_ );
} ( map { strip($_); } #lines )
;
$dict = {};
map {
if ($_ =~ /$re_key_values/) {
($k, $v) = ($1, [split(/$re_splitter/, $2)]);
$$dict{$k} = $v;
}
} #$interesting_lines;
return $dict;
} # parse
#
# sample execution goes here
#
my $parse =<<EOL;
# comment
what = is, this, you, wonder
it = is, perl
EOL
parse ( split (/[\r\n]+/, $parse) );

Perl Creating hash reference and looping through one element from each branch at a time

As a beginner I have what I think is a rather complicated problem I am hoping someone could help with.
I have the following text file (tab delminated)...
FILE1.txt
Dog Big
Dog Medium
Dog Small
Rabbit Huge
Rabbit Tiny
Rabbit Middle
Donkey Massive
Donkey Little
Donkey Gigantic
I need to read FILE1.txt into a hash reference to get something like the following... (using Data::Dumper)
$VAR1 = {
'Dog' => {
'Big',
'Medium',
'Small'
},
'Rabbit => {
'Huge',
'Tiny',
'Middle'
},
'Donkey => {
'Massive',
'Little',
'Gigantic'
},
};
The problem I am having:
I then need to loop through each branch of the hash reference one at a time, I will use the value from the hash reference to check if this matches my keyword, if so it will then return it's corresponding key.... for example...
What I need it to do:
my $keyword == "Little";
Dog->Big
if 'Big' matches my keyword then return $found = Dog
else go to the next branch
Rabbit->Huge
if 'Huge' matches my keyword then return $found = Rabbit
else go to the next branch
Donkey->Massive
if 'Massive' matches my keyword then return $found = Donkey
else go to the next branch (which is Dog again, but the second element this time)
Dog->Medium
if 'Medium' matches my keyword then return $found = Dog
else go to the next branch
Rabbit->Tiny
if 'Tiny' matches my keyword then return $found = Rabbit
else go the the next branch
Donkey->Little
if 'Little' matches my keyword then return $found = Donkey
..... and so on until the keyword is found or we reach the end of the hash reference
This is the kind of thing I am trying to achieve but don't know how to go about doing this, or whether a hash reference is the best way to do this, or if it can even be done with a hash/hash reference?
your help with this is much appreciated, thanks
Choosing proper data structure is often key step to the solution, but first of all you should define what you are trying achieve. What is overall goal? For example I have this data file and in mine application/program I need frequently ask for this information. It is crucial to ask proper question because for example if you don't need ask frequently for keyword it doesn't make sense creating hash at all.
perl -anE'say $F[0] if $F[1] eq "Little"' FILE1.txt
Yes it is that simple. Look in perlrun manpage for switches and what they mean and how to do same thing in bigger application.
If you need frequently ask for this question you should arrange your data in way which helps you and not in way you have to battle with.
use strict;
use warnings;
use feature qw(say);
use autodie;
open my $f, '<', 'FILE1.txt';
my %h;
while(<$f>) {
chomp;
my ($animal, $keyword) = split' ';
$h{$keyword} = $animal unless exists $h{$keyword};
}
close $f;
for my $keyword (qw(Little Awkward Small Tiny)) {
say $h{$keyword} ? "$keyword $h{$keyword}" : "keyword $keyword not found";
}
But if you still insist you want to traverse hash you can do it but you has been warned.
open my $f, '<', 'FILE1.txt';
my %h;
while (<$f>) {
chomp;
my ( $animal, $keyword ) = split ' ';
push #{ $h{$animal} }, $keyword;
}
close $f;
KEYWORD:
for my $keyword (qw(Little Awkward Small Tiny)) {
for my $animal (keys %h) {
for my $k (#{$h{$animal}}) {
if($k eq $keyword) {
say "$keyword $animal";
next KEYWORD;
}
}
}
say "keyword $keyword not found";
}
To critique my own answer: the structuring of the part that does the search could be better. And maybe it is pointless even using an ordered hash as the search is through a linear list. Maybe it should be an array of arrays
use strict;
use warnings;
use Tie::IxHash;
#open file
open(my $fh,"ani.txt") ||die $!;
#make an ordered hash
tie my %sizes, 'Tie::IxHash';
#read file into hash of arrays
while(<$fh>) {
(my $animal,my $size)=split(/\s+/);
if (!exists($sizes{$animal})) {
$sizes{$animal} = [$size];
} else {
push #{$sizes{$animal}},$size;
}
}
my $keyword="Little";
my $running=1;
my $depth=0;
while( $running ) {
$running = 0;
for my $search (keys %sizes) {
next if ($depth > #{$sizes{$search}});
$running = 1;
if ($keyword eq $sizes{$search}[$depth]) {
print "FOUND!!!!!! $search $depth";
exit(0);
}
}
$depth++;
}
Here is another version of a solution to the stated problem. To solve the actual problem given there is no need to store anything except the first "size" key for each animal in a hash
This hash can then be trivally used to look up the animal
use strict;
use warnings;
open(my $fh,"ani.txt") ||die $!;
my %animals;
#read file into hash
while(<$fh>) {
(my $animal,my $size)=split(/\s+/);
#only add the animal the first time the size is found
if (!exists($animals{$size})) {
$animals{$size} = $animal;
}
}
my $keyword="Little";
print "animal is ", $animals{$keyword};

Push into end of hash in Perl

So what I am trying to do with the following code is push a string, let's say "this string" onto the end of each key in a hash. I'm completely stumped on how to do this. Here's my code:
use warnings;
use strict;
use File::Find;
my #name;
my $filename;
my $line;
my #severity = ();
my #files;
my #info = ();
my $key;
my %hoa;
my $xmlfile;
my $comment;
my #comments;
open( OUTPUT, "> $ARGV[0]" );
my $dir = 'c:/programs/TEST/Test';
while ( defined( $input = glob( $dir . "\\*.txt" ) ) ) {
open( INPUT, "< $input" );
while (<INPUT>) {
chomp;
if (/File/) {
my #line = split /:/;
$key = $line[1];
push #{ $hoa{$key} }, "Filename\n";
}
if ( /XML/ ... /File/ ) {
$xmlfile = $1;
push #{ $hoa{$key} }, "XML file is $xmlfile\n";
}
if (/Important/) {
push #{ $hoa{$key} }, "Severity is $_\n";
}
if (/^\D/) {
next if /Important/;
push #{ $hoa{$key} }, "Given comment is $_\n";
}
push #{ $hoa{$key} }, "this string\n";
}
}
foreach my $k ( keys %hoa ) {
my #list = #{ $hoa{$k} };
foreach my $l (#list) {
print OUTPUT $l, "\n";
}
}
}
close INPUT;
close OUTPUT;
Where I have "this string" is where I was trying to push that string onto the end of the array. However, what ended up happening was that it ended up printing "this string" three times, and not at the end of every key like I wanted. When I tried to put it outside the while() loop, it said that the value of $key was not initialized. So please, any help? And if you need any clarification on what I'm asking, just let me know. Thank you!
No offence, but there are so many issues in this code I don't even know where to start...
First, the 'initialization block' (all these my $something; my #somethings lines at the beginning of this script) is not required in Perl. In fact, it's not just 'redundant' - it's actually confusing: I had to move my focus back and forth every time I encountered a new variable just to check its type. Besides, even with all this $input var is still not declared as local; it's either missing in comments, or the code given has omissions.
Second, why do you declare your intention to use File::Find (good) - but then do not use it at all? It could greatly simplify all this while(glob) { while(<FH>) { ... } } routine.
Third, I'm not sure why you assign something to $key only when the line read is matched by /File/ - but then use its value as a key in all the other cases. Is this an attempt to read the file organized in sections? Then it can be done a bit more simple, either by slurp/splitting or localizing $/ variable...
Anyway, the point is that if the first line of the file scanned is not matched by /File/, the previous (i.e., from the previous file!) value is used - and I'm not quite sure that it's intended. And if the very first line of the first file is not /File/-matched, then an empty string is used as a key - again, it smells like a bug...
Could you please describe your task in more details? Give some test input/output results, perhaps... It'd be great to proceed in short tasks, organizing your code in process.
Your program is ill-conceived and breaks a lot of good practice rules. Rather than enumerate them all, here is an equivalent program with a better structure.
I wonder if you are aware that all of the if statements will be tested and possibly executed? Perhaps you need to make use of elsif?
Aside from the possibility that $key is undefined when it is used, you are also setting $xmlfile to $1 which will never be defined as there are no captures in any of your regular expressions.
It is impossible to tell from your code what you are trying to do, so we can help you only if you show us your output, input and say how to derive one from the other.
use strict;
use warnings;
use File::Find;
my ($outfile) = #ARGV;
my $dir = 'c:/programs/TEST/Test';
my %hoa;
my $key;
while (my $input = glob "$dir/*.txt") {
open my $in, '<', $input or die $!;
while (<$in>) {
chomp;
if (/File/) {
my $key = (split /:/)[1];
push #{ $hoa{$key} }, "Filename\n";
}
if (/XML/ ... /File/) {
my $xmlfile = $1;
push #{ $hoa{$key} }, "XML file is $xmlfile\n";
}
if (/Important/) {
push #{ $hoa{$key} }, "Severity is $_\n";
}
if (/^\D/) {
next if /Important/;
push #{ $hoa{$key} }, "Given comment is $_\n";
}
push #{ $hoa{$key} }, "this string\n";
}
close $in;
}
open my $out, '>', $outfile or die $!;
foreach my $k (keys %hoa) {
foreach my $l (#{ $hoa{$k} }) {
print $out $l, "\n";
}
}
close $out;
I suspect based on your code, that the line where $key is set is not called each time through the loop, and that you do not trigger any of the other if statements.
This would append "this string" to the end of the array. Based on that you are getting 3 of the "this strings" at the end of the array, I would suspect that two lines do not go through the if (/FILE/) or any of the other if statements. This would leave the $key value the same and at the end, you would append "this string" to the array, using whatever the last value of $key was when it was set.
This will append the string "this string" to every element of the hash %hoa, which elements are array refs:
for (values(%hoa)) { push #{$_}, "this string"; }
Put that outside your while loop, and you'll print "this string" at the end of each element of %hoa.
It will autovivify array refs where it finds undefined elements. It will also choke if it cannot dereference an element as an array, and will manipulate arrays by symbolic reference if it finds a simple scalar and is not running under strict:
my %autoviv = ( a => ['foo'], b => undef );
push #$_, "PUSH" for values %autoviv; # ( a => ['foo', 'PUSH'], b => ['PUSH'] )
my %fatal = ( a => {} );
push #$_, "PUSH" for values %fatal; # FATAL: "Not an ARRAY reference at..."
my %dangerous = (a => "foo");
push #$_, "PUSH" for values %dangerous; # Yikes! #foo is now ("PUSH")
use strict;
my %kablam = (a => "foo");
push #$_, "PUSH" for values %kablam; # "Can't use string ("foo") as an ARRAY ref ..."
As I understand it, traverse the hash with a map command to modify its keys. An example:
EDIT: I've edited because I realised that the map command can be assigned to the same hash. No need to create a new one.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %hash = qw|
key1 value1
key2 value2
key3 value3
|;
my %hash = map { $_ . "this string" => $hash{ $_ } } keys %hash;
print Dump \%hash;
Run it like:
perl script.pl
With following output:
$VAR1 = {
'key3this string' => 'value3',
'key2this string' => 'value2',
'key1this string' => 'value1'
};