Unable to understand the behaviour of Perl hash ordering - perl

I am a beginner in Perl and I am trying to run this sample example from "Beginning Perl:Curtis Poe"
#!/perl/bin/perl
use strict;
use warnings;
use diagnostics;
my $hero = 'Ovid';
my $fool = $hero;
print "$hero isn't that much of a hero. $fool is a fool.\n";
$hero = 'anybody else';
print "$hero is probably more of a hero than $fool.\n";
my %snacks = (
stinky => 'limburger',
yummy => 'brie',
surprise => 'soap slices',
);
my #cheese_tray = values %snacks;
print "Our cheese tray will have: ";
for my $cheese (#cheese_tray) {
print "'$cheese' ";
}
print "\n";
Output of the above code, when I tried on my windows7 system with ActivePerl and in codepad.org
Ovid isn't that much of a hero. Ovid is a fool.
anybody else is probably more of a hero than Ovid.
Our cheese tray will have: 'limburger''soap slices''brie'
I am not clear with third line which prints 'limburger''soap slices''brie' but hash order is having 'limburger''brie''soap slices'.
Please help me to understand.

Hashes are not ordered. If you want a specific order, you need to use an array.
For example:
my #desc = qw(stinky yummy surprise);
my #type = ("limburger", "brie", "soap slices");
my %snacks;
#snacks{#desc} = #type;
Now you have the types in #type.
You can of course also use sort:
my #type = sort keys %snacks;

perldoc perldata:
Hashes are unordered collections of scalar values indexed by their
associated string key.
You can sort keys or values as needed.

I think the key is:
my #cheese_tray = values %snacks
From [1]: http://perldoc.perl.org/functions/values.html
"Hash entries are returned in an apparently random order. The actual random order is specific to a given hash; the exact same series of operations on two hashes may result in a different order for each hash."

Related

Perl searching for string contained in array

I have an array with the following values:
push #fruitArray, "apple|0";
push #fruitArray, "apple|1";
push #fruitArray, "pear|0";
push #fruitArray, "pear|0";
I want to find out if the string "apple" exists in this array (ignoring the "|0" "|1")
I am using:
$fruit = 'apple';
if( $fruit ~~ #fruitArray ){ print "I found apple"; }
Which isn't working.
Don't use smart matching. It never worked properly for a number of reasons and it is now marked as experimental
In this case you can use grep instead, together with an appropriate regex pattern
This program tests every element of #fruitArray to see if it starts with the letters in $fruit followed by a pipe character |. grep returns the number of elements that matched the pattern, which is a true value if at least one matched
my #fruitArray = qw/ apple|0 apple|1 pear|0 pear|0 /;
my $fruit = 'apple';
print "I found $fruit\n" if grep /^$fruit\|/, #fruitArray;
output
I found apple
I - like #Borodin suggests, too - would simply use grep():
$fruit = 'apple';
if (grep(/^\Q$fruit\E\|/, #fruitArray)) { print "I found apple"; }
which outputs:
I found apple
\Q...\E converts your string into a regex pattern.
Looking for the | prevents finding a fruit whose name starts with the name of the fruit for which you are looking.
Simple and effective... :-)
Update: to remove elements from array:
$fruit = 'apple';
#fruitsArrayWithoutApples = grep ! /^\Q$fruit\E|/, #fruitArray;
If your Perl is not ancient, you can use the first subroutine from the List::Util module (which became a core module at Perl 5.8) to do the check efficiently:
use List::Util qw{ first };
my $first_fruit = first { /\Q$fruit\E/ } #fruitArray;
if ( defined $first_fruit ) { print "I found $fruit\n"; }
Don't use grep, that will loop the entire array, even if it finds what you are looking for in the first index, so it is inefficient.
this will return true if it finds the substring 'apple', then return and not finish iterating through the rest of the array
#takes a reference to the array as the first parameter
sub find_apple{
#array_input = #{$_[0]};
foreach $fruit (#array_input){
if (index($fruit, 'apple') != -1){
return 1;
}
}
}
You can get close to the smartmatch sun without melting your wings by using match::simple:
use match::simple;
my #fruits = qw/apple|0 apple|1 pear|0 pear|0/;
$fruit = qr/apple/ ;
say "found $fruit" if $fruit |M| \#fruits ;
There's also a match() function if the infix [M] doesn't read well.
I like the way match::simple does almost everything I expected from ~~ without any surprising complexity. If you're fluent in perl it probably isn't something you'd see as necessary, but - especially with match() - code can be made pleasantly readable ... at the cost of imposing the use of references, etc.

Perl: simple foreach on hash hands mixed results? [duplicate]

activePerl 5.8 based
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
# declare a new hash
my %some_hash;
%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello",
"wilma", 1.72e30, "betty", "bye\n");
my #any_array;
#any_array = %some_hash;
print %some_hash;
print "\n";
print #any_array;
print "\n";
print $any_array[0];
print "\n";
print $any_array[1];
print "\n";
print $any_array[2];
print "\n";
print $any_array[3];
print "\n";
print $any_array[4];
print "\n";
print $any_array[5];
print "\n";
print $any_array[6];
print "\n";
print $any_array[7];
print "\n";
print $any_array[8];
print "\n";
print $any_array[9];
Output as this
D:\learning\perl>test.pl
bettybye
bar12.4wilma1.72e+030foo352.5hello
bettybye
bar12.4wilma1.72e+030foo352.5hello
betty
bye
bar
12.4
wilma
1.72e+030
foo
35
2.5
hello
D:\learning\perl>
What decided the elements print order in my sample code?
Any rule to follow when print a mixed(strings, numbers) hash in Perl? Thank you.
bar12.4wilma1.72e+030foo352.5hello
[Updated]
With you guys help, i updated the code as below.
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
# declare a new hash
my %some_hash;
%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello",
"wilma", 1.72e30, "betty", "bye");
my #any_array;
#any_array = %some_hash;
print %some_hash;
print "\n";
print "\n";
print #any_array;
print "\n";
print "\n";
my #keys;
#keys = keys %some_hash;
for my $k (sort #keys)
{
print $k, $some_hash{$k};
}
output
D:\learning\perl>test.pl
bettybyebar12.4wilma1.72e+030foo352.5hello
bettybyebar12.4wilma1.72e+030foo352.5hello
2.5hellobar12.4bettybyefoo35wilma1.72e+030
D:\learning\perl>
Finially, after called keys and sort functions. The hash keys print followed the rule below
2.5hellobar12.4bettybyefoo35wilma1.72e+030
Elements of a hash are printed out in their internal order, which can not be relied upon and will change as elements are added and removed. If you need all of the elements of a hash in some sort of order, sort the keys, and use that list to index the hash.
If you are looking for a structure that holds its elements in order, either use an array, or use one of the ordered hash's on CPAN.
the only ordering you can rely upon from a list context hash expansion is that key => value pairs will be together.
From perldoc -f keys:
The keys of a hash are returned in an apparently random order. The actual random order is subject to change in future versions of Perl, but it is guaranteed to be the same order as either the values or each function produces (given that the hash has not been modified). Since Perl 5.8.1 the ordering is different even between different runs of Perl for security reasons (see Algorithmic Complexity Attacks in perlsec).
...
Perl has never guaranteed any ordering of the hash keys, and the ordering has already changed several times during the lifetime of Perl 5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order.
Also note that while the order of the hash elements might be randomised, this "pseudoordering" should not be used for applications like shuffling a list randomly (use List::Util::shuffle() for that, see List::Util, a standard core module since Perl 5.8.0; or the CPAN module Algorithm::Numerical::Shuffle), or for generating permutations (use e.g. the CPAN modules Algorithm::Permute or Algorithm::FastPermute), or for any cryptographic applications.
Note: since you are evaluating a hash in list context, you are at least guaranteed that each key is followed by its corresponding value; e.g. you will never see an output of a 4 b 3 c 2 d 1.
I went over your code and made some notes that I think you will find helpful.
use strict;
use warnings;
# declare a new hash and initialize it at the same time
my %some_hash = (
foo => 35, # use the fat-comma or '=>' operator, it quotes the left side
bar => 12.4,
2.5 => "hello",
wilma => 1.72e30,
betty => "bye", # perl ignores trailing commas,
# the final comma makes adding items to the end of the list less bug prone.
);
my #any_array = %some_hash; # Hash is expanded into a list of key/value pairs.
print "$_ => $some_hash{$_}\n"
for keys %some_hash;
print "\n\n", # You can print multiple newlines in one string.
"#any_array\n\n"; # print takes a list of things to print.
# In print #foo; #foo is expanded into a list of items to print.
# There is no separator between the members of #foo in the output.
# However print "#foo"; interpolates #foo into a string.
# It inserts spaces between the members of the arrays.
# This is the block form of 'for'
for my $k (sort keys %some_hash)
{
# Interpolating the variables into a string makes it easier to read the output.
print "$k => $some_hash{$k}\n";
}
Hashes provide unordered, access to data by a string key.
Arrays provide access to ordered data. Random access is available by using a numerical index.
If you need to preserve the order of a group of values, use an array. If you need to look up members of the group by an associated name, use a hash.
If you need to do both, you can use both structures together:
# Keep an array of sorted hash keys.
my #sorted_items = qw( first second third fourth );
# Store the actual data in the hash.
my %item;
#item{ #sorted_items } = 1..4; # This is called a hash slice.
# It allows you to access a list of hash elements.
# This can be a very powerful way to work with hashes.
# random access
print "third => $item{third}\n";
# When you need to access the data in order, iterate over
# the array of sorted hash keys. Use the keys to access the
# data in the hash.
# ordered access
for my $name ( #sorted_items ) {
print "$name => $item{$name}\n";
}
Looking at your code samples, I see a couple of things you might want to work on.
how looping structures like for and while can be used to reduce repeated code.
how to use variable interpolation
BTW, I am glad to see you working on basics and improving your code quality. This investment of time will pay off. Keep up the good work.
The elements are (almost certainly) printed out in the order they appear (internally) in the hash table itself -- i.e. based on the hash values of their keys.
The general rule to follow is to use something other than a hash table if you care much about the order.
Hashes are not (necessarily) retrieved in a sorted manner. If you want them sorted, you have to do it yourself:
use strict;
use warnings;
my %hash = ("a" => 1, "b" => 2, "c" => 3, "d" => 4);
for my $i (sort keys %hash) {
print "$i -> $hash{$i}\n";
}
You retrieve all the keys from a hash by using keys and you then sort them using sort. Yeah, I know, that crazy Larry Wall guy, who would've ever thought of calling them that? :-)
This outputs:
a -> 1
b -> 2
c -> 3
d -> 4
For most practical purposes, the order in which a hash table (not just Perl hash variables, but hash tables in general) can be considered random.
In reality, depending on the hashing implementation, the order may actually be deterministic. (i.e., If you run the program multiple times putting the same items into the hash table in the same order each time, they'll be stored in the same order each time.) I know that Perl hashes used to have this characteristic, but I'm not sure about current versions. In any case, hash key order is not a reliable source of randomness to use in cases where randomness is desirable.
Short version, then:
Don't use a hash if you care about the order (or lack of order). If you want a fixed order, it will be effectively random and if you want a random order, it will be effectively fixed.
A hash defines no ordering properties. The order in which things come out will be unpredictable.
And if you are crazy and have no duplicate values in your hash, and you need the values sorted, you can call reverse on it.
my %hash = ("a" => 1, "b" => 2, "c" => 3, "d" => 4);
my %reverse_hash = reverse %hash;
print $_ for sort keys %reverse_hash;
Caveat is the unique values part, duplicates will be overwritten and only one value will get in.

How do I create a hash of hashes in Perl?

Based on my current understanding of hashes in Perl, I would expect this code to print "hello world." It instead prints nothing.
%a=();
%b=();
$b{str} = "hello";
$a{1}=%b;
$b=();
$b{str} = "world";
$a{2}=%b;
print "$a{1}{str} $a{2}{str}";
I assume that a hash is just like an array, so why can't I make a hash contain another?
You should always use "use strict;" in your program.
Use references and anonymous hashes.
use strict;use warnings;
my %a;
my %b;
$b{str} = "hello";
$a{1}={%b};
%b=();
$b{str} = "world";
$a{2}={%b};
print "$a{1}{str} $a{2}{str}";
{%b} creates reference to copy of hash %b. You need copy here because you empty it later.
Hashes of hashes are tricky to get right the first time. In this case
$a{1} = { %b };
...
$a{2} = { %b };
will get you where you want to go.
See perldoc perllol for the gory details about two-dimensional data structures in Perl.
Short answer: hash keys can only be associated with a scalar, not a hash. To do what you want, you need to use references.
Rather than re-hash (heh) how to create multi-level data structures, I suggest you read perlreftut. perlref is more complete, but it's a bit overwhelming at first.
Mike, Alexandr's is the right answer.
Also a tip. If you are just learning hashes perl has a module called Data::Dumper that can pretty-print your data structures for you, which is really handy when you'd like to check what values your data structures have.
use Data::Dumper;
print Dumper(\%a);
when you print this it shows
$VAR1 = {
'1' => {
'str' => 'hello'
},
'2' => {
'str' => 'world'
}
};
Perl likes to flatten your data structures. That's often a good thing...for example, (#options, "another option", "yet another") ends up as one list.
If you really mean to have one structure inside another, the inner structure needs to be a reference. Like so:
%a{1} = { %b };
The braces denote a hash, which you're filling with values from %b, and getting back as a reference rather than a straight hash.
You could also say
$a{1} = \%b;
but that makes changes to %b change $a{1} as well.
I needed to create 1000 employees records for testing a T&A system. The employee records were stored in a hash where the key was the employee's identity number, and the value was a hash of their name, date of birth, and date of hire etc. Here's how...
# declare an empty hash
%employees = ();
# add each employee to the hash
$employees{$identity} = {gender=>$gender, forename=>$forename, surname=>$surname, dob=>$dob, doh=>$doh};
# dump the hash as CSV
foreach $identity ( keys %employees ){
print "$identity,$employees{$identity}{forename},$employees{$identity}{surname}\n";
}

What decides the order of keys when I print a Perl hash?

activePerl 5.8 based
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
# declare a new hash
my %some_hash;
%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello",
"wilma", 1.72e30, "betty", "bye\n");
my #any_array;
#any_array = %some_hash;
print %some_hash;
print "\n";
print #any_array;
print "\n";
print $any_array[0];
print "\n";
print $any_array[1];
print "\n";
print $any_array[2];
print "\n";
print $any_array[3];
print "\n";
print $any_array[4];
print "\n";
print $any_array[5];
print "\n";
print $any_array[6];
print "\n";
print $any_array[7];
print "\n";
print $any_array[8];
print "\n";
print $any_array[9];
Output as this
D:\learning\perl>test.pl
bettybye
bar12.4wilma1.72e+030foo352.5hello
bettybye
bar12.4wilma1.72e+030foo352.5hello
betty
bye
bar
12.4
wilma
1.72e+030
foo
35
2.5
hello
D:\learning\perl>
What decided the elements print order in my sample code?
Any rule to follow when print a mixed(strings, numbers) hash in Perl? Thank you.
bar12.4wilma1.72e+030foo352.5hello
[Updated]
With you guys help, i updated the code as below.
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
# declare a new hash
my %some_hash;
%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello",
"wilma", 1.72e30, "betty", "bye");
my #any_array;
#any_array = %some_hash;
print %some_hash;
print "\n";
print "\n";
print #any_array;
print "\n";
print "\n";
my #keys;
#keys = keys %some_hash;
for my $k (sort #keys)
{
print $k, $some_hash{$k};
}
output
D:\learning\perl>test.pl
bettybyebar12.4wilma1.72e+030foo352.5hello
bettybyebar12.4wilma1.72e+030foo352.5hello
2.5hellobar12.4bettybyefoo35wilma1.72e+030
D:\learning\perl>
Finially, after called keys and sort functions. The hash keys print followed the rule below
2.5hellobar12.4bettybyefoo35wilma1.72e+030
Elements of a hash are printed out in their internal order, which can not be relied upon and will change as elements are added and removed. If you need all of the elements of a hash in some sort of order, sort the keys, and use that list to index the hash.
If you are looking for a structure that holds its elements in order, either use an array, or use one of the ordered hash's on CPAN.
the only ordering you can rely upon from a list context hash expansion is that key => value pairs will be together.
From perldoc -f keys:
The keys of a hash are returned in an apparently random order. The actual random order is subject to change in future versions of Perl, but it is guaranteed to be the same order as either the values or each function produces (given that the hash has not been modified). Since Perl 5.8.1 the ordering is different even between different runs of Perl for security reasons (see Algorithmic Complexity Attacks in perlsec).
...
Perl has never guaranteed any ordering of the hash keys, and the ordering has already changed several times during the lifetime of Perl 5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order.
Also note that while the order of the hash elements might be randomised, this "pseudoordering" should not be used for applications like shuffling a list randomly (use List::Util::shuffle() for that, see List::Util, a standard core module since Perl 5.8.0; or the CPAN module Algorithm::Numerical::Shuffle), or for generating permutations (use e.g. the CPAN modules Algorithm::Permute or Algorithm::FastPermute), or for any cryptographic applications.
Note: since you are evaluating a hash in list context, you are at least guaranteed that each key is followed by its corresponding value; e.g. you will never see an output of a 4 b 3 c 2 d 1.
I went over your code and made some notes that I think you will find helpful.
use strict;
use warnings;
# declare a new hash and initialize it at the same time
my %some_hash = (
foo => 35, # use the fat-comma or '=>' operator, it quotes the left side
bar => 12.4,
2.5 => "hello",
wilma => 1.72e30,
betty => "bye", # perl ignores trailing commas,
# the final comma makes adding items to the end of the list less bug prone.
);
my #any_array = %some_hash; # Hash is expanded into a list of key/value pairs.
print "$_ => $some_hash{$_}\n"
for keys %some_hash;
print "\n\n", # You can print multiple newlines in one string.
"#any_array\n\n"; # print takes a list of things to print.
# In print #foo; #foo is expanded into a list of items to print.
# There is no separator between the members of #foo in the output.
# However print "#foo"; interpolates #foo into a string.
# It inserts spaces between the members of the arrays.
# This is the block form of 'for'
for my $k (sort keys %some_hash)
{
# Interpolating the variables into a string makes it easier to read the output.
print "$k => $some_hash{$k}\n";
}
Hashes provide unordered, access to data by a string key.
Arrays provide access to ordered data. Random access is available by using a numerical index.
If you need to preserve the order of a group of values, use an array. If you need to look up members of the group by an associated name, use a hash.
If you need to do both, you can use both structures together:
# Keep an array of sorted hash keys.
my #sorted_items = qw( first second third fourth );
# Store the actual data in the hash.
my %item;
#item{ #sorted_items } = 1..4; # This is called a hash slice.
# It allows you to access a list of hash elements.
# This can be a very powerful way to work with hashes.
# random access
print "third => $item{third}\n";
# When you need to access the data in order, iterate over
# the array of sorted hash keys. Use the keys to access the
# data in the hash.
# ordered access
for my $name ( #sorted_items ) {
print "$name => $item{$name}\n";
}
Looking at your code samples, I see a couple of things you might want to work on.
how looping structures like for and while can be used to reduce repeated code.
how to use variable interpolation
BTW, I am glad to see you working on basics and improving your code quality. This investment of time will pay off. Keep up the good work.
The elements are (almost certainly) printed out in the order they appear (internally) in the hash table itself -- i.e. based on the hash values of their keys.
The general rule to follow is to use something other than a hash table if you care much about the order.
Hashes are not (necessarily) retrieved in a sorted manner. If you want them sorted, you have to do it yourself:
use strict;
use warnings;
my %hash = ("a" => 1, "b" => 2, "c" => 3, "d" => 4);
for my $i (sort keys %hash) {
print "$i -> $hash{$i}\n";
}
You retrieve all the keys from a hash by using keys and you then sort them using sort. Yeah, I know, that crazy Larry Wall guy, who would've ever thought of calling them that? :-)
This outputs:
a -> 1
b -> 2
c -> 3
d -> 4
For most practical purposes, the order in which a hash table (not just Perl hash variables, but hash tables in general) can be considered random.
In reality, depending on the hashing implementation, the order may actually be deterministic. (i.e., If you run the program multiple times putting the same items into the hash table in the same order each time, they'll be stored in the same order each time.) I know that Perl hashes used to have this characteristic, but I'm not sure about current versions. In any case, hash key order is not a reliable source of randomness to use in cases where randomness is desirable.
Short version, then:
Don't use a hash if you care about the order (or lack of order). If you want a fixed order, it will be effectively random and if you want a random order, it will be effectively fixed.
A hash defines no ordering properties. The order in which things come out will be unpredictable.
And if you are crazy and have no duplicate values in your hash, and you need the values sorted, you can call reverse on it.
my %hash = ("a" => 1, "b" => 2, "c" => 3, "d" => 4);
my %reverse_hash = reverse %hash;
print $_ for sort keys %reverse_hash;
Caveat is the unique values part, duplicates will be overwritten and only one value will get in.

Perl Array Question

Never done much programming -- been charged at work with manipulating the data from comment cards. Using perl so far I've got the database to correctly put its daily comments into an array. Comments are each one LINE of text within the database, so I just split the array on the line-break.
my #comments = split("\n", $c_data);
And yes, this being my first time programming, that took me wayyy too long to figure out.
At this point I now need to organize these array elements (is that what I should call them?) into their own separate scalars based on capitalized words (this is a behavior of the database, which was at one point corrupt).
Example of what two elements of the array look like:
print "$comments[0]\n";
This dining experience was GOOD blah blah blah.
or
print "$comments[1]\n";
Overall this was a BAD time and me and my blah blah.
These "good" or "bad" or "best" are already capitalized by the database the data came from.
What's the easiest way in Perl to get these lines into scalars from an array based on these capitalized words?
If I understand you correctly, you want to merge array elements that match a certain word. You can do it like this:
my #bad_comments = grep { /\bBAD\b/ } #comments;
my #good_comments = grep { /\bGOOD\b/ } #comments;
That way all 'good' and 'bad' comments go to each own array.
Now if you need to merge them into a scalar you'd want to join them (opposite of split):
my $bad_comments = join "\n", grep { /\bBAD\b/ } #comments;
my $good_comments = join "\n", grep { /\bGOOD\b/ } #comments;
Think hash table when you want to group data by arbitrary string keys. In this case, you have an array of GOOD comments and an array of BAD comments. What if you had an array of SO-SO comments? A strategy based on having array variables #good, #bad, #soso breaks down fast.
You have some ways to go before you can fully understand the code below:
#!/usr/bin/perl
use strict; use warnings;
use Regex::PreSuf;
my %comments;
my #types = qw( GOOD BAD ); # DRY
my $types_re = presuf #types;
while ( my $comment = <DATA> ) {
chomp $comment;
last unless $comment =~ /\S/;
# capturing match in list context returns captured strings
my ($type) = ( $comment =~ /($types_re)/ );
push #{ $comments{$type} }, $comment;
}
for my $type ( #types ) {
print "$type comments:\n";
for my $comment ( #{ $comments{$type} } ) {
print $comment, "\n";
}
}
__DATA__
This dining experience was GOOD blah blah blah.
Overall this was a BAD time and me and my blah blah.
You could use regular expressons, eg:
if ($comments[$i] =~ /GOOD/) {
# good comment
}
or more generally
if ($comments[$i] =~ /\b([A-Z]{2,})\b/) {
print "Comment: $1\n";
}
Here, \b means word boundary, () are used to extract captured text, [A-Z] represent a group of capital characters - capital letters, {2,} means that there have to be 2 or more characters defined by previous class.
I would store all your comments into a hash-of-arrays data structure, with the key being your capitalized word.
Here is a general solution to grab any capitalized word (assuming only one per comment), not just GOOD and BAD:
use strict;
use warnings;
my #comments = <DATA>;
chomp #comments;
my %data;
for (#comments) {
my $cap;
for (split) {
$cap = $_ if /^[A-Z]+$/;
}
if ($cap) { push #{ $data{$cap} }, $_ }
}
use Data::Dumper; print Dumper(\%data);
__DATA__
This is GOOD stuff
Here's some BAD stuff.
More of the GOOD junk.
Nothing here.
Here is the output:
$VAR1 = {
'BAD' => [
'Here\'s some BAD stuff.'
],
'GOOD' => [
'This is GOOD stuff',
'More of the GOOD junk.'
]
};
In my opinion, your best bet would be to create a disk-based database of some sort (SQLite?) that stores the comments and type as separate data.
Then use one of the other posted solutions to import your existing data into it.
The only problem here is that you need to learn Perl's DBI layer and a bit of SQL to use SQLite with Perl.
Not sure what you mean by "organize" and "based on".
If you mean produce a list of any capitalized words, each with a list of the lines containing that word (similar to toolic's solution, you could do this:
my %CAPS = ();
map {
my ($word) = /(\b[A-Z]+\b)/;
push( #{ $CAPS{$word} }, $_)
} #comments;
This will build a mapping of WORDS to things, and the things in this case are going to be lists of lines.
And you can refer to these lists as $CAPS{'GOOD'} or $CAPS{'BAD'}, or $CAPS{whatever}.