Counting through a hash - PERL - perl

I have a db of places people have ordered items from. I parsed the list to get the city and state so it prints like this - city, state (New York, NY) etc....
I use the variables $city and $state but I want to count how many times each city and state occur so it looks like this - city, state, count (Seattle, WA 8)
I have all of it working except the count .. I am using a hash but I can't figure out what is wrong with this hash:
if ($varc==3) {
$line =~ /(?:\>)(\w+.*)(?:\<)/;
$city = $1;
}
if ($vars==5) {
$line =~ /(?:\>)((\w+.*))(?:\<)/;
$state = $1;
# foreach $count (keys %counts){
# $counts = {$city, $state} {$count}++;
# print $counts;
# }
print "$city, $state\n";
}
foreach $count (keys %counts){
$counts = {$city, $state} {$count}++;
print $counts;
}

Instead of printing city and state you can build a "location" string with both items and use the following counting code:
# Declare this variable before starting to parse the locations.
my %counts = ();
# Inside of the loop that parses the city and state, let's assume
# that you've got $city and $state already...
my $location = "$city, $state";
$counts{$location} += 1;
}
# When you've processed all locations then the counts will be correct.
foreach $location (keys %counts) {
print "OK: $location => $counts{$location}\n";
}
# OK: New York, NY => 5
# OK: Albuquerque, NM => 1
# OK: Los Angeles, CA => 2

This is going to be a mix of an answer and a code review. I will start with a warning though.
You are trying to parse what looks like XML with Regular Expressions. While this can be done, it should probably not be done. Use an existing parser instead.
How do I know? Stuff that is between angle brackets looks like the format is XML, unless you have a very weird CSV file.
# V V
$line =~ /(?:\>)(\w+.*)(?:\<)/;
Also note that you don't need to escape < and >, they have no special meaning in regex.
Now to your code.
First, make sure you always use strict and use warnings, so you are aware of stuff that goes wrong. I can tell you're not because the $count in your loop has no my.
What's $vars (with an s), and what's $varc (with a c). I am guessing that has to do with the state and the city. Is it the column number? In an XML file? Huh.
$line =~ /(?:\>)((\w+.*))(?:\<)/;
Why are there two capture groups, both capturing the same thing?
Anyway, you want to count how often each combination of state and city occurs.
foreach $count (keys %counts){
$counts = {$city, $state} {$count}++;
print $counts;
}
Have you run this code? Even without strict, it gives a syntax error. I'm not even sure what it's supposed to do, so I can't tell you how to fix it.
To implement counting, you need a hash. You got that part right. But you need to declare that hash variable outside of your file reading loop. Then you need to create a key for your city and state combination in the hash, and increment it every time that combination is seen.
my %counts; # declare outside the loop
while ( my $line = <$fh> ) {
chomp $line;
if ( $varc == 3 ) {
$line =~ /(?:\>)(\w+.*)(?:\<)/;
$city = $1;
}
if ( $vars == 5 ) {
$line =~ /(?:\>)((\w+.*))(?:\<)/;
$state = $1;
print "$city, $state\n";
$count{"$city, $state"}++; # increment when seen
}
}
You have to parse the whole file before you can know how often each combination is in the file. So if you want to print those together, you will have to move the printing outside of the loop that reads the file, and iterate the %count hash by keys at a later point.
my %counts; # declare outside the loop
while ( my $line = <$fh> ) {
chomp $line;
if ( $varc == 3 ) {
$line =~ /(?:\>)(\w+.*)(?:\<)/;
$city = $1;
}
if ( $vars == 5 ) {
$line =~ /(?:\>)((\w+.*))(?:\<)/;
$state = $1;
$count{"$city, $state"}++; # increment when seen
}
}
# iterate again to print final counts
foreach my $item ( sort keys %counts ) {
print "$item $counts{$item}\n";
}

Related

how to count number of occurances in hash values

I have a below fix file and I want to find out how many orders are sent at same time. I am using tag 52 as the sending time.
Below is the file,
8=FIX.4.2|9=115|35=A|52=20080624-12:43:38.021|10=186|
8=FIX.4.2|52=20080624-12:43:38.066|10=111|
8=FIX.4.2|9=105|35=1|22=BOO|52=20080624-12:43:39.066|10=028|
If I want to count number how many same occurances of Tag 52 values were sent? How can I check?
So far, I have written below code but not giving me the frequency.
#!/usr/bin/perl
$f = '2.txt';
open (F,"<$f") or die "Can not open\n";
while (<F>)
{
chomp $_;
#data = split (/\|/,$_);
foreach $data (#data)
{
if ( $data == 52){
#data1 = split ( /=/,$data);
for my $j (#data1)
{
$hash{$j}++;
} for my $j (keys %hash)
{
print "$j: ", $hash{j}, "\n";
}
}
}
}
Here is your code corrected:
#!/usr/bin/perl
$f = '2.txt';
open (F,"<$f") or die "Can not open\n";
my %hash;
while (<F>) {
chomp $_;
#data = split (/\|/,$_);
foreach $data (#data) {
if ($data ~= /^52=(.*)/) {
$hash{$1}++;
}
}
}
for my $j (keys %hash) {
print "$j: ", $hash{j}, "\n";
}
Explanation:
if ( $data == 52) compares the whole field against value 52, not a substring of the field. Of course, you do not have such fields, and the test always fails. I replaces it with a regexp comparison.
The same regexp gives an opportunity to catch a timestamp immediately, without a need to split the field once more. It is done by (.*) in the regexp and $1 in the following assignment.
It is hardly makes sense to output the hash for every line of input data (your code outputs it within the foreach loop). I moved it down. But maybe, outputting the current hash for every line is what you wanted, I do not know.

Amend perl script so that words are matched on a word for word basis

I have been using this perl script (thanks to Jeff Schaller) to match 3 or more words in the title fields of two separate csv files.
Original question here:
https://unix.stackexchange.com/questions/283942/matching-3-or-more-words-from-fields-in-separate-csv-files?noredirect=1#comment494461_283942
I have also added some exception functionality following advice from meuh:
#!/bin/perl
my #csv2 = ();
open CSV2, "<csv2" or die;
#csv2=<CSV2>;
close CSV2;
my %csv2hash = ();
for (#csv2) {
chomp;
my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title
$csv2hash{$_} = $title;
}
open CSV1, "<csv1" or die;
while (<CSV1>) {
chomp;
my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title
my #titlewords = split /\s+/, $title; #/ get words
my #new; #add exception words which shouldn't be matched
foreach my $t (#titlewords){
push(#new, $t) if $t !~ /^(and|if|where)$/i;
}
#titlewords = #new;
my $desired = 3;
my $matched = 0;
foreach my $csv2 (keys %csv2hash) {
my $count = 0;
my $value = $csv2hash{$csv2};
foreach my $word (#titlewords) {
++$count if $value =~ /\b$word\b/i;
last if $count >= $desired;
}
if ($count >= $desired) {
print "$csv2\n";
++$matched;
}
}
print "$_\n" if $matched;
}
close CSV1;
During my testing, one issue I've found that I would like to tweak is that if csv2 contains a single common word such as the, if this is replicated in csv1 three or more times then three positive matches is found. To clarify:
If csv1 contains:
1216454,the important people feel the same way as the others, 15445454, 45445645
^ i.e. there are three insatnces of the in the above line
If csv2 contains:
14564564,the tallest man on earth,546456,47878787
^ i.e. there is one instance of the in this line
Then I would like only one word to be classed as matching, and there be no output (based on my desired number of matching words- 3 ) because there is only one instance of the matching word in one of the files.
However if:
csv1 contained:
1216454,the important people feel the same way as the others,15445454, 45445645
and csv2 contained:
15456456,the only way the man can sing the blues,444545,454545
Then, as there are three matching words in each (i.e. 3 instances of the word the in each title, then I would like this to be classed as a matching title based on my desired number of matching words being 3 or more, thus generating the output:
1216454,the important people feel the same way as the others,15445454, 45445645
15456456,the only way the man can sing the blues,444545,454545
I would like to amend the script so that if there is one instance of a word in a csv, and multiple instances of the same word in the other csv then that is classed as only one match. However, if there were say 3 instance of the word the in both files, then it should still be classed as three matches. Basically I would like matches to be on a word for word basis.
Everything about the script other than this is perfect so I would rather not go back to the drawing board completely as I am happy with everything other than this.
I hope I've explained it ok, if anyone need any clarification let me know.
If you just wan to count unique matches, you can use a hash instead of a list to collect the words from csv1, just like you do for csv2, and then also count the occurrences of each word separately:
#!/usr/bin/env perl
my #csv2 = ();
open CSV2, "<csv2" or die;
#csv2=<CSV2>;
close CSV2;
my %csv2hash = ();
for (#csv2) {
chomp;
my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title
$csv2hash{$_} = $title;
}
open CSV1, "<csv1" or die;
while (<CSV1>) {
chomp;
my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title
my %words;
$words{$_}++ for split /\s+/, $title; #/ get words
## Collect unique words
my #titlewords = keys(%words);
my #new; #add exception words which shouldn't be matched
foreach my $t (#titlewords){
push(#new, $t) if $t !~ /^(and|if|where)$/i;
}
#titlewords = #new;
my $desired = 3;
my $matched = 0;
foreach my $csv2 (keys %csv2hash) {
my $count = 0;
my $value = $csv2hash{$csv2};
foreach my $word (#titlewords) {
my #matches = ( $value=~/\b$word\b/ig );
my $numIncsv2 = scalar(#matches);
#matches = ( $title=~/\b$word\b/ig );
my $numIncsv1 = scalar(#matches);
++$count if $value =~ /\b$word\b/i;
if ($count >= $desired || ($numIncsv1 >= $desired && $numIncsv2 >= $desired)) {
$count = $desired+1;
last;
}
}
if ($count >= $desired) {
print "$csv2\n";
++$matched;
}
}
print "$_\n" if $matched;
}
close CSV1;

Read an file in two hashes inorder to retain the order

I am trying to read a file with user information categorized under a location, I want to fill in the some of the fields using user input and output the file while keeping the fields under each location intact for eg - file
[California]
$;FIrst_Name =
$;Last_Name=
$;Age =
[NewYork]
$;FIrst_Name =
$;Last_Name=
$;Age =
[Washington]
$;FIrst_Name =
$;Last_Name=
$;Age =
Once user provides input from command line it should look it
[California]
$;FIrst_Name = Jack
$;Last_Name= Daner
$;Age = 27
[NewYork]
$;FIrst_Name = Jill
$;Last_Name= XYZ
$;Age = 30
[Washington]
$;FIrst_Name = Kim
$;Last_Name= ABC
$;Age = 25
The order of First_Name, Last_Name and Age within each location can change and even order of locations can change, but each location section should remain separate and intact. I wrote following code so far and some of my code works for taking whole file in one hash, but i am not able to preserve each location section within it! I tried using two hashes - can someone please help me as it is getting really complex for me! Thanks a lot. ( I had another issue with a similar file as well, but unfortunately could not resolve it either)
EDITED code
Open the file
use strict;
use warnings;
use Getopt::Long;
sub read_config {
my $phCmdLineOption = shift;
my $phConfig = shift;
my $sInputfile = shift;
open($input.file, "<$InputFile") or die "Error! Cannot open $InputFile
+ for reading: $!";
while (<$input.file>) {
$_ =~ s/\s+$//;
next if ($_ =~ /^#/);
next if ($_ =~ /^$/);
if ($_ =~ m/^\[(\S+)\]$/) {
$sComponent = $1;
next;
}
elsif ($_ =~ m/^;;\s*(.*)/) {
$sDesc .= "$1.";
next;
}
elsif ($_ =~ m/\$;(\S+)\$;\s*=\s*(.*)/) {
$sParam = $1;
$sValue = $2;
if ((defined $sValue) && ($sValue !~ m/^\s*$/)) {
$phfield->{$sCategory}{$sParam} = ["$sValue", "$sDesc"];
}
else {
$field->{$sCategory}{$sParam} = [undef, "$sDesc"];
}
}
$sParam = $sValue = $sDesc = "";
next;
}
}
Write the new file -
sub write_config {
my $phCmdLineOption = shift;
my $phConfig = shift;
my $sOut = shift;
open(outfile, ">$sOut") or die " $!";
foreach $sCategory (sort {$a cmp $b} keys %{$fields}) {
print $outfile "[$sCategory]\n";
foreach $sParam (sort {$a cmp $b} keys %{$fields-{$sCategory}}) {
$sDesc = ((defined $phConfig->{$sCategory}{$sParam}[1]) $fields->{$sCategory}{$sParam}[1] : "");
print $outfile ";;$sDesc\n" if ((defined $sDesc) && ($sDesc !~ m/^$/));
$sValue = ((defined $fields->{$sCategory}{$sParam}[0]) ? $fields->{$sCategory}{$sParam}[0] : undef);
print $outfile "$sValue" if (defined $sValue);
print $outfile "\n";
}
print $outfile "\n";
}
close($outfile);
return;
Note - I have posted this question on PerlMonks forum as well. Thanks a lot!
I think you're getting lost in the detail and skipping over some basics which is unnecessarily complicating the problem. Those basics are;
Indent your code properly (it's amazing the difference this makes)
Always use the /x modifier on regex and lots of whitespace to increase readability
When using lots of regexs, use "quote rule", qr, to seperate regex definition from regex use
Apart from that, you were headed in the right direction but there are a couple of insights on the algorithm you were missing which further increased the complexity.
Firstly, for small-time parsing of data, look out for the possibility that matching one type of line immediately disqualifies matching of other types of line. All the elsif's aren't necessary since a line that matches a category is never going to match a LastName or Age and vice versa.
Secondly, when you get a match, see if you can do what's needed immediately rather than storing the result of the match for processing later. In this case, instead of saving a "component" or "category" in a variable, put it immediately into the hash you're building.
Thirdly, if you're updating text files that are not huge, consider working on a new version of the file and then at the end of the program declare the current version old, and the new version current. This reduces the chances of unintentionally modifying something in place and allows comparison of the update with the original after execution - if necessary, "rollback" of the change in trivially easy which one of your users may be very grateful for one day.
Fourthly and most of all, you've only got a couple of attributes or components to worry about, so deal with them in the concrete rather than the abstract. You can see below that I've looped over qw( First_Name Last_Name Age) rather than all keys of the hash. Now obviously, if you have to deal with open-ended or unknown attributes you can't do it this way but in this case, AFAICT, your fields are fixed.
Here's a version that basically works given the above mentioned constraints.
#!/usr/bin/env perl
use v5.12 ;
use Getopt::Long ;
my %db ; # DB hash
my $dbf = "data.txt" ; # DB file name
my $dbu = "data.new" ; # updated DB file name
my $dbo = "data.old" ; # Old DB file name
my ($cat, $first, $last, $age) ; # Default is undef
GetOptions( 'cat=s' => \$cat ,
'first=s' => \$first ,
'last=s' => \$last ,
'age=i' => \$age
);
die "Category option (--cat=...) is compolsory\n" unless $cat ;
open my $dbh, '<', $dbf or die "$dbf: $!\n"; # DB Handle
open my $uph, '>', $dbu or die "$dbu: $!\n"; # UPdate Handle
# REs for blank line, category header and attribute specification
my $blank_re = qr/ ^ \s* $ /x ;
my $cat_re = qr/ ^ \[ (\w+) \] \s* $ /x ;
my $attr_re = qr/ ^ \$ ; (?<key>\w+) \s* = \s* (?<val>\N*) $ /x ;
while ( <$dbh> ) {
next unless /$cat_re/ ;
my %obj = ( cat => $1 ) ;
while ( <$dbh> ) {
$obj{ $+{key} } = $+{val} if /$attr_re/ ;
last if /$blank_re/
}
$db{ $obj{cat} } = \%obj
}
# Grab existing obj, otherwise presume we're adding a new one
my $obref = $db{ $cat } // { cat => $cat } ;
$obref->{ First_Name } = $first if defined $first ;
$obref->{ Last_Name } = $last if defined $last ;
$obref->{ Age } = $age if defined $age ;
# Update the DB with the modified/new one
$db{ $obref->{cat} } = $obref ;
for (sort keys %db) {
my $obref = $db{ $_ } ;
printf $uph "[%s]\n", $obref->{ cat } ;
for (qw( First_Name Last_Name Age )) {
printf $uph '$;' . "%s = %s\n", $_, $obref->{ $_ }
}
print $uph "\n"
}
close $dbh ;
close $dbu ;
rename $dbf , $dbo ;
rename $dbu , $dbf ;
exit 0
User input here need be organized, and for this we can use named options for each field, plus one for state. The Getopt option for reading into a hash is useful here. We also need to associate names of these options with field names. With that in hand it is simple to process the file since we have a ready mechanism to identify lines of interest.
By putting lines on a ref-array we can keep the order as well, and that refarray is a value for the section-key in the hash. The hash is not necessary but adds flexibility for future development. Once we are at it we can also keep the order of sections by using a simple auxiliary array.
use warnings;
use strict;
use Getopt::Long;
use feature qw(say);
# Translate between user input and field name ($;) in file
my ($o1, $o2, $o3) = qw(first last age);
my #tags = ('FIrst_Name', 'Last_Name', 'Age');
my %desc = ($tags[0] => $o1, $tags[1] => $o2, $tags[2] => $o3);
my (%input, $state);
GetOptions(\%input, "$o1=s", "$o2=s", "$o3=i", 'state=s' => \$state);
my $locinfo = 'loc_info.txt';
open my $in_fh, '<', $locinfo;
my (%conf, #sec_order, $section, $field);
while (my $line = <$in_fh>)
{
chomp($line);
next if $line =~ m/^\s*$/;
# New section ([]), for hash and order-array
if ($line =~ m/^\s*\[(.*)\]/) {
push #sec_order, $section = $1;
next;
}
# If we are in a wrong state just copy the line
if ($section ne $state) {
push #{$conf{$section}}, $line . "\n";
next;
}
if (($field) = $line =~ m/^\$;\s*(.*?)\s*=/ ) {
if (exists $input{$desc{$field}}) {
# Overwrite what is there or append
$line =~ s|^\s*(.*?=\s*)(.*)|$1 $input{$desc{$field}}|;
}
}
else { warn "Unexpected line: |$line| --" }
push #{$conf{$section}}, $line . "\n";
}
close $in_fh;
for (#sec_order) { say "[$_]"; say #{$conf{$_}}; }
Invocation
script.pl -state STATE -first FIRST_NAME -last LAST_NAME -age INT
Any option may be left out in which case that field is not touched. A field supplied on the command line will be overwritten if it has something. (This can be changed easily.) This works for a single-state entry as it stands but which is simple to modify if needed.
This is a basic solution. The first next thing would be to read the field names from the file itself, instead of having them hard-coded. (This would avoid the need to spot the typo FIrst and inconsistent spacings before =, for one thing.) But the more refinements are added, the more one is getting into template development. At some point soon it will be a good idea to use a module.
Note The regex delimiter above is different than elsewhere (|) to avoid the editor coloring all red.

Using a regular expression with nested for loops, using Perl

I have two arrays:
#file_list holds a list of files in a directory, and
#name_list holds some names.
For example, these arrays could contain
#file_list = ('Bob_car', 'Bob_house', 'Bob_work', 'Fred_car', 'Fred_house', 'Fred_work', ...);
#name_list = ('Bob', 'Fred', ...);
(the real data is not that simple).
My goal is to compare each file with every name and see if they match. They match if the file string starts with the name.
I could then use these matches to sort the files into new directories, based on their corresponding name.
Here is my code:
for ( my $i = 0; $i < scalar #file_list ; $i++ )
{
for ( my $j = 0; $j < #name_list ; $j++ )
{
if ( $file_list[ $i ] =~ m/^$name_list[ $j ]/ )
{
print "$file_list[ $i ] goes with $name_list[ $j ]\n";
}
else
{
print "no match\n";
}
}
}
However, I don't get any matches. I've tested the individual loops and they are working. Else, is there something off about the regex?
About how the arrays were made:
For #name_list, the file containing the names is organized in a seemingly random way, just because of how it was used for something else. The names in that file are on several different lines, with lots of blank lines in between and lots of blank entries within lines. Names can appear more than once.
I used the following code to make #name_list:
while (my $line = <$OriginalFILE>)
{
chomp $line;
my #current_line = split( "\t", $line );
for ( my $i = 0; $i < scalar #current_line ; $i ++ )
{
if ( $current_line[ $i ] =~ m/^\s*$/ )
{
# print "$current_line[$i] is blank\n";
}
else
{
push( #raw_name_list, $current_line[ $i ] );
}
} # end of for
} # while
# collect list without repeat instances of the same name
my %unique = ();
foreach my $name (#raw_name_list)
{
$unique{$name} ++;
}
my #name_list = keys %unique;
foreach my $name ( #name_list )
{
# print "$name\n";
chomp $name;
unless(mkdir $name, 0700)
{
die "Unable to create directory called $name\n";
}
}
The array #file_list was made using:
opendir(DIR, $ARGV[1]);
my #file_list = grep ! /^\./, readdir DIR;
closedir(DIR);
# print #file_list;
#amon, here is what i did to test the loops and regex:
FILE: for my $file (#transposed_files) {
print "$file\n";
for my $name (#transposedunique) {
print "i see this $name\n";
if ($file =~ /^\Q$name\E/) {
print "$file goes with $name\n";
next FILE;
}
}
#print "no match for $file\n";
}
oh, and I transposed the arrays, so that they would print to an outfile into separate rows.
Short version: You are building your name array wrong. Look at this line:
$unique{name} ++;
You are just incrementing the name entry of the hash. You probably wanted the $name variable.
The Longer Version
On English, and Foreach-Loops
Your code is a bit unperlish and looks more like C than like Perl. Perl is much closer to English than you might think. From the original wording of your question:
take the first element from #file_list and then to compare that to each element in #name_list
You wrote this as
for (my $i = 0; $i < #file_list; $i++) {
for (my $j = 0; $j < #name_list; $j++) {
...; # compare $file_list[$i] with $name_list[$j]
}
}
I'd rather do
for my $file (#file_list) {
for my $name (#name_list) {
...; # compare $file with $name
}
}
and save myself from the hassle of array subscripting.
Building Correct Regexes
Your code contains the following test:
$file_list[ $i ] =~ m/^$name_list[ $j ]/
This will not do what you think if $name_list[$j] contains special characters like (, ., +. You can match the literal contents of a variable by enclosing it in \Q ... \E. This would make the code
$file =~ /^\Q$name\E/
(if used with my variant of the loop).
You could also go the nifty route and compare the leading substring directly:
$name eq substr $file, 0, length($name)
This expresses the same condition.
On Loop Control
I will make two assumptions:
You are only interested in the first matching name for any file
You only want to print the no match message if no name was found
Perl allows us to break out of arbitrary loops, or restart the current iteration, or go directly to the next iteration, without using flags, as you would do in other languages. All we have to do is to label our loops like LABEL: for (...).
So once we have a match, we can start our search for the next file. Also, we only want to print no match if we left the inner loop without going to the next file. This code does it:
FILE: for my $file (#file_list) {
for my $name (#name_list) {
if ($file =~ /^\Q$name\E/) {
print "$file goes with $name\n";
next FILE;
}
}
print "no match for $file\n";
}
The Zen of Negation
In your file parsing code, you express a condition
if ($field =~ /^\s*$/) {
} else {
# do this stuff only if the field does not consist only of
# zero or more whitespace characters
}
That description is far to complex. How about
if ($field =~ /\S/) {
# do this stuff only if the field contains a non-whitespace character.
}
The same condition, but simpler, and more efficient.
Simplify your Parse
In short, your file parsing code can be condensed to
my %uniq;
while (<$OriginalFILE>) {
chomp;
$uniq{$_} = undef for grep /\S/, split /\t/;
}
my #name_list = sort { length($b) <=> length($a) } keys %uniq;
The split function takes a regex as first argument, and will split on $_ if no other string is specified. It returns a list of fields.
The grep function takes a condition and a list, and will return all elements of a list that match the condition. The current element is in $_, which regexes match by default. For explanation of the regex, see above.
Note: This still allows for the fields to contain whitespace, even in leading position. To split on all whitespace, you can give split the special argument of a string containing a single space: split ' '. This would make the grep unneccessary.
The for loop can also be used as a statement modifier, i.e. like EXPR for LIST. The current element is in $_. We assign something to the $_ entry in our %uniq hash (which is already initialized to the empty hash). This could be a number, but undef works as well.
The keys are returned in a seemingly random order. But as multiple names could match a file, but we only want to select one match, we will have to match the most specific name first. Therefore, I sort the names after their length in descending order.
Your code seems to work for me. All I did was construct two arrays like this:
my #file_list = qw/Bob_car Bob_house Bob_work Fred_car Fred_house Fred_work/;
my #name_list = qw/Fred Bob Mary/;
Then running your code produces output like this:
no match
Bob_car goes with Bob
no match
no match
Bob_house goes with Bob
no match
no match
Bob_work goes with Bob
no match
Fred_car goes with Fred
no match
no match
Fred_house goes with Fred
no match
no match
Fred_work goes with Fred
no match
no match
So it looks like it's working.
A common problem with reading input from files or from a user is forgetting to strip the newline character from the end of the input. This could be your problem. If so, have a read about perldoc -f chomp, and just chomp each value as you add it to your array.
I'm always interested in doing things in efficient way so every time I see O(N^2) algorithm rings bells for me. Why it should be O(N*M) and not O(N+M)?
my $re = join('|',map quotemeta, #name_list);
$re = qr/$re/;
for my $file (#file_list) {
if($file =~ /^($re)/) {
my $name = $1;
... do what you need
}
}
its look something wrong in loop.
follow comments in code
for ( my $i = 0; $i < scalar #file_list ; $i++ )
{
#use some string variable assign it ""
for ( my $j = 0; $j < #name_list ; $j++ )
{
if ( $file_list[ $i ] =~ m/^$name_list[ $j ]/ )
{
# assign string variable to founded name_list[$j]
break loop
}
}
# check condition if string not equal to "" match found print your requirement with string value else match not found
}

Extract several rows with with HTML::TableExtract

I have made one script which will extract all the Row data from HTML <TR> tags. I am having 30 HTML <TR> tags on my HTML page. Based on count, my code will fetch particular row data. Let's say if I need data present in 5th <tr>...</tr>, then my condition is if(count =5) {(go inside and get that data)}
But my problem here is I need the selected rows' data one at a time. Let's say I need data for rows 5, 6, and 14.
Could you please help me sort it out?
$te = new HTML::TableExtract(count => 0 );
$te->parse($content);
# Examine all matching tables
foreach $ts ($te->table_states) {
#print "Table (", join(',', $ts->coords), "):\n";
$cnt = 1;
foreach $row($ts->rows) {
# print " ---- Printing Row $cnt ----\n";
$PrintLine= join("\t", #$row);
#RowData=split(/\t/,$PrintLine);
$PrintLine =~ s/\r//ig;
$PrintLine =~ s/\t//ig;
$cnt = $cnt + 1;
# if ($PrintLine =~ /Site ID/ig || $PrintLine =~ /Site name/ig){print " Intrest $PrintLine $cnt =====================\n"};
if ( $cnt == 14) {
$arraycnt = 1;
my $SiteID="";
my $SiteName="";
foreach (#RowData) {
# print " Array element $arraycnt\n";
chomp;
$_ =~ s/\r//ig;
$_ =~ s/[\xC3\xA1\xC3\xA0\xC3\xA2\xC3\xA3]//ig;
if ($arraycnt== 17 ) { $SiteID= $_;}
if ($arraycnt== 39 ) { $SiteName= $_;}
$arraycnt = $arraycnt + 1;
}
#$PrintLineFinal = $BridgeCase."\t".$PrintLine;
$PrintLineFinal = $BridgeCase."\t".$SiteID."\t".$SiteName;
#print "$PrintLineFinal\n";
print MYFILE2 "$PrintLineFinal\n";
last;
}
}
}
A few suggestions:
Always:
use strict;
use warnings;
This will force you to declare your variables with my. e.g.
foreach my $ts ($te->table_states) {
my $cnt = 1;
(warnings will let you know about most silly mistakes. strict prevents mistakes by requiring you to use better practices in certain cases).
In several places, you are using your own counter variables as you go through the array. You don't need to do this. Instead, just get the array element you want directly. e.g. $array[3] to get the third element.
Perl also allows array slices to get just certain elements you want. #array[4,5,13] gets the fifth, sixth, and fourteenth elements of the array. You can use this to process only the rows you want, instead of looping through all of them:
my #rows = $ts->rows;
foreach my $row (#rows[4,5,13]) #process only the 5th, 6th, and 14th rows.
{
...
}
Here is a shortcut version of the same thing, using an anonymous array:
foreach my $row (#{[$ts->rows]}[4,5,13])
Also, perhaps you want to define the rows you want elsewhere in your code:
my #wanted_rows = (4,5,13);
...
foreach my $row (#{[$ts->rows]}[#wanted_rows])
This code is quite confused:
$PrintLine= join("\t", #$row);
#RowData=split(/\t/,$PrintLine);
$PrintLine =~ s/\r//ig;
$PrintLine =~ s/\t//ig;
First you are joining an array with tab characters, then you are splitting the array you just joined to get the array back again. Then you remove all tab characters from the line anyway.
I suggest you get rid of all that code. Just use #$row whenever you need the array, instead of making a copy of it. If you need to print the array for debugging (which is all you seem to be doing with $PrintLine, you can print an array directly:
print #$row; #print an array, nothing between each element.
print "#$row"; #print an array with spaces between each element.
With all of these changes, your code would be something like this:
use strict;
use warnings;
my #wanted_rows = (4,5,13);
my $te = new HTML::TableExtract(count => 0);
$te->parse($content);
# Examine all matching tables
foreach my $ts ($te->table_states) {
foreach my $row (#{[$ts->rows]}[#wanted_rows]) {
s/[\xC3\xA1\xC3\xA0\xC3\xA2\xC3\xA3\r\n]//ig for (#$row);
my $SiteID = $$row[16] // ''; #set to empty strings if not defined.
my $SiteName = $$row[38] // '';
print MYFILE2 $BridgeCase."\t".$SiteID."\t".$SiteName;
}
}
You could access the results like this:
foreach $ts ($te->table_states) {
#you need 14th rows
#my 14throws = $ts->rows->[13];#starting with zero!
#17th col from the 14th row
#my $17colfrom14throws = $ts->rows->[13]->[16];
my $SiteName = $ts->rows->[13]->[38];
my $SiteID = $ts->rows->[13]->[16];
my $PrintLineFinal = $BridgeCase."\t".$SiteID."\t".$SiteName;
print MYFILE2 "$PrintLineFinal\n";
}