Trouble recording unique regex output to array in perl - perl

The goal of the following code sample is to read the contents of $target and assign all unique regex search results to an array.
I have confirmed my regex statement works so I am simplifying that so as not to focus on it.
When I execute the script I get a list of all the regex results, however, the results are not unique which leads me to believe that my manipulation of the array or my if (grep{$_ eq $1} #array) { check is causing a problem(s).
#!/usr/bin/env perl
$target = "string to search";
$inc = 0;
$once = 1;
while ($target =~ m/(regex)/g) { #While a regex result is returned
if ($once) { #If $once is not equal to zero
#array[$inc] = $1; #Set the first regex result equal to #array[0]
$once = 0; #Set $once equal to zero so this is not executed more than once
} else {
if (grep{$_ eq $1 } #array ) { #From the second regex result, check to see if the result is already in the array
#If so, do nothing
} else {
#array[$inc] = $1; #If it is not, then assign the regex search result to the next unused position in the array in any position.
$inc++; #Increment to next unused array position.
}
}
}
print #array;
exit 0;

how about this:
while ($target =~ m/(regex)/g) {
$hash{$1}++;
}
print keys %hash;
Update:
# if the order matters
while ($target =~ m/(a.)/g) {
$hash{$1} = ++$i unless $hash{$1};
}
#array = sort {$hash{$a} <=> $hash{$b}} keys %hash;

Related

How can I test if a value is repeated in array?

I'm doing a subroutine that takes a list of numbers as an argument. What I would like to do is check if there is a repeated value in that list. In case there are repeated numbers, print a message and stop the program. In case there are no repeated numbers, continue with the execution.
For example:
if (there_is_number_repeated){
print "There is a number repeated";}
else{
run this code...}
I was trying to do this creating a hash with the values of that list, and then check if there are values > 1.
use strict;
use warnings;
sub name_subroutine{
my (#numbers)=#_;
my $n=scalar(#numbers);
my %table=();
foreach my $i(#numbers){
if (exists $tabla{$i}){
$tabla{$i}+=1;}
else{
$tabla{$i} = 1;
}
my #values = values %tabla;
}
}
It's here where I do not know to continue. Is there any way to do this in an amateurish way? I'm newbie in Perl.
Thanks!
I would just do:
my %uniq;
if ( grep ++$uniq{$_} > 1, #numbers ) {
# some numbers are repeated
}
In your existing code (with a couple corrections):
my %table=();
foreach my $i(#numbers){
if (exists $table{$i}){
$table{$i}+=1;}
else{
$table{$i} = 1;
}
}
my #values = values %table;
you don't need to check for exists; doing += 1 or ++ will set it to 1 if it didn't exist. And you don't want the values (those are just the counts of how many times each array value was found), you want the keys, specifically those for which the value is > 1:
my #repeated = grep $table{$_} > 1, keys %table;
my #arr = #_;
my $count = #arr;
for(my $i=0;$i<$count;$i++)
{
my $num = $arr[$i];
for(my $j=0; $j<$count,$j!=$i; $j++)
{
if($num == $arr[$j])
{
print "\n$num is repeated";
last;
}
}
}
Tried and tested. Cheers.

Perl Error: Argument isn't numeric in array or hash lookup

I was writing a simple program to match words to a regex pattern. But I keep receiving the error above. This is my code:
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
foreach my $index (#words) {
if ($words[$index] =~ m/ord/) {
print "match\n";
} else {print "no match\n";}
}
Error I received:
Argument "Ordinary" isn't numeric in array or hash lookup at test.pl line 6.
Argument "order" isn't numeric in array or hash lookup at test.pl line 6.
Argument "afford" isn't numeric in array or hash lookup at test.pl line 6.
Argument "cordford" isn't numeric in array or hash lookup at test.pl line 6.
Argument "'ORD airport'" isn't numeric in array or hash lookup at test.pl line 6.
no matchno matchno matchno matchno match
Can anyone explain to me what's causing the error and why?
This is the code that you show (improved a little)
my #words = ( 'Ordinary', 'order', 'afford', 'cordford', q{'ORD airport'} );
for my $index ( #words ) {
if ( $words[$index] =~ /ord/ ) {
print "match\n";
}
else {
print "no match\n";
}
}
}
This for loop will set $index to each value in the #words array. So, for instance, the first time the loop is executed $index will be set to Ordinary; the second time it will be set to order etc.
Naming it $index shows clearly that you expected it to contain all the indices for #words. You can do that, like this
for my $index ( 0 .. $#words ) { ... }
and your program will work fine if you make just that change. The output is
no match
match
match
match
no match
But you had the right idea from the start. Most often an array is just a list of values and the indices have no relevance. That applies to your case, and you can write
for my $word ( #words ) {
if ( $word =~ m/ord/ ) {
print "match\n";
}
else {
print "no match\n";
}
}
Or using Perl's default variable $_ it can be written
for ( #words ) {
if ( m/ord/ ) {
print "match\n";
}
else {
print "no match\n";
}
}
or even just
print /ord/ ? "match\n" : "no match\n" for #words;
Every example above is exactly equivalent and so produces identical output
The reason is your $index will produce the elements of an array not the index values.
It should be foreach my $index (0..$#words) now $index will produce the index of an array in every iteration.
use strict;
use warnings;
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
foreach my $index (0..$#words) {
if ($words[$index] =~ m/ord/) {
print "match\n";
}
else {print "no match\n";}
}
Or else. simply check the condition with $index.
use strict;
use warnings;
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
foreach my $index (#words) {
if ($index =~ m/ord/) {
print "match\n";
}
else {print "no match\n";}
}
This is array lookup
$words[$index];
If it was a hash it would be
$words{$index};
Arrays expect integer indexes but you're using strings that look nothing like integers.
If you are iterating over arrays in Perl you don't need the index..
#!/usr/bin/perl
use strict;
use warnings;
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
foreach my $word (#words) {
if($word =~ m/ord/) {
print "$word match\n";
} else {
print "$word no match\n";
}
}
Note. I've used foreach because you see it in more language you could also use for
You can also try something a little bit alternative, note this won't end but it's worth studying ie
#!/usr/bin/perl
use strict;
use warnings;
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
my $iterator = sub {
my $item = shift(#words);
push(#words, $item);
return $item;
};
while(my $item = $iterator->()) {
print("$item\n");
}
I do love Perl.

Force auto-increment to treat its argument as string

I need Perl's auto-increment magic for strings, but some strings (such as those composed entirely of digits) are interpreted as numbers and a normal increment is performed instead. How would I force Perl to treat a value passed to ++ as a string?
Here's the related question about how auto incrementing works: Autoincrementing letters in Perl
Like the docs explained, basically you need the variable to
match the regex /^[a-zA-Z]*[0-9]*\z/ and
only be used in string contexts.
Because you have variables that don't match the regex, those ones will be treated as numbers. You can write your own increment function to get your desired functionality. Here's an idea I had about how it could work to get you started.
#!/usr/bin/perl
use strict;
use warnings;
my $test = "1000";
for (0..100) {
$test = increment($test);
}
print $test . "\n";
$test = "M2V3";
for (0..100) {
$test = increment($test);
}
print $test . "\n";
sub increment {
my ($str) = #_;
my #letters = reverse split //, $str;
my $add = "";
my $increment = 1;
my $result = "";
for my $let (#letters) {
if ( $increment == 1 ) {
++$let;
}
if ( $let =~ /(.)(.)/ ) {
$add = $2;
$increment = 1;
} else {
$add = $let;
$increment = 0;
}
$result = $add . $result;
}
return $result;
}
This outputs:
1101
M3F4
I didn't calculate to confirm that M3F4 is the correct result but it seems close.

Using a regular expression with nested for loops, using Perl

I have two arrays:
#file_list holds a list of files in a directory, and
#name_list holds some names.
For example, these arrays could contain
#file_list = ('Bob_car', 'Bob_house', 'Bob_work', 'Fred_car', 'Fred_house', 'Fred_work', ...);
#name_list = ('Bob', 'Fred', ...);
(the real data is not that simple).
My goal is to compare each file with every name and see if they match. They match if the file string starts with the name.
I could then use these matches to sort the files into new directories, based on their corresponding name.
Here is my code:
for ( my $i = 0; $i < scalar #file_list ; $i++ )
{
for ( my $j = 0; $j < #name_list ; $j++ )
{
if ( $file_list[ $i ] =~ m/^$name_list[ $j ]/ )
{
print "$file_list[ $i ] goes with $name_list[ $j ]\n";
}
else
{
print "no match\n";
}
}
}
However, I don't get any matches. I've tested the individual loops and they are working. Else, is there something off about the regex?
About how the arrays were made:
For #name_list, the file containing the names is organized in a seemingly random way, just because of how it was used for something else. The names in that file are on several different lines, with lots of blank lines in between and lots of blank entries within lines. Names can appear more than once.
I used the following code to make #name_list:
while (my $line = <$OriginalFILE>)
{
chomp $line;
my #current_line = split( "\t", $line );
for ( my $i = 0; $i < scalar #current_line ; $i ++ )
{
if ( $current_line[ $i ] =~ m/^\s*$/ )
{
# print "$current_line[$i] is blank\n";
}
else
{
push( #raw_name_list, $current_line[ $i ] );
}
} # end of for
} # while
# collect list without repeat instances of the same name
my %unique = ();
foreach my $name (#raw_name_list)
{
$unique{$name} ++;
}
my #name_list = keys %unique;
foreach my $name ( #name_list )
{
# print "$name\n";
chomp $name;
unless(mkdir $name, 0700)
{
die "Unable to create directory called $name\n";
}
}
The array #file_list was made using:
opendir(DIR, $ARGV[1]);
my #file_list = grep ! /^\./, readdir DIR;
closedir(DIR);
# print #file_list;
#amon, here is what i did to test the loops and regex:
FILE: for my $file (#transposed_files) {
print "$file\n";
for my $name (#transposedunique) {
print "i see this $name\n";
if ($file =~ /^\Q$name\E/) {
print "$file goes with $name\n";
next FILE;
}
}
#print "no match for $file\n";
}
oh, and I transposed the arrays, so that they would print to an outfile into separate rows.
Short version: You are building your name array wrong. Look at this line:
$unique{name} ++;
You are just incrementing the name entry of the hash. You probably wanted the $name variable.
The Longer Version
On English, and Foreach-Loops
Your code is a bit unperlish and looks more like C than like Perl. Perl is much closer to English than you might think. From the original wording of your question:
take the first element from #file_list and then to compare that to each element in #name_list
You wrote this as
for (my $i = 0; $i < #file_list; $i++) {
for (my $j = 0; $j < #name_list; $j++) {
...; # compare $file_list[$i] with $name_list[$j]
}
}
I'd rather do
for my $file (#file_list) {
for my $name (#name_list) {
...; # compare $file with $name
}
}
and save myself from the hassle of array subscripting.
Building Correct Regexes
Your code contains the following test:
$file_list[ $i ] =~ m/^$name_list[ $j ]/
This will not do what you think if $name_list[$j] contains special characters like (, ., +. You can match the literal contents of a variable by enclosing it in \Q ... \E. This would make the code
$file =~ /^\Q$name\E/
(if used with my variant of the loop).
You could also go the nifty route and compare the leading substring directly:
$name eq substr $file, 0, length($name)
This expresses the same condition.
On Loop Control
I will make two assumptions:
You are only interested in the first matching name for any file
You only want to print the no match message if no name was found
Perl allows us to break out of arbitrary loops, or restart the current iteration, or go directly to the next iteration, without using flags, as you would do in other languages. All we have to do is to label our loops like LABEL: for (...).
So once we have a match, we can start our search for the next file. Also, we only want to print no match if we left the inner loop without going to the next file. This code does it:
FILE: for my $file (#file_list) {
for my $name (#name_list) {
if ($file =~ /^\Q$name\E/) {
print "$file goes with $name\n";
next FILE;
}
}
print "no match for $file\n";
}
The Zen of Negation
In your file parsing code, you express a condition
if ($field =~ /^\s*$/) {
} else {
# do this stuff only if the field does not consist only of
# zero or more whitespace characters
}
That description is far to complex. How about
if ($field =~ /\S/) {
# do this stuff only if the field contains a non-whitespace character.
}
The same condition, but simpler, and more efficient.
Simplify your Parse
In short, your file parsing code can be condensed to
my %uniq;
while (<$OriginalFILE>) {
chomp;
$uniq{$_} = undef for grep /\S/, split /\t/;
}
my #name_list = sort { length($b) <=> length($a) } keys %uniq;
The split function takes a regex as first argument, and will split on $_ if no other string is specified. It returns a list of fields.
The grep function takes a condition and a list, and will return all elements of a list that match the condition. The current element is in $_, which regexes match by default. For explanation of the regex, see above.
Note: This still allows for the fields to contain whitespace, even in leading position. To split on all whitespace, you can give split the special argument of a string containing a single space: split ' '. This would make the grep unneccessary.
The for loop can also be used as a statement modifier, i.e. like EXPR for LIST. The current element is in $_. We assign something to the $_ entry in our %uniq hash (which is already initialized to the empty hash). This could be a number, but undef works as well.
The keys are returned in a seemingly random order. But as multiple names could match a file, but we only want to select one match, we will have to match the most specific name first. Therefore, I sort the names after their length in descending order.
Your code seems to work for me. All I did was construct two arrays like this:
my #file_list = qw/Bob_car Bob_house Bob_work Fred_car Fred_house Fred_work/;
my #name_list = qw/Fred Bob Mary/;
Then running your code produces output like this:
no match
Bob_car goes with Bob
no match
no match
Bob_house goes with Bob
no match
no match
Bob_work goes with Bob
no match
Fred_car goes with Fred
no match
no match
Fred_house goes with Fred
no match
no match
Fred_work goes with Fred
no match
no match
So it looks like it's working.
A common problem with reading input from files or from a user is forgetting to strip the newline character from the end of the input. This could be your problem. If so, have a read about perldoc -f chomp, and just chomp each value as you add it to your array.
I'm always interested in doing things in efficient way so every time I see O(N^2) algorithm rings bells for me. Why it should be O(N*M) and not O(N+M)?
my $re = join('|',map quotemeta, #name_list);
$re = qr/$re/;
for my $file (#file_list) {
if($file =~ /^($re)/) {
my $name = $1;
... do what you need
}
}
its look something wrong in loop.
follow comments in code
for ( my $i = 0; $i < scalar #file_list ; $i++ )
{
#use some string variable assign it ""
for ( my $j = 0; $j < #name_list ; $j++ )
{
if ( $file_list[ $i ] =~ m/^$name_list[ $j ]/ )
{
# assign string variable to founded name_list[$j]
break loop
}
}
# check condition if string not equal to "" match found print your requirement with string value else match not found
}

Perl Hash of Hash Output

I'm reading a file. I want a hash that gives me the first number of a line as a key to a hash of all the numbers of the rest of the line to 1.
I believe I'm adding the hash correctly, because Dumper prints correctly.
However, print "$first $secondID\n" is not giving me any output.
while (<FILE>) {
chomp $_;
if (/(\d+)\t(.+)/) {
$firstNum = $1;
#seconds = split(/\,/,$2);
foreach $following (#seconds) {
$Pairs->{$firstNum}{$following} = 1;
}
foreach $first (sort {$a <=> $b} keys %Pairs) {
print "$first\n";
%second = {$Pairs{$first}};
foreach $secondID (sort {$a <=> $b} keys %second) {
print "$first $secondID\n";
}
}
print Dumper($Pairs);
}
else {
print "ERROR\n";
}
}
Later on, given a pair of numbers I would like to look up to see whether $Pairs{$num1}{$num2} is defined. would I write
if(defined $Pairs{$num1}{$num2})
Or should I check the first key first. Then check the second key
if (defined $Pairs{$num1}) {
$temp = $Pairs{$num1};
if (defined $temp{$num2}) {
print "true\n;
}
}
You have a couple of errors. Firstly you seem to be unsure whether you are using %Pairs or $Pairs to store your hash, and secondly you have %second = {$Pairs{$first}}, which tries to assign a hash reference to the hash %second. Presumably you want my %second = %{ $Pairs{$first} }.
You should always use strict and use warnings at the start of all your Perl programs, and declare all variables at the point of first use using my. This will alert you to simple mistakes you could otherwise easily overlook, and would have shown up your use of both %Pairs and $Pairs in this program, as well as your attempt to assign a single value (a hash reference) to a hash.
Rather than copying the entire hash, you should save a reference to it in $seconds. Then you can dereference it in the following for loop.
Experienced Perl programmers would also thank you for using lower-case plus underscore for local (my) variables, and reserving capitals for package and class names.
This program works as you intended, and expects the file name as a command-line parameter:
use strict;
use warnings;
my %pairs;
while (<>) {
unless ( /(\d+)\s+(.+)/ ) {
print "ERROR\n";
next;
}
my $first_num = $1;
my #seconds = split /,/, $2;
foreach my $following (#seconds) {
$pairs{$first_num}{$following} = 1;
}
foreach my $first (sort { $a <=> $b } keys %pairs) {
print "$first\n";
my $second = $pairs{$first};
foreach my $second_id (sort { $a <=> $b } keys %$second) {
print "$first $second_id\n";
}
}
}
my %hash;
while ( <> ) {
my #numbers = split /\D+/;
my $key = shift #numbers;
#{$hash{$key}}{ #numbers } = ( 1 ) x #numbers;
}
# test it this way...
if ( $hash{ $num1 }{ $num2 } ) {
}
Use:
%second = %{$Pairs->{$first}};