Using white space in key value for hashing in perl - perl

Can we safely use hash tables where the key value would include white spaces in between. For ex:
my $key1="Dave 2314";
my $key2="John 3212";
$newhash{$key1}= 35;
$newhash{$key2}= 46;
I used similar piece of code in one of my program. I feel like the hashing do work, but exists function don't go well=>
print "Found\n" if (exists $newhash{$searchKey})
This gives absurd results. Sometimes it works well and return correct response if the key is present and sometimes it doesn't for the very same input. Is having white spaces in the keys the reason for such absurd functioning?

What absurd results do you get? The hash doesn't care what you have in the keys. Are you sure that you have the right thing in $searchKey? If you are taking that from user input, is there an extra newline on the end?
This works as it should:
my %newhash;
my $key1="Dave 2314";
my $key2="John 3212";
$newhash{$key1} = 35;
$newhash{$key2} = 46;
print "Found\n" if exists $newhash{$key1};
But, there's another issue. You can have code in the braces for the hash element single access. When you have just a scalar variable it works. This is a syntax error because there's a bare word Dave, a space, and a literal number 1234:
print "Found\n" if exists $newhash{Dave 2314};
This is not a syntax error though, because there's a function named Dave (that just happens to return a key that exists). I'm confident this isn't your problem:
sub Dave { 'John 3212' }
print "Found\n" if exists $newhash{Dave 2314};
Written another way:
sub Dave { 'John 3212' }
print "Found\n" if exists $newhash{ Dave(2314) };
And yet another way:
print "Found\n" if exists $newhash{ join ' ', qw(John 3212 ) };
You should have quoted that key if it was literal:
print "Found\n" if exists $newhash{'Dave 2314'};
You can have unquoted strings if they don't look like code. This looks like 'Dave':
print "Found\n" if exists $newhash{Dave};
But what about this? That dot is actually the string concatenation operator and it thinks Dave is a bare word. It you haven't defined a subroutine, this is a syntax error:
print "Found\n" if exists $newhash{Dave.John};
This works though. The thing before the dot is a subroutine call but the thing after is a string:
sub Dave { 'John 3212' }
print "Found\n" if exists $newhash{Dave.John};
So there are some weird edge cases. But I typically don't have this problem because I always quote literal keys.

Thanks to all for investing your time.
The issue was in my code itself. The entire logic was based on a flag variable which i didnt reset properly as and when required.
So to answer my own question, whitespaces in between the key string should not be a problem.

Related

Getting Error of Modification of a read-only value attempted

I am trying to select the below value from database:
Reporting that one of #its many problems had been the recent# extended
sales slump in women's apparel, the seven-store retailer said it would
start a three-month liquidation sale in all of its stores.~(A) its
many problems had been the recent~(B) its many problems has been the
recently~(C) its many problems is the recently~(D) their many problems
is the recent~(E) their many problems had been the recent~
i am selecting this value in variable $ques and then selecting a text as below:
$ques=~s/^(.*?)\#(.*?)\#(.*?)$/$2/;
Now, while replacing the ~ character in the string by
$3=~s/~/\n/g; ---->line 171
and running the script, I am getting one error as:
Modification of a read-only value attempted at main.pl line 171
I want to replace all the ~ character with '\n' and print the final value. Please suggest how to do it.
*I have researched this on net, but got confused that how to handle these read only variables.
You've already got a good explanation of the problem from José Castro. But there's another solution if you're using a recent-ish version of Perl (Update: having checked more carefully, I find that means 5.14+). The /r argument to the substitution operator will copy your string, make the substitution on the copy and then return that altered value.
So you could write:
my $new_value = $3 =~ s/~/\n/rg;
It sounds like what you really want in this case is split rather than regular expression capture groups:
my #parts = split(/#/, $ques);
$parts[2] =~ s/~/\n/g;
It makes the intent of your code clearer since you are, in fact, splitting on # symbols.
Just like you say, the special variables $1, $2, etc., are read-only, and that means that you can't perform that substitution on them.
Performing the substitution on $ques will do what you need:
$ques =~ s/~/\n/g;
print $ques;
Do note that in the earlier substitution that you're performing on $ques you're getting rid of all the ~ characters.

Perl $1 giving uninitialized value error

I am trying to extract a part of a string and put it into a new variable. The string I am looking at is:
maker-scaffold_26653|ref0016423-snap-gene-0.1
(inside a $gene_name variable)
and the thing I want to match is:
scaffold_26653|ref0016423
I'm using the following piece of code:
my $gene_name;
my $scaffold_name;
if ($gene_name =~ m/scaffold_[0-9]+\|ref[0-9]+/) {
$scaffold_name = $1;
print "$scaffold_name\n";
}
I'm getting the following error when trying to execute:
Use of uninitialized value $scaffold_name in concatenation (.) or string
I know that the pattern is right, because if I use $' instead of $1 I get
-snap-gene-0.1
I'm at a bit of a loss: why will $1 not work here?
If you want to use a value from the matching you have to make () arround the character in regex
To expand on Jens' answer, () in a regex signifies an anonymous capture group. The content matched in a capture group is stored in $1-9+ from left to right, so for example,
/(..):(..):(..)/
on an HH:MM:SS time string will store hours, minutes, and seconds in $1, $2, $3 respectively. Naturally this begins to become unwieldy and is not self-documenting, so you can assign the results to a list instead:
my ($hours, $mins, $secs) = $time =~ m/(..):(..):(..)/;
So your example could bypass the use of $ variables by doing direct assignment:
my ($scaffold_name) = $gene_name =~ m/(scaffold_[0-9]+[|]ref[0-9]+)/;
# $scaffold_name now contains 'scaffold_26653|ref0016423'
You can even get rid of the ugly =~ binding by using for as a topicalizer:
my $scaffold_name;
for ($gene_name) {
($scaffold_name) = m/(scaffold_\d+[|]ref\d+)/;
print $scaffold_name;
}
If things start to get more complex, I prefer to use named capture groups (introduced in Perl v5.10.0):
$gene_name =~ m{
(?<scaffold_name> # ?<name> creates a named capture group
scaffold_\d+? # 'scaffold' and its trailing digits
[|] # Literal pipe symbol
ref\d+ # 'ref' and its trailing digits
)
}xms; # The x flag lets us write more readable regexes
print $+{scaffold_name}, "\n";
The results of named capture groups are stored in the magic hash %+. Access is done just like any other hash lookup, with the capture groups as the keys. %+ is locally scoped in the same way the $ are, so it can be used as a drop-in replacement for them in most situations.
It's overkill for this particular example, but as regexes start to get larger and more complicated, this saves you the trouble of either having to scroll all the way back up and count anonymous capture groups from left to right to find which of those darn $ variables is holding the capture you wanted, or scan across a long list assignment to find where to add a new variable to hold a capture that got inserted in the middle.
My personal rule of thumb is to assign the results of anonymous captured to descriptively named lexically scoped variables for 3 or less captures, then switch to using named captures, comments, and indentation in regexes when more are necessary.

To find the matched and unmatched values in perl

I am a newbie to programming and I hope someone can explain this to me:
So I have two text files i.e. Scan1.txt and Scan2.txt that are stored in my computer. Scan1.txt contains:
Tom
white
black
mark
john
ben
Scan2.txt contains:
bob
ben
white
gary
tom
black
patrick
I have to extract the matched values of these two files and the unmatched values and print them separately. I somehow found the solution for this which works fine. But can someone please explain how exactly the match happens here. Looks like somehow just this line:
$hash{$matchline}++ in the code does the matching and increments the value of hash when the match is found. I understand the logic but I do not understand how this match actually happens. Can someone help me understand this?
Thank you in advance!
Here is the code:
open (F1, "Scan1.txt");
open (F2, "Scan2.txt");
%hash=();
while ($matchline= <F1> ){
$hash{$matchline}=1;
}
close F1;
while( $matchline= <F2> ){
$hash{$matchline}++;
}
close F2;
foreach $matchline (keys %hash){
if ($hash{$matchline} == 1){
chomp($matchline);
push(#unmatched, $matchline);
}
else{
chomp($matchline);
push (#matched, $matchline);
}
}
print "Matched Entries are >>\n";
print "```````````````````````\n";
print join ("\n", #matched) . "\n";
print "```````````````````````\n";
print "Unmatched Entries are >>\n";
print "```````````````````````\n";
print join ("\n", #unmatched) . "\n";
print "```````````````````````\n";
The code you mention above will give you a false result if a given word exists more than one time in the second file and not exists in the first.
this line:
$hash{$matchline}++
increments a different counter for each different word.
in the first loop it sets to 1 for the words in the first file.
so if a word exists in each file the counter will be at least 2.
the $hash itself is a set of counters.
A more generalized version of your problem is that of computing the set union or intersection between two sets. This link gives a very good treatment of the problem in general.
In your case, the set is nothing but the list of values from each file. The logic is, if a certain value was present in both files then $hash{matchline} == 2, because the value will be incremented in both the while loops. However, if the line was present in only one of the files, the value of $hash{matchline} == 1, since only one while loop will increment the value and not the other.
Also, Lajos Veres raises a very important point: if a certain word, say "Tom" is present twice in the same file, then the algorithm will fail. It is a subtle detail, which can be resolved in many ways- removing duplicates beforehand, using two hashes, etc.
Hope this helps.

Perl "else" statement not executing

I use an ActivePerl script to take in CSV files and create XML files that I load into a database. These are userid database entries, name, address, etc. We've always used the home phone number field to generate an initial password (which we encourage the users to change immediately!). The proliferation of cellphones means I have a bunch of people with no home phone, so I want to use the cell phone field when the home phone field is empty.
My input fields look like this:
# 0 Firstname
# 1 Lastname
# 2 VP (voicepart)
# 3 Address
# 4 City
# 5 State
# 6 Zip
# 7 Phone
# 8 Mobile
# 9 Email
Here's the Perl code I've worked up to create the password - the create_password subroutine is working when there's a value in field 7:
my $pass_word = '';
my $pass_word = create_password($fields[7]);
if (my $pass_word = '') {
print "Use the cell phone number \n";
my $pass_word = create_password($fields[8]);
}
The "print" statement is to tell me what it thinks it's doing.
This looks to me like it should work, but the "if" statment never fires. The Print statement doesn't print, and nobody with a value only in field 8 ever gets a password generated. There must be something wrong with the way I'm testing the value of $pass_word but I can't see it. Should I be testing the values of $fields[7] and $fields[8] instead of the variable value? How DO you test a Perl variable for null value if this doesn't work?
You have several problems in your code.
First of all, after you declared a variable using my, you don't need to add my before the variable when you use it;
Secondly, for this line:
if (my $pass_word = '')
I think you meant
if ($pass_word == '')
(my is removed, as talked in the first point)
= means assignment, which returns the value you assigned to $pass_word, which is '' here, that's why this condition always return false.
But still, == is not correct here. In perl, we use eq to compare two strings. == is used to compare numbers.
So, remove all the my except the first one, and use eq to compare your strings.
You've got two major problems in here.
First one is your string equality test. In Perl, strings are compared for equality using operator eq (as in $string eq 'something'). = is the assignment operator.
Second one is your (ab)use of my. Each my declares a new variable that “hides” the previous one, so in effect you can never re-use its value, you're confronted to undef every time.
Replace = with eq in your if clause; remove all but the first uses of my, and you should be set!
my declares a new variable which hides the variable with the same name in the surrounding scope. Remove the excessive use of my.

(3 lines) from bash to perl?

I have these three lines in bash that work really nicely. I want to add them to some existing perl script but I have never used perl before ....
could somebody rewrite them for me? I tried to use them as they are and it didn't work
note that $SSH_CLIENT is a run-time parameter you get if you type set in bash (linux)
users[210]=radek #where 210 is tha last octet from my mac's IP
octet=($SSH_CLIENT) # split the value on spaces
somevariable=$users[${octet[0]##*.}] # extract the last octet from the ip address
These might work for you. I noted my assumptions with each line.
my %users = ( 210 => 'radek' );
I assume that you wanted a sparse array. Hashes are the standard implementation of sparse arrays in Perl.
my #octet = split ' ', $ENV{SSH_CLIENT}; # split the value on spaces
I assume that you still wanted to use the environment variable SSH_CLIENT
my ( $some_var ) = $octet[0] =~ /\.(\d+)$/;
You want the last set of digits from the '.' to the end.
The parens around the variable put the assignment into list context.
In list context, a match creates a list of all the "captured" sequences.
Assigning to a scalar in a list context, means that only the number of scalars in the expression are assigned from the list.
As for your question in the comments, you can get the variable out of the hash, by:
$db = $users{ $some_var };
# OR--this one's kind of clunky...
$db = $users{ [ $octet[0] =~ /\.(\d+)$/ ]->[0] };
Say you have already gotten your IP in a string,
$macip = "10.10.10.123";
#s = split /\./ , $macip;
print $s[-1]; #get last octet
If you don't know Perl and you are required to use it for work, you will have to learn it. Surely you are not going to come to SO and ask every time you need it in Perl right?