To find the matched and unmatched values in perl - perl

I am a newbie to programming and I hope someone can explain this to me:
So I have two text files i.e. Scan1.txt and Scan2.txt that are stored in my computer. Scan1.txt contains:
Tom
white
black
mark
john
ben
Scan2.txt contains:
bob
ben
white
gary
tom
black
patrick
I have to extract the matched values of these two files and the unmatched values and print them separately. I somehow found the solution for this which works fine. But can someone please explain how exactly the match happens here. Looks like somehow just this line:
$hash{$matchline}++ in the code does the matching and increments the value of hash when the match is found. I understand the logic but I do not understand how this match actually happens. Can someone help me understand this?
Thank you in advance!
Here is the code:
open (F1, "Scan1.txt");
open (F2, "Scan2.txt");
%hash=();
while ($matchline= <F1> ){
$hash{$matchline}=1;
}
close F1;
while( $matchline= <F2> ){
$hash{$matchline}++;
}
close F2;
foreach $matchline (keys %hash){
if ($hash{$matchline} == 1){
chomp($matchline);
push(#unmatched, $matchline);
}
else{
chomp($matchline);
push (#matched, $matchline);
}
}
print "Matched Entries are >>\n";
print "```````````````````````\n";
print join ("\n", #matched) . "\n";
print "```````````````````````\n";
print "Unmatched Entries are >>\n";
print "```````````````````````\n";
print join ("\n", #unmatched) . "\n";
print "```````````````````````\n";

The code you mention above will give you a false result if a given word exists more than one time in the second file and not exists in the first.
this line:
$hash{$matchline}++
increments a different counter for each different word.
in the first loop it sets to 1 for the words in the first file.
so if a word exists in each file the counter will be at least 2.
the $hash itself is a set of counters.

A more generalized version of your problem is that of computing the set union or intersection between two sets. This link gives a very good treatment of the problem in general.
In your case, the set is nothing but the list of values from each file. The logic is, if a certain value was present in both files then $hash{matchline} == 2, because the value will be incremented in both the while loops. However, if the line was present in only one of the files, the value of $hash{matchline} == 1, since only one while loop will increment the value and not the other.
Also, Lajos Veres raises a very important point: if a certain word, say "Tom" is present twice in the same file, then the algorithm will fail. It is a subtle detail, which can be resolved in many ways- removing duplicates beforehand, using two hashes, etc.
Hope this helps.

Related

Using white space in key value for hashing in perl

Can we safely use hash tables where the key value would include white spaces in between. For ex:
my $key1="Dave 2314";
my $key2="John 3212";
$newhash{$key1}= 35;
$newhash{$key2}= 46;
I used similar piece of code in one of my program. I feel like the hashing do work, but exists function don't go well=>
print "Found\n" if (exists $newhash{$searchKey})
This gives absurd results. Sometimes it works well and return correct response if the key is present and sometimes it doesn't for the very same input. Is having white spaces in the keys the reason for such absurd functioning?
What absurd results do you get? The hash doesn't care what you have in the keys. Are you sure that you have the right thing in $searchKey? If you are taking that from user input, is there an extra newline on the end?
This works as it should:
my %newhash;
my $key1="Dave 2314";
my $key2="John 3212";
$newhash{$key1} = 35;
$newhash{$key2} = 46;
print "Found\n" if exists $newhash{$key1};
But, there's another issue. You can have code in the braces for the hash element single access. When you have just a scalar variable it works. This is a syntax error because there's a bare word Dave, a space, and a literal number 1234:
print "Found\n" if exists $newhash{Dave 2314};
This is not a syntax error though, because there's a function named Dave (that just happens to return a key that exists). I'm confident this isn't your problem:
sub Dave { 'John 3212' }
print "Found\n" if exists $newhash{Dave 2314};
Written another way:
sub Dave { 'John 3212' }
print "Found\n" if exists $newhash{ Dave(2314) };
And yet another way:
print "Found\n" if exists $newhash{ join ' ', qw(John 3212 ) };
You should have quoted that key if it was literal:
print "Found\n" if exists $newhash{'Dave 2314'};
You can have unquoted strings if they don't look like code. This looks like 'Dave':
print "Found\n" if exists $newhash{Dave};
But what about this? That dot is actually the string concatenation operator and it thinks Dave is a bare word. It you haven't defined a subroutine, this is a syntax error:
print "Found\n" if exists $newhash{Dave.John};
This works though. The thing before the dot is a subroutine call but the thing after is a string:
sub Dave { 'John 3212' }
print "Found\n" if exists $newhash{Dave.John};
So there are some weird edge cases. But I typically don't have this problem because I always quote literal keys.
Thanks to all for investing your time.
The issue was in my code itself. The entire logic was based on a flag variable which i didnt reset properly as and when required.
So to answer my own question, whitespaces in between the key string should not be a problem.

Problems with append using a for loop involving lists in Python 2

I'm a beginner with python 2 and I've got two problems with 'append' command using a for loop.
I'trying to make run this piece of code, but it doesn't work properly:
def pole():
fish_pole = []
fish_elements = ['stick', 'liana', 'worm', 'bended needle']
pick_choice = raw_input()
if pick_choice == "pick up":
print "Good boy. You start picking up your 'tools'"
for element in fish_elements:
fish_pole.append(element)
fish_elements.remove(element)
print "You've found a %s and then" % element
if not element in fish_elements:
print "Ok you have the tools you need."
print "Now you can go to the river."
river()
else:
print "Come on. The sun is dying."
pole()
Ok, my problems are these:
I can't figure out why it prints the "You've found a %s and then" string only for the element 'stick' and the element 'worm';
if I write simply "if not fish_elements:" I have the first problem I've just mentioned above plus this one: as soon as the script finishes printing the two strings for 'stick' and 'worm', it goes straight for the 'else' option and it restarts the entire pole() definition from the beginning.
Please help me. Thanks guys!
Since you're iterating through all the elements of fish_elements you can consider removing fish_elements.remove(element) as it may be the main problem.
for element in fish_elements:
fish_pole.append(element)
print "You've found a %s and then" % element

Perl get array count so can start foreach loop at a certain array element

I have a file that I am reading in. I'm using perl to reformat the date. It is a comma seperated file. In one of the files, I know that element.0 is a zipcode and element.1 is a counter. Each row can have 1-n number of cities. I need to know the number of elements from element.3 to the end of the line so that I can reformat them properly. I was wanting to use a foreach loop starting at element.3 to format the other elements into a single string.
Any help would be appreciated. Basically I am trying to read in a csv file and create a cpp file that can then be compiled on another platform as a plug-in for that platform.
Best Regards
Michael Gould
you can do something like this to get the fields from a line:
my #fields = split /,/, $line;
To access all elements from 3 to the end, do this:
foreach my $city (#fields[3..$#fields])
{
#do stuff
}
(Note, based on your question I assume you are using zero-based indexing. Thus "element 3" is the 4th element).
Alternatively, consider Text::CSV to read your CSV file, especially if you have things like escaped delimiters.
Well if your line is being read into an array, you can get the number of elements in the array by evaluating it in scalar context, for example
my $elems = #line;
or to be really sure
my $elems = scalar(#line);
Although in that case the scalar is redundant, it's handy for forcing scalar context where it would otherwise be list context. You can also find the index of the last element of the array with $#line.
After that, if you want to get everything from element 3 onwards you can use an array slice:
my #threeonwards = #line[3 .. $#line];

Need to add two values in perl

In if condition I used to take one value from log file after matching the particular pattern. That pattern is matched two times in log file. While matching the pattern first time that value is 0 and second time value is 48. It may be also reverse. First value may contain 48 and second value may contain 0. I need to calculate the exact value. So I planned to add these two values. but after adding these two values also while printing the total value in if condition I used to get the two values separately. But I need single value only.
Please give me solution to solve this issue.
Thanks in advance.
Do you mean something like this:
my $entry = "First is 10, seconds is 48";
if(my ($a,$b) = $entry =~ /(\d+)/g) {
print $a + $b,"\n"; # 58
}
But without actual code it is hard to see what your problem really is.

Perl: pattern match a string and then print next line/lines

I am using Net::Whois::Raw to query a list of domains from a text file and then parse through this to output relevant information for each domain.
It was all going well until I hit Nominet results as the information I require is never on the same line as that which I am pattern matching.
For instance:
Name servers:
ns.mistral.co.uk 195.184.229.229
So what I need to do is pattern match for "Name servers:" and then display the next line or lines but I just can't manage it.
I have read through all of the answers on here but they either don't seem to work in my case or confuse me even further as I am a simple bear.
The code I am using is as follows:
while ($record = <DOMAINS>) {
$domaininfo = whois($record);
if ($domaininfo=~ m/Name servers:(.*?)\n/){
print "Nameserver: $1\n";
}
}
I have tried an example of Stackoverflow where
<DOMAINS>;
will take the next line but this didn't work for me and I assume it is because we have already read the contents of this into $domaininfo.
EDIT: Forgot to say thanks!
how rude.
So, the $domaininfo string contains your domain?
What you probably need is the m parameter at the end of your regular expression. This treats your string as a multilined string (which is what it is). Then, you can match on the \n character. This works for me:
my $domaininfo =<<DATA;
Name servers:
ns.mistral.co.uk 195.184.229.229
DATA
$domaininfo =~ m/Name servers:\n(\S+)\s+(\S+)/m;
print "Server name = $1\n";
print "IP Address = $2\n";
Now, I can match the \n at the end of the Name servers: line and capture the name and IP address which is on the next line.
This might have to be munged a bit to get it to work in your situation.
This is half a question and perhaps half an answer (the question's in here as I am not yet allowed to write comments...). Okay, here we go:
Name servers:
ns.mistral.co.uk 195.184.229.229
Is this what an entry in the file you're parsing looks like? What will follow immediately afterwards - more domain names and IP addresses? And will there be blank lines in between?
Anyway, I think your problem may (in part?) be related to your reading the file line by line. Once you get to the IP address line, the info about 'Name servers:' having been present will be gone. Multiline matching will not help if you're looking at your file line by line. Thus I'd recommend switching to paragraph mode:
{
local $/ = ''; # one paragraph instead of one line constitutes a record
while ($record = <DOMAINS>) {
# $record will now contain all consecutive lines that were NOT separated
# by blank lines; once there are >= 1 blank lines $record will have a
# new value
# do stuff, e.g. pattern matching
}
}
But then you said
I have tried an example of Stackoverflow where
<DOMAINS>;
will take the next line but this didn't work for me and I assume it is because we have already read the contents of this into $domaininfo.
so maybe you've already tried what I have just suggested? An alternative would be to just add another variable ($indicator or whatever) which you'll set to 1 once 'Name servers:' has been read, and as long as it's equal to 1 all following lines will be treated as containing the data you need. Whether this is feasible, however, depends on you always knowing what else your data file contains.
I hope something in here has been helpful to you. If there are any questions, please ask :)