i have a very simple perl question regarding pattern matching problem.
I am reading file with a list of names (fileA).
I would like to check if any of these names exist in another file (fileB).
if ($name -e $fileB){
do something
}else{
do something else
}
it is in a way to check if a pattern exists in a file.
I have tried
open(IN, $controls) or die "Can't open the control file\n";
while(my $line = <IN>){
if ($name =~ $line ){
print "$name\tfound\n";
}else{
print "$name\tnotFound\n";
}
}
This is repeating itself as it checks and prints every entry rather than checking whether the name exists or not.
When you are doing compare one list to another, you're interested in hashes. A hash is an array that is keyed and the list itself has no order. A hash can only have a single instance of a particular key (but different keys can have the same data).
What you can do is go through the first file, and create a hash keyed by that line. Then, you go through the second folder and check to see if any of those lines match any keys in your hash:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie; #You don't have to check if "open" fails.
use constant {
FIRST_FILE => 'file1.txt',
SECOND_FILE => 'file2.txt',
};
open my $first_fh, "<", FIRST_FILE;
# Get each line as a hash key
my %line_hash;
while ( my $line = <$first_fh> ) {
chomp $line;
$line_hash{$line} = 1;
}
close $first_fh;
Now each line is a key in your hash %line_hash. The data really doesn't matter. The important part is the value of the key itself.
Now that I have my hash of the lines in the first file, I can read in the second file and see if that line exists in my hash:
open my $second_fh, "<", SECOND_FILE;
while ( my $line = <$second_fh> ) {
chomp $line;
if ( exists $line_hash{$line} ) {
say qq(I found "$line" in both files);
}
}
close $second_fh;
There's a map function too that can be used:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie; #You don't have to check if "open" fails.
use constant {
FIRST_FILE => 'file1.txt',
SECOND_FILE => 'file2.txt',
};
open my $first_fh, "<", FIRST_FILE
chomp ( my #lines = <$first_fh> );
# Get each line as a hash key
my %line_hash = map { $_ => 1 } #lines;
close $first_fh;
open my $second_fh, "<", SECOND_FILE;
while ( my $line = <$second_fh> ) {
chomp $line;
if ( exists $line_hash{$line} ) {
say qq(I found "$line" in both files);
}
}
close $second_fh;
I am not a great fan of map because I don't find it that much more efficient and it is harder to understand what is going on.
To check whether a pattern exists in a file, you have to open the file and read its content. The fastest way how to search for inclusion of two lists is to store the content in a hash:
#!/usr/bin/perl
use strict;
use warnings;
open my $LST, '<', 'fileA' or die "fileA: $!\n";
open my $FB, '<', 'fileB' or die "fileB: $!\n";
my %hash;
while (<$FB>) {
chomp;
undef $hash{$_};
}
while (<$LST>) {
chomp;
if (exists $hash{$_}) {
print "$_ exists in fileB.\n";
}
}
I have just given an algorithm kind of code which is not tested.
But i feel this does the job for you.
my #a;
my $matched
my $line;
open(A,"fileA");
open(A,"fileB");
while(<A>)
{
chomp;
push #a,$_;
}
while(<B>)
{
chomp;
$line=$_;
$matched=0;
for(#a){if($line=~/$_/){last;$matched=1}}
if($matched)
{
do something
}
else
{
do something else
}
}
Related
I am a beginner with Perl and I want to merge the content of two text files.
I have read some similar questions and answers on this forum, but I still cannot resolve my issues
The first file has the original ID and the recoded ID of each individual (in the first and fourth columns)
The second file has the recoded ID and some information on some of the individuals (in the first and second columns).
I want to create an output file with the original, recoded and information of these individuals.
This is the perl script I have created so far, which is not working.
If anyone could help it would be very much appreciated.
use warnings;
use strict;
use diagnostics;
use vars qw( #fields1 $recoded $original $IDF #fields2);
my %columns1;
open (FILE1, "<file1.txt") || die "$!\n Couldn't open file1.txt\n";
while ($_ = <FILE1>)
{
chomp;
#fields1=split /\s+/, $_;
my $recoded = $fields1[0];
my $original = $fields1[3];
my %columns1 = (
$recoded => $original
);
};
open (FILE2, "<file2.txt") || die "$!\n Couldnt open file2.txt \n";
for ($_ = <FILE2>)
{
chomp;
#fields2=split /\s+/, $_;
my $IDF= $fields2[0];
my $F=$fields2[1];
my %columns2 = (
$F => $IDF
);
};
close FILE1;
close FILE2;
open (FILE3, ">output.txt") ||die "output problem\n";
for (keys %columns1) {
if (exists ($columns2{$_}){
print FILE3 "$_ $columns1{$_}\n"
};
}
close FILE3;
One problem is with scoping. In your first loop, you have a my in front of $column1 which makes it local to the loop and will not be in scope when you next the loop. So the %columns1 (which is outside of the loop) does not have any values set (which is what I suspect you want to set). For the assignment, it would seem to be easier to have $columns1{$recorded} = $original; which assigns the value to the key for the hash.
In the second loop you need to declare %columns2 outside of the loop and possibly use the above assignment.
For the third loop, in the print you just need add $columns2{$_} in front part of the string to be printed to get the original ID to be printed before the recorded ID.
Scope:
The problem is with scope of the hash variables you have defined. The scope of the variable is limited to the loop inside which the variable has been defined.
In your code, since %columns1 and %columns2 are used outside the while loops. Hence, they should be defined outside the loops.
Compilation error : braces not closed properly
Also, in the "if exists" part, the open-and-closed braces symmetry is affected.
Here is your code with the required corrections made:
use warnings;
use strict;
use diagnostics;
use vars qw( #fields1 $recoded $original $IDF #fields2);
my (%columns1, %columns2);
open (FILE1, "<file1.txt") || die "$!\n Couldn't open CFC_recoded.txt\n";
while ($_ = <FILE1>)
{
chomp;
#fields1=split /\s+/, $_;
my $recoded = $fields1[0];
my $original = $fields1[3];
%columns1 = (
$recoded => $original
);
}
open (FILE2, "<file2.txt") || die "$!\n Couldnt open CFC_F.xlsx \n";
for ($_ = <FILE2>)
{
chomp;
#fields2=split /\s+/, $_;
my $IDF= $fields2[0];
my $F=$fields2[1];
%columns2 = (
$F => $IDF
);
}
close FILE1;
close FILE2;
open (FILE3, ">output.txt") ||die "output problem\n";
for (keys %columns1) {
print FILE3 "$_ $columns1{$_} \n" if exists $columns2{$_};
}
close FILE3;
I am fairly new to Perl so hopefully this has a quick solution.
I have been trying to combine two files based on a key. The problem is there are multiple values instead of the one it is returning. Is there a way to loop through the hash to get the 1-10 more values it could be getting?
Example:
File Input 1:
12345|AA|BB|CC
23456|DD|EE|FF
File Input2:
12345|A|B|C
12345|D|E|F
12345|G|H|I
23456|J|K|L
23456|M|N|O
32342|P|Q|R
The reason I put those last one in is because the second file has a lot of values I don’t want but file 1 I want all values. The result I want is something like this:
WANTED OUTPUT:
12345|AA|BB|CC|A|B|C
12345|AA|BB|CC|D|E|F
12345|AA|BB|CC|G|H|I
23456|DD|EE|FF|J|K|L
23456|DD|EE|FF|M|N|O
Attached is the code I am currently using. It gives an output like so:
OUTPUT I AM GETTING:
12345|AA|BB|CC|A|B|C
23456|DD|EE|FF|J|K|L
My code so far:
#use strict;
#use warnings;
open file1, "<FILE1.txt";
open file2, "<FILE2.txt";
while(<file2>){
my($line) = $_;
chomp $line;
my($key, $value1, $value2, $value3) = $line =~ /(.+)\|(.+)\|(.+)\|(.+)/;
$value4 = "$value1|$value2|$value3";
$file2Hash{$key} = $value4;
}
while(<file1>){
my ($line) = $_;
chomp $line;
my($key, $value1, $value2, $value3) = $line =~/(.+)\|(.+)\|(.+)\|(.+)/;
if (exists $file2Hash{$key}) {
print $line."|".$file2Hash{$key}."\n";
}
else {
print $line."\n";
}
}
Thank you for any help you may provide,
Your overall idea is sound. However in file2, if you encounter a key you have already defined, you overwrite it with a new value. To work around that, we store an array(-ref) inside our hash.
So in your first loop, we do:
push #{$file2Hash{$key}}, $value4;
The #{...} is just array dereferencing syntax.
In your second loop, we do:
if (exists $file2Hash{$key}){
foreach my $second_value (#{$file2Hash{$key}}) {
print "$line|$second_value\n";
}
} else {
print $line."\n";
}
Beyond that, you might want to declare %file2Hash with my so you can reactivate strict.
Keys in a hash must be unique. If keys in file1 are unique, use file1 to create the hash. If keys are not unique in either file, you have to use a more complicated data structure: hash of arrays, i.e. store several values at each unique key.
I assume that each key in FILE1.txt is unique and that each unique key has at least one corresponding line in FILE2.txt.
Your approach is then quite close to what you need, you should just use FILE1.txt to create the hash from (as already mentioned here).
The following should work:
#!/usr/bin/perl
use strict;
use warnings;
my %file1hash;
open file1, "<", "FILE1.txt" or die "$!\n";
while (<file1>) {
my ($key, $rest) = split /\|/, $_, 2;
chomp $rest;
$file1hash{$key} = $rest;
}
close file1;
open file2, "<", "FILE2.txt" or die "$!\n";
while (<file2>) {
my ($key, $rest) = split /\|/, $_, 2;
if (exists $file1hash{$key}) {
chomp $rest;
printf "%s|%s|%s\n", $key, $file1hash{$key}, $rest;
}
}
close file2;
exit 0;
I have two files, one with text and another with key / hash values. I want to replace occurrences of the key with the hash values. The following code does this, what I want to know is if there is a better way than the foreach loop I am using.
Thanks all
Edit: I know it is a bit strange using
s/\n//;
s/\r//;
instead of chomp, but this works on files with mixed end of line characters (edited both on windows and linux) and chomp (I think) does not.
File with key / hash values (hash.tsv):
strict $tr|ct
warnings w#rn|ng5
here h3r3
File with text (doc.txt):
Do you like use warnings and strict?
I do not like use warnings and strict.
Do you like them here or there?
I do not like them here or there?
I do not like them anywhere.
I do not like use warnings and strict.
I will not obey your good coding practice edict.
The perl script:
#!/usr/bin/perl
use strict;
use warnings;
open (fh_hash, "<", "hash.tsv") or die "could not open file $!";
my %hash =();
while (<fh_hash>)
{
s/\n//;
s/\r//;
my #tmp_hash = split(/\t/);
$hash{ #tmp_hash[0] } = #tmp_hash[1];
}
close (fh_hash);
open (fh_in, "<", "doc.txt") or die "could not open file $!";
open (fh_out, ">", "doc.out") or die "could not open file $!";
while (<fh_in>)
{
foreach my $key ( keys %hash )
{
s/$key/$hash{$key}/g;
}
print fh_out;
}
close (fh_in);
close (fh_out);
One problem with
for my $key (keys %hash) {
s/$key/$hash{$key}/g;
}
is it doesn't correctly handle
foo => bar
bar => foo
Instead of swapping, you end up with all "foo" or all "bar", and you can't even control which.
# Do once, not once per line
my $pat = join '|', map quotemeta, keys %hash;
s/($pat)/$hash{$1}/g;
You might also want to handle
foo => bar
food => baz
by taking the longest rather than possibly ending with "bard".
# Do once, not once per line
my $pat =
join '|',
map quotemeta,
sort { length($b) <=> length($a) }
keys %hash;
s/($pat)/$hash{$1}/g;
You can read a whole file into a variable a replace all occurrences at once for each key-val.
Something like:
use strict;
use warnings;
use YAML;
use File::Slurp;
my $href = YAML::LoadFile("hash.yaml");
my $text = read_file("text.txt");
foreach (keys %$href) {
$text =~ s/$_/$href->{$_}/g;
}
open (my $fh_out, ">", "doc.out") or die "could not open file $!";
print $fh_out $text;
close $fh_out;
produces:
Do you like use w#rn|ng5 and $tr|ct?
I do not like use w#rn|ng5 and $tr|ct.
Do you like them h3r3 or th3r3?
I do not like them h3r3 or th3r3?
I do not like them anywh3r3.
I do not like use w#rn|ng5 and $tr|ct.
I will not obey your good coding practice edict.
For shorting a code i used YAML and replaced your input file with:
strict: $tr|ct
warnings: w#rn|ng5
here: h3r3
and used File::Slurp for reading a whole file into a variable. Of course, you can "slurp" the file without File::Slurp, for example with:
my $text;
{
local($/); #or undef $/;
open(my $fh, "<", $file ) or die "problem $!\n";
$text = <$fh>;
close $fh;
}
I've a code as below to parse a text file. Display all words after "Enter:" keyword on all lines of the text file. I'm getting displayed all words after "Enter:" keyword, but i wan't duplicated should not be repeated but its repeating. Please guide me as to wht is wrong in my code.
#! /usr/bin/perl
use strict;
use warnings;
$infile "xyz.txt";
open (FILE, $infile) or die ("can't open file:$!");
if(FILE =~ /ENTER/ ){
#functions = substr($infile, index($infile, 'Enter:'));
#functions =~/#functions//;
%seen=();
#unique = grep { ! $seen{$_} ++ } #array;
while (#unique != ''){
print '#unique\n';
}
}
close (FILE);
Here is a way to do the job, it prints unique words found on each line that begins with the keyword Enter:
#!/usr/bin/perl
use strict;
use warnings;
my $infile = "xyz.txt";
# use 3 arg open with lexical file handler
open my $fh, '<', $infile or die "unable to open '$infile' for reading: $!";
# loop thru all lines
while(my $line = <$fh) {
# remove linefeed;
chomp($line);
# if the line begins with "Enter:"
# remove the keyword "Enter:"
if ($line =~ s/^Enter:\s+//) {
# split the line on whitespaces
# and populate the array with all words found
my #words = split(/\s+/, $line);
# create a hash where the keys are the words found
my %seen = map { $_ => 1 }#words;
# display unique words
print "$_\t" for(keys %seen);
print "\n";
}
}
If I understand you correctly, one problem is that your 'grep' only counts the occurrences of each word. I think you want to use 'map' so that '#unique' only contains the unique words from '#array'. Something like this:
#unique = map {
if (exists($seen{$_})) {
();
} else {
$seen{$_}++; $_;
}
} #array;
I have a file in which every line is an integer which represents an id. What I want to do is just check whether some specific ids are in this list.
But the code didn't work. It never tells me it exists even if 123 is a line in that file. I don't know why? Help appreciated.
open (FILE, "list.txt") or die ("unable to open !");
my #data=<FILE>;
my %lookup =map {chop($_) => undef} #data;
my $element= '123';
if (exists $lookup{$element})
{
print "Exists";
}
Thanks in advance.
You want to ensure you make your hash correctly. The very outdated chop isn't what you want to use. Use chomp instead, and use it on the entire array at once and before you create the hash:
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
chomp( my #data = <$fh> );
my $hash = map { $_, 1 } #data;
With Perl 5.10 and up, you can also use the smart match operator:
my $id = get_id_to_check_for();
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
chomp( my #data = <$fh> );
print "Id found!" if $id ~~ #data;
perldoc -q contain
chop returns the character it chopped, not what was left behind. You perhaps want something like this:
my %lookup = map { substr($_,0,-1) => undef } #data;
However, generally, you should consider using chomp instead of chop to do a more intelligent CRLF removal, so you'd end up with a line like this:
my %lookup =map {chomp; $_ => undef } #data;
Your problem is that chop returns the character chopped, not the resulting string, so you're creating a hash with a single entry for newline. This would be obvious in debugging if you used Data::Dumper to output the resulting hash.
Try this instead:
my #data=<FILE>;
chomp #data;
my %lookup = map {$_ => undef} #data;
This should work... it uses first in List::Util to do the searching, and eliminates the initial map (this is assuming you don't need to store the values for something else immediately after). The chomp is done while searching for the value; see perldoc -f chomp.
use List::Util 'first';
open (my $fh, 'list.txt') or die 'unable to open list.txt!';
my #elements = <$fh>;
my $element = '123';
if (first { chomp; $_ eq $element } #elements)
{
print "Exists";
}
This one may not exactly match your specific problem,
but if your integer numbers need to be
counted, you might even use the good
old "canonical" perl approach:
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
my %lookup;
while( <$fh> ) { chomp; $lookup{$_}++ } # this will count occurences if ints
my $element = '123';
if( exists $lookup{$element} ) {
print "$element $lookup{$element} times there\n"
}
This might even be in some circumstances faster than
solutions with intermediate array.
Regards
rbo