Search an element in a hash table - perl

I have created a hash table from a text file like this:
use strict;
use warnings;
my %h;
open my $fh, '<', 'tst' or die "failed open 'tst' $!";
while ( <$fh> ) {
push #{$h{keys}}, (split /\t/)[0];
}
close $fh;
use Data::Dumper;
print Dumper \%h;
Now I want to look for a field in another text file in the hash table.
if it exists the current line is written in a result file:
use strict;
use warnings;
my %h;
open my $fh, '<', 'tst' or die "failed open 'tst' $!";
while ( <$fh> ) {
push #{$h{keys}}, (split /\t/)[0];
}
close $fh;
use Data::Dumper;
print Dumper \%h;
open (my $fh1,"<", "exp") or die "Can't open the file: ";
while (my $line =<$fh1>){
chomp ($line);
my ($var)=split(">", $line);
if exists $h{$var};
print ($line);
}
I got these errors:
syntax error at codeperl.pl line 26, near "if exists"
Global symbol "$line" requires explicit package name at codeperl.pl line 27.
syntax error at codeperl.pl line 29, near "}"
Execution of codeperl.pl aborted due to compilation errors.
Any idea please?

What is there to say? The statement if exists $h{$var}; is a syntax error. You may want:
print $line, "\n" if exists $h{$var};
or
if (exists $h{$var}) {
print $line, "\n";
}
The other errors will go away once you fixed that. If you get multiple errors, always look at the first error (with respect to the line numbers). Later errors are often a result of a previous one. In this case, the syntax error messed up the parsing.
Edit
your main problem isn't the syntax error, it is how you populate your hash. The
push #{$h{keys}}, (split /\t/)[0];
pushes first field on the line onto the arrayref that is in the keys entry. To me, it seems that you actually want to use this field as the key:
my ($key) = split /\t/;
$h{$key} = undef; # any value will do.
After that, your Dumper \%h will produce something like
$VAR1 = {
'# ries bibliothèques électroniques à travers' => undef,
'a a pour les ressortissants des' => undef,
'a a priori aucune hiérarchie des' => undef,
};
and your lookup via exists should work.

just try your code like this
First, build your hash
while(<$file1>){
# get your key from current line
$key = (split)[0];
# set the key into the hash
$hash{$key} = 1;
}
Second, judge
while(<$file2>){
# get the field you want you judge
$value = (split)[0];
# to see if $value exists
if( exists $hash{$value} ){
print "got $value";
}
}

Related

Parsing data from delimited blocks

I have a log file content many blocks /begin CHECK ... /end CHECK like below:
/begin CHECK
Var_AAA
"Description AAA"
DATATYPE UBYTE
Max_Value 255.
ADDRESS 0xFF0011
/end CHECK
/begin CHECK
Var_BBB
"Description BBB"
DATATYPE UBYTE
Max_Value 255.
ADDRESS 0xFF0022
/end CHECK
...
I want to extract the variable name and its address, then write to a new file like this
Name Address
Var_AAA => 0xFF0011
Var_BBB => 0xFF0022
I am just thinking about the ($start, $keyword, $end) to check for each block and extract data after keyword only
#!/usr/bin/perl
use strict;
use warnings;
my $input = 'input.log';
my $output = 'output.out';
my ( $start, $keyword, $end ) = ( '^\/begin CHECK\n\n', 'ADDRESS ', '\/end CHECK' );
my #block;
# open input file for reading
open( my $in, '<', $input ) or die "Cannot open file '$input' for reading: $!";
# open destination file for writing
open( my $out, '>', $output ) or die "Cannot open file '$output' for writing: $!";
print( "copying variable name and it's address from $input to $output \n" );
while ( $in ) { #For each line of input
if ( /$start/i .. /$end/i ) { #Block matching
push #block, $_;
}
if ( /$end/i ) {
for ( #block ) {
if ( /\s+ $keyword/ ) {
print $out join( '', #block );
last;
}
}
#block = ();
}
close $in or die "Cannot close file '$input': $!";
}
close $out or die "Cannot close file '$output': $!";
But I got nothing after execution. Can anyone suggest me with sample idea?
Most everything looks good but it's your start regex that's causing the first problem:
'^\/begin CHECK\n\n'
You are reading lines from the file but then looking for two newlines in a row. That's not going to ever match because a line ends with exactly one newline (unless you change $/, but that's a different topic). If you want to match the send of a line, you can use the $ (or \z) anchor:
'^\/begin CHECK$'
Here's the program I pared down. You can adjust it to do all the rest of the stuff that you need to do:
use v5.10;
use strict;
use warnings;
use Data::Dumper;
my ($start, $keyword, $end) = (qr{^/begin CHECK$}, qr(^ADDRESS ), qr(^/end CHECK));
while (<DATA>) #For each line of input
{
state #block;
chomp;
if (/$start/i .. /$end/i) #Block matching
{
push #block, $_ unless /^\s*$/;
}
if( /$end/i )
{
print Dumper( \#block );
#block = ();
}
}
After that, you're not reading the data. You need to put the filehandle inside <> (the line input operator):
while ( <$in> )
The file handles will close themselves at the end of the program automatically. If you want to close them yourself that's fine but don't do that until you are done. Don't close $in until the while is finished.
using the command prompt in windows. In MacOS or Unix will follow the same logic you can do:
perl -wpe "$/='/end CHECK';s/^.*?(Var_\S+).*?(ADDRESS \S+).*$/$1 => $2\n/s" "your_file.txt">"new.txt
first we set the endLine character to $/ = "/end CHECK".
we then pick only the first Var_ and the first ADDRESS. while deleting everything else in single line mode ie Dot Matches line breaks \n. s/^.*?(Var_\S+).*?(ADDRESS \S+).*$/$1 => $2\n/s.
We then write the results into a new file. ie >newfile.
Ensure to use -w -p -e where -e is for executing the code, -p is for printing and -w is for warnings:
In this code, I did not write the values to a new file ie, did not include the >newfile.txt prt so that you may be able to see the result. If you do include the part, just open the newfile.txt and everything will be printed there
Here are some of the issues with your code
You have while ($in) instead of while ( <$in> ), so your program never reads from the input file
You close your input file handle inside the while read loop, so you can only ever read one record
Your $start regex pattern is '^\/begin CHECK\n\n'. The single quotes make your program search for backslash n backslash n instead of newline newline
Your test if (/\s+ $keyword/) looks for multiple space characters of any sort, followed by a space, followed by ADDRESS—the contents of $keyword. There are no occurrences of ADDRESS preceded by whitespace anywhere in your data
You have also written far too much without testing anything. You should start by writing your read loop on its own and make sure that the data is coming in correctly before proceeding by adding two or three lines of code at a time between tests. Writing 90% of the functionality before testing is a very bad approach.
In future, to help you address problems like this, I would point you to the excellent resources linked on the Stack Overflow Perl tag information page
The only slightly obscure thing here is that the range operator /$start/i .. /$end/i returns a useful value; I have copied it into $status. The first time the operator matches, the result will be 1; the second time 2 etc. The last time is different because it is a string that uses engineering notation like 9E0, so it still evaluates to the correct count but you can check for the last match using /E/. I've used == 1 and /E/ to avoid pushing the begin and end lines onto #block
I don't think there's anything else overly complex here that you can't find described in the Perl language reference
use strict;
use warnings;
use autodie; # Handle bad IO status automatically
use List::Util 'max';
my ($input, $output) = qw/ input.log output.txt /;
open my $in_fh, '<', $input;
my ( #block, #vars );
while ( <$in_fh> ) {
my $status = m{^/begin CHECK}i .. m{^/end CHECK}i;
if ( $status =~ /E/ ) { # End line
#block = grep /\S/, #block;
chomp #block;
my $var = $block[0];
my $addr;
for ( #block ) {
if ( /^ADDRESS\s+(0x\w+)/ ) {
$addr = $1;
last;
}
}
push #vars, [ $var, $addr ];
#block = ();
}
elsif ( $status ) {
push #block, $_ unless $status == 1;
}
}
# Format and generate the output
open my $out_fh, '>', $output;
my $w = max map { length $_->[0] } #vars;
printf $out_fh "%-*s => %s\n", $w, #$_ for [qw/ Name Address / ], #vars;
close $out_fh;
output
Name => Address
Var_AAA => 0xFF0011
Var_BBB => 0xFF0022
Update
For what it's worth, I would have written something like this. It produces the same output as above
use strict;
use warnings;
use autodie; # Handle bad IO status automatically
use List::Util 'max';
my ($input, $output) = qw/ input.log output.txt /;
my $data = do {
open my $in_fh, '<', $input;
local $/;
<$in_fh>;
};
my #vars;
while ( $data =~ m{^/begin CHECK$(.+?)^/end CHECK$}gms ) {
my $block = $1;
next unless $block =~ m{(\w+).+?ADDRESS\s+(0x\w+)}ms;
push #vars, [ $1, $2 ];
}
open my $out_fh, '>', $output;
my $w = max map { length $_->[0] } #vars;
printf $out_fh "%-*s => %s\n", $w, #$_ for [qw/ Name Address / ], #vars;
close $out_fh;

print hashes with values from different files

I want to create output file that has values from file 1 and file 2.
The line from file 1:
chr1 Cufflinks exon 708356 708487 1000 - .
gene_id "CUFF.3"; transcript_id "CUFF.3.1"; exon_number "5"; FPKM
"3.1300591420"; frac "1.000000"; conf_lo "2.502470"; conf_hi
"3.757648"; cov "7.589085"; chr1Cufflinks exon 708356
708487 . - . gene_id "XLOC_001284"; transcript_id
"TCONS_00007667"; exon_number "7"; gene_name "LOC100288069"; oId
"CUFF.15.2"; nearest_ref "NR_033908"; class_code "j"; tss_id
"TSS2981";
The line from file 2:
CUFF.48557
chr4:160253850-160259462:160259621-160260265:160260507-160262715
The second column from this file is unique id (uniq_id).
I want to get output file in the following format:
transcript_id(CUFF_id) uniq_id gene_id(XLOC_ID) FPKM
My script takes XLOC_ID and FPKM values from first file and print them together with two columns from the second file.
#!/usr/bin/perl -w
use strict;
my $v_merge_gtf = shift #ARGV or die $!;
my $unique_gtf = shift #ARGV or die $!;
my %fpkm_hash;
my %xloc_hash;
open (FILE, "$v_merge_gtf") or die $!;
while (<FILE>) {
my $line = $_;
chomp $line;
if ($line =~ /[a-z]/) {
my #array = split("\t", $line);
if ($array[2] eq 'exon') {
my $id = $array[8];
if ($id =~ /transcript_id \"(CUFF\S+)/) {
$id = $1;
$id =~ s/\"//g;
$id =~ s/;//;
}
my $fpkm = $array[8];
if ($fpkm =~ /FPKM \"(\S+)/) {
$fpkm = $1;
$fpkm =~ s/\"//g;
$fpkm =~ s/;//;
}
my $xloc = $array[17];
if ($xloc =~ /gene_id \"(XLOC\S+)/) {
$xloc = $1;
$xloc =~ s/\"//g;
$xloc =~ s/;//;
}
$fpkm_hash{$id} = $fpkm;
$xloc_hash{$id} = $xloc;
}
}
}
close FILE;
open (FILE, "$unique_gtf") or die $!;
while (<FILE>) {
my $line = $_;
chomp $line;
if ($line =~ /[a-z]/) {
my #array = split("\t", $line);
my $id = $array[0];
my $uniq = $array[1];
print $id . "\t" . $uniq . "\t" . $xloc_hash{$id} . "\t" . $fpkm_hash{$id} . "\n";
}
}
close FILE;
I initialized hashes outside of the files, but I get the following error for each CUFF values:
CUFF.24093
chr17:3533641-3539345:3527526-3533498:3526786-3527341:3524707-3526632
Use of uninitialized value in concatenation (.) or string at ex_1.pl
line 55, line 9343.
Use of uninitialized value in concatenation (.) or string at ex_1.pl
line 55, line 9343.
How can I fix this issue?
Thank you!
I think the warning message is because the $id key, (CUFF.24093), you get on line 9343 of the second file isn't contained in the hashes you created in the first file.
Is it possible that an ID in the second file isn't contained in the first file? That seems to be the case here.
If so, and you just want to skip over this unknown ID, you could add a line to your program like:
my $id = $array[0];
my $uniq = $array[1];
next unless exists $fpkm_hash{$id}; # add this line
print $id . "\t" . $uniq . "\t" . $xloc_hash{$id} . "\t" . $fpkm_hash{$id} . "\n";
This will bypass the following print statement and go back to the top of the while loop and read in the next line and continue processing.
It depends on what action you want to take if you encounter an unknown ID.
Update: I thought I might make some observations/improvements to your code.
my $v_merge_gtf = shift #ARGV or die $!;
my $unique_gtf = shift #ARGV or die $!;
The error variable $! serves no purpose here (this is a fact I only recently discovered even after 14 years using Perl). $! is only set for system calls, (where you are involving the operating system).The most common are open and close for files, and opendir and closedir for directories. If an error occurs in opening/closing a file or a directory, $! will contain the error message. (See in my included code how I handled this - I created a message, $usage to be printed if the shift didn't succeed.
Instead of using 2 hashes to store the information, I used 1 hash,%data. The advantage is that it will use less memory, (because its only storing 1 set of keys instead of 2), Though, you could use the 2 if you like.
I used the recommended 3 argument (filehandle, mode, filename) form for opening the files. The 2 argument approach you used is outdated and less safe (for reasons I won't go into detail here). Also, the lexical filehandles I used, my $mrg and my $unique are the newer ways to create filehandles (instead of usingFILEfor your 2 opens).
You can directly assign to $linein your while loop like while (my $line = <FILE>) instead of the way you did it. In my sample program, I didn't assign to $line, but instead relied on the default variable $_. (It simplifies the 2 following statements, next unless /\S/; my #array = split /\t/;). I didn't chomp for the first file because you're only parsing inside the string and aren't using anything from the end of the string.chomp is necessary for the second while loop because the second variable my $uniq = ... would have a newline at its end if it wasn't removed by chomp.
I didn't know what you meant by this statement, if ($line =~ /[a-z]/). I am assuming you wanted to check for empty lines and only process lines with non-space data. That's why I wrote next unless /\S/;instead. (says to skip the following statements and got to the top of the while loop and read the next record).
Your first while loop worked because you had no errors in your input file. If there had errors, the way you wrote the code could have been a problem.
The statementmy $id = $array[8]; gives $id a value that would have been wrongly used if the following if statement had been false. (The same thing for the 2 other variables you want to capture,$fpkm and $xloc). You can see in my code example how I handled this.
In my code, I died if the match didn't succeed, You might not want todie but say match or next to try the next line of data. It depends on how you would want to handle a failed match.
And in this line$array[8] =~ /gene_id "(CUFF\S+)";/, Note that I put the ";following the captured data, so there is no need to remove it from the captured data (as you did in your substitutions)
Well, I know this is a long comment on your code, but I hope you get some good ideas about why I recommended the changes given.
or die "Could not find ID in $v_merge_gtf (line# $.)";
$. is the line number of the file being read.
#!/usr/bin/perl
use warnings;
use strict;
my $usage = "USAGE: perl $0 merge_gtf_file unique_gtf_file\n";
my $v_merge_gtf = shift #ARGV or die $usage;
my $unique_gtf = shift #ARGV or die $usage;
my %data;
open my $mrg, '<', $v_merge_gtf or die $!;
while (<$mrg>) {
next unless /\S/;
my #array = split /\t/;
if ($array[2] eq 'exon') {
$array[8] =~ /gene_id "(CUFF\S+)";/
or die "Could not find ID in $v_merge_gtf (line# $.)";
my $id = $1;
$array[8] =~ /FPKM "(\S+)";/
or die "Could not find FPKM in $v_merge_gtf (line# $.)";
my $fpkm = $1;
$array[17] =~ /gene_id "(XLOC\S+)";/
or die "Could not find XLOC in $v_merge_gtf (line# $.)";
my $xloc = $1;
$data{$id}{fpkm} = $fpkm;
$data{$id}{xloc} = $xloc;
}
}
close $mrg or die $!;
open my $unique, '<', $unique_gtf or die $!;
while (<$unique>) {
next unless /\S/;
chomp;
my ($id, $uniq) = split /\t/;
print join("\t", $id, $uniq, $data{$id}{fpkm}, $data{$id}{xloc}), "\n";
}
close $unique or die $!;

Program in Perl that reads from file, finds a line containing specific character and prints them. × 22510

I have been learning Perl for a few days and I am completely new.
The code is supposed to read from a big file and if a line contains "warning" it should store it and print it on a new line and also count the number of appearances of each type of warning. There are different types of warnings in the file e.g "warning GR145" or "warning GT10" etc.
So I want to print something like
Warning GR145 14 warnings
Warning GT10 12 warnings
and so on
The problem is that when I run it, it doesnt print the whole list of warnings.
I will appreciate your help. Here is the code:
use strict;
use warnings;
my #warnings;
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if($line =~ /warning ([a-zA-Z0-9]*):/) {
push (#warnings, $line);
print $1 ,"\n";
}
}
close $file;
You are using case sensitive matching in your if statement. Try adding a /i:
if($line =~ /warning ([a-z0-9]*):/i)
EDIT: I misread the actual question, so this answer could be ignored...
You need to use a hash array, a mapping from warning string to occurrence count.
use strict;
use warnings;
my %warnings = {};
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if ($line =~ /warning ([a-zA-Z0-9]*)\:.*/) {
++$warnings{$1};
}
}
close $file;
foreach $w (keys %warnings) {
print $w, ": ", $warnings{$w}, "\n";
}

Alternative to foreach loop with hashes in perl

I have two files, one with text and another with key / hash values. I want to replace occurrences of the key with the hash values. The following code does this, what I want to know is if there is a better way than the foreach loop I am using.
Thanks all
Edit: I know it is a bit strange using
s/\n//;
s/\r//;
instead of chomp, but this works on files with mixed end of line characters (edited both on windows and linux) and chomp (I think) does not.
File with key / hash values (hash.tsv):
strict $tr|ct
warnings w#rn|ng5
here h3r3
File with text (doc.txt):
Do you like use warnings and strict?
I do not like use warnings and strict.
Do you like them here or there?
I do not like them here or there?
I do not like them anywhere.
I do not like use warnings and strict.
I will not obey your good coding practice edict.
The perl script:
#!/usr/bin/perl
use strict;
use warnings;
open (fh_hash, "<", "hash.tsv") or die "could not open file $!";
my %hash =();
while (<fh_hash>)
{
s/\n//;
s/\r//;
my #tmp_hash = split(/\t/);
$hash{ #tmp_hash[0] } = #tmp_hash[1];
}
close (fh_hash);
open (fh_in, "<", "doc.txt") or die "could not open file $!";
open (fh_out, ">", "doc.out") or die "could not open file $!";
while (<fh_in>)
{
foreach my $key ( keys %hash )
{
s/$key/$hash{$key}/g;
}
print fh_out;
}
close (fh_in);
close (fh_out);
One problem with
for my $key (keys %hash) {
s/$key/$hash{$key}/g;
}
is it doesn't correctly handle
foo => bar
bar => foo
Instead of swapping, you end up with all "foo" or all "bar", and you can't even control which.
# Do once, not once per line
my $pat = join '|', map quotemeta, keys %hash;
s/($pat)/$hash{$1}/g;
You might also want to handle
foo => bar
food => baz
by taking the longest rather than possibly ending with "bard".
# Do once, not once per line
my $pat =
join '|',
map quotemeta,
sort { length($b) <=> length($a) }
keys %hash;
s/($pat)/$hash{$1}/g;
You can read a whole file into a variable a replace all occurrences at once for each key-val.
Something like:
use strict;
use warnings;
use YAML;
use File::Slurp;
my $href = YAML::LoadFile("hash.yaml");
my $text = read_file("text.txt");
foreach (keys %$href) {
$text =~ s/$_/$href->{$_}/g;
}
open (my $fh_out, ">", "doc.out") or die "could not open file $!";
print $fh_out $text;
close $fh_out;
produces:
Do you like use w#rn|ng5 and $tr|ct?
I do not like use w#rn|ng5 and $tr|ct.
Do you like them h3r3 or th3r3?
I do not like them h3r3 or th3r3?
I do not like them anywh3r3.
I do not like use w#rn|ng5 and $tr|ct.
I will not obey your good coding practice edict.
For shorting a code i used YAML and replaced your input file with:
strict: $tr|ct
warnings: w#rn|ng5
here: h3r3
and used File::Slurp for reading a whole file into a variable. Of course, you can "slurp" the file without File::Slurp, for example with:
my $text;
{
local($/); #or undef $/;
open(my $fh, "<", $file ) or die "problem $!\n";
$text = <$fh>;
close $fh;
}

How can I check if a value is in a list in Perl?

I have a file in which every line is an integer which represents an id. What I want to do is just check whether some specific ids are in this list.
But the code didn't work. It never tells me it exists even if 123 is a line in that file. I don't know why? Help appreciated.
open (FILE, "list.txt") or die ("unable to open !");
my #data=<FILE>;
my %lookup =map {chop($_) => undef} #data;
my $element= '123';
if (exists $lookup{$element})
{
print "Exists";
}
Thanks in advance.
You want to ensure you make your hash correctly. The very outdated chop isn't what you want to use. Use chomp instead, and use it on the entire array at once and before you create the hash:
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
chomp( my #data = <$fh> );
my $hash = map { $_, 1 } #data;
With Perl 5.10 and up, you can also use the smart match operator:
my $id = get_id_to_check_for();
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
chomp( my #data = <$fh> );
print "Id found!" if $id ~~ #data;
perldoc -q contain
chop returns the character it chopped, not what was left behind. You perhaps want something like this:
my %lookup = map { substr($_,0,-1) => undef } #data;
However, generally, you should consider using chomp instead of chop to do a more intelligent CRLF removal, so you'd end up with a line like this:
my %lookup =map {chomp; $_ => undef } #data;
Your problem is that chop returns the character chopped, not the resulting string, so you're creating a hash with a single entry for newline. This would be obvious in debugging if you used Data::Dumper to output the resulting hash.
Try this instead:
my #data=<FILE>;
chomp #data;
my %lookup = map {$_ => undef} #data;
This should work... it uses first in List::Util to do the searching, and eliminates the initial map (this is assuming you don't need to store the values for something else immediately after). The chomp is done while searching for the value; see perldoc -f chomp.
use List::Util 'first';
open (my $fh, 'list.txt') or die 'unable to open list.txt!';
my #elements = <$fh>;
my $element = '123';
if (first { chomp; $_ eq $element } #elements)
{
print "Exists";
}
This one may not exactly match your specific problem,
but if your integer numbers need to be
counted, you might even use the good
old "canonical" perl approach:
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
my %lookup;
while( <$fh> ) { chomp; $lookup{$_}++ } # this will count occurences if ints
my $element = '123';
if( exists $lookup{$element} ) {
print "$element $lookup{$element} times there\n"
}
This might even be in some circumstances faster than
solutions with intermediate array.
Regards
rbo