I have two files, one with text and another with key / hash values. I want to replace occurrences of the key with the hash values. The following code does this, what I want to know is if there is a better way than the foreach loop I am using.
Thanks all
Edit: I know it is a bit strange using
s/\n//;
s/\r//;
instead of chomp, but this works on files with mixed end of line characters (edited both on windows and linux) and chomp (I think) does not.
File with key / hash values (hash.tsv):
strict $tr|ct
warnings w#rn|ng5
here h3r3
File with text (doc.txt):
Do you like use warnings and strict?
I do not like use warnings and strict.
Do you like them here or there?
I do not like them here or there?
I do not like them anywhere.
I do not like use warnings and strict.
I will not obey your good coding practice edict.
The perl script:
#!/usr/bin/perl
use strict;
use warnings;
open (fh_hash, "<", "hash.tsv") or die "could not open file $!";
my %hash =();
while (<fh_hash>)
{
s/\n//;
s/\r//;
my #tmp_hash = split(/\t/);
$hash{ #tmp_hash[0] } = #tmp_hash[1];
}
close (fh_hash);
open (fh_in, "<", "doc.txt") or die "could not open file $!";
open (fh_out, ">", "doc.out") or die "could not open file $!";
while (<fh_in>)
{
foreach my $key ( keys %hash )
{
s/$key/$hash{$key}/g;
}
print fh_out;
}
close (fh_in);
close (fh_out);
One problem with
for my $key (keys %hash) {
s/$key/$hash{$key}/g;
}
is it doesn't correctly handle
foo => bar
bar => foo
Instead of swapping, you end up with all "foo" or all "bar", and you can't even control which.
# Do once, not once per line
my $pat = join '|', map quotemeta, keys %hash;
s/($pat)/$hash{$1}/g;
You might also want to handle
foo => bar
food => baz
by taking the longest rather than possibly ending with "bard".
# Do once, not once per line
my $pat =
join '|',
map quotemeta,
sort { length($b) <=> length($a) }
keys %hash;
s/($pat)/$hash{$1}/g;
You can read a whole file into a variable a replace all occurrences at once for each key-val.
Something like:
use strict;
use warnings;
use YAML;
use File::Slurp;
my $href = YAML::LoadFile("hash.yaml");
my $text = read_file("text.txt");
foreach (keys %$href) {
$text =~ s/$_/$href->{$_}/g;
}
open (my $fh_out, ">", "doc.out") or die "could not open file $!";
print $fh_out $text;
close $fh_out;
produces:
Do you like use w#rn|ng5 and $tr|ct?
I do not like use w#rn|ng5 and $tr|ct.
Do you like them h3r3 or th3r3?
I do not like them h3r3 or th3r3?
I do not like them anywh3r3.
I do not like use w#rn|ng5 and $tr|ct.
I will not obey your good coding practice edict.
For shorting a code i used YAML and replaced your input file with:
strict: $tr|ct
warnings: w#rn|ng5
here: h3r3
and used File::Slurp for reading a whole file into a variable. Of course, you can "slurp" the file without File::Slurp, for example with:
my $text;
{
local($/); #or undef $/;
open(my $fh, "<", $file ) or die "problem $!\n";
$text = <$fh>;
close $fh;
}
Related
Can't call method print on an undefined value in line 40 line 2.
Here is the code. I use FileHandle to settle files:
#!/usr/bin/perl
use strict;
use warnings;
use FileHandle;
die unless (#ARGV ==4|| #ARGV ==5);
my #input =();
$input[0]=$ARGV[3];
$input[1]=$ARGV[4] if ($#ARGV==4);
chomp #input;
$input[0] =~ /([^\/]+)$/;
my $out = "$1.insert";
my $lane= "$1";
my %fh=();
open (Info,">$ARGV[1]") || die "$!";
open (AA,"<$ARGV[0]") || die "$!";
while(<AA>){
chomp;
my #inf=split;
my $iden=$inf[0];
my $outputfile="$ARGV[2]/$iden";
$fh{$iden}=FileHandle->new(">$outputfile");
}
close AA;
foreach my $input (#input) {
open (IN, "<$input" ) or die "$!" ;
my #path=split (/\//,$input);
print Info "#$path[-1]\n";
while (<IN>) {
my $line1 = $_;
my ($id1,$iden1) = (split "\t", $line1)[6,7];
my $line2 = <IN> ;
my ($id2,$iden2) = (split "\t", $line2)[6,7];
if ($id1 eq '+' && $id2 eq '-') {
my #inf=split(/\t/,$line1);
$fh{$iden1}->print($line1);
$fh{$iden2}->print($line2);
}
}
close IN;
}
I’ve tried multiple variations of this, but none of them seem to work. Any ideas?
Please remember that the primary worth of a Stack Overflow post is not to fix your particular problem, but to help the thousands of others who may be stuck in the same way. With that in mind, "I fixed it, thanks, bye" is more than a little selfish
As I said in my comment, using open directly on a hash element is much preferable to involving FileHandle. Perl will autovivify the hash element and create a file handle for you, and most people at all familiar with Perl will thank you for not making them read up again on the FileHandle documentation
I rewrote your code like this, which is much more Perlish and relies less on "magic numbers" to access #ARGV. You should really assign #ARGV to a list of named scalars, or - better still - use Getopt::Long so that they are named anyway
You should open your file handles as late as possible, and close the output handles early. This is effected most easily by using lexical file handles and limiting their scope to a block. Perl will implicitly close lexical handles for you when they go out of scope
There is no need to chomp the contents of #ARGVunless you could be be called under strange and errant circumstances, in which case you need to do a hell of a lot more to verify the input
You never use the result of $input[0] =~ /([^\/]+)$/ or the variables $out and $lane, so I removed them
#!/usr/bin/perl
use strict;
use warnings 'all';
# $ARGV[0] -- input file
# $ARGV[1] -- output log file
# $ARGV[2] -- directory for outputs per ident
# $ARGV[3] -- 1, $input[0]
# $ARGV[4] -- 2, $input[1] or undef
die "Fix the parameters" unless #ARGV == 4 or #ARGV == 5;
my #input = #ARGV[3,4];
my %fh;
{
open my $fh, '<', $ARGV[0] or die $!;
while ( <$fh> ) {
my $id = ( split )[0];
my $outputfile = "$ARGV[2]/$id";
open $fh{$id}, '>', $outputfile or die qq{Unable to open "$outputfile" for output: $!};
}
}
open my $log_fh, '>', $ARGV[1] or die qq{Unable to open "$ARGV[1]" for output: $!};
for my $input ( #input ) {
next unless $input; # skip unspecified parameters
my #path = split qr|/|, $input; # Really should be done by File::Spec
print $log_fh "#$path[-1]\n"; # Or File::Basename
open my $fh, '<', $input or die qq{Unable to open "$input" for input: $!};
while ( my $line0 = <$fh> ) {
chomp $line0;
my $line1 = <$fh>;
chomp $line1;
my ($id0, $iden0) = (split /\t/, $line0)[6,7];
my ($id1, $iden1) = (split /\t/, $line1)[6,7];
if ( $id0 eq '+' and $id1 eq '-' ) {
$fh{$_} or die qq{No output file for "$_"} for $iden0, $iden1;
print { $fh{$iden0} } $line0;
print { $fh{$iden1} } $line1;
}
}
}
while ( my ($iden, $fh) = each %fh ) {
close $fh or die qq{Unable to close file handle for "$iden": $!};
}
You don't have any error handling on this line:
$fh{$iden}=FileHandle->new(">$outputfile");
It's possible that opening a filehandle is silently failing, and only producing an error when you try to print to it. For example, if you have specified an invalid filename.
Also, you never check if $iden1 and $iden2 are names of open filehandles that actually exist. It's possible one of them does not exist.
In particular, you aren't removing a newline from $line1, so if $iden1 and $iden2 happen to be the last values on the line, this will be included in the name you are trying to use, and it will fail.
In your first while loop, you set up a hash of filehandles that you will write to later. The keys in this hash are the "iden" strings from the first file passed to the program.
Later, you parse another file and use the "iden" values in that file to choose which filehandle to write data to. But one (or more) of the "iden" values in the second file is missing from the first file. So that filehandle can't be found in the %fh hash. Because you don't check for that, you get `undef back from the hash and you can't print to an undefined filehandle.
To fix it, put a check before trying to use one of the filehandles from the %fh hash.
die "Unknown fh identifier '$iden1'" unless exists $fh{$iden1};
die "Unknown fh identifier '$iden2'" unless exists $fh{$iden2};
$fh{$iden1}->print($line1);
$fh{$iden2}->print($line2);
i have a very simple perl question regarding pattern matching problem.
I am reading file with a list of names (fileA).
I would like to check if any of these names exist in another file (fileB).
if ($name -e $fileB){
do something
}else{
do something else
}
it is in a way to check if a pattern exists in a file.
I have tried
open(IN, $controls) or die "Can't open the control file\n";
while(my $line = <IN>){
if ($name =~ $line ){
print "$name\tfound\n";
}else{
print "$name\tnotFound\n";
}
}
This is repeating itself as it checks and prints every entry rather than checking whether the name exists or not.
When you are doing compare one list to another, you're interested in hashes. A hash is an array that is keyed and the list itself has no order. A hash can only have a single instance of a particular key (but different keys can have the same data).
What you can do is go through the first file, and create a hash keyed by that line. Then, you go through the second folder and check to see if any of those lines match any keys in your hash:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie; #You don't have to check if "open" fails.
use constant {
FIRST_FILE => 'file1.txt',
SECOND_FILE => 'file2.txt',
};
open my $first_fh, "<", FIRST_FILE;
# Get each line as a hash key
my %line_hash;
while ( my $line = <$first_fh> ) {
chomp $line;
$line_hash{$line} = 1;
}
close $first_fh;
Now each line is a key in your hash %line_hash. The data really doesn't matter. The important part is the value of the key itself.
Now that I have my hash of the lines in the first file, I can read in the second file and see if that line exists in my hash:
open my $second_fh, "<", SECOND_FILE;
while ( my $line = <$second_fh> ) {
chomp $line;
if ( exists $line_hash{$line} ) {
say qq(I found "$line" in both files);
}
}
close $second_fh;
There's a map function too that can be used:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie; #You don't have to check if "open" fails.
use constant {
FIRST_FILE => 'file1.txt',
SECOND_FILE => 'file2.txt',
};
open my $first_fh, "<", FIRST_FILE
chomp ( my #lines = <$first_fh> );
# Get each line as a hash key
my %line_hash = map { $_ => 1 } #lines;
close $first_fh;
open my $second_fh, "<", SECOND_FILE;
while ( my $line = <$second_fh> ) {
chomp $line;
if ( exists $line_hash{$line} ) {
say qq(I found "$line" in both files);
}
}
close $second_fh;
I am not a great fan of map because I don't find it that much more efficient and it is harder to understand what is going on.
To check whether a pattern exists in a file, you have to open the file and read its content. The fastest way how to search for inclusion of two lists is to store the content in a hash:
#!/usr/bin/perl
use strict;
use warnings;
open my $LST, '<', 'fileA' or die "fileA: $!\n";
open my $FB, '<', 'fileB' or die "fileB: $!\n";
my %hash;
while (<$FB>) {
chomp;
undef $hash{$_};
}
while (<$LST>) {
chomp;
if (exists $hash{$_}) {
print "$_ exists in fileB.\n";
}
}
I have just given an algorithm kind of code which is not tested.
But i feel this does the job for you.
my #a;
my $matched
my $line;
open(A,"fileA");
open(A,"fileB");
while(<A>)
{
chomp;
push #a,$_;
}
while(<B>)
{
chomp;
$line=$_;
$matched=0;
for(#a){if($line=~/$_/){last;$matched=1}}
if($matched)
{
do something
}
else
{
do something else
}
}
I have been learning Perl for a few days and I am completely new.
The code is supposed to read from a big file and if a line contains "warning" it should store it and print it on a new line and also count the number of appearances of each type of warning. There are different types of warnings in the file e.g "warning GR145" or "warning GT10" etc.
So I want to print something like
Warning GR145 14 warnings
Warning GT10 12 warnings
and so on
The problem is that when I run it, it doesnt print the whole list of warnings.
I will appreciate your help. Here is the code:
use strict;
use warnings;
my #warnings;
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if($line =~ /warning ([a-zA-Z0-9]*):/) {
push (#warnings, $line);
print $1 ,"\n";
}
}
close $file;
You are using case sensitive matching in your if statement. Try adding a /i:
if($line =~ /warning ([a-z0-9]*):/i)
EDIT: I misread the actual question, so this answer could be ignored...
You need to use a hash array, a mapping from warning string to occurrence count.
use strict;
use warnings;
my %warnings = {};
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if ($line =~ /warning ([a-zA-Z0-9]*)\:.*/) {
++$warnings{$1};
}
}
close $file;
foreach $w (keys %warnings) {
print $w, ": ", $warnings{$w}, "\n";
}
I want to write the key and value pair that i have populated in the hash.I am using
open(OUTFILE,">>output_file.txt");
{
foreach my $name(keys %HoH) {
my $values = $HoH{$name};
print "$name: $values\n";
}
}
close(OUTFILE);
Somehow it creates the output_file.txt but it does not write the data to it.What could be the reason?
Use:
print OUTFILE "$name: $values\n";
Without specifying the filehandle in the print statement, you are printing to STDOUT, which is by default the console.
open my $outfile, '>>', "output_file.txt";
print $outfile map { "$_: $HOH{$_}\n" } keys %HoH;
close($outfile);
I cleaned up for code, using the map function here would be more concise. Also I used my variables for the file handles, always good practice. There are still more ways to do this, you should check out Perl Cook book, here
When you open OUTFILE you have a couple of choices for how to write to it. One, you can specify the filehandle in your print statements, or two, you can select the filehandle and then print normally (without specifying a filehandle). You're doing neither. I'll demonstrate:
use strict;
use warnings;
use autodie;
my $filename = 'somefile.txt';
open my( $filehandle ), '>>', $filename;
foreach my $name ( keys %HoH ) {
print $filehandle "$name: $HoH{$name}\n";
}
close $filehandle;
If you were to use select, you could do it this way:
use strict;
use warnings;
use autodie;
my $filename = 'somefile.txt';
open my( $filehandle ), '>>', $filename;
my $oldout = select $filehandle;
foreach my $name( keys %HoH ) {
print "$name: $HoH{$name}\n";
}
close $filehandle;
select $oldout;
Each method has its uses, but more often than not, in the interest of writing clear and easy to read/maintain code, you use the first approach unless you have a real good reason.
Just remember, whenever you're printing to a file, specify the filehandle in your print statement.
sergio's answer of specifying the filehandle is the best one.
Nonetheless there is another way: use select to change the default output filehandle. And in another alternate way to do things, using while ( each ) rather than foreach ( keys ) can be better in some cases (particularly, when the hash is tied to a file somehow and it would take a lot of memory to get all the keys at once).
open(OUTFILE,">>output_file.txt");
select OUTFILE;
while (my ($name, $value) = each %HoH) {
print "$name: $value\n";
}
close(OUTFILE);
I have a file in which every line is an integer which represents an id. What I want to do is just check whether some specific ids are in this list.
But the code didn't work. It never tells me it exists even if 123 is a line in that file. I don't know why? Help appreciated.
open (FILE, "list.txt") or die ("unable to open !");
my #data=<FILE>;
my %lookup =map {chop($_) => undef} #data;
my $element= '123';
if (exists $lookup{$element})
{
print "Exists";
}
Thanks in advance.
You want to ensure you make your hash correctly. The very outdated chop isn't what you want to use. Use chomp instead, and use it on the entire array at once and before you create the hash:
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
chomp( my #data = <$fh> );
my $hash = map { $_, 1 } #data;
With Perl 5.10 and up, you can also use the smart match operator:
my $id = get_id_to_check_for();
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
chomp( my #data = <$fh> );
print "Id found!" if $id ~~ #data;
perldoc -q contain
chop returns the character it chopped, not what was left behind. You perhaps want something like this:
my %lookup = map { substr($_,0,-1) => undef } #data;
However, generally, you should consider using chomp instead of chop to do a more intelligent CRLF removal, so you'd end up with a line like this:
my %lookup =map {chomp; $_ => undef } #data;
Your problem is that chop returns the character chopped, not the resulting string, so you're creating a hash with a single entry for newline. This would be obvious in debugging if you used Data::Dumper to output the resulting hash.
Try this instead:
my #data=<FILE>;
chomp #data;
my %lookup = map {$_ => undef} #data;
This should work... it uses first in List::Util to do the searching, and eliminates the initial map (this is assuming you don't need to store the values for something else immediately after). The chomp is done while searching for the value; see perldoc -f chomp.
use List::Util 'first';
open (my $fh, 'list.txt') or die 'unable to open list.txt!';
my #elements = <$fh>;
my $element = '123';
if (first { chomp; $_ eq $element } #elements)
{
print "Exists";
}
This one may not exactly match your specific problem,
but if your integer numbers need to be
counted, you might even use the good
old "canonical" perl approach:
open my $fh, '<', 'list.txt' or die "unable to open list.txt: $!";
my %lookup;
while( <$fh> ) { chomp; $lookup{$_}++ } # this will count occurences if ints
my $element = '123';
if( exists $lookup{$element} ) {
print "$element $lookup{$element} times there\n"
}
This might even be in some circumstances faster than
solutions with intermediate array.
Regards
rbo