This question already has an answer here:
Perl parsing file content
(1 answer)
Closed 2 months ago.
I am new to perl scripting and have to maintain someone's script. There is a subroutine to parse a config file content (with standard format of three columns).
What is line $info{$name}{$b}{address}=$address used for?
Is it a hash?
How do I access the parsed content in the main code?
For example, foreach name, get the son's name and address.
my $msg="";
my #names;
my %info=parseCfg($file);
foreach $name (#names) {
$msg="-I-: Working on $name\n";
$a=$info{}{}{};
sub parseCfg {
my $file=$_[0];
if (-e $file) {
open (F,"<$file") or die "Fail to open $file\n";
$msg="-I-: Reading from config file: $file\n";
print $msg; print LOG $msg;
my %seen;
while (<F>) {
my ($name,$b,$address)=#fields;
push (#names,$name);
$info{$name}{$b}{address}=$address;
}
close F;
} else {
die "-E-: Missing config file $file\n";
}
return %info;
}
Example of config file:
Format: Name son's_name address
Adam aaa xxx
Billy bbb yyy
Cindy ccc sss
You're recommanded to use use strict; use warnings;, so most of the errors (syntax & compilation) can be avoided and have a clean code.
I just ran your code, and its still having the compilation errors. Suggesting you to paste compiled running code in SO, it would help community to answer your question in faster way.
I have re-written your code and its giving the result as you mentioned - Son Name and Address. This would work only if you have unique Name's in your input file. If two person is having same name with different Son and Addresses this logic needs to be altered.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $file = "/path/to/file/file.txt";
my %info = parseCfg($file);
foreach my $name (keys %info){
print "-I-: Working on $name\n";
print "SON: $info{$name}{'SON'}\n";
print "ADDRESS: $info{$name}{'ADDRESS'}\n";
}
sub parseCfg {
my $file = shift;
my %data;
return if !(-e $file);
open(my $fh, "<", $file) or die "Can't open < $file: $!";
my $msg = "-I-: Reading from config file: $file\n";
print $msg; #print LOG $msg;
my %seen;
while (<$fh>) {
my #fields = split(" ", $_);
my ($name, $b, $address) = #fields;
$data{$name}{'SON'} = $b;
$data{$name}{'ADDRESS'} = $address;
}
close $fh;
return %data;
}
Result:
-I-: Reading from config file: /path/to/file/file.txt
-I-: Working on Adam
SON: aaa
ADDRESS: xxx
-I-: Working on Billy
SON: bbb
ADDRESS: yyy
-I-: Working on Cindy
SON: ccc
ADDRESS: sss
Hope it helps you.
Related
I'm incredibly new to Perl, and never have been a phenomenal programmer. I have some successful BVA routines for controlling microprocessor functions, but never anything embedded, or multi-facted. Anyway, my question today is about a boggle I cannot get over when trying to figure out how to remove duplicate lines of text from a text file I created.
The file could have several of the same lines of txt in it, not sequentially placed, which is problematic as I'm practically comparing the file to itself, line by line. So, if the first and third lines are the same, I'll write the first line to a new file, not the third. But when I compare the third line, I'll write it again since the first line is "forgotten" by my current code. I'm sure there's a simple way to do this, but I have issue making things simple in code. Here's the code:
my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";
open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
while (<FILE>) {
print "$_";
print "$searchString\n";
if (($_ =~ /$searchString/) and ($count == "0")) {
++ $count;
print FILE2 $_;
} else {
print "This isn't working\n";
}
}
close (FILE);
close (FILE2);
Excuse the way filehandles and scalars do not match. It is a work in progress... :)
The secret of checking for uniqueness, is to store the lines you have seen in a hash and only print lines that don't exist in the hash.
Updating your code slightly to use more modern practices (three-arg open(), lexical filehandles) we get this:
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
open my $in_fh, '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";
my %seen;
while (<$in_fh>) {
print $out_fh unless $seen{$_}++;
}
But I would write this as a Unix filter. Read from STDIN and write to STDOUT. That way, your program is more flexible. The whole code becomes:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
Assuming this is in a file called my_filter, you would call it as:
$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt
Update: But this doesn't use your $searchString variable. It's not clear to me what that's for.
If your file is not very large, you can store each line readed from the input file as a key in a hash variable. And then, print the hash keys (ordered). Something like that:
my %lines = ();
my $order = 1;
open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
$lines {$line} = $order++;
}
close $fhi;
open my $fho, ">", $file3 or die "Cannot open file: $!";
#Sort the keys, only if needed
my #ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( #ordered_lines ) {
print $fho $key;
}
close $fho;
You need two things to do that:
a hash to keep track of all the lines you have seen
a loop reading the input file
This is a simple implementation, called with an input filename and an output filename.
use strict;
use warnings;
open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";
my %seen;
while (my $line = <$fh_in>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $fh_out $line;
}
# remember this line
$seen{$line}++;
}
To test it, I've included it with the DATA handle as well.
use strict;
use warnings;
my %seen;
while (my $line = <DATA>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $line;
}
# remember this line
$seen{$line}++;
}
__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world
This will print
foo
bar
asdf
asdfg
hello world
Keep in mind that the memory consumption will grow with the file size. It should be fine as long as the text file is smaller than your RAM. Perl's hash memory consumption grows a faster than linear, but your data structure is very flat.
I prepared the following script that takes a GI ID number from NCBI that I prepared in my tsv file and prints the scientific name associated with the ID:
#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::Taxonomy;
my ($filename) = #ARGV;
open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
while(<>) {
my ($taxonid, $counts) = (split /\t/);
for my $each($taxonid) {
print "$each\n";
my $db = Bio::DB::Taxonomy->new(-source => 'entrez');
my $taxon = $db->get_taxon(-taxonid => $taxonid);
print "Taxon ID is $taxon->id, \n";
print "Scientific name is ", $taxon->scientific_name, "\n";
}
}
With this script, I receive the following:
1760
Taxon ID is Bio::Taxon=HASH(0x33a91f8)->id,
Scientific name is Actinobacteria
What I want to do
Now the next step is for me to list the full taxon path of the bacteria in question. So for the above example, I want to see k__Bacteria; p__ Actinobacteria; c__ Actinobacteria as output. Furthermore, I want the GI IDs on my table to be repliaced with this full taxon path.
In which direction should I go?
First, I notice you open $filename which is your first command line argument, but you don't use the file pointer $fh you created.
So, these two lines are not needed in your case because you already do the trick with <>
my ($filename) = #ARGV;
open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
Next. I don't know what is inside your filename and your database so I cannot help you more. Can you provide an example of what is inside your database and your file?
One more thing, what I can see here is that you may not need to create your $db instance inside the loop.
#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::Taxonomy;
my $db = Bio::DB::Taxonomy->new(-source => 'entrez');
while(<>) {
my ($taxonid, $counts) = (split /\t/);
for my $each($taxonid) {
print "$each\n";
my $taxon = $db->get_taxon(-taxonid => $taxonid);
print "Taxon ID is $taxon->id, \n";
print "Scientific name is ", $taxon->scientific_name, "\n";
}
}
Edit
From your command Is is hard to help you. When you write
my $taxon = $db->get_taxon(-taxonid => $taxonid);
You receive a Bio::Taxon node where the documentation ca be found here
I don't know what k__Bacteria; p__ Actinobacteria; c__ Actinobacteria representy for you. Is it an information offered by a Bio::Taxon node?
Anyway, you can still explore $taxon with this:
#!/usr/bin/env perl
# Author: Yves Chevallier
# Date:
use strict;
use warnings;
use Data::Dumper;
use Bio::DB::Taxonomy;
my $db = Bio::DB::Taxonomy->new(-source => 'entrez');
while(<DATA>) {
my ($taxonid, $counts) = (split /\t/);
for my $each($taxonid) {
print "$each\n";
my $taxon = $db->get_taxon(-taxonid => $taxonid);
print Dumper $taxon;
print "Taxon ID is $taxon->id, \n";
print "Scientific name is ", $taxon->scientific_name, "\n";
}
}
__DATA__
12 1760
I have been learning Perl for a few days and I am completely new.
The code is supposed to read from a big file and if a line contains "warning" it should store it and print it on a new line and also count the number of appearances of each type of warning. There are different types of warnings in the file e.g "warning GR145" or "warning GT10" etc.
So I want to print something like
Warning GR145 14 warnings
Warning GT10 12 warnings
and so on
The problem is that when I run it, it doesnt print the whole list of warnings.
I will appreciate your help. Here is the code:
use strict;
use warnings;
my #warnings;
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if($line =~ /warning ([a-zA-Z0-9]*):/) {
push (#warnings, $line);
print $1 ,"\n";
}
}
close $file;
You are using case sensitive matching in your if statement. Try adding a /i:
if($line =~ /warning ([a-z0-9]*):/i)
EDIT: I misread the actual question, so this answer could be ignored...
You need to use a hash array, a mapping from warning string to occurrence count.
use strict;
use warnings;
my %warnings = {};
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if ($line =~ /warning ([a-zA-Z0-9]*)\:.*/) {
++$warnings{$1};
}
}
close $file;
foreach $w (keys %warnings) {
print $w, ": ", $warnings{$w}, "\n";
}
I have two files. one file only contains key and another has key and value both. how could i compare a key of one file with value of another?
example of file1
steve
robert
sandy
alex
example of file2
age25, steve
age29, alex
age30, mindy
age50, rokuna
age25, steve
example of output
age25, steve
age29, alex
Here is what i have so far
my $age_name="file1.txt";
my $name="file2.txt";
open my $MYFILE, "<", $name or die "could not open $name \n";
open my $MYFILE2, "<", $age_name or die "could not open $age_name \n";
while(<$MYFILE>) {
my ($key, $value) = split(",");
my $secondfile = <$MYFILE2>;
if ( defined $secondfile ) {
my ($key2, $value2) = split(",");
if ($value2=~m/$key/) {
print "$key2 - $value2 \n";
}
}
}
close $MYFILE;
close $MYFILE2;
You are reading one line from the first file and one line from the second line. The problem is the lines do not have to be related. The classical solution is to read one file into a hash and then use the hash for lookup while reading the second one:
#!/usr/bin/perl
use strict;
use warnings;
my %age_of;
open my $AGE, '<', 'file2.txt' or die $!;
while (<$AGE>) {
chomp;
my ($age, $name) = split /, /;
$age_of{$name} = $age;
}
open my $NAME, '<', 'file1.txt' or die $!;
while (<$NAME>) {
chomp;
print "$age_of{$_}, $_\n" if exists $age_of{$_};
}
I have a simple log file which is very messy and I need it to be neat. The file contains log headers, but they are all jumbled up together. Therefore I need to sort the log files according to the log headers. There are no static number of lines - that means that there is no fixed number of lines for the each header of the text file. And I am using perl grep to sort out the headers.
The Log files goes something like this:
Car LogFile Header
<text>
<text>
<text>
Car LogFile Header
<text>
Car LogFile Header
<and so forth>
I have done up/searched a simple algorithm but it does not seem to be working. Can someone please guide me? Thanks!
#!/usr/bin/perl
#use 5.010; # must be present to import the new 5.10 functions, notice
#that it is 5.010 not 5.10
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(FH, $srce);
my #buf = <FH>;
close(FH);
my #lines = grep (/$string1/, #buffer);
After executing the code, there is no result shown at the terminal. Any ideas?
I think you want something like:
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open my $fh, '<', $srce or die "Could not open $srce: $!";
my #lines = sort grep /\Q$string1/, <$fh>;
print #lines;
Make sure you have the right file path and that the file has lines that match your test pattern.
It seems like you are missing a lot of very basic concepts and maybe cutting and paste code you see elsewhere. If you're just starting out, pick up a Perl tutorial such as Learning Perl. There are other books and references listed in perlfaq2.
Always use:
use strict;
use warnings;
This would have told you that #buffer is not defined.
#!/usr/bin/perl
use strict;
use warnings;
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(my $FH, $srce) or die "Failed to open file $srce ($!)";
my #buf = <$FH>;
close($FH);
my #lines = grep (/$string1/, #buf);
print #lines;
Perl is tricky for experts, so experts use the warnings it provides to protect them from making mistakes. Beginners need to use the warnings so they don't make mistakes they don't even know they can make.
(Because you didn't get a chance to chomp the input lines, you still have newlines at the end so the print prints the headings one per line.)
I don't think grep is what you want really.
As you pointed out in brian's answer, the grep will only give you the headers and not the subsequent lines.
I think you need an array where each element is the header and the subsequent lines up to the next header.
Something like: -
#!/usr/bin/perl
use strict;
use warnings;
my $srce = "./default.log";
my $string1 = "Car LogFile Header";
my #logs;
my $log_entry;
open(my $FH, $srce) or die "Failed to open file $srce ($!)";
my $found = 0;
while(my $buf = <$FH>)
{
if($buf =~ /$string1/)
{
if($found)
{
push #logs, $log_entry;
}
$found = 1;
$log_entry = $buf;
}
else
{
$log_entry = $log_entry . $buf;
}
}
if($found)
{
push #logs, $log_entry;
}
close($FH);
print sort #logs;
i think it's what is being asked for.
Perl grep is not same as Unix grep command in that it does not print anything on the screen.
The general syntax is: grep Expr, LIST
Evaluates Expr for each element of LIST and returns a list consisting of those elements for which the expression evaluated to true.
In your case all the #buffer elements which have the vale of $string1 will be returned.
You can then print the #buffer array to actually see them.
You just stored everything in an array instead of printing it out. It's also not necessary to keep the whole file in memory. You can read and print the match results line by line, like this:
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(FH, $srce);
while(my $line = <FH>) {
if($line =~ m/$string1/) {
print $line;
}
}
close FH;
Hello I found a way to extract links from html file
!/usr/bin/perl -w
2
3 # Links graber 1.0
2
3 # Links graber 1.0
4 #Author : peacengell
5 #28.02.13
6
7 ####
8
9 my $file_links = "links.txt";
10 my #line;
11 my $line;
12
13
14 open( FILE, $file_links ) or die "Can't find File";
15
16 while (<FILE>) {
17 chomp;
18 $line = $_ ;
19
20 #word = split (/\s+/, $line);
21 #word = grep(/href/, #word);
22 foreach $x (#word) {
23
24 if ( $x =~ m /ul.to/ ){
25 $x=~ s/href="//g;
26 $x=~s/"//g;
27 print "$x \n";
28
29
30 }
31
32 }
33
34 }
you can use it and modify it please let me know if you modify it.