I'm trying to write a Perl script to parse a directory full of emails and names and extract an email address and corresponding name.
At the moment I'm parsing for the word Email Address : and then extracting the line, but this is where I am stuck.
The data is in the following format:
Name :John van
Email Address :john#abc.com
I need to get this data into two variables like $name and $email.
Is there a better way to parse the files to get the email address and name? How do I deal with the strings and re arrange them.
Can anyone help please?
data: (the \n is only implicit for understanding)
Name :John van\n
\n
Email Address :john#abc.com\n
\n
regex based:
use Data::Dumper;
my #data = m/Name\s*:([A-Za-z\s]*)\n\nEmail Address\s*:([A-Za-z\s]*#[A-Za-z\s]*.[A-Za-z]*)\n/g;
print Dumper #data;
will give
$VAR = [
John van,
john#abc.com
]
if you want to do it line based, my approach would be: (not tested - sharpshoot) :)
my #data = (
'Name :john van',
'',
'Email Address :john#abc.com',
''
);
my (#persons, $name, $email);
my $gotName = 0;
my $gotEmail = 0;
while(#data) { # data is your read in filehandle
if (/^Name/) {
$name = $_;
$name =~ s/.*://;
chomp($name);
$gotName++;
}
if (/^Email/) {
$mail= $_;
$mail=~ s/.*://;
chomp($mail);
$gotEmail++;
}
if ($gotName == 1 and $gotEmail == 1) {
push(#persons, ($name,$email));
$gotName = 0;
$gotEmail = 0;
}
}
Is there a better way to parse the files to get the email address and
name?
a better way as which one?
How do I deal with the strings and re arrange them.
what is the question?
There's definitely an easier way of doing this but try:
From Input:
Name: John Van
Email Address: john#abc.com
Name: John Doe
Email Address: johnD#123.com
#!/usr/bin/perl
use warnings;
use strict;
my $emails = 'email.txt';
open my $input, '<', $emails or die "Can't open $emails: $!";
my (%data, #name, #email);
while(<$input>){
push #name, $1 if /Name:\s+(.*)/;
push #email, $1 if /Email Address:\s+(.*)/;
$data{$name[$_]} = $email[$_] for 0 .. $#name;
}
for my $name (keys %data){
my $email = $data{$name};
print "$name\t$email\n"
}
Outputs:
John Doe johnD#123.com
John Van john#abc.com
Related
This question already has an answer here:
Perl parsing file content
(1 answer)
Closed 2 months ago.
I am new to perl scripting and have to maintain someone's script. There is a subroutine to parse a config file content (with standard format of three columns).
What is line $info{$name}{$b}{address}=$address used for?
Is it a hash?
How do I access the parsed content in the main code?
For example, foreach name, get the son's name and address.
my $msg="";
my #names;
my %info=parseCfg($file);
foreach $name (#names) {
$msg="-I-: Working on $name\n";
$a=$info{}{}{};
sub parseCfg {
my $file=$_[0];
if (-e $file) {
open (F,"<$file") or die "Fail to open $file\n";
$msg="-I-: Reading from config file: $file\n";
print $msg; print LOG $msg;
my %seen;
while (<F>) {
my ($name,$b,$address)=#fields;
push (#names,$name);
$info{$name}{$b}{address}=$address;
}
close F;
} else {
die "-E-: Missing config file $file\n";
}
return %info;
}
Example of config file:
Format: Name son's_name address
Adam aaa xxx
Billy bbb yyy
Cindy ccc sss
You're recommanded to use use strict; use warnings;, so most of the errors (syntax & compilation) can be avoided and have a clean code.
I just ran your code, and its still having the compilation errors. Suggesting you to paste compiled running code in SO, it would help community to answer your question in faster way.
I have re-written your code and its giving the result as you mentioned - Son Name and Address. This would work only if you have unique Name's in your input file. If two person is having same name with different Son and Addresses this logic needs to be altered.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $file = "/path/to/file/file.txt";
my %info = parseCfg($file);
foreach my $name (keys %info){
print "-I-: Working on $name\n";
print "SON: $info{$name}{'SON'}\n";
print "ADDRESS: $info{$name}{'ADDRESS'}\n";
}
sub parseCfg {
my $file = shift;
my %data;
return if !(-e $file);
open(my $fh, "<", $file) or die "Can't open < $file: $!";
my $msg = "-I-: Reading from config file: $file\n";
print $msg; #print LOG $msg;
my %seen;
while (<$fh>) {
my #fields = split(" ", $_);
my ($name, $b, $address) = #fields;
$data{$name}{'SON'} = $b;
$data{$name}{'ADDRESS'} = $address;
}
close $fh;
return %data;
}
Result:
-I-: Reading from config file: /path/to/file/file.txt
-I-: Working on Adam
SON: aaa
ADDRESS: xxx
-I-: Working on Billy
SON: bbb
ADDRESS: yyy
-I-: Working on Cindy
SON: ccc
ADDRESS: sss
Hope it helps you.
I'm trying to parse the string that use delimiter '#'
this string has 3 lines
101#Introduction to the Professor#SG_FEEL#QUE_NOIMAGE#
head up to the Great Hall and speak to the professor to check in for class.#
#
102#Looking for Instructors#SG_FEEL#QUE_NOIMAGE#
Look for the Battle Instructor.#
Talk to Battle Instructor#
103#Battle Instructor#SG_FEEL#QUE_NOIMAGE#
You have spoken to the Battle Instructor#
#
how to get each value before delimiter '#' so I can make a new format that look like this
[101] = {
Title = "Introduction to the Professor",
Description = {
"head up to the Great Hall and speak to the professor to check in for class."
},
Summary = ""
},
[102] = {
Title = "Looking for Instructors",
Description = {
"Look for the Battle Instructor."
},
Summary = "Talk to Battle Instructor"
},
[103] = {
Title = "Battle Instructor",
Description = {
"You have spoken to the Battle Instructor"
},
Summary = ""
},
Also there will be multiple data from 101 - n
I'm trying to use split with the code below:
#!/usr/bin/perl
use strict;
use warnings;
my $data = '101#Introduction to the Professor#SG_FEEL#QUE_NOIMAGE#';
my #values = split('#', $data);
foreach my $val (#values) {
print "$val\n";
}
exit 0;
and the output:
101
Introduction to the Professor
SG_FEEL
QUE_NOIMAGE
How to read multiple line data? And also how to exclude some data, for example to match the new format, I don't need SG_FEEL and QUE_NOIMAGE data
The Perl special variable $/ sets the "input record separator"—the string that Perl uses to decide where a line ends. You can set that to something else.
use v5.26;
use utf8;
use strict;
use warnings;
$/ = "\n\n"; # set the input record separator
while( <DATA> ) {
chomp;
say "$. ------\n", $_;
}
__END__
101#Introduction to the Professor#SG_FEEL#QUE_NOIMAGE#
head up to the Great Hall and speak to the professor to check in for class.#
#
102#Looking for Instructors#SG_FEEL#QUE_NOIMAGE#
Look for the Battle Instructor.#
Talk to Battle Instructor#
103#Battle Instructor#SG_FEEL#QUE_NOIMAGE#
You have spoken to the Battle Instructor#
#
The output shows that you read whole records with each call to <DATA>:
1 ------
101#Introduction to the Professor#SG_FEEL#QUE_NOIMAGE#
head up to the Great Hall and speak to the professor to check in for class.#
#
2 ------
102#Looking for Instructors#SG_FEEL#QUE_NOIMAGE#
Look for the Battle Instructor.#
Talk to Battle Instructor#
3 ------
103#Battle Instructor#SG_FEEL#QUE_NOIMAGE#
You have spoken to the Battle Instructor#
#
From there you can parse that record however you need.
Reading multiple lines is easy, see readline:
open my $fh, '<', $filename
or die "Couldn't read '$filename': $!";
my #input = <$fh>;
Now you want to go through all lines and look at what to do with them:
my $linenumber;
my %info; # We want to collect information
while ($linenumber < $#input) {
Each line that starts with nnn# starts a new item:
if( $input[ $linenumber ] =~ /^(\d+)#/ ) {
my #data = split /#/, $input[ $linenumber ];
$info{ number } = $data[0];
$info{ Title } = $data[1];
$linenumber++;
};
Now, read stuff into the description until we encounter an empty line:
while ($input[$linenumber] !~ /^#$/) {
$info{ Description } .= $input[$linenumber];
$linenumber++;
};
$linenumber++; # skip the last "#" line
Now, output the stuff in %info, formatting left as an exercise. I've used qq{} for demonstration purposes. You will want to change that to qq():
print qq{Number: "$info{ number }"\n};
print qq{Title: "$info{ Title }"\n};
print qq(Description: {"$info{ Description }"}\n);
};
Friends need help. Following my INPUT TEXT FILE
Andrew UK
Cindy China
Rupa India
Gordon Australia
Peter New Zealand
To convert the above into hash and to write back into file when the records exist in a directory. I have tried following (it does not work).
#!/usr/perl/5.14.1/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash = ();
my $file = ".../input_and_output.txt";
my $people;
my $country;
open (my $fh, "<", $file) or die "Can't open the file $file: ";
my $line;
while (my $line =<$fh>) {
my ($people) = split("", $line);
$hash{$people} = 1;
}
foreach my $people (sort keys %hash) {
my #country = $people;
foreach my $c (#country) {
my $c_folder = `country/test1_testdata/17.26.6/$c/`;
if (-d $cad_root){
print "Exit\n";
} else {
print "NA\n";
}
}
This is the primary problem:
my ($people) = split("", $line);
Your are splitting using an empty string, and you are assigning the return value to a single variable (which will just end up with the first character of each line).
Instead, you should split on ' ' (a single space character which is a special pattern):
As another special case, ... when the PATTERN is either omitted or a string composed of a single space character (such as ' ' or "\x20" , but not e.g. / /). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.
Limit the number of fields returned to ensure the integrity of country names with spaces:
#!/usr/bin/env perl
use strict;
use warnings;
my #people;
while (my $line = <DATA>) {
$line =~ /\S/ or next;
$line =~ s/\s+\z//;
push #people, [ split ' ', $line, 2 ];
}
use YAML::XS;
print Dump \#people;
__DATA__
Andrew UK
Cindy China
Rupa India
Gordon Australia
Peter New Zealand
The entries are added to an array so 1) The input order is preserved; and 2) Two people with the same name but from different countries do not result in one entry being lost.
If the order is not important, you could just use a hash keyed on country names with people's names in an array reference for each entry. For now, I am going to assume order matters (it would help us help you if you put more effort into formulate a clear question).
One option is to now go through the list of person-country pairs, and print all those pairs for which the directory country/test1_testdata/17.26.6/$c/ exists (incidentally, in your code you have
my $c_folder = `country/test1_testdata/17.26.6/$c/`;
That will try to execute a program called country/test1_testdata/17.26.6/$c/ and save its output in $c_folder if it produces any. To moral of the story: In programming, precision matters. Just because ` looks like ', that doesn't mean you can use one to mean the other.)
Given that your question is focused on hashes, I use an array of references to anonymous hashes to store the list of people-country pairs in the code below. I cache the result of the lookup to reduce the number of times you need to hit the disk.
#!/usr/bin/env perl
use strict;
use warnings;
#ARGV == 2 ? run( #ARGV )
: die_usage()
;
sub run {
my $people_data_file = shift;
my $country_files_location = shift;
open my $in, '<', $people_data_file
or die "Failed to open '$people_data_file': $!";
my #people;
my %countries;
while (my $line = <$in>) {
next unless $line =~ /\S/; # ignore lines consisting of blanks
$line =~ s/\s+\z//;# remove all trailing whitespace
my ($name, $country) = split ' ', $line, 2;
push #people, { name => $name, country => $country };
$countries{ $country } = undef;
}
# At this point, #people has a list of person-country pairs
# We are going to use %countries to reduce the number of
# times we need to check the existence of a given directory,
# assuming that the directory tree is stable while this program
# is running.
PEOPLE:
for my $person ( #people ) {
my $country = $person->{country};
if ($countries{ $country }) {
print join("\t", $person->{name}, $country), "\n";
}
elsif (-d "$country_files_location/$country/") {
$countries{ $country } = 1;
redo PEOPLE;
}
}
}
sub die_usage {
die "Need data file name and country files location\n";
}
Now, there are a bazillion variations on this which is why it is important for you to formulate a clear and concise question so people trying to help you can answer your specific questions, instead of each coming up his/her own solution to the problem as they see it. For example, one could also do this:
#!/usr/bin/env perl
use strict;
use warnings;
#ARGV == 2 ? run( #ARGV )
: die_usage()
;
sub run {
my $people_data_file = shift;
my $country_files_location = shift;
open my $in, '<', $people_data_file
or die "Failed to open '$people_data_file': $!";
my %countries;
while (my $line = <$in>) {
next unless $line =~ /\S/; # ignore lines consisting of blanks
$line =~ s/\s+\z//;# remove all trailing whitespace
my ($name, $country) = split ' ', $line, 2;
push #{ $countries{$country} }, $name;
}
for my $country (keys %countries) {
-d "$country_files_location/$country"
or delete $countries{ $country };
}
# At this point, %countries maps each country for which
# we have a data file to a list of people. We can then
# print those quite simply so long as we don't care about
# replicating the original order of lines from the original
# data file. People's names will still be sorted in order
# of appearance in the original data file for each country.
while (my ($country, $people) = each %countries) {
for my $person ( #$people) {
print join("\t", $person, $country), "\n";
}
}
}
sub die_usage {
die "Need data file name and country files location\n";
}
If what you want is a counter of names in a hash, then I got you, buddy!
I won't attempt the rest of the code because you are checking a folder of records
that I don't have access to so I can't trouble shoot anything more than this.
I see one of your problems. Look at this:
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say'; # Really like using say instead of print because no need for newline.
my $file = 'input_file.txt';
my $fh; # A filehandle.
my %hash;
my $people;
my $country;
my $line;
unless(open($fh, '<', $file)){die "Could not open file $_ because $!"}
while($line = <$fh>)
{
($people, $country) = split(/\s{2,}/, $line); # splitting on at least two spaces
say "$people \t $country"; # Just printing out the columns in the file or people and Country.
$hash{$people}++; # Just counting all the people in the hash.
# Seeing how many unique names there are, like is there more than one Cindy, etc ...?
}
say "\nNow I'm just sorting the hash of people by names.";
foreach(sort{$a cmp $b} keys %hash)
{
say "$_ => $hash{$_}"; # Based on your file. The counter is at 1 because nobody has the same names.
}
Here is the output. As you can see I fixed the problem by splitting on at least two white-spaces so the country names don't get cut out.
Andrew UK
Cindy China
Rupa India
Gordon Australia
Peter New Zealand
Andrew United States
Now I'm just sorting the hash of people by names.
Andrew => 2
Cindy => 1
Gordon => 1
Peter => 1
Rupa => 1
I added another Andrew to the file. This Andrew is from the United States
as you can see. I see one of your problems. Look at this:
my ($people) = split("", $line);
You are splitting on characters as there is no space between those quotes.
If you look at this change now, you are splitting on at least one space.
my ($people) = split(" ", $line);
I'm new to Perl, so please bare with my on my ignorance. What I'm trying to do is read a file (already using File::Slurp module) and create variables from the data in the file. Currently I have this setup:
use File::Slurp;
my #targets = read_file("targetfile.txt");
print #targets;
Within that target file, I have the following bits of data:
id: 123456789
name: anytownusa
1.2.3.4/32
5.6.7.8/32
The first line is an ID, the second line is a name, and all successive lines will be IP addresses (maximum length of a few hundred).
So my goal is to read that file and create variables that look something like this:
$var1="123456789";
$var2="anytownusa";
$var3="1.2.3.4/32,5.6.7.8/32,etc,etc,etc,etc,etc";
** Taking note that all the IP addresses end up grouped together into a single variable and seperated by a (,) comma.
File::Slurp will read the complete file data in one go. This might cause an issue if the file size is very big. Let me show you a simple approach to this problem.
Read file line by line using while loop
Check line number using $. and assign line data to respective variable
Store ips in an array and at the end print them using join
Note: If you have to alter the line data then use search and replace in the respective conditional block before assigning the line data to the variable.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my ($id, $name, #ips);
while(<DATA>){
chomp;
if ($. == 1){
$id = $_;
}
elsif ($. == 2){
$name = $_;
}
else{
push #ips, $_;
}
}
print "$id\n";
print "$name\n";
print join ",", #ips;
__DATA__
id: 123456789
name: anytownusa
1.2.3.4/32
5.6.7.8/32
Demo
As it has been noted, there is no reason to "slurp" the whole file into a variable. If nothing else, it only makes the processing harder.
Also, why not store named labels in a hash, in this example
my %identity = (id => 123456789, name => 'anytownusa');
The code below picks up the key names from the file, they aren't hard-coded.
Then
use warnings;
use strict;
use feature 'say';
my (#ips, %identity);
my $file = 'targetfile.txt';
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
next if not /\S/;
chomp;
my ($m1, $m2) = split /:/; #/(stop bad syntax highlight)
if ($m1 and $m2) { $identity{$m1} = $m2; }
else { push #ips, $m1; }
}
say "$_: $identity{$_}" for keys %identity;
say join '/', #ips;
If the line doesn't have : the split will return it whole, which will be the ip and which is stored in an array for processing later. Otherwise it returns the named pair, for 'id' and 'name'.
We first skipped blank lines with next if not /\S/;, so the line must have some non-space elements and else suffices, as there is always something in $m1. We also need to remove the newline, with chomp.
Read the file into variables directly:
use Modern::Perl;
my ($id, $name, #ips) = (<DATA>,<DATA>,<DATA>);
chomp ($id, $name, #ips);
say $id;
say $name;
$" = ',';
say "#ips";
__DATA__
id: 123456789
name: anytownusa
1.2.3.4/32
5.6.7.8/32
Output:
id: 123456789
name: anytownusa
1.2.3.4/32,5.6.7.8/32
I have a log file like below
ID: COM-1234
Program: Swimming
Name: John Doe
Description: Joined on July 1st
------------------------------ID: COM-2345
Program: Swimming
Name: Brock sen
Description: Joined on July 1st
------------------------------ID: COM-9876
Program: Swimming
Name: johny boy
Description: Joined on July 1st
------------------------------ID: COM-9090
Program: Running
Name: justin kim
Description: Good Record
------------------------------
and I want to group it based on the Program (Swimming , Running etc) and want a display like,
PROGRAM: Swimming
==>ID
COM-1234
COM-2345
COM-9876
PROGRAM: Running
==>ID
COM-9090
I'm very new to Perl and I wrote the below piece (incomplete).
#!/usr/bin/perl
use Data::Dumper;
$/ = "%%%%";
open( AFILE, "D:\\mine\\out.txt");
while (<AFILE>)
{
#temp = split(/-{20,}/, $_);
}
close (AFILE);
my %hash = #new;
print Dumper(\%hash);
I have read from perl tutorials that hash key-value pairs will take unique keys with multiple values but not sure how to make use of it.
I'm able to read a file and store in to hash, unsure how to process to the aforementioned format.Any help is really appreciated.Thanks.
I always prefer to write programs like this so they read from STDIN as that makes them more flexible.
I'd do it like this:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
# Read one "chunk" of data at a time
local $/ = '------------------------------';
# A hash to store our results.
my %program;
# While there's data on STDIN...
while (<>) {
# Remove the end of record marker
chomp;
# Skip any empty records
# (i.e. ones with no printable characters)
next unless /\S/;
# Extract the information that we want with a regex
my ($id, $program) = /ID: (.+?)\n.*Program: (.+?)\n/s;
# Build a hash of arrays containing our data
push #{$program{$program}}, $id;
}
# Now we have all the data we need, so let's display it.
# Keys in %program are the program names
foreach my $p (keys %program) {
say "PROGRAM: $p\n==>ID";
# $program{$p} is a reference to an array of IDs
say "\t$_" for #{$program{$p}};
say '';
}
Assuming this is in a program called programs.pl and the input data is in programs.txt, then you'd run it like this:
C:/> programs.pl < programs.txt
Always put use warnings; and use strict; in top of the program. And always use three argument for open
open my $fh, "<", "D:\\mine\\out.txt";
my %hash;
while (<$fh>){
if(/ID/)
{
my $nxt = <$fh>;
s/.*?ID: //g;
$hash{"$nxt==>ID \n"}.=" $_";
}
}
print %hash;
Output
Program: Running
==>ID
COM-9090
Program: Swimming
==>ID
COM-1234
COM-2345
COM-9876
I your input file program were found at the line after ID. So I used
my $nxt = <$fh>; Now the program were store into the $nxt variable.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash = ();
open my $IN, "<", "your file name here" or die "Error: $!\n";
while (<$IN>) {
if ($_ =~ m/^\s*-*ID:\s*COM/) {
(my $id) = ($_ =~ m/\s*ID:\s*(.*)/);
my $prog_name = <$IN>;
chomp $prog_name;
$prog_name =~ s/Program/PROGRAM/;
$hash{$prog_name} = [] unless $hash{$prog_name};
push #{$hash{$prog_name}}, $id;
}
}
close $IN;
print Dumper(\%hash);
Output will be:
$VAR1 = {
'PROGRAM: Running' => [
'COM-9090'
],
'PROGRAM: Swimming' => [
'COM-1234',
'COM-2345',
'COM-9876'
]
};
Let's look at these two lines:
$hash{$prog_name} = [] unless $hash{$prog_name};
push #{$hash{$prog_name}}, $id;
The first line initiates an empty array reference as the value if the hash is undefined. The second line pushes the ID to the end of that array (regardless of the first line).
Moreover, the first line is not mandatory. Perl knows what you mean if you just write push #{$hash{$prog_name}}, $id; and interprets it as if you said "go to the value of this key" and creates it if it wasn't already there. Now you say that the value is an array and you push $id to the list.