Reading File and Inserting into Variables using Perl - perl

I'm new to Perl, so please bare with my on my ignorance. What I'm trying to do is read a file (already using File::Slurp module) and create variables from the data in the file. Currently I have this setup:
use File::Slurp;
my #targets = read_file("targetfile.txt");
print #targets;
Within that target file, I have the following bits of data:
id: 123456789
name: anytownusa
1.2.3.4/32
5.6.7.8/32
The first line is an ID, the second line is a name, and all successive lines will be IP addresses (maximum length of a few hundred).
So my goal is to read that file and create variables that look something like this:
$var1="123456789";
$var2="anytownusa";
$var3="1.2.3.4/32,5.6.7.8/32,etc,etc,etc,etc,etc";
** Taking note that all the IP addresses end up grouped together into a single variable and seperated by a (,) comma.

File::Slurp will read the complete file data in one go. This might cause an issue if the file size is very big. Let me show you a simple approach to this problem.
Read file line by line using while loop
Check line number using $. and assign line data to respective variable
Store ips in an array and at the end print them using join
Note: If you have to alter the line data then use search and replace in the respective conditional block before assigning the line data to the variable.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my ($id, $name, #ips);
while(<DATA>){
chomp;
if ($. == 1){
$id = $_;
}
elsif ($. == 2){
$name = $_;
}
else{
push #ips, $_;
}
}
print "$id\n";
print "$name\n";
print join ",", #ips;
__DATA__
id: 123456789
name: anytownusa
1.2.3.4/32
5.6.7.8/32
Demo

As it has been noted, there is no reason to "slurp" the whole file into a variable. If nothing else, it only makes the processing harder.
Also, why not store named labels in a hash, in this example
my %identity = (id => 123456789, name => 'anytownusa');
The code below picks up the key names from the file, they aren't hard-coded.
Then
use warnings;
use strict;
use feature 'say';
my (#ips, %identity);
my $file = 'targetfile.txt';
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
next if not /\S/;
chomp;
my ($m1, $m2) = split /:/; #/(stop bad syntax highlight)
if ($m1 and $m2) { $identity{$m1} = $m2; }
else { push #ips, $m1; }
}
say "$_: $identity{$_}" for keys %identity;
say join '/', #ips;
If the line doesn't have : the split will return it whole, which will be the ip and which is stored in an array for processing later. Otherwise it returns the named pair, for 'id' and 'name'.
We first skipped blank lines with next if not /\S/;, so the line must have some non-space elements and else suffices, as there is always something in $m1. We also need to remove the newline, with chomp.

Read the file into variables directly:
use Modern::Perl;
my ($id, $name, #ips) = (<DATA>,<DATA>,<DATA>);
chomp ($id, $name, #ips);
say $id;
say $name;
$" = ',';
say "#ips";
__DATA__
id: 123456789
name: anytownusa
1.2.3.4/32
5.6.7.8/32
Output:
id: 123456789
name: anytownusa
1.2.3.4/32,5.6.7.8/32

Related

Creating multiple hashes from multiple files in one go

I want to perform a vlookup like process but with multiple files wherein the contents of the first column from all files (sorted n uniq-ed) is reference value. Now I would like to store these key-values pairs from each file in each hash and then print them together. Something like this:
file1: while(){$hash1{$key}=$val}...file2: while(){$hash2{$key}=$val}...file3: while(){$hash3{$key}=$val}...so on
Then print it: print "$ref_val $hash1{$ref_val} $hash3{$ref_val} $hash3{$ref_val}..."
$i=1;
#FILES = #ARGV;
foreach $file(#FILES)
{
open($fh,$file);
$hname="hash".$i; ##trying to create unique hash by attaching a running number to hash name
while(<$fh>){#d=split("\t");$hname{$d[0]}=$d[7];}$i++;
}
$set=$i-1; ##store this number for recreating the hash names during printing
open(FH,"ref_list.txt");
while(<FH>)
{
chomp();print "$_\t";
## here i run the loop recreating the hash names and printing its corresponding value
for($i=1;$i<=$set;$i++){$hname="hash".$i; print "$hname{$_}\t";}
print "\n";
}
Now this where I am stuck perl takes $hname as hash name instead of $hash1, $hash2...
Thanks in advance for the helps and opinions
The shown code attempts to use symbolic references to construct variable names at runtime. Those things can raise a lot of trouble and should not be used, except very occasionally in very specialized code.
Here is a way to read multiple files, each into a hash, and store them for later processing.
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
my #files = #ARGV;
my #data;
for my $file (#files) {
open my $fh, '<', $file or do {
warn "Skip $file, can't open it: $!";
next;
};
push #data, { map { (split /\t/, $_)[0,6] } <$fh> };
}
dd \#data;
Each hash associates the first column with the seventh (index 6), as clarified, for each line. A reference to such a hash for each file, formed by { }, is added to the array.
Note that when you add a key-value pair to a hash which already has that key the new overwrites the old. So if a string repeats in the first column in a file, the hash for that file will end up with the value (column 7) for the last one. The OP doesn't discuss possible duplicates of this kind in data files (only for the reference file), please clarify if needed.
The Data::Dump is used only to print; if you don't wish to install it use core Data::Dumper.
I am not sure that I get the use of that "reference file", but you can now go through the array of hash references for each file and fetch values as needed. Perhaps like
open my $fh_ref, '<', $ref_file or die "Can't open $ref_file: $!";
while (my $line = <$fh_ref>) {
my $key = ... # retrieve the key from $line
print "$key: ";
foreach my $hr (#data) {
print "$hr->{$key} ";
}
say '';
}
This will print key: followed by values for that string, one from each file.

how to remove last single line available in file using perl

how to remove last single line available in file using perl.
I have my data like below.
"A",1,-2,-1,-4,
"B",3,-5,-2.-5,
how to remove the last line... I am summing all the numbers but receiving a null value at the end.
Tried using chomp but did not work.
Here is the code currently being used:
while (<data>) {
chomp(my #row = (split ',' , $_ , -1);
say sum #row[1 .. $#row];
}
Try this (shell one-liner) :
perl -lne '!eof() and print' file
or as part of a script :
while (defined($_ = readline ARGV)) {
print $_ unless eof();
}
You should be using Text::CSV or Text::CSV_XS for handling comma separated value files. Those modules are available on CPAN. That type of solution would look like this:
use Text::CSV;
use List::Util qw(sum);
my $csv = Text::CSV->new({binary => 1})
or die "Cannot use CSV: " . Text::CSV->error_diag;
while(my $row = $csv->getline($fh)) {
next unless ($row->[0] || '') =~ m/\w/; # Reject rows that don't start with an identifier.
my $sum = sum(#$row[1..$#$row]);
print "$sum\n";
}
If you are stuck with a solution that doesn't use a proper CSV parser, then at least you'll need to add this to your existing while loop, immediately after your chomp:
next unless scalar(#row) && length $row[0]; # Skip empty rows.
The point to this line is to detect when a row is empty -- has no elements, or elements were empty after the chomp.
I suspect this is an X/Y question. You think you want to avoid processing the final (empty?) line in your input when actually you should be ensuring that all of your input data is in the format you expect.
There are a number of things you can do to check the validity of your data.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use List::Util 'sum';
use Scalar::Util 'looks_like_number';
while (<DATA>) {
# Chomp the input before splitting it.
chomp;
# Remove the -1 from your call to split().
# This automatically removes any empty trailing fields.
my #row = split /,/;
# Skip lines that are empty.
# 1/ Ensure there is data in #row.
# 2/ Ensure at least one element in #row contains
# non-whitespace data.
next unless #row and grep { /\S/ } #row;
# Ensure that all of the data you pass to sum()
# looks like numbers.
say sum grep { looks_like_number $_ } #row[1 .. $#row];
}
__DATA__
"A",1.2,-1.5,4.2,1.4,
"B",2.6,-.50,-1.6,0.3,-1.3,

Perl hash formatting

I have a log file like below
ID: COM-1234
Program: Swimming
Name: John Doe
Description: Joined on July 1st
------------------------------ID: COM-2345
Program: Swimming
Name: Brock sen
Description: Joined on July 1st
------------------------------ID: COM-9876
Program: Swimming
Name: johny boy
Description: Joined on July 1st
------------------------------ID: COM-9090
Program: Running
Name: justin kim
Description: Good Record
------------------------------
and I want to group it based on the Program (Swimming , Running etc) and want a display like,
PROGRAM: Swimming
==>ID
COM-1234
COM-2345
COM-9876
PROGRAM: Running
==>ID
COM-9090
I'm very new to Perl and I wrote the below piece (incomplete).
#!/usr/bin/perl
use Data::Dumper;
$/ = "%%%%";
open( AFILE, "D:\\mine\\out.txt");
while (<AFILE>)
{
#temp = split(/-{20,}/, $_);
}
close (AFILE);
my %hash = #new;
print Dumper(\%hash);
I have read from perl tutorials that hash key-value pairs will take unique keys with multiple values but not sure how to make use of it.
I'm able to read a file and store in to hash, unsure how to process to the aforementioned format.Any help is really appreciated.Thanks.
I always prefer to write programs like this so they read from STDIN as that makes them more flexible.
I'd do it like this:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
# Read one "chunk" of data at a time
local $/ = '------------------------------';
# A hash to store our results.
my %program;
# While there's data on STDIN...
while (<>) {
# Remove the end of record marker
chomp;
# Skip any empty records
# (i.e. ones with no printable characters)
next unless /\S/;
# Extract the information that we want with a regex
my ($id, $program) = /ID: (.+?)\n.*Program: (.+?)\n/s;
# Build a hash of arrays containing our data
push #{$program{$program}}, $id;
}
# Now we have all the data we need, so let's display it.
# Keys in %program are the program names
foreach my $p (keys %program) {
say "PROGRAM: $p\n==>ID";
# $program{$p} is a reference to an array of IDs
say "\t$_" for #{$program{$p}};
say '';
}
Assuming this is in a program called programs.pl and the input data is in programs.txt, then you'd run it like this:
C:/> programs.pl < programs.txt
Always put use warnings; and use strict; in top of the program. And always use three argument for open
open my $fh, "<", "D:\\mine\\out.txt";
my %hash;
while (<$fh>){
if(/ID/)
{
my $nxt = <$fh>;
s/.*?ID: //g;
$hash{"$nxt==>ID \n"}.=" $_";
}
}
print %hash;
Output
Program: Running
==>ID
COM-9090
Program: Swimming
==>ID
COM-1234
COM-2345
COM-9876
I your input file program were found at the line after ID. So I used
my $nxt = <$fh>; Now the program were store into the $nxt variable.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash = ();
open my $IN, "<", "your file name here" or die "Error: $!\n";
while (<$IN>) {
if ($_ =~ m/^\s*-*ID:\s*COM/) {
(my $id) = ($_ =~ m/\s*ID:\s*(.*)/);
my $prog_name = <$IN>;
chomp $prog_name;
$prog_name =~ s/Program/PROGRAM/;
$hash{$prog_name} = [] unless $hash{$prog_name};
push #{$hash{$prog_name}}, $id;
}
}
close $IN;
print Dumper(\%hash);
Output will be:
$VAR1 = {
'PROGRAM: Running' => [
'COM-9090'
],
'PROGRAM: Swimming' => [
'COM-1234',
'COM-2345',
'COM-9876'
]
};
Let's look at these two lines:
$hash{$prog_name} = [] unless $hash{$prog_name};
push #{$hash{$prog_name}}, $id;
The first line initiates an empty array reference as the value if the hash is undefined. The second line pushes the ID to the end of that array (regardless of the first line).
Moreover, the first line is not mandatory. Perl knows what you mean if you just write push #{$hash{$prog_name}}, $id; and interprets it as if you said "go to the value of this key" and creates it if it wasn't already there. Now you say that the value is an array and you push $id to the list.

Using Perl to parse a CSV file from a particular row to the end of the file

am very new to Perl and need your help
I have a CSV file xyz.csv with contents:
here level1 and er values are strings names...not numbers...
level1,er
level2,er2
level3,er3
level4,er4
I parse this CSV file using the script below and pass the fields to an array in the first run
open(my $d, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$d>) {
chomp $line;
my #data = split "," , $line;
#XYX = ( [ "$data[0]", "$data[1]" ], );
}
For the second run I take an input from a command prompt and store in variable $val. My program should parse the CSV file from the value stored in variable until it reaches the end of the file
For example
I input level2 so I need a script to parse from the second line to the end of the CSV file, ignoring the values before level2 in the file, and pass these values (level2 to level4) to the #XYX = (["$data[1]","$data[1]"],);}
level2,er2
level3,er3
level4,er4
I input level3 so I need a script to parse from the third line to the end of the CSV file, ignoring the values before level3 in the file, and pass these values (level3 and level4) to the #XYX = (["$data[0]","$data[1]"],);}
level3,er3
level4,er4
How do I achieve that? Please do give your valuable suggestions. I appreciate your help
As long as you are certain that there are never any commas in the data you should be OK using split. But even so it would be wise to limit the split to two fields, so that you get everything up to the first comma and everything after it
There are a few issues with your code. First of all I hope you are putting use strict and use warnings at the top of all your Perl programs. That simple measure will catch many trivial problems that you could otherwise overlook, and so it is especially important before you ask for help with your code
It isn't commonly known, but putting a newline "\n" at the end of your die string prevent Perl from giving file and line number details in the output of where the error occurred. While this may be what you want, it is usually more helpful to be given the extra information
Your variable names are verly unhelpful, and by convention Perl variables consist of lower-case alphanumerics and underscores. Names like #XYX and $W don't help me understand your code at all!
Rather than splitting to an array, it looks like you would be better off putting the two fields into two scalar variables to avoid all that indexing. And I am not sure what you intend by #XYX = (["$data[1]","$data[1]"],). First of all do you really mean to use $data[1] twice? Secondly, your should never put scalar variables inside double quotes, as it does something very specific, and unless you know what that is you should avoid it. Finally, did you mean to push an anonymous array onto #XYX each time around the loop? Otherwise the contents of the array will be overwritten each time a line is read from the file, and the earlier data will be lost
This program uses a regular expression to extract $level_num from the first field. All it does it find the first sequence of digits in the string, which can then be compared to the minimum required level $min_level to decide whether a line from the log is relevant
use strict;
use warnings;
my $file = 'xyz.csv';
my $min_level = 3;
my #list;
open my $fh, '<', $file or die "Could not open '$file' $!";
while (my $line = <$fh>) {
chomp $line;
my ($level, $error) = split ',', $line, 2;
my ($level_num) = $level =~ /(\d+)/;
next unless $level_num >= $min_level;
push #list, [ $level, $error ];
}
For deciding which records to process you can use the "flip-flop" operator (..) along these lines.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $level = shift || 'level1';
while (<DATA>) {
if (/^\Q$level,/ .. 0) {
print;
}
}
__DATA__
level1,er
level2,er2
level3,er3
level4,er4
The flip-flop operator returns false until its first operand is true. At that point it returns false until its second operand is true; at which point it returns false again.
I'm assuming that your file is ordered so that once you start to process it, you never want to stop. That means that the first operand to the flip-flop can be /^\Q$level,/ (match the string $level at the start of the line) and the second operand can just be zero (as we never want it to stop processing).
I'd also strongly recommend not parsing CSV records using split /,/. That may work on your current data but, in general, the fields in a CSV file are allowed to contain embedded commas which will break this approach. Instead, have a look at Text::CSV or Text::ParseWords (which is included with the standard Perl distribution).
Update: I seem to have got a couple of downvotes on this. It would be great if people would take the time to explain why.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my #XYZ;
my $file = 'xyz.csv';
open my $fh, '<', $file or die "$file: $!\n";
my $level = shift; # get level from commandline
my $getall = not defined $level; # true if level not given on commandline
my $parser = Text::CSV->new({ binary => 1 }); # object for parsing lines of CSV
while (my $row = $parser->getline($fh)) # $row is an array reference containing cells from a line of CSV
{
if ($getall # if level was not given on commandline, then put all rows into #XYZ
or # if level *was* given on commandline, then...
$row->[0] eq $level .. 0 # ...wait until the first cell in a row equals $level, then put that row and all subsequent rows into #XYZ
)
{
push #XYZ, $row;
}
}
close $fh;
#!/usr/bin/perl
use strict;
use warnings;
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
my $level = shift ||"level1";
while (my $line = <$data>) {
chomp $line;
my #fields = split "," , $line;
if($fields[0] eq $level .. 0){
print "\n$fields[0]\n";
print "$fields[1]\n";
}}
This worked....thanks ALL for your help...

How to parse multiple line, fixed-width file in perl?

I have a file that I need to parse in the following format. (All delimiters are spaces):
field name 1: Multiple word value.
field name 2: Multiple word value along
with multiple lines.
field name 3: Another multiple word
and multiple line value.
I am familiar with how to parse a single line fixed-width file, but am stumped with how to handle multiple lines.
#!/usr/bin/env perl
use strict; use warnings;
my (%fields, $current_field);
while (my $line = <DATA>) {
next unless $line =~ /\S/;
if ($line =~ /^ \s+ ( \S .+ )/x) {
if (defined $current_field) {
$fields{ $current_field} .= $1;
}
}
elsif ($line =~ /^(.+?) : \s+ (.+) \s+/x ) {
$current_field = $1;
$fields{ $current_field } = $2;
}
}
use Data::Dumper;
print Dumper \%fields;
__DATA__
field name 1: Multiple word value.
field name 2: Multiple word value along
with multiple lines.
field name 3: Another multiple word
and multiple line value.
Fixed-width says unpack to me. It is possible to parse with regexes and split, but unpack should be a safer choice, as it is the Right Tool for fixed width data.
I put the width of the first field to 12 and the empty space between to 13, which works for this data. You may need to change that. The template "A12A13A*" means "find 12 then 13 ascii characters, followed by any length of ascii characters". unpack will return a list of these matches. Also, unpack will use $_ if a string is not supplied, which is what we do here.
Note that if the first field is not fixed width up to the colon, as it appears to be in your sample data, you'll need to merge the fields in the template, e.g. "A25A*", and then strip the colon.
I chose array as the storage device, as I do not know if your field names are unique. A hash would overwrite fields with the same name. Another benefit of an array is that it preserves the order of the data as it appears in the file. If these things are irrelevant and quick lookup is more of a priority, use a hash instead.
Code:
use strict;
use warnings;
use Data::Dumper;
my $last_text;
my #array;
while (<DATA>) {
# unpack the fields and strip spaces
my ($field, undef, $text) = unpack "A12A13A*";
if ($field) { # If $field is empty, that means we have a multi-line value
$field =~ s/:$//; # strip the colon
$last_text = [ $field, $text ]; # store data in anonymous array
push #array, $last_text; # and store that array in #array
} else { # multi-line values get added to the previous lines data
$last_text->[1] .= " $text";
}
}
print Dumper \#array;
__DATA__
field name 1: Multiple word value.
field name 2: Multiple word value along
with multiple lines.
field name 3: Another multiple word
and multiple line value
with a third line
Output:
$VAR1 = [
[
'field name 1:',
'Multiple word value.'
],
[
'field name 2:',
'Multiple word value along with multiple lines.'
],
[
'field name 3:',
'Another multiple word and multiple line value with a third line'
]
];
You could do this:
#!/usr/bin/perl
use strict;
use warnings;
my #fields;
open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n";
for (<$fh>) {
if (/^\s/) {
$fields[$#fields] .= $_;
} else {
push #fields, $_;
}
}
close $fh;
If the line starts with white space, append it to the last element in #fields, otherwise push it onto the end of the array.
Alternatively, slurp the entire file and split with look-around:
#!/usr/bin/perl
use strict;
use warnings;
$/=undef;
open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n";
my #fields = split/(?<=\n)(?!\s)/, <$fh>;
close $fh;
It's not a recommended approach though.
You can change delimiter:
$/ = "\nfield name";
while (my $line = <FILE>) {
if ($line =~ /(\d+)\s+(.+)/) {
print "Record $1 is $2";
}
}