Reading Data from a file in Perl - perl

I have a file abc.txt that has data of the form
sHost = "Arun";
sUid ="Abc";
I want to get Arun for sHost and so forth using Perl. My code:
my $filename = "abc.txt";
use strict;
use warnings;
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>)
{
chomp $row;
if ($row=~m/sHost/)
{
print $row;
}
}
The output I am getting sHost = Arun;
But I want only 'Arun'. What logic should I apply here? I am very new to Perl and Linux.

After the chomp, alter to this and the variable $host will contain the value
if ($row=~m/sHost = "(.*)"/) {
$host=$1;
In simple terms the ( ) section is given to $1 if there is a match. See man perlre for the details
To generalise this to read any key and any value do something like this
while (my $row = <$fh>) {
if ($row = ~ /^(\w+) = "([^"]+)"/) {
$value{$1} = $2;
}
Then $value{'sHost'} will be "Arun" etc

For universal config file parsing you can use following piece of code:
my %config;
if ($row =~ m/^\s*(["'`])?(\S+)\1?\s*=\s*(["'`])?(\S+?)\3?;?$/) {
my $key = $2;
my $value = $4;
$config{$key} = $value;
}
This regexp allows you to process key-value lines with plain or surrounded by different quote type (" ' `, but you can add your symbols if you like) key/value with leading or/and trailing whitespaces, semicolon is not ogligatory. Also you can change (\S+) according to your requirements of key/value possible values (\S - all except whitespaces).

use m/.*=\s*([^\s]*)/g instead of m/sHost/
use print $1 instead of print $row

Replace
if ($row=~m/sHost/)
with
if ($row=~s/sHost//)

Related

How to check whether one file's value contains in another text file? (perl script)

I would like to check one of the file's values contains on another file. if one of the value contains it will show there is existing bin for that specific, if no, it will show there is no existing bin limit. the problem is I am not sure how to check all values at once.
first DID1 text file value contain :
L84A:D:O:M:
L84C:B:E:D:
second DID text file value contain :
L84A:B:E:Q:X:F:i:M:Y:
L84C:B:E:Q:X:F:i:M:Y:
L83A:B:E:Q:X:F:i:M:Y:
if first 4words value are match, need to check all value for that line.
for example L84A in first text file & second text file value has M . it should print out there is an existing M bin
below is my code :
use strict;
use warnings;
my $filename = 'DID.txt';
my $filename1 = 'DID1.txt';
my $count = 0;
open( FILE2, "<$filename1" )
or die("Could not open log file. $!\n");
while (<FILE2>) {
my ($number) = $_;
chomp($number);
my #values1 = split( ':', $number );
open( FILE, "<$filename" )
or die("Could not open log file. $!\n");
while (<FILE>) {
my ($line) = $_;
chomp($line);
my #values = split( ':', $line );
foreach my $val (#values) {
if ( $val =~ /$values1[0]/ ) {
$count++;
if ( $values[$count] =~ /$values1[$count]/ ) {
print
"Yes ,There is an existing bin & DID\n #values1\n";
}
else {
print "No, There is an existing bin & DID\n";
}
}
}
}
}
I cannot check all value. please help to give any advice on it since this is my first time learning for perl language. Thanks a lot :)
Based on my understanding I write this code:
use strict;
use warnings;
#use ReadWrite;
use Array::Utils qw(:all);
use vars qw($my1file $myfile1cnt $my2file $myfile2cnt #output);
$my1file = "did1.txt"; $my2file = "did2.txt";
We are going to read both first and second files (DID1 and DID2).
readFileinString($my1file, \$myfile1cnt); readFileinString($my2file, \$myfile2cnt);
In first file, as per the OP's request the first four characters should be matched with second file and then if they matched we need to check rest of the characters in the first file with the second one.
while($myfile1cnt=~m/^((\w){4})\:([^\n]+)$/mig)
{
print "<LineStart>";
my $lineChk = $1; my $full_Line = $3; #print ": $full_Line\n";
my #First_values = split /\:/, $full_Line; #print join "\n", #First_values;
If the first four digit matched then,
if($myfile2cnt=~m/^$lineChk\:([^\n]+)$/m)
{
Storing the rest of the content in the same and to be split with colon and getting the characters to be matched with first file contents.
my $FullLine = $1; my #second_values = split /:/, $FullLine;
Then search each letter first and second content which matched line...
foreach my $sngletter(#First_values)
{
If the letters are matched with first and second file its going to be printed.
if( grep {$_ eq "$sngletter"} #second_values)
{
print "Matched: $sngletter\t";
}
}
}
else { print "Not Matched..."; }
This is just information that the line end.
print "<LineEnd>\n"
}
#------------------>Reading a file
sub readFileinString
#------------------>
{
my $File = shift;
my $string = shift;
use File::Basename;
my $filenames = basename($File);
open(FILE1, "<$File") or die "\nFailed Reading File: [$File]\n\tReason: $!";
read(FILE1, $$string, -s $File, 0);
close(FILE1);
}
Read search pattern and data into hash (first field is a key), then go through data and select only field included into pattern for this key.
use strict;
use warnings;
use feature 'say';
my $input1 = 'DID1.txt'; # look for key,pattern(array)
my $input2 = 'DID.txt'; # data - key,elements(array)
my $pattern;
my $data;
my %result;
$pattern = file2hash($input1); # read pattern into hash
$data = file2hash($input2); # read data into hash
while( my($k,$v) = each %{$data} ) { # walk through data
next unless defined $pattern->{$k}; # skip those which is not in pattern hash
my $find = join '|', #{ $pattern->{$k} }; # form search pattern for grep
my #found = grep {/$find/} #{ $v }; # extract only those of interest
$result{$k} = \#found; # store in result hash
}
while( my($k,$v) = each %result ) { # walk through result hash
say "$k has " . join ':', #{ $v }; # output final result
}
sub file2hash {
my $filename = shift;
my %hash;
my $fh;
open $fh, '<', $filename
or die "Couldn't open $filename";
while(<$fh>) {
chomp;
next if /^\s*$/; # skip empty lines
my($key,#data) = split ':';
$hash{$key} = \#data;
}
close $fh;
return \%hash;
}
Output
L84C has B:E
L84A has M

Parsing string in multiline data with positive lookbehind

I am trying to parse data like:
header1
-------
var1 0
var2 5
var3 9
var6 1
header2
-------
var1 -3
var3 5
var5 0
Now I want to get e.g. var3 for header2. Whats the best way to do this?
So far I was parsing my files line-by-line via
open(FILE,"< $file");
while (my $line = <FILE>){
# do stuff
}
but I guess it's not possible to handle multiline parsing properly.
Now I am thinking to parse the file at once but wasn't successful so far...
my #Input;
open(FILE,"< $file");
while (<FILE>){ #Input = <FILE>; }
if (#Input =~ /header2/){
#...
}
The easier way to handle this is "paragraph mode".
local $/ = "";
while (<>) {
my ($header, $body) =~ /^([^\n]*)\n-+\n(.*)/s
or die("Bad data");
my #data = map [ split ], split /\n/, $body;
# ... Do something with $header and #data ...
}
The same can be achieved without messing with $/ as follows:
my #buf;
while (1) {
my $line = <>;
$line =~ s/\s+\z// if !defined($line);
if (!length($line)) {
if (#buf) {
my $header = shift(#buf);
shift(#buf);
my #data = map [ split ], splice(#buf);
# ... Do something with $header and #data ...
}
last if !defined($line);
next;
}
push #buf, $line;
}
(In fact, the second snippet includes a couple of small improvements over the first.)
Quick comments on your attempt:
The while loop is useless because #Input = <FILE> places the remaining lines of the file in #Input.
#Input =~ /header2/ matches header2 against the stringification of the array, which is the stringification of the number of elements in #Input. If you want to check of an element of #Input contains header2, will you will need to loop over the elements of #Inputs and check them individually.
while (<FILE>){ #Input = <FILE>; }
This doesn't make much sense. "While you can read a record from FILE, read all of the data on FILE into #Input". I think what you actually want is just:
my #Input = <FILE>;
if (#Input =~ /header2/){
This is quite strange too. The binding operator (=~) expects scalar operands, so it evaluates both operands in scalar context. That means #Input will be evaluated as the number of elements in #Input. That's an integer and will never match "header2".
A couple of approaches. Firstly a regex approach.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $data = join '', <$fh>;
if ($data =~ /header2.+var3 (.+?)\n/s) {
say $1;
} else {
say 'Not found';
}
The key to this is the /s on the m// operator. Without it, the two dots in the regex won't match newlines.
The other approach is more of a line by line parser.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $section = '';
while (<$fh>) {
chomp;
# if the line all word characters,
# then we've got a section header.
if ($_ !~ /\W/) {
$section = $_;
next;
}
my ($key, $val) = split;
if ($section eq 'header2' and $key eq 'var3') {
say $val;
last;
}
}
We read the file a line at a time and make a note of the section headers. For data lines, we split on whitespace and check to see if we're in the right section and have the right key.
In both cases, I've switched to using a more standard approach (lexical filehandles, 3-arg open(), or die $!) for opening the file.

Parsing a CSV file and Hashing

I am trying to parse a CSV file to read in all the other zip codes. I am trying to create a hash where each key is a zip code and the value is the number it appears in the file. Then I want to print out the contents as Zip Code - Number. Here is the Perl script I have so far.
use strict;
use warnings;
my %hash = qw (
zipcode count
);
my $file = $ARGV[0] or die "Need CSV file on command line \n";
open(my $data, '<', $file) or die "Could not open '$file $!\n";
while (my $line = <$data>) {
chomp $line;
my #fields = split "," , $line;
if (exists($hash{$fields[2]})) {
$hash{$fields[1]}++;
}else {
$hash{$fields[1]} = 1;
}
}
my $key;
my $value;
while (($key, $value) = each(%hash)) {
print "$key - $value\n";
}
exit;
You don't say which column your zip code is in, but you are using the third field to check for an existing hash element, and then the second field to increment it.
There is no need to check whether a hash element already exists: Perl will happily create a non-existent hash element and increment it to 1 the first time you access it.
There is also no need to explicitly open any files passed as command line parameters: Perl will open them and read them if you use the <> operator without a file handle.
This reworking of your own program may work. It assumes the zip code is in the second column of the CSV. If it is anywhere else just change ++$hash{$fields[1]} appropriately.
use strict;
use warnings;
#ARGV or die "Need CSV file on command line \n";
my %counts;
while (my $line = <>) {
chomp $line;
my #fields = split /,/, $line;
++$counts{$fields[1]};
}
while (my ($key, $value) = each %counts) {
print "$key - $value\n";
}
Sorry if this is off-topic, but if you're on a system with the standard Unix text processing tools, you could use this command to count the number of occurrences of each value in field #2, and not need to write any code.
cut -d, -f2 filename.csv | sort | uniq -c
which will generate something like this output, where the count is listed first, and the zipcode second:
12 12345
2 56789
34 78912
1 90210

How do I modify the second column of a CSV file based on the first column?

I'm new to Perl and I have a CSV file that contains e-mails and names, like this:
john#domain1.com;John
Paul#domain2.com;
Richard#domain3.com;Richard
Rob#domain4.com;
Andrew#domain5.com;Andrew
However, as you can see a few entries/lines have the e-mail address and the ; field separator, but lack the name. I need to read line by line and and if the name field is missing, I want to print in this place the begin of the e-mail until #domainX.com. Output example:
john#domain1.com;John
Paul#domain2.com;Paul
Richard#domain3.com;Richard
Rob#domain4.com;Rob
Andrew#domain5.com;Andrew
I'm new with Perl, I did the iteration of read line by line, such this:
#!/usr/bin/perl
use warnings;
use strict;
open (MYFILE, 'test.txt');
while (<MYFILE>) {
chomp;
}
But I'm failing to parse the entries to use ; as a separator and to check if the name field is missing and consequently print the begin of the e-mail without the domain.
Can someone please give me a example based on my code?
First, if the file may contain real CSV (or space SV in your case) data (e.g. quoted fields), I'd strongly recommend using a standard Perl module to parse it.
Otherwise, a quick-and-dirty example can be:
#!/usr/bin/perl
use warnings;
use strict;
# In modern Perl, please always use 3-aqr form of open and lexical filehandles.
# More robust
open $fh, "<", 'test.txt' || die "Can not open: $!\n";
while (<$fh>) {
chomp;
my ($email, name) = split(/;/, $_);
if (!$name) {
my ($userid, $domain) = split(/\#/, $email);
$name = $userid;
}
print "$space_prefix$email;$name\n"; # Print to STDOUT for simplicity of example
}
close($fh);
Try:
#!/usr/bin/env perl
use strict;
use warnings;
for my $file ( #ARGV ){
open my$in_fh, '<', $file or die "could not open $file: $!\n";
while( my $line = <$in_fh> ){
chomp( $line );
my ( $email, $name ) = split m{ \; }msx, $line;
if( ! ( defined $name && length( $name ) > 0 ) ){
( $name ) = split m{ \# }msx, $email;
$name = ucfirst( lc( $name ));
}
print "$email;$name\n";
}
}
I am not a pearl programmer, but I would split first on the space character, and then you could iterate through the results and split by the semi-colon. Then you can check the second member of the semi-colon split array, and if it is empty, replace it with the beginning of the first member of the semi-colon split array. Then, just reverse the process, first joining by semi-colons and then by spaces.

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.