Ignore empty lines in a CSV file using Perl - perl

I have to read a CSV file, abc.csv, select a few fields from them and form a new CSV file, def.csv.
Below is my code. I am trying to ignore empty lines from abc.csv.
genNewCsv();
sub genNewCsv {
my $csv = Text::CSV->new();
my $aCsv = "abc.csv"
my $mCsv = "def.csv";
my $fh = FileHandle->new( $aCsv, "r" );
my $nf = FileHandle->new( $mCsv, "w" );
$csv->print( $nf, [ "id", "flops""ub" ] );
while ( my $row = $csv->getline($fh) ) {
my $id = $row->[0];
my $flops = $row->[2];
next if ( $id =~ /^\s+$/ ); #Ignore empty lines
my $ub = "TRUE";
$csv->print( $nf, [ $id, $flops, $ub ] );
}
$nf->close();
$fh->close();
}
But I get the following error:
Use of uninitialized value $flops in pattern match (m//)
How do I ignore the empty lines in the CSV file?
I have used Stack Overflow question Remove empty lines and space with Perl, but it didn't help.

You can skip the entire row if any fields are empty:
unless(defined($id) && defined($flop) && defined($ub)){
next;
}
You tested if $id was empty, but not $flops, which is why you got the error.
You should also be able to do
unless($id && $flop && $ub){
next;
}
There, an empty string would evaluate to false.
Edit: Now that I think about it, what do you mean by ignore lines that aren't there?
You can also do this
my $id = $row[0] || 'No Val' #Where no value is anything you want to signify it was empty
This will show a default value for the the variable, if the first value evaluated to false.

Run this on your file first to get non-empty lines
sub removeblanks {
my #lines = ();
while(#_) {
push #lines, $_ unless( $_ =~ /^\s*$/ ); #add any other conditions here
}
return \#lines;
}

You can do the following to skip empty lines:
while (my $row = $csv->getline($fh)){
next unless grep $_, #$row;
...

You could use List::MoreUtils to check if any of the fields are defined:
use List::MoreUtils qw(any);
while(my $row = ...) {
next unless any { defined } #{ $row };
...
}

Related

compare the value of same items with one exception case

A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
QC4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
QA4,train,75
F,movie,
It should do comparison between the values of same type (wherever the 2nd column matches) and Print the differ .Now i want output to look like:
**A,food,75 is not matching with B,food,75 C,food,85
A,car,69 is not matching with C,car,136 B,Car,136
A,house,179 is not matching with B,house,150
QC4,train,95 is not matching with QA4,train,75
F,movie missing value
Code I've written is below but its not printing the format the way I want.
while (FILE) {
my $line = $_ ;
my #lines = split /,/, $line ;
$data{$lines[1]}{$lines[0]} = $lines[2] ;
}
foreach my $item (keys %val) {
foreach my $letter1 (keys %{$val{$item}}) {
foreach my $letter2 (keys %{$val{$item}}) {
if ( ($val{$item}{$letter1} != $val{$item}{$letter2}) && ($letter1 ne
$letter2) && ( (!defined $done{$item}{$letter1}{$letter2}) ||
(!defined
$done{$item}{$letter2}{$letter1}) ) ) {
print "$item : $letter1, $val{$item}{$letter1}, $letter2 ,
$val{$item}
{$letter2}\n" ;
}
}
Really hard to follow the logic of your code.
But I seem to get the desired result with this:
[Edit] The code was edited as per the comments
use strict;
use warnings;
my (%hash,
);
while(my $line=<DATA>) {
chomp $line;
my ($letter, $object, $number)=split /,/, $line;
### here we are dealing with missing values
if ($number =~ m{^\s*$}) {
print $line, "missing value\n";
next;
}
### here we dissever exceptional items apart from the others
if ($letter=~m{^E[AC]X\d$}) {
$object = "exceptional_$object";
}
$number+=0; # in case there is whitespace at the end
push #{$hash{$object}{$number}}, [$letter,$number,$line];
}
for my $object(sort keys %hash) {
my $oref = $hash{$object};
if (1==keys %$oref) {
next;
}
my $str;
for my $item (values %$oref) {
$str .= $str ? " $item->[0][2]" : "$item->[0][2] is not matching with";
}
print ($str,"\n");
}
__DATA__
A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
ECX4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
EAX4,train,75
F,movie,
output
F,movie,missing value
A,car,69 is not matching with B,car,136
EAX4,train,75 is not matching with ECX4,train,95
C,food,85 is not matching with A,food,75
A,house,179 is not matching with B,house,150
What the algorithm does:
Looping through the input we remember the all the lines for each unique pair of object and number.
After going through the input loop we do the following:
For each object we skip it if it has no different numbers:
if (1==keys %$oref) {
next
}
If it has, we build an output string from a list of the first remembered lines for that object and number (that is we omit the duplicates for the object and number);
the first item from the list amended with "is not matching with".
Also, I am reading from the special filehandle DATA, which accesses embedded data in the script. This is for convenience of demoing the code

Text file to hash table

I want to create a hash table for the data in my file.
The file contains a bunch of commands that are written as
===|showcommand|
Every time I see this delimiter I want to create a hash key and store the data below it as an array in the value until it sees the next delimiter.
The next delimiter will do the same thing which is to create a hash key with the delimiter name and store the data on the next lines following it into an array as a value.
my %commands;
my $name;
my $body;
while (<>) {
if (my ($new_name) = /===\|([^|]*)\|/) {
$commands{$name} = $body if defined($name);
$name = $new_name;
$body = '';
} else {
$body .= $_;
}
}
$commands{$name} = $body if defined($name);
Assumes the body of the command starts on the line after the header, and stop on the line before the one with the next header.
You probably have it already working, but adding a small comment still, regarding the question on how to return the hash from a function.
Here's an example:
Input -file (used the following, which, I think contains similar structure as your input -file.
===|showcommand|
cmd1
cmd2
cmd3
cmd4
===|testcommand|
command1
command2
command3
===|anothercommand|
another1
another2
another3
another4
Perl -script:
use strict;
# Calling ReadCommandFile to build hash.
my %commands = ReadCommandFile("./commands.txt");
# ReadCommandFile - reads commands.txt and builds
# a hash.
sub ReadCommandFile()
{
my $file = shift;
my %hash = ();
my $name;
open(FILE, "<$file");
while(<FILE>)
{
if($_ =~ /===\|(.*)\|/)
{
$name = $1;
$hash{$name} = [];
}
else
{
my $line = $_;
$line =~ s/\n$//;
push(#{$hash{$name}}, $line);
}
}
close(FILE);
return %hash;
}
As a result, you should get the following hash (output from Data::Dumper):
$VAR1 = 'anothercommand';
$VAR2 = [
'another1',
'another2',
'another3',
'another4'
];
$VAR3 = 'showcommand';
$VAR4 = [
'cmd1',
'cmd2',
'cmd3',
'cmd4'
];
$VAR5 = 'testcommand';
$VAR6 = [
'command1',
'command2',
'command3'
];
You can then access individual elements like this:
Get the third command from "showcommand":
print "\nCommand #3: " . $commands{'showcommand'}[2];
Output: cmd3
The data from the file is copied to a hash and the commands are added as an array under the respective keywords.
Thanks!

mapping grep result to csv file

I'm trying to populate the grep result to csv file. But it is showing the following error.
"Use of uninitialized value in concatenation (.) or string at"
code:
sub gen_csv {
my $db_ptr = shift #_;
my $cvs_file_name = shift #_;
open( FILE, ">$cvs_file_name" ) or die("Unable to open CSV FILE $cvs_file_name\n");
print FILE "Channel no, Page no, \n";
foreach my $s ( #{$db_ptr} ) {
my $tmp = "$s->{'ch_no'},";
$tmp .= "$s->{'pg_no'},";
print FILE $tmp;
}
close(FILE);
}
sub parse_test_logs {
my $chnl;
my $page;
my $log = "sample.log";
open my $log_fh, "<", $log;
while ( my $line = <$log_fh> ) {
if ( $line =~ /(.*):.*solo_(.*): queueing.*/ ) {
my $chnl = $1;
my $page = $2;
}
my %test_details = (
'ch_no' => $chnl,
'pg_no' => $page, # <- was missing closing single quote
);
push( #{$dba_ptr}, \%test_details );
}
close log_fh;
}
Any suggestions on what i'm missing out?
(i'm getting the above error pointing to my $tmp = "$s->{'ch_no'},"; in gen_csv module)
Most likely this is due to NULL values in your DB records or the keys you are using are wrong. Either way, the warning is because the ch_no value does not exist.
If you don't care about NULL values, and you are fine with some of the values being missing, then you can suppress warnings for uninitialized values.
no warnings 'uninitialized';
Your problem involves this block:
if ( $line =~ /(.*):.*solo_(.*): queueing.*/ ) {
my $chnl = $1;
my $page = $2;
}
my %test_details = (
'ch_no' => $chnl,
'pg_no' => $page,
);
You're capturing your variables, but you have them declared with my within the if block. Those lexicals then go out of scope and are undef when used to initialize the hash.
I recommend simplifying your parsing function to the following:
sub parse_test_logs {
my $log = "sample.log";
open my $log_fh, "<", $log;
while (<$log_fh>) {
if ( my ( $chnl, $page ) = /(.*):.*solo_(.*): queueing.*/ ) {
push #{$dba_ptr}, { 'ch_no' => $chnl, 'pg_no' => $page };
} else {
warn "regex did not match for line $.: $_";
}
}
close $log_fh;
}
Finally, it's possible that you already are, but I just want to pass on the ever necessary advice to always include use strict; and use warnings; at the top of EVERY Perl script.

functional Perl: Filter, Iterator

I have to write Perl although I'm much more comfortable with Java, Python and functional languages. I'd like to know if there's some idiomatic way to parse a simple file like
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
I want a function that I pass an iterator over the lines of the files and that returns a map from keys to list of values. But to be functional and structured I'd like to:
use a filter that wraps the given iterator and returns an iterator without empty lines or comment lines
The mentioned filter(s) should be defined outside of the function for reusability by other functions.
use another function that is given the line and returns a tuple of key and values string
use another function that breaks the comma separated values into a list of values.
What is the most modern, idiomatic, cleanest and still functional way to do this? The different parts of the code should be separately testable and reusable.
For reference, here is (a quick hack) how I might do it in Python:
re_is_comment_line = re.compile(r"^\s*#")
re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$")
re_splitter = re.compile(r"\s*,\s*")
is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line))
and re_key_values.match(line)
def parse(lines):
interesting_lines = ifilter(is_interesting_line, imap(strip, lines))
key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines)
splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values)
return dict(splitted_values)
A direct translation of your Python would be
my $re_is_comment_line = qr/^\s*#/;
my $re_key_values = qr/^\s*(\w+)\s*=\s*(.*)$/;
my $re_splitter = qr/\s*,\s*/;
my $is_interesting_line= sub {
my $_ = shift;
length($_) and not /$re_is_comment_line/ and /$re_key_values/;
};
sub parse {
my #lines = #_;
my #interesting_lines = grep $is_interesting_line->($_), #lines;
my #key_values = map [/$re_key_values/], #interesting_lines;
my %splitted_values = map { $_->[0], [split $re_splitter, $_->[1]] } #key_values;
return %splitted_values;
}
Differences are:
ifilter is called grep, and can take an expression instead of a block as first argument. These are roughly equivalent to a lambda. The current item is given in the $_ variable. The same applies to map.
Perl doesn't emphazise laziness, and seldomly uses iterators. There are instances where this is required, but usually the whole list is evaluated at once.
In the next example, the following will be added:
Regexes don't have to be precompiled, Perl is very good with regex optimizations.
Instead of extracting key/values with regexes, we use split. It takes an optional third argument that limits the number of resulting fragments.
The whole map/filter stuff can be written in one expression. This doesn't make it more efficient, but emphazises the flow of data. Read the map-map-grep from bottom upwards (actually right to left, think of APL).
.
sub parse {
my %splitted_values =
map { $_->[0], [split /\s*,\s*/, $_->[1]] }
map {[split /\s*=\s*/, $_, 2]}
grep{ length and !/^\s*#/ and /^\s*\w+\s*=\s*\S/ }
#_;
return \%splitted_values; # returning a reference improves efficiency
}
But I think a more elegant solution here is to use a traditional loop:
sub parse {
my %splitted_values;
LINE: for (#_) {
next LINE if !length or /^\s*#/;
s/\A\s*|\s*\z//g; # Trimming the string—omitted in previous examples
my ($key, $vals) = split /\s*=\s*/, $_, 2;
defined $vals or next LINE; # check if $vals was assigned
#{ $splitted_values{$key} } = split /\s*,\s*/, $vals; # Automatically create array in $splitted_values{$key}
}
return \%splitted_values
}
If we decide to pass a filehandle instead, the loop would be replaced with
my $fh = shift;
LOOP: while (<$fh>) {
chomp;
...;
}
which would use an actual iterator.
You could now go and add function parameters, but do this only iff you are optimizing for flexibility and nothing else. I already used a code reference in the first example. You can invoke them with the $code->(#args) syntax.
use Carp; # Error handling for writing APIs
sub parse {
my $args = shift;
my $interesting = $args->{interesting} or croak qq("interesting" callback required);
my $kv_splitter = $args->{kv_splitter} or croak qq("kv_splitter" callback required);
my $val_transform= $args->{val_transform} || sub { $_[0] }; # identity by default
my %splitted_values;
LINE: for (#_) {
next LINE unless $interesting->($_);
s/\A\s*|\s*\z//g;
my ($key, $vals) = $kv_splitter->($_);
defined $vals or next LINE;
$splitted_values{$key} = $val_transform->($vals);
}
return \%splitted_values;
}
This could then be called like
my $data = parse {
interesting => sub { length($_[0]) and not $_[0] =~ /^\s*#/ },
kv_splitter => sub { split /\s*=\s*/, $_[0], 2 },
val_transform => sub { [ split /\s*,\s*/, $_[0] ] }, # returns anonymous arrayref
}, #lines;
I think the most modern approach consists in taking advantage of the CPAN modules. In your example, Config::Properties may helps:
use strict;
use warnings;
use Config::Properties;
my $config = Config::Properties->new(file => 'example.properties') or die $!;
my $value = $config->getProperty('key');
As indicated in the posts linked to by #collapsar, Higher-Order Perl is a great read for exploring functional techniques in Perl.
Here is an example that hits your bullet points:
use strict;
use warnings;
use Data::Dumper;
my #filt_rx = ( qr{^\s*\#},
qr{^[\r\n]+$} );
my $kv_rx = qr{^\s*(\w+)\s*=\s*([^\r\n]*)};
my $spl_rx = qr{\s*,\s*};
my $iterator = sub {
my ($fh) = #_;
return sub {
my $line = readline($fh);
return $line;
};
};
my $filter = sub {
my ($it,#r) = #_;
return sub {
my $line;
do {
$line = $it->();
} while ( defined $line
&& grep { $line =~ m/$_/} #r );
return $line;
};
};
my $kv = sub {
my ($line,$rx) = #_;
return ($line =~ m/$rx/);
};
my $spl = sub {
my ($values,$rx) = #_;
return split $rx, $values;
};
my $it = $iterator->( \*DATA );
my $f = $filter->($it,#filt_rx);
my %map;
while ( my $line = $f->() ) {
my ($k,$v) = $kv->($line,$kv_rx);
$map{$k} = [ $spl->($v,$spl_rx) ];
}
print Dumper \%map;
__DATA__
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
It produces the following hash on the provided input:
$VAR1 = {
'key2' => [
'value1',
'value2',
'value3'
],
'key1' => [
'value'
]
};
you might be interested in this SO question as well as this one.
the following code is a self-contained perl script destined to give you an idea of how to implement in perl (only partially in a functional style; in case you don't revulse seeing the particular coding style and/or language construct, i can refine the solution somewhat).
Miguel Prz is right that in most cases you'd search CPAN for solutions to match your requirements.
my (
$is_interesting_line
, $re_is_comment_line
, $re_key_values
, $re_splitter
);
$re_is_comment_line = qr(^\s*#);
$re_key_values = qr(^\s*(\w+)\s*=\s*(.*)$);
$re_splitter = qr(\s*,\s*);
$is_interesting_line = sub {
my $line = shift;
return (
(!(
!defined($line)
|| ($line eq '')
))
&& ($line =~ /$re_key_values/)
);
};
sub strip {
my $line = shift;
# your implementation goes here
return $line;
}
sub parse {
my #lines = #_;
#
my (
$dict
, $interesting_lines
, $k
, $v
);
#
#$interesting_lines =
grep {
&{$is_interesting_line} ( $_ );
} ( map { strip($_); } #lines )
;
$dict = {};
map {
if ($_ =~ /$re_key_values/) {
($k, $v) = ($1, [split(/$re_splitter/, $2)]);
$$dict{$k} = $v;
}
} #$interesting_lines;
return $dict;
} # parse
#
# sample execution goes here
#
my $parse =<<EOL;
# comment
what = is, this, you, wonder
it = is, perl
EOL
parse ( split (/[\r\n]+/, $parse) );

Relative Record Separator in Perl

I have a data that looks like this:
id:40108689 --
chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),
id:40108116 --
chr22_scrambled_bysegments:25375481:F : chr22_scrambled_bysegments:25375481:F (1.0),
chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),
id:1 --
chr22:21133765:F : chr22:21133765:F (0.0),
So each record is separated by id:[somenumber] --
What's the way to access the data so that we can have a hash of array:
$VAR = { 'id:40108689' => [' chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),'],
'id:40108116' => ['chr22_scrambled_bysegments:25375481:F :chr22_scrambled_bysegments:25375481:F (1.0)',
'chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),'
#...etc
}
I tried to approach this using record separator. But not sure how to generalize it?
{
local $/ = " --\n"; # How to include variable content id:[number] ?
while ($content = <INFILE>) {
chomp $content;
print "$content\n" if $content; # Skip empty records
}
}
my $result = {};
my $last_id;
while (my $line = <INFILE>) {
if ($line =~ /(id:\d+) --/) {
$last_id = $1;
next;
}
next unless $last_id; # Just in case the file doesn't start with an id line
push #{ $result->{$last_id} }, $line;
}
use Data::Dumper;
print Dumper $result;
Uses the normal record separator.
Uses $last_id to keep track of the last id row encountered and is set to the next id when another one is encountered. Pushes non-id rows on to an array for the hash key of the last matched id line.