Perl script (or anything) to total up CSV column - perl

I wrote (with lots of help from others) an awk command to total up a column in a CSV file. Unfortunately, I learned after some Googling that awk isn't great at handling CSV files due to the fact that the separator is not always the same (i.e. commas should be ignored when surround by quotes).
It seems that perhaps a Perl script could do better. Would it be possible to have a one-line Perl script (or something nearly as succinct) that achieves the same thing as this awk command that totals up the 5th column of a CSV file?
cat file.csv | awk -F "\"*,\"*" '{s+=$5} END {printf("%01.2f\n", s)}'
I'm not married to Perl in particular but I was hoping to avoid writing a full-blown PHP script. By this time I could have easily written a PHP script, but now that I've come this far, I want to see if I can follow it through.

You need to use a decent CSV parser to deal with all the complexities of CSV format. Text::CSV_XS (or Text::CSV if that's not avialable) is one of the preferred ones.
perl -e '{use Text::CSV_XS; my $csv=Text::CSV_XS->new(); open my $fh, "<", "file.csv" or die "file.csv: $!"; my $sum = 0; while (my $row = $csv->getline ($fh)) {$sum += $row->[4]}; close $fh; print "$sum\n";}'
Here's the actual Perl code, for better readability
use Text::CSV_XS; # use the parser library
my $csv = Text::CSV_XS->new(); # Create parser object
open my $fh, "<", "file.csv" or die "file.csv: $!"; # Open the file.
my $sum = 0;
while (my $row = $csv->getline ($fh)) { # $row is array of field values now
$sum += $row->[4];
}
close $fh;
print "$sum\n";
The above could be shortened by using slightly lesser quality but denser Perl:
cat file.csv | perl -MText::CSV_XS -nae '$csv=Text::CSV_XS->new();
$csv->parse($_); #f=$csv->fields(); $s+=$f[4]} { print "$s\n"'

Are you opposed to using a Perl module? You can use Text::CSV to do this easily without rolling your own parser.
Tutorial snippet changed to perform total:
# ... some tutorial code ommited
while (<CSV>) {
if ($csv->parse($_)) {
my #columns = $csv->fields();
$total += $columns[4];
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
print "total: $total\n";

Python
import csv
with open( "some_file.csv", "rb" ) as source:
rdr= csv.reader( source )
col_5= 0
for row in rdr:
col_5 += row[5]
print col_5
Not a one-liner, but pretty terse.

There are a number of tools that do this. A quick search for 'cli csvparser' lead me to several tools (which I apparently can't link to--possibly to prevent spamming).
I installed the first one I found--csvtool--and was able to do a similar command line as yours and get a total.

Pretty short (and fast) solution:
perl -MText::CSV_XS -E'$c=new Text::CSV_XS;$s+=$r->[4]while$r=$c->getline(*ARGV);say$s' file.csv

Related

How do I copy a CSV file, but skip the first line?

I want to write a script that takes a CSV file, deletes its first row and creates a new output csv file.
This is my code:
use Text::CSV_XS;
use strict;
use warnings;
my $csv = Text::CSV_XS->new({sep_char => ','});
my $file = $ARGV[0];
open(my $data, '<', $file) or die "Could not open '$file'\n";
my $csvout = Text::CSV_XS->new({binary => 1, eol => $/});
open my $OUTPUT, '>', "file.csv" or die "Can't able to open file.csv\n";
my $tmp = 0;
while (my $line = <$data>) {
# if ($tmp==0)
# {
# $tmp=1;
# next;
# }
chomp $line;
if ($csv->parse($line)) {
my #fields = $csv->fields();
$csvout->print($OUTPUT, \#fields);
} else {
warn "Line could not be parsed: $line\n";
}
}
On the perl command line I write: c:\test.pl csv.csv and it doesn't create the file.csv output, but when I double click the script it creates a blank CSV file. What am I doing wrong?
Your program isn't ideally written, but I can't tell why it doesn't work if you pass the CSV file on the command line as you have described. Do you get the errors Could not open 'csv.csv' or Can't able to open file.csv? If not then the file must be created in your current directory. Perhaps you are looking in the wrong place?
If all you need to do is to drop the first line then there is no need to use a module to process the CSV data - you can handle it as a simple text file.
If the file is specified on the command line, as in c:\test.pl csv.csv, you can read from it without explicitly opening it using the <> operator.
This program reads the lines from the input file and prints them to the output only if the line counter (the $. variable) isn't equal to one).
use strict;
use warnings;
open my $out, '>', 'file.csv' or die $!;
while (my $line = <>) {
print $out $line unless $. == 1;
}
Yhm.. you don't need any modules for this task, since CSV ( comma separated value ) are simply text files - just open file, and iterate over its lines ( write to output all lines except particular number, e.g. first ). Such task ( skip first line ) is so simple, that it would be probably better to do it with command line one-liner than a dedicated script.
quick search - see e.g. this link for an example, there are numerous tutorials about perl input/output operations
http://learn.perl.org/examples/read_write_file.html
PS. Perl scripts ( programs ) usually are not "compiled" into binary file - they are of course "compiled", but, uhm, on the fly - that's why /usr/bin/perl is called rather "interpreter" than "compiler" like gcc or g++. I guess what you're looking for is some editor with syntax highlighting and other development goods - you probably could try Eclipse with perl plugin for that ( cross platform ).
http://www.eclipse.org/downloads/
http://www.epic-ide.org/download.php/
this
user#localhost:~$ cat blabla.csv | perl -ne 'print $_ if $x++; '
skips first line ( prints out only if variable incremented AFTER each use of it is more than zero )
You are missing your first (and only) argument due to Windows.
I think this question will help you: #ARGV is empty using ActivePerl in Windows 7

getting "2032 - EIF - CR char inside unquoted, not part of EOL # pos" error with my perl script

I have written a simple perl script to read a line from a .csv file. The code is as per below:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ binary => 1 });
open my $fh, "<", "testresults.csv" or die "testresults.csv $!";
while ( my $row = $csv->getline( $fh ) ) {
my #fields = #$row;
}
$csv->eof or $csv->error_diag;
close $fh or die "testresults.csv $!";
And the testresults.csv file looks like this:
Node1,Node2,Node3,Node4,map1,map2,map3,map4,map5,map6,map7,map8,DM,LAT,AVG,product
on the first line followed by the results on each line:
name1,name2,name3,name4,Node1,Node2,Node3,Node4,Node5,Node6,Node7,Node8,0%,
0.002835480002,0.1714008533,4.86003691857886E-04
and so on.
I am getting the following error with my code when I do a ./filename.pl from the command prompt:
CSV_XS ERROR: 2032 - EIF - CR char inside unquoted, not part of EOL # pos 420
I tried to google for this error but could not fathom much into this error.
It would seem from our conversation in the comments that the error comes of the strings in the input being interlaced with null characters, being made visible by using
use Data::Dumper;
$Data::Dumper::Useqq = 1;
while (<$fh>) {
print Dumper $_;
}
A quick hack is to strip the null characters in the input file with something like:
perl -i.bak -pwe 'tr/\0//d' testresults.csv
NOTE: But as has been pointed out in comments from people more experienced in encoding matters, this can/could/should be solved by decoding your data instead. Just stripping the bad symbols might break your data in subtle ways, and is not an ideal solution.
I'm sorry, I do not know much about that, but using Text::CSV::Encoded does sound like a good start, like cjm suggested.

Search large CSV files with multiple search criteria on unix

I have several large CSV files that I need to search with 1 to many parameters, if I find a hit I need to save that line in another file. Below is an example of perl code that runs successfully but is very slow against a 5gb file. Any suggestions on speeding this up would be greatly appreciated.
#!/usr/bin/env perl
use Text::CSV_XS;
$numArgs = $#ARGV;
#First Parameter is the input file name
$Finput = $ARGV[0];
chomp($Finput);
#Second Parameter is the output file name
$Foutput = $ARGV[1];
chomp($Foutput);
# Open the Control file but quit if it doesn't exist
open(INPUT1, $Finput) or die "The Input File $Finput could not be found.\n";
open(OUTPUT1, ">$Foutput") or die "Cannot open output $Foutout file.\n";
my $csv = Text::CSV_XS->new();
open my $FH, "<", $Finput;
while (<$FH>) {
$csv->parse($_);
my #fields = $csv->fields;
if ($fields[0] == 10000) {
if ($fields[34] eq 'abcdef') {
if ($fields[103] == 9999) {
print OUTPUT1 "$_\n";
}
}
}
}
I don't know your data, or your criteria.
But if we could use your example given above, then I would try trivial tests against the lines BEFORE doing the CSV handling.
For example (note, my perl is terrible, this is meant to be exemplar, not correct):
if (/.*10000.*abcdef.*9999.*/) {
$csv->parse($_);
if ($fields[0] = 10000) {
...
}
}
Basically, you do some simpler, faster checks to more quickly DISQUALIFY rows before performing the additional processing necessary to qualify them.
Clearly if more of your rows match than do not, or if the check for simple qualification isn't really practical, then this technique won't work.
Done right, CSV parsing is a bit expensive (in fact you have a error here assuming that a single line of CSV is a single record, that may be true for your data, but CSV actually allows embedded newlines, so it's not a generic assumption that can be made for all CSV).
So, it's good to not have to pay the price of parsing it if, "at a glance", the line isn't going to match anyway.
This is code that runs "successfully"? I find that hard to believe.
if ($fields[0] = 10000) {
if ($fields[34] = 'abcdef') {
if ($fields[103] = 9999) {
These are not checks for equality, but assignments. All of these if-clauses will always return true. What you probably wanted here was == and eq, not =.
You also open two filehandles on the input file, and use the CSV module in the wrong way. I'm not convinced that these minor errors should cause the script to be too slow, but it would be printing all the records in that 5gb file.
Here's a revised version of your script.
use strict;
use warnings;
use Text::CSV;
use autodie;
my $Finput = $ARGV[0];
my $Foutput = $ARGV[1];
open my $FH, "<", $Finput;
open my $out, ">", $Foutput;
my $csv = Text::CSV->new();
while (my $row = $csv->getline($FH)) {
my #fields = #$row;
if ($fields[0] == 10000) {
if ($fields[34] eq 'abcdef') {
if ($fields[103] == 9999) {
$csv->print($out, $row);
}
}
}
}
The autodie pragma will take care of checking the return value from open for us (and other things). use strict; use warnings; will make our brains hurt less. Oh, and I am using Text::CSV, not the _XS version.
You want to use grep "{searchstring}" filename1.csv filename2.csv > savefile.txt on each file. Maybe you want to read the filename.csv line-by-line:
#!/bin/bash
exec 3<filename.csv
while read haystack <&3
do
grep "{needle}" $haystack > result.txt
done

Perl script that works the same as unix command "history | grep keyword"

in Unix, what I want to do is "history | grep keyword", just because it takes quite some steps if i wanna grep many types of keywords, so I want it to be automation, which I write a Perl script to do everything, instead of repeating the commands by just changing the keyword, so whenever I want to see those certain commands, I will just use the Perl script to do it for me.
The keyword that I would like to 'grep' is such as source, ls, cd, etc.
It can be printed out in any format, as long as to know how to do it.
Thanks! I appreciate any comments.
modified (thanks to #chas-owens)
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = ".bash.history";
open FILE, "<", $historyFile or die "could not open $historyFile: $!";
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
original
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = "<.bash.history";
open FILE, $historyFile;
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
to be honest ... history | grep whatever is clean and simple and nice ; )
note code may not be perfect
because it takes quite some steps if i wanna grep many types of keywords
history | grep -E 'ls|cd|source'
-P will switch on the Perl compatible regular expression library, if you have a new enough version of grep.
This being Perl, there are many ways to do it. The simplest is probably:
#!/usr/bin/perl
use strict;
use warnings;
my $regex = shift;
print grep { /$regex/ } `cat ~/.bash_history`;
This runs the shell command cat ~/.bash_history and returns the output as a list of lines. The list of lines is then consumed by the grep function. The grep function runs the code block for every item and only returns the ones that have a true return value, so it will only return lines that match the regex.
This code has several things wrong with it (it spawns a shell to run cat, it holds the entire file in memory, $regex could contain dangerous things, etc.), but in a safe environment where speed/memory isn't an issue, it isn't all that bad.
A better script would be
#!/usr/bin/perl
use strict;
use warnings;
use constant HISTORYFILE => "$ENV{HOME}/.bash_history";
my $regex = shift;
open my $fh, "<", HISTORYFILE
or die "could not open ", HISTORYFILE, ": $!";
while (<$fh>) {
next unless /$regex/;
print;
}
This script uses a constant to make it easier to change which history file it is using at a latter date. It opens the history file directly and reads it line by line. This means the whole file is never in memory. This can be very important if the file is very large. It still has the problem that $regex might contain a harmful regex, but so long as you are the person running it, you only have yourself to blame (but I wouldn't let outside users pass arguments to a command like this through, say a web application).
I think you are better off writing a perlscript which does you fancy matching (i.e. replaces the grep) but does not read the history file. I say this because the history does not appear to be flushed to the .bash_history file until I exit the shell. Now there are probably settings and/or environment variables to control this, but I don't know what they are. So if you just write a perl script which scanns STDIN for your favourite commands you can invoke it like
history | findcommands.pl
If its less typing you are after set up a shell function or alias to do this for you.
As requested by #keifer here is a sample perl script which searches for a specified (or default set of commands in your history). Onbiously you should change the dflt_cmds to whichever ones you search for most frequently.
#!/usr/bin/perl
my #dflt_cmds = qw( cd ls echo );
my $cmds = \#ARGV;
if( !scalar(#$cmds) )
{
$cmds = \#dflt_cmds;
}
while( my $line = <STDIN> )
{
my( $num, $cmd, #args ) = split( ' ', $line );
if( grep( $cmd eq $_ , #$cmds ) )
{
print join( ' ', $cmd, #args )."\n";
}
}

Why do I get the error message »Failed to parse line« with Text::CSV?

I'm trying to parse a CSV file in Perl, but don't really understand examples I found on the Internet. Could someone explain me this example?
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file = 'dhcp.csv';
my $csv = Text::CSV->new();
open (CSV, "<", $file) or die $!;
while (<CSV>) {
next if ($. == 1);
if ($csv->parse($_)) {
my #columns = $csv->fields();
print "Name: $columns[0]\n\tContact: $columns[4]\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close CSV;
When I run it, I get Failed to parse line. What does the $. stand for? And $_?
My goal is to find the line where there is the computer name I search for. After that, I can find the corresponding MAC address. I hope this is comprehensible, thanks.
EDIT:
My CSV file looks like:
172.30.72.22,DEC-16.rec.local,001676b755d6,Bart SIMPSONS,Ordinateur de bureau,DEC/DECVA,002,SR2 0.12,,Accès complet,N/D,Aucun
172.30.72.23,DEC-20.rec.local,001688b7bfdc,Larry Wall,Ordinateur de bureau,DEC/DECVA,003,?,,Accès complet,N/D,Aucun
Field #2 is the hostname, I want to resolve field #3 (MAC address) by field #2.
EDIT n°2:
In fact, don't need to parse the CSV file for my purpose. I found a bash solution, fast enough for my application.
my $macAdd = `cat dhcp.csv | grep {computerName} | cut -d ',' -f 5`
Done !
Thanks for your help, one day I'll have to parse a csv file, sure.
3rd edit : don't know who edited my post and the topic question, but that's not it at all !
$. is input line number. $_ is the "magic" default variable which many Perl operators (including <>) act upon unless instructed otherwise.
Look them up in perldoc perlvar for details.
BTW if you stuff $. into the error message you'll at least know which line fails.
EDIT: I replaced error_input with error_diag and now it says: 2037EIF - Binary character in unquoted field, binary off106. After adding my $csv = Text::CSV->new ({binary=>1}); the lines parsed OK.
So it looks like the accented characters confused Text::CSV.
It is good practice in these days making script utf-8 compliant, so:
use strict;
use warnings;
use Carp;
#use utf8; #uncomment, if in this script want use utf8 characters
use Text::CSV;
my $csv = Text::CSV->new();
my $file = 'dhcp.csv';
open(my $fh, "<:encoding(UTF-8)", $file) || croak "can't open $file: $!";
while (<$fh>) {
#next if ($. == 1); #uncomment, if your data file has header line too
if ($csv->parse($_)) {
my #columns = $csv->fields();
print "Name: $columns[0]\n\tContact: $columns[4]\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close $fh;
You could try using the $csv->error_diag method to find out what the module doesn't like about your input.
And then you could turn on binary data handling to get it working. But I strongly suspect you should be looking at Text::CSV::Encoded instead.
I disagree with the folks about using your own CSV parser.
Instead, my suggestion is to use a simpler CSV parser like Parse::CSV. It's an easy-to-use module. The very first example in the documentation should be enough to give you a painless start.
It is always good to open CSV files in binary mode:
my $csv = Text::CSV->new({ binary => 1});
May be your CSV file is encoded in UTF8 or any other charset. Read Text::CSV documentation for more info.