How can I extract fields from a CSV file in Perl? - perl

I want to extract a particular fields from a csv file (830k records) and store into hash. Is there any fast and easy way to do in Perl with out using any external methods?
How can I achieve that?

Use Text::CSV_XS. It's fast, moderately flexible, and extremely well-tested. The answer to many of these questions is something on CPAN. Why spend the time to make something not as good as what a lot of people have already perfected and tested?
If you don't want to use external modules, which is a silly objection, look at the code in Text::CSV_XS and do that. I'm constantly surprised that people think that even though they think they can't use a module they won't use a known and tested solution as example code for the same task.

assuming normal csv (ie, no embedded commas), to get 2nd field for example
$ perl -F"," -lane 'print $F[1];' file

See also this code fragment taken from The Perl Cookbook which is a great book in itself for Perl solutions to common problems

using split command would do the job I guess. (guessing columns are separated by commas and no commas present in fields)
while (my $line = <INPUTFILE>){
#columns= split ('<field_separator>',$line); #field separator is ","
}
and then from elements of the "column" array you can construct whatever hash you like.

Related

Is there added value in using Text::CSV for writing output?

If I am writing tabular output to a file as CSV, what advantage does loading an extra module
Text::CSV
and converting my data to an object get me over a basic loop and string manipulation? I have seen a couple of answers that suggest doing this: How can I get column names and row data in order with DBI in Perl and How do I create a CSV file using Perl.
Loading up an entire module seems like overkill and significant overhead for something I can write in four lines of Perl (ignoring data retrieval, etc.):
my $rptText = join(',', map { qq/"$_"/ } #head) . "\n";
foreach my $person ( #$data ) {
$rptText .= join(',', map { qq/"$person->{$_}"/ } #head) . "\n";
}
So what does loading
Text::CSV
get me over the above code?
For simple trivial data (which, admittedly, is quite common) there's no advantage. You can join , print, and go on your merry way. For a quick throw-away script that's likely all you need (and faster, too, if you'd need to consult the Text::CSV documentation).
The advantages start when your data is non-trivial.
Does it contain commas?
Does it contain double-quotes?
Does it contain newlines?
Does it contain non-ASCII characters?
Is there a distinction between undef (no value) and '' (empty string)?
Might you need to use a different separator (or let the user specify it)?
For production code, the standard pro-module advice applies:
Reuse instead of reinvent (particularly if you'll be generating more than one CSV file)
It keeps your code cleaner (more consistent and with better separation of concerns)
CPAN modules are often faster (better optimized), more robust (edge-case handling), and have a cleaner API than in-house solutions.
The overhead of loading a module is almost certainly a non-issue.
Your CSV won't be valid if $person->{col1} contains ". Also, all columns will be wrapped in double quotes, which might not be desired (e.g. numbers). You'll also get "" for undefined values, which might not work if you plan to load the CSV into a database that makes a distinction between null and ''.

Date naipulation in Perl

My script is passed a date parameter in the format YYYYMMDD, like for example 20130227. I need to check whether it's a Monday. If yes then I need to retrieve the previous four days date values otherwise I should retrieve the two previous days' date values and store them in array.
For example if the parameter is 20130227 and it's a Monday, then I need to store ('20130227' '20130226' '20130225' '20130224') in an array. If it's not a Monday then I need to store only ('20130227' '20130226') in an array.
What perl function can I use for doing this? I am using perl on solaris 10.
Not all standard Perl commands are listed in the standard Perl list of commands. This is a big confusion for beginners and the main reason you end up seeing beginners use a lot of system commands to do things that could be done directly in Perl.
Many Perl commands are available if you include the Perl Module for that command. For example, you want to copy a file from one place to another, but there's no Perl copy command listed as a standard function. Many people end up doing this:
system ("cp $my_file $new_location");
However, there is a standard Perl module called File::Copy that includes the missing copy command:
use File::Copy;
copy ($my_file, $new_location);
The File::Copy module is included with Perl, but you have to know about modules and how they're used. Unfortunately, although they're a major part of Perl, they're simply not included in many Perl beginner books.
I am assuming your confusion comes from the fact you're looking for some command in Perl and not finding it. However, the Time::Piece module is a standard Perl module since Perl 5.10 that is used for date manipulation. Unfortunately, it's an object oriented module which can make its syntax a bit strange to users who aren't familiar with Object Oriented Perl. Fortunately, it's really very simple to use.
In object oriented programming, you create an object that contains your data. The object itself cannot easily be displayed, but contains all of the features of your data, and various methods (basically subroutines) can be used to get information on that object.
First thing you need to do is create a time object of your date:
my $time_obj = Time::Piece->strptime("20130227", "%Y%m%d");
Now, $time_obj represents your date. The 20130227 represents your date string. The %Y%m%d represents the format of your string (YYYYMMDD). Unfortunately the Time::Piece documentation doesn't tell you how the format works, but the format characters are documented in the Unix strftime manpage.
Once you have your time object, you can query it with all sorts of method (aka subroutines):
if ( $time_obj->day_of_week == 1 ) {
print "This is a Monday\n";
}
Read the documentation and try it out.
The generic toolkit for handling dates in Perl would be the DateTime module. It comes with a huge range of date parsing choices to get your strings formats in and out, and can easily query e.g. day-of-week.
A more lightweight, fast and recommended alternative might be Date::ISO8601 - your formats are quite close to that ISO format, but you would need to be willing to do a bit of manipulation on the variables e.g. my ($yyyy, $mm, $dd) = ( substr($d, 0,4), substr( $d, 4, 2 ), substr( $d, 6, 2 ) ); will grab the year month and day strings from your examples to feed to the module's constructor.
Please give these at least a try, and if you get stuck post some code on your question. Once you have some attempted code in the question, it is much quicker for someone to answer by filling in just the bits you don't know - you probably know a lot more about the solution you want than you think!
I'm not keen on helping someone who appears to have made no effort to them themselves. So I'm not going to give you an answer, but I'll suggest that you look at the Time::Piece module (a standard part of Perl since version 5.10). And, in particular, its strftime method.
That should be enought to get you started.

Which module to use for storing small cache files in Perl?

I've written a script that needs to store a small string in-between runs. Which CPAN module could I use to make the process as simple as possible? Ideally I'd want something like:
use That::Module;
my $static_data = read_config( 'script-name' ); # read from e.g. ~/.script-name.data
$static_data++;
write_config( 'script-name', $static_data ); # write to e.g. ~/.script-name.data
I don't need any parsing of the file, just storage. There's a lot of different OSes and places to store these files in out there, which is why I don't want to do that part myself.
Just use Storable for portable persistence of Perl data structures and File::HomeDir for portable "general config place" finding:
use File::HomeDir;
use FindBin qw($Script);
use Storable qw(nstore);
# Generate absolute path like:
# /home/stas/.local/share/script.pl.data
my $file = File::Spec->catfile(File::HomeDir->my_data(), "$Script.data");
# Network order for better endianess compatibility
nstore \%table, $file;
$hashref = retrieve($file);
If it's just a single string (eg, 'abcd1234'), just use a normal file and write to it with open.
If you're looking for something a bit more advanced, take a look at Config::Simple or JSON::XS. Conifg::Simple has its own function to write out to a file, and JSON can just use a plain open.
May be this can help you - http://www.stonehenge.com/merlyn/UnixReview/col53.html. But I think you cannot avoid using work with files and directories.
The easiest way that I know how to do this (rather than rolling by hand) is to use DBM::Deep.
Every time I post about this module I get hate posts responding that its too slow, so please don't do that.

A module to generate a table automatically in perl

I have some content on my STDOUT and i want that content need to be arranged in to a descent table.
Can anyone suggest me a Perl module that does handle this kind of requirement
Thanks in Advance, any small help is appreciated.
Thanks!
Aditya
Text::Table and Text::ASCIITable make two different outputs, the latter having outlines. I'm sure there are more hanging around CPAN. You also might look at formats, a little-used bit of Perl functionality, meant for formatting reports.
From CPAN, you can use Text::Table
Assuming you are wanting to pipe the STDOUT from the existing program in to something else to format it, you can do something like this using printf
Create a perl script called process.pl
#/bin/perl
use strict;
while (<>) {
my $unformatted_input = $_;
# Assuming you want to split on spaces, adjust if it is in fixed format.
my #elements = split / +/, $unformatted_input, 4;
# Printf format string, you can adjust lengths here. This would take
# an input of items in the elements array and make each file 10 characters
# See http://perldoc.perl.org/functions/sprintf.html for options
my $format_string='%10s%10s%10s%10s';
printf($format_string,#elements);
}
Then, pipe your STDOUT to this and it will format it to screen:
$ yourProcessThatDoesStdout | process.pl

How can i count the respective lines for each sub in my perl code?

I am refactoring a rather large body of code and a sort of esoteric question came to me while pondering where to go on with this. What this code needs in large parts is shortening of subs.
As such it would be very advantageous to point some sort of statistics collector at the directory, which would go through all the .pm, .cgi and .pl files, find all subs (i'm fine if it only gets the named ones) and gives me a table of all of them, along with their line count.
I gave PPI a cursory look, but could not find anything directly relevant, with some tools that might be appropiate, but rather complex to use.
Are there any easier modules that do something like this?
Failing that, how would you do this?
Edit:
Played around with PPI a bit and created a script that collects relevant statistics on a code base: http://gist.github.com/514512
my $document = PPI::Document->new($file);
# Strip out comments and documentation
$document->prune('PPI::Token::Pod');
$document->prune('PPI::Token::Comment');
# Find all the named subroutines
my $sub_nodes = $document->find(
sub { $_[1]->isa('PPI::Statement::Sub') and $_[1]->name } );
print map { sprintf "%s %s\n", $_->name, scalar split /\n/, $_->content } #$sub_nodes;
I'm dubious that simply identifying long functions is the best way to identify what needs to be refactored. Instead, I'd run the code through perlcritic at increasing levels of harshness and follow the suggestions.