Keeping only certain fields on a CSV file - perl

I have a couple of CSV files which have a lot of fields, but I only need to keep a few of them, so I wanted to get rid of the extra data before importing them.
I tought of running:
perl -i.bak -F, -ane 'BEGIN {$,=","} print #F[3..6], #F[9..12]' file.csv
Although text fields are quoted, some fields contain commas and this simple solution does not work.

Use Text::CSV. It handles fields containing the delimiter, among many other nice features.
use strict;
use warnings;
use File::Copy;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
eol => $/,
always_quote => 1
}) or die 'Cannot use CSV: ' . Text::CSV->error_diag();
my $file = 'input.csv';
my $backup = "$file.bak";
copy $file, $backup or die "Copy failed: $!";
open my $in_fh, '<', $backup or die "$backup: $!";
open my $out_fh, '>', $file or die "$file: $!";
while (my $row = $csv->getline($in_fh)) {
my #wanted = #$row[3..6,9..12];
$csv->print($out_fh, \#wanted);
}
close $in_fh;
close $out_fh;

Related

Perl Script: sorting through log files.

Trying to write a script which opens a directory and reads bunch of multiple log files line by line and search for information such as example:
"Attendance = 0 " previously I have used grep "Attendance =" * to search my information but trying to write a script to search for my information.
Need your help to finish this task.
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/path/';
opendir (DIR, $dir) or die $!;
while (my $file = readdir(DIR))
{
print "$file\n";
}
closedir(DIR);
exit 0;
What's your perl experience?
I'm assuming each file is a text file. I'll give you a hint. Try to figure out where to put this code.
# Now to open and read a text file.
my $fn='file.log';
# $! is a variable which holds a possible error msg.
open(my $INFILE, '<', $fn) or die "ERROR: could not open $fn. $!";
my #filearr=<$INFILE>; # Read the whole file into an array.
close($INFILE);
# Now look in #filearr, which has one entry per line of the original file.
exit; # Normal exit
I prefer to use File::Find::Rule for things like this. It preserves path information, and it's easy to use. Here's an example that does what you want.
use strict;
use warnings;
use File::Find::Rule;
my $dir = '/path/';
my $type = '*';
my #files = File::Find::Rule->file()
->name($type)
->in(($dir));
for my $file (#files){
print "$file\n\n";
open my $fh, '<', $file or die "can't open $file: $!";
while (my $line = <$fh>){
if ($line =~ /Attendance =/){
print $line;
}
}
}

In Perl, how can filter all log files in a directory, and extract interesting lines?

I'm trying to select only the .log files in my directory and then search in those files for the word "unbound" and print the entire line into a new output file with the same name as the log file (number###.log) but with a .txt extension. This is what I have so far:
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
my $outpath = $ARGV[1];
my #files;
my $files;
opendir(DIR,$path) or die "$!";
#files = grep { /\.log$/} readdir(DIR);
my #out;
my $out;
opendir(OUT,$outpath) or die "$!";
my $line;
foreach $files (#files) {
open (FILE, "$files");
my #line = <FILE>;
my $regex = Unbound;
open (OUT, ">>$out");
print grep {$line =~ /$regex/ } <>;
}
close OUT;
close FILE;
closedir(DIR);
closedir (OUT);
I'm a beginner, and I don't really know how to create a new text file with the acquired output.
Few things I'd suggest to improve this code:
declare your loop iterators within the loop. foreach my $file ( #files ) {
use 3 arg open: open ( my $input_fh, "<", $filename );
use glob rather than opendir then grep. foreach my $file ( <$path/*.txt> ) {
grep is good for extracting things into arrays. Your grep reads the whole file to print it, which isn't necessary. Doesn't matter much if the file is short though.
perltidy is great for reformatting code.
you're opening 'OUT' to a directory path (I think?) which isn't going to work.
$outpath isn't, it's a file. You need to do something different to output to different files. opendir isn't really valid to an output.
because you're using opendir that's actually giving you filenames - not full paths. So you might be in the wrong place to actually open the files. Prepending the path name, doing a chdir are possible solutions. But that's one of the reasons I like glob because it returns a path as well.
So with that in mind - how about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
#Extract paths
my $input_path = $ARGV[0];
my $output_path = $ARGV[1];
#Error if paths are invalid.
unless (defined $input_path
and -d $input_path
and defined $output_path
and -d $output_path )
{
die "Usage: $0 <input_path> <output_path>\n";
}
foreach my $filename (<$input_path/*.log>) {
# extract the 'name' bit of the filename.
# be slightly careful with this - it's based
# on an assumption which isn't always true.
# File::Spec is a more powerful way of accomplishing this.
# but should grab 'number####' from /path/to/file/number####.log
my $output_file = basename ( $filename, '.log' );
#open input and output filehandles.
open( my $input_fh, "<", $filename ) or die $!;
open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;
print "Processing $filename -> $output_path/$output_file.txt\n";
#iterate input, extracting into $line
while ( my $line = <$input_fh> ) {
#check if $line matches your RE.
if ( $line =~ m/Unbound/ ) {
#write it to output.
print {$output_fh} $line;
}
}
#tidy up our filehandles. Although technically, they'll
#close automatically because they leave scope
close($output_fh);
close($input_fh);
}
Here is a script that takes advantage of Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding #Sobrique's solution, but using modules such as Path::Tiny or Path::Class will make it easier to write these one off scripts more quickly, and correctly.
Also, I didn't really test this script, so watch out for bugs.
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
run(\#ARGV);
sub run {
my $argv = shift;
unless (#$argv == 2) {
die "Need source and destination paths\n";
}
my $it = path($argv->[0])->realpath->iterator({
recurse => 0,
follow_symlinks => 0,
});
my $outdir = path($argv->[1])->realpath;
while (my $path = $it->()) {
next unless -f $path;
next unless $path =~ /[.]log\z/;
my $logfh = $path->openr;
my $outfile = $outdir->child($path->basename('.log') . '.txt');
my $outfh;
while (my $line = <$logfh>) {
next unless $line =~ /Unbound/;
unless ($outfh) {
$outfh = $outfile->openw;
}
print $outfh $line;
}
close $outfh
or die "Cannot close output '$outfile': $!";
}
}
Notes
realpath will croak if the path provided does not exist.
Similarly for openr and openw.
I am reading input files line-by-line to keep the memory footprint of the program independent of the sizes of input files.
I do not open the output file until I know I have a match to print to.
When matching a file extension using a regular expression pattern, keep in mind that \n is a valid character in Unix file names, and the $ anchor will match it.

Split string into variables and use output as element

Well.. I'm stuck again. I've read up quite a few topic with similar problems but not finding a solution for mine. I have a ; delimited csv file and the strings at the 8th column ($elements[7]) is as following: "aaaa;bb;cccc;ddddd;eeee;fffff;gg;". What i'm trying is to split the string based on ; and capture the outputs to variables. Then use those variables in the main csv file in their own column.
So now the file is like:
3d;2f;7j;8k;4s;2b;5g;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";4g;1a;5g;2g;7h;3d;2f;7j
3c;9k;5l;4g;1a;5g;3d;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";4g;1a;5g;2g;7h;3d;2f;7j
4g;1a;5g;2g;7h;3d;8k;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";3d;2f;7j;8k;4s;2b;4g;1a
And i want it like:
3d;2f;7j;8k;4s;2b;5g;4g;1a;5g;2g;7h;3d;2f;7j;aaaa;bb;cccc;ddddd;eeee;fffff;gg
3c;9k;5l;4g;1a;5g;3d;4g;1a;5g;2g;7h;3d;2f;7j;aaaa;bb;cccc;ddddd;eeee;fffff;gg;
4g;1a;5g;2g;7h;3d;8k;3d;2f;7j;8k;4s;2b;4g;1a;aaaa;bb;cccc;ddddd;eeee;fffff;gg;
This is my code i've been trying it with. I know.. it's terrible! But i'm hoping someone can help me?
use strict;
use warnings;
my $inputfile = shift || die "Give files\n";
my $outputfile = shift || die "Give output\n";
open my $INFILE, '<', $inputfile or die "In use / Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "In use :$!\n";
while (<$INFILE>) {
s/"//g;
my #elements = split /;/, $_;
my ($varA, $varB, $varC, $varD, $varE, $varF, $varG, $varH) split (';', $elements[10]);
$elements[16] = $varA;
$elements[17] = $varB;
$elements[18] = $varC;
$elements[19] = $varD;
$elements[20] = $varE;
$elements[21] = $varF;
$elements[22] = $varG;
$elements[23] = $varH;
my $output_line = join(";", #elements);
print $OUTFILE $output_line;
}
close $INFILE;
close $OUTFILE;
exit 0;
I'm confused about the my statement as well, it shouldn't be possible right? I mean the $vars are in a closed part so it shouldn't be possible to write them to $elements?
EDIT
This is how i adjusted the code with TLP's suggestions:
use strict;
use warnings;
use Text::CSV;
my $inputfile = shift || die "Give files\n";
my $outputfile = shift || die "Give output\n";
open my $INFILE, '<', $inputfile or die "In use / Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "In use :$!\n";
my $csv = Text::CSV->new({ # create a csv object
sep_char => ";", # delimiter
eol => "\n", # adds newline to print
});
while (my $row = $csv->getline($INFILE)) { # $row is an array ref
my $line = splice(#$row, 10, 1); # remove 8th line
$csv->parse($line); # parse the line
push #$row, $csv->fields(); # push newly parsed fields onto main array
$csv->print($OUTFILE, $row);
}
close $INFILE;
close $OUTFILE;
exit 0;
You should use a CSV module, e.g. Text::CSV to parse your data. Here's a brief example on how it can be done. You can replace the file handles I used below with your own.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ # create a csv object
sep_char => ";", # delimiter
eol => "\n", # adds newline to print
});
while (my $row = $csv->getline(*DATA)) { # $row is an array ref
my $line = splice(#$row, 7, 1); # remove 8th line
$csv->parse($line); # parse the line
push #$row, $csv->fields(); # push newly parsed fields onto main array
$csv->print(*STDOUT, $row);
}
__DATA__
3d;2f;7j;8k;4s;2b;5g;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";4g;1a;5g;2g;7h;3d;2f;7j
3c;9k;5l;4g;1a;5g;3d;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";4g;1a;5g;2g;7h;3d;2f;7j
4g;1a;5g;2g;7h;3d;8k;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";3d;2f;7j;8k;4s;2b;4g;1a
Output:
3d;2f;7j;8k;4s;2b;5g;4g;1a;5g;2g;7h;3d;2f;7j;aaaa;bb;cccc;ddddd;eeee;fffff;gg;
3c;9k;5l;4g;1a;5g;3d;4g;1a;5g;2g;7h;3d;2f;7j;aaaa;bb;cccc;ddddd;eeee;fffff;gg;
4g;1a;5g;2g;7h;3d;8k;3d;2f;7j;8k;4s;2b;4g;1a;aaaa;bb;cccc;ddddd;eeee;fffff;gg;

What am I doing wrong in my Perl script written to parse a CSV file?

I have two scripts in which I'm experimenting with CSV_XS. In the first, I hard-coded everything: source directory, filename, and the csv delimiter I wanted to look for. The script works great. In the second, however, I try to dynamically discover as much as possible. That script seems to run, but it outputs nothing.
I'm having trouble figuring out why, and I was hoping you fine Perl folks wouldn't mind lending a second set of eyes to the problem:
First, the successful script:
#!/usr/bin/perl -w
use Text::CSV_XS;
my #records;
my $file = 'Data/space.txt';
my $csv=Text::CSV_XS->new({ sep_char => " " });
open(FILE,$file) || die "Couldn't open $file: $!\n";
while (<FILE>){
$csv->parse($_);
push(#records,[$csv->fields]);
}
close FILE;
foreach (#records){
print $_->[0], ",", $_->[1], ",", $_->[2], ",", $_->[3], ",", $_->[4], "\n";
}
And second, the "failing" script:
#!/usr/bin/perl -w
use Text::CSV_XS;
$input_dir = $ARGV[0]; #I pass "Data" on the command line
my #records;
opendir(DIR, $input_dir) || die "cannot open dir $input_dir: $!";
my #filelist = grep {$_ ne '.' && $_ ne '..'} readdir DIR;
closedir DIR;
foreach $file (#filelist){
print "Input file='",$input_dir,"/",$file,"'\n";
if ($file =~ /comma/) {$sep=','}
elsif ($file =~ /pipe/) {$sep='|'}
elsif ($file =~ /space/) {$sep=' '}
else {die "Cannot identify separator in $file: $!";}
print "Delimiter='",$sep,"'\n";
open(FILE,$input_dir||"/"||$file) || die "Couldn't open $file: $!\n";
my $csv=Text::CSV_XS->new({ sep_char => $sep });
while (<FILE>){
$csv->parse( $_ );
push(#records,[$csv->fields]);
print "File Input Line:'", $_ ,$csv->fields,"'\n";
};
close FILE;
}
foreach $record (#records){
print $record->[0], ",", $record->[1], ",", $record->[2], ",", $record->[3], ",", $record->[4], "\n";
}
This line looks kind of suspect:
open(FILE,$input_dir||"/"||$file) || die "Couldn't open $file: $!\n";
I don't think you want to put those || in there. What that does is check to see if $input_dir is true, then if it isn't, it check to see if "/" is true (which it always is). Your $input_dir is likely always true, so you're just opening the $input_dir.
You should be using File::Spec to create your fully-qualified files:
my $fullfile = File::Spec->catfile( $input_dir, $file );
open( FILE, $fullfile ) || die "Couldn't open $fullfile: $!\n";
This will "do the right thing" in putting a / where appropriate (or, if you're on Windows, \). Then pass that in to your open() command.
Further, you should be using lexical filehandles and directory handles, along with the three-option open():
open my $fh, '<', $fullfile or die "Could not open file $fullfile: $!\n";
Lexical filehandles are much safer, as they can't get overridden by some other module defining a FILE filehandle. Three-option open() is easier to understand and isn't prone to error when you have a filename that has a > or < or | in it.
If you want to get really crazy, put use autodie; at the top, so you don't even have to check for the return value of open() or opendir():
use autodie;
open my $fh, '<', $fullfile;

perl: Writing file at Nth position

I am trying to write in to file at Nth POSITION. I have tried with below example but it writes at the end. Please help to achieve this.
#!/usr/bin/perl
open(FILE,"+>>try.txt")
or
die ("Cant open file try.txt");
$POS=5;
seek(FILE,$POS,0);
print FILE "CP1";
You are opening the file in read-write appending mode. Try opening the file in read-write mode:
my $file = "try.txt";
open my $fh, "+<", $file
or die "could not open $file: $!";
Also, note the use of the three argument open, the lexical filehandle, and $!.
#!/usr/bin/perl
use strict;
use warnings;
#create an in-memory file
my $fakefile = "1234567890\n";
open my $fh, "+<", \$fakefile
or die "Cant open file: $!";
my $offset = 5;
seek $fh, $offset, 0
or die "could not seek: $!";
print $fh "CP1";
print $fakefile;
The code above prints:
12345CP190
If I understand you correctly, if the file contents are
123456789
you want to change that to
1234CP157689
You cannot achieve that using modes supplied to open (regardless of programming language).
You need to open the source file and another temporary file (see File::Temp. Read up to the insertion point from the source and write the contents to the temporary file, write what you want to insert, then write the remainder of the source file to the temporary file, close the source and rename the temporary to the source.
If you are going to do this using seek, both files must be opened in binary mode.
Here is an example using line oriented input and text mode:
#!/usr/bin/perl
use strict; use warnings;
use File::Temp qw( :POSIX );
my $source = 'test.test';
my $temp = tmpnam;
open my $source_h, '<', $source
or die "Failed to open '$source': $!";
open my $temp_h, '>', $temp
or die "Failed to open '$temp' for writing: $!";
while ( my $line = <$source_h> ) {
if ( $line =~ /^[0-9]+$/ ) {
$line = substr($line, 0, 5) . "CP1" . substr($line, 5);
}
print $temp_h $line;
}
close $temp_h
or die "Failed to close '$temp': $!";
close $source_h
or die "Failed to close '$source': $!";
rename $temp => $source
or die "Failed to rename '$temp' to '$source': $!";
this works for me
use strict;
use warnings;
open( my $fh, '+<', 'foo.txt' ) or die $!;
seek( $fh, 3, 0 );
print $fh "WH00t?";
this is also a more "modern" use of open(), see http://perldoc.perl.org/functions/open.html
The file will be closed when $fh goes out of scope ..
"Inserting" a string into a function can (mostly) be done in place. See the lightly used truncate built-in function.
open my $fh, '+<', $file or die $!;
seek $fh, 5, 0;
$/ = undef;
$x = <$fh>; # read everything after the 5th byte into $x
truncate $fh, 5;
print $fh "CPI";
print $fh $x;
close $fh;
If your file is line or record oriented, you can insert lines or modify individual lines easily with the core module Tie::File This will allow the file to be treated as an array and Perl string and array manipulation to be used to modify the file in memory. You can safely operate on huge files larger than your RAM with this method.
Here is an example:
use strict; use warnings;
use Tie::File;
#create the default .txt file:
open (my $out, '>', "nums.txt") or die $!;
while(<DATA>) { print $out "$_"; }
close $out or die $!;
tie my #data, 'Tie::File', "nums.txt" or die $!;
my $offset=5;
my $insert="INSERTED";
#insert in a string:
$data[0]=substr($data[0],0,$offset).$insert.substr($data[0],$offset)
if (length($data[0])>$offset);
#insert a new array element that becomes a new file line:
splice #data,$offset,0,join(':',split(//,$insert));
#insert vertically:
$data[$_]=substr($data[$_],0,$offset) .
substr(lc $insert,$_,1) .
substr($data[$_],$offset) for (0..length($insert));
untie #data; #close the file too...
__DATA__
123456789
234567891
345678912
456789123
567891234
678912345
789123456
891234567
912345678
Output:
12345iINSERTED6789
23456n7891
34567s8912
45678e9123
56789r1234
I:N:St:E:R:T:E:D
67891e2345
78912d3456
891234567
912345678
The file modifications with Tie::File are made in place and as the array is modified. You could use Tie::File just on the first line of you file to modify and insert as you requested. You can put sleep between the array mods and use tail -n +0 -f on the file and watch the file change if you wish...
Alternatively, if your file is reasonable size and you want to treat it like characters, you can read the entire file into memory, do string operations on the data, then write the modified data back out. Consider:
use strict; use warnings;
#creat the default .txt file:
open (my $out, '>', "nums.txt") or die $!;
while(<DATA>) { print $out "$_"; }
close $out or die $!;
my $data;
open (my $in, '<', "nums.txt") or die $!;
{ local $/=undef; $data=<$in>; }
close $in or die $!;
my $offset=5;
my $insert="INSERTED";
open (my $out, '>', "nums.txt") or die $!;
print $out substr($data,0,$offset).$insert.substr($data,$offset);
close $out or die $!;
__DATA__
123456789
2
3
4
5
6
7
8
9
Output:
12345INSERTED6789
2
3
4
5
6
7
8
9
If you treat files as characters, beware that under Windows, files in text mode have a \r\n for a new line. That is two characters if opened in binary mode.