Parsing CSV with Text::CSV

Parsing CSV with Text::CSV - perl

I am trying to parse a file where the header row is at row 8. From row 9-n is my data. How can I use Text::CSV to do this? I am having trouble, my code is below:
my #cols = #{$csv->getline($io, 8)};
my $row = {};
$csv->bind_columns (\#{$row}{#cols});
while($csv->getline($io, 8)){
my $ip_addr = $row->{'IP'};
}

use Text::CSV;
my $csv = Text::CSV->new( ) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $io, "test.csv" or die "test.csv: $!";
my $array_ref = $csv->getline_all($io, 8);
my $record = "";
foreach $record (#$array_ref) {
print "$record->[0] \n";
}
close $io or die "test.csv: $!";

Are you dead-set on using bind_columns? I think I see what you're trying to do, and it's notionally very creative, but if all you want is a way to reference the column by the header name, how about something like this:
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1 } );
my (%header);
open my $io, "<", '/var/tmp/foo.csv' or die $!;
while (my $row = $csv->getline ($io)) {
next unless $. > 7;
my #fields = #$row;
unless (%header) {
$header{$fields[$_]} = $_ for 0..$#fields;
next;
}
my $ip_addr = $fields[$header{'IP'}];
print "$. => $ip_addr\n";
}
close $io;
Sample Input:
Test Data,,,
Trash,,,
Test Data,,,
Trash,,,
Beans,Joe,10.224.38.189,XYZ
Beans,Joe,10.224.38.190,XYZ
Beans,Joe,10.224.38.191,XYZ
Last Name,First Name,IP,Computer
Beans,Joe,10.224.38.192,XYZ
Beans,Joe,10.224.38.193,XYZ
Beans,Joe,10.224.38.194,XYZ
Beans,Joe,10.224.38.195,XYZ
Beans,Joe,10.224.38.196,XYZ
Beans,Joe,10.224.38.197,XYZ
Output:
9 => 10.224.38.192
10 => 10.224.38.193
11 => 10.224.38.194
12 => 10.224.38.195
13 => 10.224.38.196
14 => 10.224.38.197

Related

Sorting CSV file column value based on column headers

Hi am newbie fir perl scripting, i need a help implement a logic for sorting CSV file header based column values,.
Example:
S.NO,NAME,S2,S5,S3,S4,S1
1,aaaa,88,99,77,55,66
2,bbbb,66,77,88,99,55
3,cccc,55,44,77,88,66
4,dddd,77,55,66,88,99
now i want to sort this file as below..
s.no,s2,s4,s5,s1,s0,name => that's how i want is as i defined order of headers like s.no,name,s1,s2,s3,s4,s5 and it's respective whole columns values also should change based on header exchange, how to do it perl this one...?
That's the required output is like following bellow,
S.NO,NAME,S1,S2,S3,S4,S5
1,aaaaaaa,66,88,77,55,99
2,bbbbbbb,55,66,88,77,99
3,ccccccc,66,55,77,88,44
4,ddddddd,99,77,66,88,55
or what the order i want in column headers, like below.
S.NO,NAME,S5,S4,S3,S2,S1 -> like as per my requirement i need to re-order my columns header and it's respective columns value also..
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file = 'a1.csv';
my $size = 3;
my #files;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1, sep_char => ';' });
open my $in, "<:encoding(utf8)", $file or die "$file: $!";
while (my $row = $csv->getline($in)) {
if (not #files) {
my $file_counter = int #$row / $size;
$file_counter++ if #$row % $size;
for my $i (1 .. $file_counter) {
my $outfile = "output$i.csv";
open my $out, ">:encoding(utf8)", $outfile or die "$outfile: $!";
push #files, $out;
}
}
my #fields = #$row;
foreach my $i (0 .. $#files) {
my $from = $i*$size;
my $to = $i*$size+$size-1;
$to = $to <= $#fields ? $to : $#fields;
my #data = #fields[$from .. $to];
$csv->print($files[$i], \#data);
print {$files[$i]} "\n";
}
}

#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use Text::CSV qw();
my #headers = qw(s.no name s1 s2 s3 s4 s5);
my $csv_in = Text::CSV->new({binary => 1, auto_diag => 1});
my $csv_out = Text::CSV->new({binary => 1, auto_diag => 1});
open my $in, '<:encoding(UTF-8)', 'a1.csv';
open my $out, '>:encoding(UTF-8)', 'output1.csv';
$csv_in->header($in);
$csv_out->say($out, [#headers]);
while (my $row = $csv_in->getline_hr($in)) {
$csv_out->say($out, [$row->#{#headers}]);
}

The handy Text::AutoCSV module lets you rearrange the column order as a one-liner:
$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(in_file=>"in.csv",out_file=>"out.csv",out_fields=>["SNO","NAME","S1","S2","S3","S5"])->write()'
$ cat out.csv
s.no,name,s1,s2,s3,s5
1,aaaa,66,55,77,99
2,bbbb,55,99,88,77
3,cccc,66,88,77,44
4,dddd,99,88,66,55
I'm not sure what your actual desired order of fields is because you have two and both of them include columns that aren't in the sample input file (It has two s2 columns; is one of them supposed to be s4?), but you should get the idea. Column names have to be all caps with special characters like . removed, but the actual names are used for the output.

my $eix = "001"; $csv_in->header ($in, munge_column_names => sub { s/^$/"E".$eix++/er/; });

Perl: How to print a random section (word definition) from a dictionary file

I want to print a random new word English in dictionary file in terminal Unix by Perl. I want to select and print a random line and 2 follow lines.
But my code doesn't complete this work.
Please help me to improve it.
An example of the output I wish:
#inspire: ....
ghk
lko...
Dictionary file:
#inspiration: mean....
abc def...
ghk lmn
...
#inspire: ....
ghk
lko...
#people: ...
...
The complete dictionary file is here anhviet109K.txt. It's about 14MB
My code:
use strict;
use warnings;
use File::Copy qw(copy move);
my $files = 'anhviet109K.txt';
my $fh;
my $linewanted = 16 + int( rand( 513796 - 16 ) );
# 513796: number of lines of file dic.txt
open( $fh, "<", $files ) or die "cannot open < $fh: $!";
my $del = " {2,}";
my $temp = 0;
my $count = 0;
while ( my $line = <$fh> ) {
if ( ( $line =~ "#" ) && ( $. > $linewanted ) ) {
$count = 4;
}
else {
next;
}
if ( $count > 0 ) {
print $line;
$count--;
}
else {
last;
}
}
close $fh;

Something like this, perhaps?
Your data has helped me to exclude the header entries in your dictionary file
This program finds the location of all of the entries (lines beginning with #) in the file, then chooses one at random and prints it
Tốt học tiếng Anh may mắn
use strict;
use warnings 'all';
use Fcntl ':seek';
use constant FILE => 'anhviet109K.txt';
open my $fh, '<', FILE or die qq{Unable to open "#{[FILE]}" for input: $!};
my #seek; # Locations of all the definitions
my $addr = tell $fh;
while ( <$fh> ) {
push #seek, $addr if /^\#(?!00-)/;
$addr = tell $fh;
}
my $choice = $seek[rand #seek];
seek $fh, $choice, SEEK_SET;
print scalar <$fh>;
while ( <$fh> ) {
last if /^\#/;
print;
}
output
#finesse /fi'nes/
* danh từ
- sự khéo léo, sự phân biệt tế nhị
- mưu mẹo, mánh khoé
* động từ
- dùng mưu đoạt (cái gì); dùng mưu đẩy (ai) làm gì; dùng mưu, dùng kế
=to finesse something away+ dùng mưu đoạt cái gì

A single pass approach:
use strict;
use warnings;
use autodie;
open my $fh, '<:utf8', 'anhviet109K.txt';
my $definition = '';
my $count;
my $select;
while (my $line = <$fh>) {
if ($line =~ /^#(?!00-)/) {
++$count;
$select = rand($count) < 1;
if ($select) {
$definition = $line;
}
}
elsif ($select) {
$definition .= $line;
}
}
# remove blank line that some entries have
$definition =~ s/^\s+\z//m;
binmode STDOUT, ':utf8';
print $definition;
This iterative random selection always selects the first item, has a 1/2 chance of replacing it with the second item, a 1/3 for the third, and so on.

Remove/Extract rows based on Unique/duplicate Id from a CSV file

Depending on how you look at it I need to remove rows based on if the Id is unique or extract rows if the Id has duplicates (keeping all duplicates).
And I'm unsure/don't have enough knowledge of Perl to accomplish this. I've found similair topics but didn't have much succes. These are the examples I'm using example 1, example 2 and example 3. In a previous problem someone showed me a solution with the List::MoreUtils module, so I could merge values with a common Id. This is not the case now, this one is removing rows if the id is unique. I know I can probably do this with the List::MoreUtils module but I want to do it without. This is my dummy data (copied example data from other question since the data doesn't matter), here you can see what I'm after. Order is not important.
Before:
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla
101;Fruits;50010;Grape;500;Red;1
101;Fruits;50020;Strawberry;500;Red;1
201;Vegetables;60010;Carrot;500;White;1
101;Fruits;50060;Apple;1000;Red;1
101;Fruits;50030;Banana;1000;Green;1
101;Fruits;50060;Apple;500;Green;1
101;Fruits;50020;Strawberry;1000;Red;1
201;Vegetables;60010;Carrot;100;Purple;1
101;Fruits;50020;Strawberry;200;Red;1
After:
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla
101;Fruits;50020;Strawberry;500;Red;1
201;Vegetables;60010;Carrot;500;White;1
101;Fruits;50060;Apple;1000;Red;1
101;Fruits;50060;Apple;500;Green;1
101;Fruits;50020;Strawberry;1000;Red;1
201;Vegetables;60010;Carrot;100;Purple;1
101;Fruits;50020;Strawberry;200;Red;1
You can see that the rows of Grape and Banana with id 50010 and 50030 have been removed because there only exists one entry for both.
This is my script, I'm struggeling with the part where I select the unique values from the hash and to output them (taking the Text::CSV_XS module in account). Can someone show me how to do this?
#!/usr/bin/perl -w
use strict;
use warnings;
use Text::CSV_XS;
my $inputfile = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";
open (my $infile, '<:encoding(iso-8859-1)', $inputfile) or die "Sourcefile in use / not found :$!\n";
open (my $outfile, '>:encoding(UTF-8)', $outputfile) or die "Outputfile in use :$!\n";
my $csv_in = Text::CSV_XS->new({binary => 1,sep_char => ";",auto_diag => 1,always_quote => 1,eol => $/});
my $csv_out = Text::CSV_XS->new({binary => 1,sep_char => "|",auto_diag => 1,always_quote => 1,eol => $/});
my $header = $csv_in->getline($infile);
$csv_out->print($outfile, $header);
my %data;
while (my $elements = $csv_in->getline($infile)){
my #columns = #{ $elements };
my $id = $columns[2];
push #{ $data{$id} }, \#columns;
}
for my $id ( sort keys %data ){ # Sort not important
if #{ $data{$id} } > 1 # Here I have no idea anymore..
$csv_out->print($outfile, \#columns); #
}

Rather than loading a hash with the entire dataset, I think I'd go ahead and read the file twice, loading a hash with just your ID values. This will definitely take longer, but as your file grows, there may be disadvantages of having all of that data in memory.
That said, I did not use Text::CSV_XS but this is a notional idea of what I had in mind.
my %count;
open (my $infile, '<:encoding(iso-8859-1)', $inputfile) or die;
open (my $outfile, '>:encoding(UTF-8)', $outputfile) or die;
while (<$infile>) {
next if $. == 1;
my ($id) = (split /;/, $_, 4)[2];
$count{$id}++;
}
seek $infile, 0, 0;
while (<$infile>) {
my #fields = split /;/;
print $outfile join '|', #fields if $count{$fields[2]} > 1 or $. == 1;
}
close $infile;
close $outfile;
The $. == 1 at the end is so you don't lose your header row.
-- EDIT --
#!/usr/bin/perl -w
use strict;
use warnings;
use Text::CSV_XS;
my $inputfile = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";
open (my $infile, '<:encoding(iso-8859-1)', $inputfile) or die;
open (my $outfile, '>:encoding(UTF-8)', $outputfile) or die;
my $csv_in = Text::CSV_XS->new({binary => 1,sep_char => ";",
auto_diag => 1,always_quote => 1,eol => $/});
my $csv_out = Text::CSV_XS->new({binary => 1,sep_char => "|",
auto_diag => 1,always_quote => 1,eol => $/});
my ($count, %count) = (1);
while (my $elements = $csv_in->getline($infile)){
$count{$$elements[2]}++;
}
seek $infile, 0, 0;
while (my $elements = $csv_in->getline($infile)){
$csv_out->print($outfile, $elements)
if $count{$$elements[2]} > 1 or $count++ == 1;
}
close $infile;
close $outfile;

perl hash mapping/retrieval issues with split and select columns

Perl find and replace multiple(huge) strings in one shot
P.S.This question is related to the answer for above question.
When I try to replace this code:
Snippet-1
open my $map_fh, '<', 'map.csv' or die $!;
my %replace = map { chomp; split /,/ } <$map_fh>;
close $map_fh;
with this code:
Snippet-2
my %replace = map { chomp; (split /,/)[0,1] } <$map_fh>;
even though the key exists (as in the dumper), exists statement doesn't return the value for the key.
For same input file, it works perfectly with just split alone (Snippet-1) whereas not returning anything when i select specific columns after split(Snippet-2).
Is there some integer/string datatype mess-up happening here?
Input Mapping File
483329,Buffalo
483330,Buffalo
483337,Buffalo
Script Output
$VAR1 = {
'483329' => 'Buffalo',
'46546' => 'Chicago_CW',
'745679' => 'W. Washington',
};
1 search is ENB
2 search is 483329 **expected Buffalo here**
3 search is 483330
4 search is 483337
Perl Code
open my $map_fh, '<', $MarketMapFile or die $!;
if ($MapSelection =~ /eNodeBID/i) {
my %replace = map { chomp; (split /,/)[0,1] } <$map_fh>;
use Data::Dumper;
print Dumper(\%replace);
}
close $map_fh;
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1, eol => $/,quote_space => 0 });
my $tmpCSVFile = $CSVFile."tmp";
open my $in_fh, '<', $CSVFile or die $!;
open my $out_fh, '>', $tmpCSVFile or die $!;
my $cnt=1;
while (my $row = $csv->getline($in_fh)) {
my $search = $row->[5];
$search =~ s/[^[:print:]]+//g;
if ($MapSelection =~ /eNodeBID/i) {
$search =~ s/(...)-(...)-//g;
$search =~ s/\(M\)//g;
}
my $match = (exists $replace{$search}) ? $replace{$search} : undef;
print "\n$cnt search is $search ";
if (defined($match)) {
$match =~ s/[^[:print:]]+//g;
print "and match is $match";
}
push #$row, $match;
#print " match is $match";
$csv->print($out_fh, $row);
$cnt++;
}
# untie %replace;
close $in_fh;
close $out_fh;

You have a problem of scope. Your code:
if ($MapSelection =~ /eNodeBID/i) {
my %replace = map { chomp; (split /,/)[0,1] } <$map_fh>;
use Data::Dumper;
print Dumper(\%replace);
}
declares %replace within the if block. Move it outside so that it can also be seen by later code:
my %replace;
if ($MapSelection =~ /eNodeBID/i) {
%replace = map { chomp; (split /,/)[0,1] } <$map_fh>;
use Data::Dumper;
print Dumper(\%replace);
}
Putting use strict and use warnings at the top of your code helps you find these kinds of issues.
Also, you can just use my $match = $replace{$search} since it's equivalent to your ?: operation.

Always include use strict; and use warnings; at the top of EVERY perl script. If you had done that and been maintaining good coding practice with declaring your variables, you would've gotten error:
Global symbol "%replace" requires explicit package name at
That would've let you know there was a scoping issue with your code. One way to avoid that is to use a ternary in your initialization of %replace
my %replace = ($MapSelection =~ /eNodeBID/i)
? map { chomp; (split /,/)[0,1] } <$map_fh>
: ();

Perl - csv parsing - rearrange csv data when fields are dynamics

Using Perl, i need to parse and rearrange csv files that has some dynamic fields (devices and associated values)
Here is the original csv (the header is here for description only)
DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,sda,sda1,sda2,sda3,sdb,sdb1,sdb2,sdb3
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,0.0,0.0,0.0,0.0,18.0,0.0,18.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:49,T0002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:51,T0003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:53,T0004,0.0,0.0,0.0,0.0,369.8,0.0,369.8,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:55,T0005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I need it to be transformed into:
DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,device,value
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,sda,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,sda1,0.0
... and so on
Here is the sample code that generates the csv file based on original data:
if (((rindex $l,"DISKBUSY,") > -1)) {
#Open destination file
if( ! open(FILE,">>".$dstfile_DISKBUSY) ) {
exit(1);
}
(my #line) = split(",",$l);
my $section = "DISKBUSY";
my $write = $section.",".$SerialNumber.",".$hostnameT.",".
$timestamp.",".$line[1];
my $i = 2;
while ($i <= $#line) {
$write = $write.','.$line[$i];
$i = $i + 1;
}
print (FILE $write."\n");
close( FILE );
}
I need to rearrange it as described to be able to work with the data in a generic way, but dynamic fields (name of devices) drives me crazy :-)
Many thanks for any help !

You can use Text::CSV:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
eol => "\n"
}) or die "Cannot use CSV: " . Text::CSV->error_diag();
open my $fh, '<', 'file.csv' or die $!;
my #columns = #{ $csv->getline($fh) };
my #device_columns = #columns[5..$#columns];
my #header = (#columns[0..4], "device", "value");
$csv->print(\*STDOUT, \#header);
while (my $row = $csv->getline($fh)) {
foreach my $i (0..$#device_columns) {
my #output = (#$row[0..4], $device_columns[$i], $row->[5+$i]);
$csv->print(\*STDOUT, \#output);
}
}
close $fh;
Output:
DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,device,value
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda1,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda2,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda3,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb,18.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb1,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb2,18.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb3,0.0
(this is only the output for the first row of your input data)
Better solution
The following uses getline_hr to return each row in the input CSV as a hashref, which makes the code a bit cleaner:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
eol => "\n"
}) or die "Cannot use CSV: " . Text::CSV->error_diag();
open my $fh, '<', 'file.csv' or die $!;
$csv->column_names($csv->getline($fh));
my #cols = ( $csv->column_names );
my #devices = splice #cols, 5;
my #header = ( #cols, "device", "value" );
$csv->print(\*STDOUT, \#header);
while (my $hr = $csv->getline_hr($fh)) {
foreach my $device (#devices) {
my #output = ( #$hr{#cols}, $device, $hr->{$device} );
$csv->print(\*STDOUT, \#output);
}
}
close $fh;

Use the Text::CSV module.
You can assign header names with $csv->column_names(#column_names) and then use $csv->getline_hr to get the line as a hash reference where the hash reference will be keyed by your column names. This will make it much easier to parse your file.
You don't have to use Text::CSV to write back your file (although it makes sure your file is written correctly), but you should use it to parse your data.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Parsing CSV with Text::CSV - perl

use Text::CSV; my $csv = Text::CSV->new( ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $io, "test.csv" or die "test.csv: $!"; my $array_ref = $csv->getline_all($io, 8); my $record = ""; foreach $record (#$array_ref) { print "$record->[0] \n"; } close $io or die "test.csv: $!";

Related

Sorting CSV file column value based on column headers

Perl: How to print a random section (word definition) from a dictionary file

Remove/Extract rows based on Unique/duplicate Id from a CSV file

perl hash mapping/retrieval issues with split and select columns

Perl - csv parsing - rearrange csv data when fields are dynamics

Categories

Resources