Parsing fields with special characters using Perl Text::CSV - perl

I am using the Text::CSV module to parse lines into various fields from a tab-separated value file.
Examples of special characters in strings are
"CEZARY Å?UKASZEWICZ, PAWEÅ? WIETESKA","BÜRO FÜR"
My code goes as below:
my $file = $ARGV[0] or die "Need to get TSV file on the command line\n";
my $csv = Text::CSV->new({sep_char => "\t"});
open(my $data,'<', $file) or die "Could not open '$file' $!\n";
while (my $line= <$data>) {
if($csv->parse($line)){
my #curr_arr = $csv->fields();
}
} # end of while
close $data;
The above is some of the important parts of my code. The error I get is as follows:
cvs_xs error : 2026 - EIQ - Binary Character inside quoted field, binary off #pos 15

my $csv = Text::CSV->new({ binary => 1, sep_char => "\t"});

Related

Problems with parsing CSV file in Perl

I have a CSV file like this:
id,item,itemtype,date,service,level,message,action,user
"344","-1","IRM","2008-08-22 13:01:57","login","1","Failed login: \'irm\', database \'irmD\'",NULL,NULL
"346","-1","IRM","2008-08-27 10:58:59","login","1","Ошибка входа:\'\', база данных \'irmD\'",NULL,NULL
It's Okay with the second line, but Text::CSV just skips the third one. The third line consists Cyrillic characters, but the file is encoded in UTF-8 and Perl shouldn't have any problems with that.
And the code:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
use utf8;
my $file = 'Test.csv'; my $csv = Text::CSV->new();
open (CSV, "<", $file) or die $!;
while (<CSV>) {
if ($csv->parse($_)) {
if ($. == 1) {
next;
}
my #columns = $csv->fields();
my $id=$columns[0];
print $id." ";
}
}
print "\n";
close CSV;
Any help or hint will be appreciated.
Did you read the documentation of Text::CSV?
If your
data contains newlines embedded in fields, or characters above 0x7e
(tilde), or binary data, you must set "binary => 1"
Also, use utf8 tells Perl you're going to use UTF-8 in the source code, not in the data. Remove it.
Using <> to read in CSV is also mentioned in the documentation:
while (<>) { # WRONG!
Here is a working version:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $file = 'Test.csv';
my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;
open my $CSV, '<', $file or die $!;
while (my $line = $csv->getline($CSV)) {
next if 1 == $.;
my #columns = #$line;
my $id = $columns[0];
print $id . " ";
}
print "\n";
close $CSV;
I think your problem will be, that whilst you've useed UTF8, that's only really for perl's uses.
From:
http://perldoc.perl.org/utf8.html
utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code
Looking at Text::CSV
You probably want:
$csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
You will also - probably - need to specify that you're opening a UTF-8 file. You can either do this as part of the open or with binmode
open ( my $filehandle, "<:encoding(UTF-8)", "Test.csv" );

Keeping only certain fields on a CSV file

I have a couple of CSV files which have a lot of fields, but I only need to keep a few of them, so I wanted to get rid of the extra data before importing them.
I tought of running:
perl -i.bak -F, -ane 'BEGIN {$,=","} print #F[3..6], #F[9..12]' file.csv
Although text fields are quoted, some fields contain commas and this simple solution does not work.
Use Text::CSV. It handles fields containing the delimiter, among many other nice features.
use strict;
use warnings;
use File::Copy;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
eol => $/,
always_quote => 1
}) or die 'Cannot use CSV: ' . Text::CSV->error_diag();
my $file = 'input.csv';
my $backup = "$file.bak";
copy $file, $backup or die "Copy failed: $!";
open my $in_fh, '<', $backup or die "$backup: $!";
open my $out_fh, '>', $file or die "$file: $!";
while (my $row = $csv->getline($in_fh)) {
my #wanted = #$row[3..6,9..12];
$csv->print($out_fh, \#wanted);
}
close $in_fh;
close $out_fh;

Parsing Tab Delimited File into an array

I am attempting to read a CSV into an array in a way that I can access each column in a row. However when I run the following code with the goal of printing a specific column from each row, it only outputs empty lines.
#set command line arguments
my ($infi, $outdir, $idcol) = #ARGV;
#lead file of data to get annotations for
open FILE, "<", $infi or die "Can't read file '$infi' [$!]\n";
my #data;
foreach my $row (<FILE>){
chomp $row;
my #cells = split /\t/, $row;
push #data, #cells;
}
#fetch genes
foreach (#data){
print "#_[$idcol]\n";
# print $geneadaptor->fetch_by_dbID($_[$idcol]);
}
With a test input of
a b c
1 2 3
d e f
4 5 6
I think the issue here isn't so much loading the file, but in treating the resulting array. How should I be approaching this problem?
First of all you need to push #data, \#cells, otherwise you will get all the fields concatenated into a single list.
Then you need to use the loop value in the second for loop.
foreach (#data){
print $_->[$idcol], "\n";
}
#_ is a completely different variable from $_ and is unpopulated here.
You should also consider using
while (my $row = <FILE>) { ... }
to read your file. It reads only a single line at a time whereas for will read the entire file into a list of lines before iterating over it.
I recommend to avoid parsing the CSV file directly and using the Text::CSV module.
use Text::CSV;
use Carp;
#set command line arguments
my ($infi, $outdir, $idcol) = #ARGV;
my $csv = Text::CSV->new({
sep_char => "\t"
});
open(my $fh, "<:encoding(UTF-8)", $infi) || croak "can't open $infi: $!";
# Uncomment if you need to skip header line
# <$fh>;
while (<$fh>) {
if ($csv->parse($_)) {
my #columns = $csv->fields();
print "$columns[0]\t$columns[1]\t$columns[2]\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close $fh;

Why does my Perl script say "Can't call method parse on an undefined value"?

I am new to Perl and still trying to figure out how to code in this language.
I am currently trying to split a long single string of csv into multiple lines.
Data example
a,b,c<br />x,y,x<br />
which I so far have manage to split up, adding in quotes, to add into a CSV file again later on:
"a,b,c""x,y,z"
By having the quotes it just signifies which sets of CSV are together as such.
The problem I am having is that when I try and create a CSV file, passing in data in a string i am getting an error
"Can't call method "parse" on an undefined variable.
When I print out the string which I am passing in, it is defined and holds data. I am hoping that this is something simple which I am doing wrong through lack of experience.
The CSV code which I am using is:
use warnings;
use Text::CSV;
use Data::Dumper;
use constant debug => 0;
use Text::CSV;
print "Running CSV editor......\n";
#my $csv = Text::CSV->new({ sep_char => ',' });
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
my $fileextension = substr($file, -4);
#If the file is a CSV file then read in the file.
if ($fileextension =~ m/csv/i)
{
print "Reading and formating: $ARGV[0] \n";
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
my #fields;
my $testline;
my $line;
while ($line = <$data>)
{
#Clears the white space at the end of the line.
chomp $line;
#Splits the line up and removes the <br />.
$testline = join "\" \" ", split qr{<br\s?/>}, $line;
#my $newStr = join $/, #lines;
#print $newStr;
my $q1 = "\"";
$testline = join "", $q1,$testline,$q1;
print "\n printing testline: \n $testline \n";
}
$input_string = $testline;
print "\n Testing input string line:\n $input_string";
if ($csv->parse ($input_string))
{
my #field = $csv->fields;
foreach my $col (0 .. $#field) {
my $quo = $csv->is_binary ($col) ? $csv->{quote_char} : "";
printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;#
}
}
else
{
print STDERR "parse () failed on argument: ",
$csv->error_input, "\n";
$csv->error_diag ();
}
#print $_,$/ for #lines;
print "\n Finished reading and formating: $ARGV[0] \n";
}else
{
print "Error: File is not a CSV file\n"
}
You did not create a Text::CSV object, but you try to use it.
"Can't call method "parse" on an undefined variable
This means that your $csv is not there, thus it does not have a method called parse. Simply create a Text::CSV object first, at the top of your code below all the use lines.
my $csv = Text::CSV->new;
Pleae take a look at the CPAN documentation of Text::CSV.
Also, did I mention you should use strict?

Split string into variables and use output as element

Well.. I'm stuck again. I've read up quite a few topic with similar problems but not finding a solution for mine. I have a ; delimited csv file and the strings at the 8th column ($elements[7]) is as following: "aaaa;bb;cccc;ddddd;eeee;fffff;gg;". What i'm trying is to split the string based on ; and capture the outputs to variables. Then use those variables in the main csv file in their own column.
So now the file is like:
3d;2f;7j;8k;4s;2b;5g;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";4g;1a;5g;2g;7h;3d;2f;7j
3c;9k;5l;4g;1a;5g;3d;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";4g;1a;5g;2g;7h;3d;2f;7j
4g;1a;5g;2g;7h;3d;8k;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";3d;2f;7j;8k;4s;2b;4g;1a
And i want it like:
3d;2f;7j;8k;4s;2b;5g;4g;1a;5g;2g;7h;3d;2f;7j;aaaa;bb;cccc;ddddd;eeee;fffff;gg
3c;9k;5l;4g;1a;5g;3d;4g;1a;5g;2g;7h;3d;2f;7j;aaaa;bb;cccc;ddddd;eeee;fffff;gg;
4g;1a;5g;2g;7h;3d;8k;3d;2f;7j;8k;4s;2b;4g;1a;aaaa;bb;cccc;ddddd;eeee;fffff;gg;
This is my code i've been trying it with. I know.. it's terrible! But i'm hoping someone can help me?
use strict;
use warnings;
my $inputfile = shift || die "Give files\n";
my $outputfile = shift || die "Give output\n";
open my $INFILE, '<', $inputfile or die "In use / Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "In use :$!\n";
while (<$INFILE>) {
s/"//g;
my #elements = split /;/, $_;
my ($varA, $varB, $varC, $varD, $varE, $varF, $varG, $varH) split (';', $elements[10]);
$elements[16] = $varA;
$elements[17] = $varB;
$elements[18] = $varC;
$elements[19] = $varD;
$elements[20] = $varE;
$elements[21] = $varF;
$elements[22] = $varG;
$elements[23] = $varH;
my $output_line = join(";", #elements);
print $OUTFILE $output_line;
}
close $INFILE;
close $OUTFILE;
exit 0;
I'm confused about the my statement as well, it shouldn't be possible right? I mean the $vars are in a closed part so it shouldn't be possible to write them to $elements?
EDIT
This is how i adjusted the code with TLP's suggestions:
use strict;
use warnings;
use Text::CSV;
my $inputfile = shift || die "Give files\n";
my $outputfile = shift || die "Give output\n";
open my $INFILE, '<', $inputfile or die "In use / Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "In use :$!\n";
my $csv = Text::CSV->new({ # create a csv object
sep_char => ";", # delimiter
eol => "\n", # adds newline to print
});
while (my $row = $csv->getline($INFILE)) { # $row is an array ref
my $line = splice(#$row, 10, 1); # remove 8th line
$csv->parse($line); # parse the line
push #$row, $csv->fields(); # push newly parsed fields onto main array
$csv->print($OUTFILE, $row);
}
close $INFILE;
close $OUTFILE;
exit 0;
You should use a CSV module, e.g. Text::CSV to parse your data. Here's a brief example on how it can be done. You can replace the file handles I used below with your own.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ # create a csv object
sep_char => ";", # delimiter
eol => "\n", # adds newline to print
});
while (my $row = $csv->getline(*DATA)) { # $row is an array ref
my $line = splice(#$row, 7, 1); # remove 8th line
$csv->parse($line); # parse the line
push #$row, $csv->fields(); # push newly parsed fields onto main array
$csv->print(*STDOUT, $row);
}
__DATA__
3d;2f;7j;8k;4s;2b;5g;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";4g;1a;5g;2g;7h;3d;2f;7j
3c;9k;5l;4g;1a;5g;3d;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";4g;1a;5g;2g;7h;3d;2f;7j
4g;1a;5g;2g;7h;3d;8k;"aaaa;bb;cccc;ddddd;eeee;fffff;gg;";3d;2f;7j;8k;4s;2b;4g;1a
Output:
3d;2f;7j;8k;4s;2b;5g;4g;1a;5g;2g;7h;3d;2f;7j;aaaa;bb;cccc;ddddd;eeee;fffff;gg;
3c;9k;5l;4g;1a;5g;3d;4g;1a;5g;2g;7h;3d;2f;7j;aaaa;bb;cccc;ddddd;eeee;fffff;gg;
4g;1a;5g;2g;7h;3d;8k;3d;2f;7j;8k;4s;2b;4g;1a;aaaa;bb;cccc;ddddd;eeee;fffff;gg;