Problems with parsing CSV file in Perl - perl

I have a CSV file like this:
id,item,itemtype,date,service,level,message,action,user
"344","-1","IRM","2008-08-22 13:01:57","login","1","Failed login: \'irm\', database \'irmD\'",NULL,NULL
"346","-1","IRM","2008-08-27 10:58:59","login","1","Ошибка входа:\'\', база данных \'irmD\'",NULL,NULL
It's Okay with the second line, but Text::CSV just skips the third one. The third line consists Cyrillic characters, but the file is encoded in UTF-8 and Perl shouldn't have any problems with that.
And the code:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
use utf8;
my $file = 'Test.csv'; my $csv = Text::CSV->new();
open (CSV, "<", $file) or die $!;
while (<CSV>) {
if ($csv->parse($_)) {
if ($. == 1) {
next;
}
my #columns = $csv->fields();
my $id=$columns[0];
print $id." ";
}
}
print "\n";
close CSV;
Any help or hint will be appreciated.

Did you read the documentation of Text::CSV?
If your
data contains newlines embedded in fields, or characters above 0x7e
(tilde), or binary data, you must set "binary => 1"
Also, use utf8 tells Perl you're going to use UTF-8 in the source code, not in the data. Remove it.
Using <> to read in CSV is also mentioned in the documentation:
while (<>) { # WRONG!
Here is a working version:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $file = 'Test.csv';
my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;
open my $CSV, '<', $file or die $!;
while (my $line = $csv->getline($CSV)) {
next if 1 == $.;
my #columns = #$line;
my $id = $columns[0];
print $id . " ";
}
print "\n";
close $CSV;

I think your problem will be, that whilst you've useed UTF8, that's only really for perl's uses.
From:
http://perldoc.perl.org/utf8.html
utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code
Looking at Text::CSV
You probably want:
$csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
You will also - probably - need to specify that you're opening a UTF-8 file. You can either do this as part of the open or with binmode
open ( my $filehandle, "<:encoding(UTF-8)", "Test.csv" );

Related

Perl read .DAT file with UTF-8 BOM format and write it with UTF-8 format without BOM

I have a .DAT file with CR LF and UTF-8 format with BOM, I'm trying to convert it to CR LF UTF-8 format without BOM using Perl. I'm currently using the following code to do so and eve though the output file is generated without the BOM, the header is not included in the file with rest of the data. My requirement is to get the final output file in UTF-8 format without BOM and header included with the rest of the data.
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
sub encodeWithoutBOM
{
my $src = $_[1];
my $des = $_[2];
my #array;
open(SRC,'<',$src) or die $!;
# open destination file for writing
open(DES,'>',$des) or die $!;
print("copying content from $src to $des\n");
while(<SRC>){
#array = <SRC>;
}
foreach (#array){
print DES;
}
close(SRC);
close(DES);
}
use open ':std', ':encoding(UTF-8)';
while (<>) {
s/^\N{BOM}// if $. == 1;
print;
}
Another option is to use File::BOM from CPAN, which lets you transparently handle the byte order mark:
#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use File::BOM qw/open_bom/;
sub encode_without_bom {
my ($src, $dst) = #_;
open_bom(my $infile, $src, ":encoding(UTF-8)");
open my $outfile, ">:utf8", $dst;
say "Copying from $src to $dst";
while (<$infile>) {
print $outfile $_;
}
}
encode_without_bom "input.txt", "output.txt";

How to open a file that has a special character in it such as $?

Seems fairly simple but with the "$" in the name causes the name to split. I tried escaping the character out but when I try to open the file I get GLOB().
my $path = 'C:\dir\name$.txt';
open my $file, '<', $path || die
print "file = $file\n";
It should open the file so I can traverse the entries.
It has nothing to do with the "$". Just follow standard file handling procedure.
use strict;
use warnings;
my $path = 'C:\dir\name$.txt';
open my $file_handle, '<', $path or die "Can't open $path: $!";
# read and print the file line by line
while (my $line = <$file_handle>) {
# the <> in scalar context gets one line from the file
print $line;
}
# reset the handle
seek $file_handle, 0, 0;
# read the whole file at once, print it
{
# enclose in a block to localize the $/
# $/ is the line separator, so when it's set to undef,
# it reads the whole file
local $/ = undef;
my $file_content = <$file_handle>;
print $file_content;
}
Consider using the CPAN modules File::Slurper or Path::Tiny which will handle the exact details of using open and readline, checking for errors, and encoding if appropriate (most text files are encoded to UTF-8).
use strict;
use warnings;
use File::Slurper 'read_text';
my $file_content = read_text $path;
use Path::Tiny 'path';
my $file_content = path($path)->slurp_utf8;
If it's a data file, use read_binary or slurp_raw.

how to assign data into hash from an input file

I am new to perl.
Inside my input file is :
james1
84012345
aaron5
2332111 42332
2345112 18238
wayne[2]
3505554
Question: I am not sure what is the correct way to get the input and set the name as key and number as values. example "james" is key and "84012345" is the value.
This is my code:
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $input= $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
my #names = split ' ', $data;
my #values = split ' ', $data;
#hash{#names} = #values;
print Dumper \%hash;
I'mma go over your code real quick:
#!/usr/bin/perl -w
-w is not recommended. You should use warnings; instead (which you're already doing, so just remove -w).
use strict;
use warnings;
Very good.
use Data::Dumper;
my $input= $ARGV[0];
OK.
my %hash;
Don't declare variables before you need them. Declare them in the smallest scope possible, usually right before their first use.
open my $data , '<', $input or die " cannot open file : $_\n";
You have a spurious space at the beginning of your error message and $_ is unset at this point. You should include $input (the name of the file that failed to open) and $! (the error reason) instead.
my #names = split ' ', $data;
my #values = split ' ', $data;
Well, this doesn't make sense. $data is a filehandle, not a string. Even if it were a string, this code would assign the same list to both #names and #values.
#hash{#names} = #values;
print Dumper \%hash;
My version (untested):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#ARGV == 1
or die "Usage: $0 FILE\n";
my $file = $ARGV[0];
my %hash;
{
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
local $/ = '';
while (my $paragraph = readline $fh) {
my #words = split ' ', $paragraph;
my $key = shift #words;
$hash{$key} = \#words;
}
}
print Dumper \%hash;
The idea is to set $/ (the input record separator) to "" for the duration of the input loop, which makes readline return whole paragraphs, not lines.
The first (whitespace separated) word of each paragraph is taken to be the key; the remaining words are the values.
You have opened a file with open() and attached the file handle to $data. The regular way of reading data from a file is to loop over each line, like so:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $input = $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
while (my $line = <$data>) {
chomp $line; # Removes extra newlines (\n)
if ($line) { # Checks if line is empty
my ($key, $value) = split ' ', $line;
$hash{$key} = $value;
}
}
print Dumper \%hash;
OK, +1 for using strict and warnings.
First Take a look at the $/ variable for controlling how a file is broken into records when it's read in.
$data is a file handle you need to extract the data from the file, if it's not to big you can load it all into an array, if it's a large file you can loop over each record at a time. See the <> operator in perlop
Looking at you code it appears that you want to end up with the following data structure from your input file
%hash(
james1 =>[
84012345
],
aaron5 => [
2332111,
42332,
2345112,
18238
]
'wayne[2]' => [
3505554,
]
)
See perldsc on how to do that.
All the documentation can be read using the perldoc command which comes with Perl. Running perldoc on its own will give you some tips on how to use it and running perldoc perldoc will give you possibly far more info than you need at the moment.

Extract data from file

I have data like
"scott
E -45 COLLEGE LANE
BENGALI MARKET
xyz -785698."
"Tomm
D.No: 4318/3,Ansari Road, Dariya Gunj,
xbc - 289235."
I wrote one Perl program to extract names i.e;
open(my$Fh, '<', 'printable address.txt') or die "!S";
open(my$F, '>', 'names.csv') or die "!S";
while (my#line =<$Fh> ) {
for(my$i =0;$i<=13655;$i++){
if ($line[$i]=~/^"/) {
print $F $line[$i];
}
}
}
It works fine and it extracts names exactly .Now my aim is to extract address that is like
BENGALI MARKET
xyz -785698."
D.No: 4318/3,Ansari Road, Dariya Gunj,
xbc - 289235."
In CSV file. How to do this please tell me
There are a lot of flaws with your original problem. Should address those before suggesting any enhancements:
Always have use strict; and use warnings; at the top of every script.
Your or die "!S" statements are broken. The error code is actually in $!. However, you can skip the need to do that by just having use autodie;
Give your filehandles more meaningful names. $Fh and $F say nothing about what those are for. At minimum label them as $infh and $outfh.
The while (my #line = <$Fh>) { is flawed as that can just be reduced to my #line = <$Fh>;. Because you're going readline in a list context it will slurp the entire file, and the next loop it will exit. Instead, assign it to a scalar, and you don't even need the next for loop.
If you wanted to slurp your entire file into #line, your use of for(my$i =0;$i<=13655;$i++){ is also flawed. You should iterate to the last index of #line, which is $#line.
if ($line[$i]=~/^"/) { is also flawed as you leave the quote character " at the beginning of your names that you're trying to match. Instead add a capture group to pull the name.
With the suggested changes, the code reduces to:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'printable address.txt';
open my $outfh, '>', 'names.csv';
while (my $line = <$infh>) {
if ($line =~ /^"(.*)/) {
print $outfh "$1\n";
}
}
Now if you also want to isolate the address, you can use a similar method as you did with the name. I'm going to assume that you might want to build the whole address in a variable so you can do something more complicated with it than throwing them blindly at a file. However, mirroring the file setup for now:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'printable address.txt';
open my $namefh, '>', 'names.csv';
open my $addressfh, '>', 'address.dat';
my $address = '';
while (my $line = <$infh>) {
if ($line =~ /^"(.*)/) {
print $namefh "$1\n";
} elsif ($line =~ /(.*)"$/) {
$address .= $1;
print $addressfh "$address\n";
$address = '';
} else {
$address .= $line;
}
}
Ultimately, no matter what you want to use your data for, your best solution is probably to output it to a real CSV file using Text::CSV. That way it can be imported into a spreadsheet or some other system very easily, and you won't have to parse it again.
use strict;
use warnings;
use autodie;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1, eol => "\n" } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $infh, '<', 'printable address.txt';
open my $outfh, '>', 'address.csv';
my #data;
while (my $line = <$infh>) {
# Name Field
if ($line =~ /^"(.*)/) {
#data = ($1, '');
# End of Address
} elsif ($line =~ /(.*)"$/) {
$data[1] .= $1;
$csv->print($outfh, \#data);
# Address lines
} else {
$data[1] .= $line;
}
}

Unterminated `s' command

I am working on a perl script that finds will take customer service information and change the service ids. The code i have now can take a 2 column csv file for reference, and swap the numbers, but when i try to use "\n" or "\|" like i need to, it will give me the `ol:
"sed: -e expression #1, char 29: unterminated `s' command"
Here's my original code that works for changing JUST the number within the quotes:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV; #load Text::CSV module for parsing CSV
use LWP::Simple; #load LWP module for doing HTTP get
my $file = 'sid_swap.csv'; #CSV file to parse and load
my $csv = Text::CSV->new(); #create a new Text::CSV object
open (CSV, "<", $file) or die $!; #open CSV file for parsing
while (<CSV>) {
if ($csv->parse($_)) {
my #columns = $csv->fields(); #parse csv files and load into an array for each row
my $newSID = $columns[0];
my $oldSID = $columns[1];
system("sed -i 's/\<row SERVICE_MENU_ID=\"$oldSID\"/\<row SERVICE_MENU_ID=\"$newSID\"/g' customer_data.txt");
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close CSV;
And here is the new code that throws the error:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV; #load Text::CSV module for parsing CSV
use LWP::Simple; #load LWP module for doing HTTP get
my $file = 'sid_swap.csv'; #CSV file to parse and load
my $csv = Text::CSV->new(); #create a new Text::CSV object
open (CSV, "<", $file) or die $!; #open CSV file for parsing
while (<CSV>) {
if ($csv->parse($_)) {
my #columns = $csv->fields(); #parse csv files and load into an array for each row
my $newSID = $columns[0];
my $oldSID = $columns[1];
system("sed -i 's/\<row SERVICE_MENU_ID=\"$oldSID\"/\n$newSID\|/g' customer_data.txt");
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close CSV;
Thanks for any help you can provide!
To debug, change system("sed ...") to die("sed ...");. You'll see you're what trying to execute
sed -i 's/<row SERVICE_MENU_ID="OLDSID"/
NEWSID|/g' customer_data.t
I guess sed doesn't like actual newlines in the middle of its arguments? That can be fixed using proper escaping, but your approach is... insane. You're processing the entire file for each row of the CSV!
open(my $CSV, "<", $file) or die $!; # Let's not use global vars!
my %fixes;
while (<$CSV>) {
$csv->parse($_)
or die "Failed to parse line: ".$csv->error_input."\n";
my ($newSID, $oldSID) = $csv->fields();
$fixes{$oldSID} = $newSID;
}
my $pat = join "|", map quotemeta, keys(%fixes);
my $re = qr/$pat/;
{
local #ARGV = 'customer_data.txt';
local $^I = ''; # "perl -i" mode
while (<>) {
s/<row SERVICE_MENU_ID="($re)"/\n$fixes{$1}|/g;
print;
}
}
Doing it in one pass also solve the problem of
1111,2222
2222,3333
being the same as
1111,3333
2222,3333