Using Perl, i need to parse and rearrange csv files that has some dynamic fields (devices and associated values)
Here is the original csv (the header is here for description only)
DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,sda,sda1,sda2,sda3,sdb,sdb1,sdb2,sdb3
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,0.0,0.0,0.0,0.0,18.0,0.0,18.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:49,T0002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:51,T0003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:53,T0004,0.0,0.0,0.0,0.0,369.8,0.0,369.8,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:55,T0005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I need it to be transformed into:
DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,device,value
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,sda,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,sda1,0.0
... and so on
Here is the sample code that generates the csv file based on original data:
if (((rindex $l,"DISKBUSY,") > -1)) {
#Open destination file
if( ! open(FILE,">>".$dstfile_DISKBUSY) ) {
exit(1);
}
(my #line) = split(",",$l);
my $section = "DISKBUSY";
my $write = $section.",".$SerialNumber.",".$hostnameT.",".
$timestamp.",".$line[1];
my $i = 2;
while ($i <= $#line) {
$write = $write.','.$line[$i];
$i = $i + 1;
}
print (FILE $write."\n");
close( FILE );
}
I need to rearrange it as described to be able to work with the data in a generic way, but dynamic fields (name of devices) drives me crazy :-)
Many thanks for any help !
You can use Text::CSV:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
eol => "\n"
}) or die "Cannot use CSV: " . Text::CSV->error_diag();
open my $fh, '<', 'file.csv' or die $!;
my #columns = #{ $csv->getline($fh) };
my #device_columns = #columns[5..$#columns];
my #header = (#columns[0..4], "device", "value");
$csv->print(\*STDOUT, \#header);
while (my $row = $csv->getline($fh)) {
foreach my $i (0..$#device_columns) {
my #output = (#$row[0..4], $device_columns[$i], $row->[5+$i]);
$csv->print(\*STDOUT, \#output);
}
}
close $fh;
Output:
DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,device,value
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda1,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda2,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda3,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb,18.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb1,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb2,18.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb3,0.0
(this is only the output for the first row of your input data)
Better solution
The following uses getline_hr to return each row in the input CSV as a hashref, which makes the code a bit cleaner:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
eol => "\n"
}) or die "Cannot use CSV: " . Text::CSV->error_diag();
open my $fh, '<', 'file.csv' or die $!;
$csv->column_names($csv->getline($fh));
my #cols = ( $csv->column_names );
my #devices = splice #cols, 5;
my #header = ( #cols, "device", "value" );
$csv->print(\*STDOUT, \#header);
while (my $hr = $csv->getline_hr($fh)) {
foreach my $device (#devices) {
my #output = ( #$hr{#cols}, $device, $hr->{$device} );
$csv->print(\*STDOUT, \#output);
}
}
close $fh;
Use the Text::CSV module.
You can assign header names with $csv->column_names(#column_names) and then use $csv->getline_hr to get the line as a hash reference where the hash reference will be keyed by your column names. This will make it much easier to parse your file.
You don't have to use Text::CSV to write back your file (although it makes sure your file is written correctly), but you should use it to parse your data.
Related
Hi am newbie fir perl scripting, i need a help implement a logic for sorting CSV file header based column values,.
Example:
S.NO,NAME,S2,S5,S3,S4,S1
1,aaaa,88,99,77,55,66
2,bbbb,66,77,88,99,55
3,cccc,55,44,77,88,66
4,dddd,77,55,66,88,99
now i want to sort this file as below..
s.no,s2,s4,s5,s1,s0,name => that's how i want is as i defined order of headers like s.no,name,s1,s2,s3,s4,s5 and it's respective whole columns values also should change based on header exchange, how to do it perl this one...?
That's the required output is like following bellow,
S.NO,NAME,S1,S2,S3,S4,S5
1,aaaaaaa,66,88,77,55,99
2,bbbbbbb,55,66,88,77,99
3,ccccccc,66,55,77,88,44
4,ddddddd,99,77,66,88,55
or what the order i want in column headers, like below.
S.NO,NAME,S5,S4,S3,S2,S1 -> like as per my requirement i need to re-order my columns header and it's respective columns value also..
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file = 'a1.csv';
my $size = 3;
my #files;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1, sep_char => ';' });
open my $in, "<:encoding(utf8)", $file or die "$file: $!";
while (my $row = $csv->getline($in)) {
if (not #files) {
my $file_counter = int #$row / $size;
$file_counter++ if #$row % $size;
for my $i (1 .. $file_counter) {
my $outfile = "output$i.csv";
open my $out, ">:encoding(utf8)", $outfile or die "$outfile: $!";
push #files, $out;
}
}
my #fields = #$row;
foreach my $i (0 .. $#files) {
my $from = $i*$size;
my $to = $i*$size+$size-1;
$to = $to <= $#fields ? $to : $#fields;
my #data = #fields[$from .. $to];
$csv->print($files[$i], \#data);
print {$files[$i]} "\n";
}
}
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use Text::CSV qw();
my #headers = qw(s.no name s1 s2 s3 s4 s5);
my $csv_in = Text::CSV->new({binary => 1, auto_diag => 1});
my $csv_out = Text::CSV->new({binary => 1, auto_diag => 1});
open my $in, '<:encoding(UTF-8)', 'a1.csv';
open my $out, '>:encoding(UTF-8)', 'output1.csv';
$csv_in->header($in);
$csv_out->say($out, [#headers]);
while (my $row = $csv_in->getline_hr($in)) {
$csv_out->say($out, [$row->#{#headers}]);
}
The handy Text::AutoCSV module lets you rearrange the column order as a one-liner:
$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(in_file=>"in.csv",out_file=>"out.csv",out_fields=>["SNO","NAME","S1","S2","S3","S5"])->write()'
$ cat out.csv
s.no,name,s1,s2,s3,s5
1,aaaa,66,55,77,99
2,bbbb,55,99,88,77
3,cccc,66,88,77,44
4,dddd,99,88,66,55
I'm not sure what your actual desired order of fields is because you have two and both of them include columns that aren't in the sample input file (It has two s2 columns; is one of them supposed to be s4?), but you should get the idea. Column names have to be all caps with special characters like . removed, but the actual names are used for the output.
my $eix = "001"; $csv_in->header ($in, munge_column_names => sub { s/^$/"E".$eix++/er/; });
I would like to read in a .csv file using CSV_XS then select columns from with by header to match what is stored in an array outputting a new .csv
use strict;
use warnings;
use Text::CSV_XS;
my $csvparser = Text::CSV_XS->new () or die "".Text::CSV_XS->error_diag();
my $file;
my #headers;
foreach $file (#args){
my #CSVFILE;
my $csvparser = Text::CSV_XS->new () or die "".Text::CSV_XS->error_diag();
for my $line (#csvfileIN) {
$csvparser->parse($line);
my #fields = $csvparser->fields;
$line = $csvparser->combine(#fields);
}
}
The following example, just parse a CSV file to a variable, then you can match, remove, add lines to that variable, and write back the variable to the same CSV file.
In this example i just remove one entry line from the CSV.
First, i would just parse the CSV file.
use Text::CSV_XS qw( csv );
$parsed_file_array_of_hashesv = csv(
in => "$input_csv_filename",
sep => ';',
headers => "auto"
); # as array of hash
Second, once you have the $parsed_file_array_of_hashesv, now you can loop that array in perl and detect the line you want to remove from the array.
and then remove it using
splice ARRAY, OFFSET, LENGTH
removes anything from the OFFSET index through the index OFFSET+LENGT
lets assume index 0
my #extracted_array = #$parsed_file_array_of_hashesv; #dereference hashes reference
splice #extracted_array, 0, 1;#remove entry 0
$ref_removed_line_parsed = \#extracted_array; #referece to array
Third, write back the array to the CSV file
$current_metric_file = csv(
in => $ref_removed_line_parsed, #only accepts referece
out => "$output_csv_filename",
sep => ';',
eol => "\n", # \r, \n, or \r\n or undef
#headers => \#sorted_column_names, #only accepts referece
headers => "auto"
);
Notice, that if you use the \#sorted_column_names you will be able to control the order of the columns
my #sorted_column_names;
foreach my $name (sort {lc $a cmp lc $b} keys %{ $parsed_file_array_of_hashesv->[0] }) { #all hashes have the same column names so we choose the first one
push(#sorted_column_names,$name);
}
That should write the CSV file without your line.
use open ":std", ":encoding(UTF-8)";
use Text::CSV_XS qw( );
# Name of columns to copy to new file.
my #col_names_out = qw( ... );
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
for (...) {
my $qfn_in = ...;
my $qfn_out = ...;
open(my $fh_in, "<", $qfn_in)
or die("Can't open \"$qfn_in\": $!\n");
open(my $fh_out, "<", $qfn_out)
or die("Can't create \"$qfn_out\": $!\n");
$csv->column_names(#{ $csv->getline($fh_in) });
$csv->say($fh_out, \#col_names_out);
while (my $row = $csv->getline_hr($fh_in)) {
$csv->say($fh_out, [ #$row{#col_names_out} ]);
}
}
I've asked this question before how to do this with AWK but it doesn't handle it all that well.
The data has semicolons in quoted fields, which AWK doesn't take into account. So I was trying it in perl with the text::csv module so I don't have to think about that. The problem is I don't know how to output it to files based on a column value.
Short example from previous question, the data:
10002394;"""22.98""";48;New York;http://testdata.com/bla/29012827.jpg;5.95;93962094820
10025155;27.99;65;Chicago;http://testdata.com/bla/29011075.jpg;5.95;14201021349
10003062;19.99;26;San Francisco;http://testdata.com/bla/29002816.jpg;5.95;17012725049
10003122;13.0;53;"""Miami""";http://testdata.com/bla/29019899.jpg;5.95;24404000059
10029650;27.99;48;New York;http://testdata.com/bla/29003007.jpg;5.95;3692164452
10007645;20.99;65;Chicago;"""http://testdata.com/bla/28798580.jpg""";5.95;10201848233
10025825;12.99;65;Chicago;"""http://testdata.com/bla/29017837.jpg""";5.95;93962025367
The desired result:
File --> 26.csv
10003062;19.99;26;San Francisco;http://testdata.com/bla/29002816.jpg;5.95;17012725049
File --> 48.csv
10002394;22.98;48;New York;http://testdata.com/bla/29012827.jpg;5.95;93962094820
10029650;27.99;48;New York;http://testdata.com/bla/29003007.jpg;5.95;3692164452
File --> 53.csv
10003122;13.0;53;Miami;http://testdata.com/bla/29019899.jpg;5.95;24404000059
File --> 65.csv
10025155;27.99;65;Chicago;http://testdata.com/bla/29011075.jpg;5.95;14201021349
10007645;20.99;65;Chicago;http://testdata.com/bla/28798580.jpg;5.95;10201848233
10025825;12.99;65;Chicago;http://testdata.com/bla/29017837.jpg;5.95;93962025367
This is what I have so far. EDIT: Modified code:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
#use Data::Dumper;
use Time::Piece;
my $inputfile = shift || die "Give input and output names!\n";
open my $infile, '<', $inputfile or die "Sourcefile in use / not found :$!\n";
#binmode($infile, ":encoding(utf8)");
my $csv = Text::CSV_XS->new({binary => 1,sep_char => ";",quote_space => 0,eol => $/});
my %fh;
my %count;
my $country;
my $date = localtime->strftime('%y%m%d');
open(my $fh_report, '>', "report$date.csv");
$csv->getline($infile);
while ( my $elements = $csv->getline($infile)){
EDITED IN:
__________
next unless ($elements->[29] =~ m/testdata/);
for (#$elements){
next if ($elements =~ /apple|orange|strawberry/);
}
__________
for (#$elements){
s/\"+/\"/g;
}
my $filename = $elements->[2];
$shop = $elements->[3] .";". $elements->[2];
$count{$country}++;
$fh{$filename} ||= do {
open(my $fh, '>:encoding(UTF-8)', $filename . ".csv") or die "Could not open file '$filename'";
$fh;
};
$csv->print($fh{$filename}, $elements);
}
#print $fh_report Dumper(\%count);
foreach my $name (reverse sort { $count{$a} <=> $count{$b} or $a cmp $b } keys %count) {
print $fh_report "$name;$count{$name}\n";
}
close $fh_report;
Errors:
Can't call method "print" on an undefined value at sort_csv_delimiter.pl line 28, <$infile> line 2
I've been messing around with this but I'm totally at a loss. Can someone help me?
My guess is that you want hash of cached file handles,
my %fh;
while ( my $elements = $csv->getline( $infile ) ) {
my $filename = $elements->[2];
$fh{$filename} ||= do {
open my $fh, ">", "$filename.csv" or die $!;
$fh;
};
# $csv->combine(#$elements);
$csv->print($fh{$filename}, $elements);
}
I don't see an instance of your stated problem -- occurrences of the semicolon separator character ; within quoted fields -- but you are correct that Text::CSV will handle it correctly.
This short program reads your example data from the DATA file handle and prints the result to STDOUT. I presume you know how to read from or write to different files if you wish.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ';', eol => $/ });
my #data;
while ( my $row = $csv->getline(\*DATA) ) {
push #data, $row;
}
my $file;
for my $row ( sort { $a->[2] <=> $b->[2] or $a->[0] <=> $b->[0] } #data ) {
unless (defined $file and $file == $row->[2]) {
$file = $row->[2];
printf "\nFile --> %d.csv\n", $file;
}
$csv->print(\*STDOUT, $row);
}
__DATA__
10002394;22.98;48;http://testdata.com/bla/29012827.jpg;5.95;93962094820
10025155;27.99;65;http://testdata.com/bla/29011075.jpg;5.95;14201021349
10003062;19.99;26;http://testdata.com/bla/29002816.jpg;5.95;17012725049
10003122;13.0;53;http://testdata.com/bla/29019899.jpg;5.95;24404000059
10029650;27.99;48;http://testdata.com/bla/29003007.jpg;5.95;3692164452
10007645;20.99;65;http://testdata.com/bla/28798580.jpg;5.95;10201848233
10025825;12.99;65;http://testdata.com/bla/29017837.jpg;5.95;93962025367
output
File --> 26.csv
10003062;19.99;26;http://testdata.com/bla/29002816.jpg;5.95;17012725049
File --> 48.csv
10002394;22.98;48;http://testdata.com/bla/29012827.jpg;5.95;93962094820
10029650;27.99;48;http://testdata.com/bla/29003007.jpg;5.95;3692164452
File --> 53.csv
10003122;13.0;53;http://testdata.com/bla/29019899.jpg;5.95;24404000059
File --> 65.csv
10007645;20.99;65;http://testdata.com/bla/28798580.jpg;5.95;"10201848233 "
10025155;27.99;65;http://testdata.com/bla/29011075.jpg;5.95;14201021349
10025825;12.99;65;http://testdata.com/bla/29017837.jpg;5.95;93962025367
Update
I have just realised that your "desired result" isn't the output that you expect to see, but rather the way separate records are written to different files. This program solves that.
It looks from your question as though you want the data sorted in order of the first field as well, and so I have read all of the file into memory and printed a sorted version to the relevant files. I have also used autodie to avoid having to code status checks for all the IO operations.
use strict;
use warnings;
use autodie;
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ';', eol => $/ });
my #data;
while ( my $row = $csv->getline(\*DATA) ) {
push #data, $row;
}
my ($file, $fh);
for my $row ( sort { $a->[2] <=> $b->[2] or $a->[0] <=> $b->[0] } #data ) {
unless (defined $file and $file == $row->[2]) {
$file = $row->[2];
open $fh, '>', "$file.csv";
}
$csv->print($fh, $row);
}
close $fh;
__DATA__
10002394;22.98;48;http://testdata.com/bla/29012827.jpg;5.95;93962094820
10025155;27.99;65;http://testdata.com/bla/29011075.jpg;5.95;14201021349
10003062;19.99;26;http://testdata.com/bla/29002816.jpg;5.95;17012725049
10003122;13.0;53;http://testdata.com/bla/29019899.jpg;5.95;24404000059
10029650;27.99;48;http://testdata.com/bla/29003007.jpg;5.95;3692164452
10007645;20.99;65;http://testdata.com/bla/28798580.jpg;5.95;10201848233
10025825;12.99;65;http://testdata.com/bla/29017837.jpg;5.95;93962025367
FWIW I have done this using Awk (gawk):
awk --assign col=2 'BEGIN { if(!(col ~/^[1-9]/)) exit 2; outname = "part-%s.txt"; } !/^#/ { out = sprintf(outname, $col); print > out; }' bigfile.txt
other_process data | awk --assign col=2 'BEGIN { if(!(col ~/^[1-9]/)) exit 2; outname = "part-%s.txt"; } !/^#/ { out = sprintf(outname, $col); print > out; }'
Let me explain the awk script:
BEGIN { # execution block before reading any file (once)
if(!(col ~/^[1-9]/)) exit 2; # assert the `col` variable is a positive number
outname = "part-%s.txt"; # formatting string of the output file names
}
!/^#/ { # only process lines not starting with '#' (header/comments in various data files)
out = sprintf(outname, $col); # format the output file name, given the value in column `col`
print > out; # put the line to that file
}
If you like you can add a variable to specify a custom filename or use the current filename (or STDIN) as prefix:
NR == 1 { # at the first file (not BEGIN, as we might need FILENAME)
if(!(col ~/^[1-9]/)) exit 2; # assert the `col` variable is a positive number
if(!outname) outname = (FILENAME == "-" ? "STDIN" : FILENAME); # if `outname` variable was not provided (with `-v/--assign`), use current filename or STDIN
if(!(outname ~ /%s/)) outname = outname ".%s"; # if `outname` is not a formatting string - containing %s - append it
}
!/^#/ { # only process lines not starting with '#' (header/comments in various data files)
out = sprintf(outname, $col); # format the output file name, given the value in column `col`
print > out; # put the line to that file
}
Note: if you provide multiple input files, only the first file's name will be used as output prefix. To support multiple input files and multiple prefixes, you can use FNR == 1 instead and add another variable to distinguish between user-provided outname and the auto-generated one.
I am trying to parse a file where the header row is at row 8. From row 9-n is my data. How can I use Text::CSV to do this? I am having trouble, my code is below:
my #cols = #{$csv->getline($io, 8)};
my $row = {};
$csv->bind_columns (\#{$row}{#cols});
while($csv->getline($io, 8)){
my $ip_addr = $row->{'IP'};
}
use Text::CSV;
my $csv = Text::CSV->new( ) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $io, "test.csv" or die "test.csv: $!";
my $array_ref = $csv->getline_all($io, 8);
my $record = "";
foreach $record (#$array_ref) {
print "$record->[0] \n";
}
close $io or die "test.csv: $!";
Are you dead-set on using bind_columns? I think I see what you're trying to do, and it's notionally very creative, but if all you want is a way to reference the column by the header name, how about something like this:
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1 } );
my (%header);
open my $io, "<", '/var/tmp/foo.csv' or die $!;
while (my $row = $csv->getline ($io)) {
next unless $. > 7;
my #fields = #$row;
unless (%header) {
$header{$fields[$_]} = $_ for 0..$#fields;
next;
}
my $ip_addr = $fields[$header{'IP'}];
print "$. => $ip_addr\n";
}
close $io;
Sample Input:
Test Data,,,
Trash,,,
Test Data,,,
Trash,,,
Beans,Joe,10.224.38.189,XYZ
Beans,Joe,10.224.38.190,XYZ
Beans,Joe,10.224.38.191,XYZ
Last Name,First Name,IP,Computer
Beans,Joe,10.224.38.192,XYZ
Beans,Joe,10.224.38.193,XYZ
Beans,Joe,10.224.38.194,XYZ
Beans,Joe,10.224.38.195,XYZ
Beans,Joe,10.224.38.196,XYZ
Beans,Joe,10.224.38.197,XYZ
Output:
9 => 10.224.38.192
10 => 10.224.38.193
11 => 10.224.38.194
12 => 10.224.38.195
13 => 10.224.38.196
14 => 10.224.38.197
I have a csv file that I am searching for lines that contain a certain model. The program works perfectly when searching for '2GM' model but NOT for '2GM(F)'
This is the program:
#!/usr/bin/perl
# Searches modeltest.txt for all instances of model
# Writes a file called <your model>.txt with all lines
# in modeltest.txt where the model is found
# Edit $model for different uses
use strict;
use warnings;
use Text::CSV;
my $input_file = 'modeltest.txt';
my #lines = ();
# my $model = '2GM'; # Search for 2GM - WORKS PERFECTLY
my $model = '2GM(F)'; # Search for 2GM(F) - DOES NOT WORK!
# my $model = '2GM\(F\)'; # Does not work either!
print "Search pattern is $model\n";
my $output_file = $model . '.txt';
my $csv = Text::CSV->new({binary => 1, auto_diag => 1, eol=> "\012"})
or die "Cannot use CSV: ".Text::CSV->error_diag ();
print "Searching modeltest.txt for $model....\n";
open my $infh, '<', $input_file or die "Can't open '$input_file':$!" ;
open my $outfh, '>', $output_file or die "Can't open '$output_file':$!" ;
while (my $row = $csv->getline($infh))
{
my #fields = $csv->fields();
if (/^($model)$/ ~~ #fields) # search for pattern
{
$csv->print ($outfh, ["Y $fields[1]",$model]) or $csv->error_diag;
}
}
close $infh;
close $outfh;
$csv->eof or die "Processing of '$input_file' terminated prematurely\n";
print "All Done see output files...\n";
Here is the modeltest.txt file:
3,721575-42702,121575-42000,"PUMP ASSY, WATER",,26,COOLING SEA WATER PUMP,-,2GM(F),3GM(F),-,3HM,3HMF,,
1,721575-42702,121575-42000,"PUMP ASSY, WATER",,73,COOLING SEA WATER PUMP,-,2GM,3GM,-,3HM,-,,
45,103854-59191,,"BOLT ASSY, JOINT M12",W,38,FUEL PIPE,1GM,2GM(F),3GM(F),3GMD,3HM,3HMF,,
21,104200-11180,,"RETAINER, SPRING",,11,CYLINDER HEAD,1GM,2GM(F),3GM(F),3GMD,-,-,,
24,23414-080000,,"GASKET, 8X1.0",,77,FUEL PIPE,-,2GM,3GM,-,3HM,-,,
3,124223-42092,124223-42091,IMPELLER,,73,COOLING SEA WATER PUMP,-,2GM,3GM,-,3HM,-,,
Here is the output for 2GM.txt
"Y 721575-42702",2GM
"Y 23414-080000",2GM
"Y 124223-42092",2GM
There is no output for 2GM(F) - the program does not work! and I have no idea why?
Can anyone throw some light onto my problem?
YES this Worked Thank you again !!
Happy not to be using smartmatch...
Did the following:
Changed the search expression to
my $model = "2GM\(F\)";
Used the following code
while (my $row = $csv->getline($infh))
{
my #fields = $csv->fields();
foreach my $field (#fields)
{
if ($model eq $field) # search for pattern match in any field
{
$csv->print ($outfh, ["Y $fields[1]",$model]) or $csv->error_diag;
}
}
}
Parentheses have a special meaning in regular expressions, they create capture groups.
If you want to match literal parentheses(or any other special character) in a regular expression you need to escape them with backslashes, so your search pattern needs to be 2GM\(F\).
You can also use \Q and \E to disable special characters in your pattern match and leave your search pattern the same:
if (/^(\Q$model\E)$/ ~~ #fields) # search for pattern
...
The smartmatch operator ~~ is deprecated I believe, it would be more straightforward to loop over #fields:
foreach my $field ( $csv->fields() ) {
if (/^($model)/ =~ $field) # search for pattern
...
}
And really there is no reason to pattern match when you can compare directly:
foreach my $field ( #{$csv->fields()} ) {
if ($model eq $field) # search for pattern
...
}
It is best to use \Q in the regex so that you don't have to mess with escaping characters when you define $model.
The data is already in the array referred to by $row - there is no need to call fields to fetch it again.
It is much clearer, and may be slightly faster, to use any from List::Util
It's tidier to use autodie if all you want to do is die on an IO error
Setting auto_diag to a value greater than one will cause it to die in the case of any errors instead of just warning
This is a version of your own program with these issues altered
use strict;
use warnings;
use autodie;
use Text::CSV;
use List::Util 'any';
my $input_file = 'modeltest.txt';
my $model = '2GM(F)';
my $output_file = "$model.txt";
my $csv = Text::CSV->new({ binary => 1, eol => $/, auto_diag => 2 })
or die "Cannot use CSV: " . Text::CSV->error_diag;
open my $infh, '<', $input_file;
open my $outfh, '>', $output_file;
print qq{Searching "$input_file" for "$model"\n};
while (my $row = $csv->getline($infh)) {
if (any { /\Q$model/ } #$row) {
$csv->print($outfh, ["Y $row->[1]",$model]);
}
}
close $outfh;