Am trying to use database query to print query output to CSV but can't get the output on to separate lines. How to do so?
Here's the code:
use warnings;
use DBI;
use strict;
use Text::CSV;
#set up file
my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attribute.
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
#set up query
my $dbh = DBI->connect("DBI:mysql:db", "name") or die ("Error: $DBI::errstr");
my $sql = qq(select * from one_table join two_table using (primary_key));
my $query = $dbh->prepare($sql);
$query->execute;
#loop through returned rows of query and save each row as an array
while ( my (#row ) = $query->fetchrow_array ) {
#print each row to the csv file
$csv->print ($fh, [#row]);
# every line seems to be appended to same line in "new.csv"
# tried adding "\n" to no avail
}
close $fh or die "new.csv: $!";
This must be a common use case but couldn't find anything about issues with new lines.
I assume your problem is that all your CSV data ends up on the same line?
You should set the eol option in your CSV object:
my $csv = Text::CSV->new ( {
binary => 1, # should set binary attribute.
eol => $/, # end of line character
}) or die "Cannot use CSV: ".Text::CSV->error_diag ();
This character will be appended to the end of line in print. You might also consider not copying the values from your fetchrow call every iteration, since print takes an array ref. Using references will be more straightforward.
while (my $row = $query->fetchrow_arrayref) {
....
$csv->print($fh, $row);
First of all, you have a missing semicolon at the end of the line
my $sql = qq(select * from one_table join two_table using (primary_key))
By default, Text::CSV uses the current value of $\, the output record separator at end of line. And, again by default, this is set to undef, so you won't get any separator printed.
You can either set up your $csv object with
my $csv = Text::CSV->new({ binary => 1, eol => "\n" });
or just print the newline explicitly, like this. Note that's there's no need to fetch the row into an array and then copy it to an anonymous array to get this to work. fetchrow_arrayref will return an array reference that you can just pass directly to print.
while (my $row = $query->fetchrow_arrayref) {
$csv->print($fh, $row);
print $fh "\n";
}
try this sql query
select * from one_table join two_table using (primary_key)
INTO OUTFILE '/tmp/new.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
Related
How can i Split Value by Newline (\n) in some column, extract to new row and fill other column
My Example CSV Data (data.csv)
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,FTP
HTTP
HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP
SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
In Service column has multiple value, separate by new line.
I want to extract it and fill with other value in some row look like this.
1,test#email.com,192.168.10.110,FTP,,
1,test#email.com,192.168.10.110,HTTP,,
1,test#email.com,192.168.10.110,HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP,,
2,webmaster#email.com,192.168.10.111,SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
I try to parsing with Text::CSV, I can only split multiple ip and service But i Don't known to fill other value as above example.
#!/usr/bin/perl
use Text::CSV;
my $file = "data.csv";
my #csv_value;
open my $fh, '<', $file or die "Could not open $file: $!";
my $csv = Text::CSV->new;
my $sum = 0;
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
push #csv_value, $fields;
}
close $data;
Thank you in advance for any help you can provide.
To expand on my comment
perl -ne 'if (!/^\d/){print "$line$_";} else {print $_;} /(.*,).*/; $line=$1;' file1
Use the perl command line options
e = inline command
n = implicit loop, i.e. for every line in the file do the script
Each line of the file is now in the $_ default variable
if (!/^\d/){print "$line$_";} - if the line does not start with a digit print the $line (more later) variable, followed by default variable which is the line from the file
else {print $_;} - else just print the line
Now after we've done this if the line matches anything followed by a comma followed by anything, catch it with the regex bracket so it's put in $1. So for the first line $1 will be '1,test#email.com,192.168.10.109,'
/(.*,).*/; $line=$1;
Because we do this after the first line has been printed $line will always be the previous full line.
Your input CSV is broken. I would suggest to fix the generator.
With correctly formatted input CSV you will have to enable binary option in Text::CSV as your data contains non-ASCII characters.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
# input has non-ASCII characters
my $csv_in = Text::CSV->new({ binary => 1 });
my $csv_out = Text::CSV->new();
$csv_out->eol("\n");
while (my $row = $csv_in->getline(\*STDIN)) {
for my $protocol (split("\n", $row->[3])) {
$row->[3] = $protocol;
$csv_out->print(\*STDOUT, $row);
}
}
exit 0;
Test with fixed input data:
$ cat dummy.csv
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,"FTP
HTTP
HTTPS",,
2,webmaster#email.com,192.168.10.111,"SFTP
SNMP",,
3,admin#email.com,192.168.10.112,HTTP,,
$ perl dummy.pl <dummy.csv
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,FTP,,
1,test#email.com,192.168.10.109,HTTP,,
1,test#email.com,192.168.10.109,HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP,,
2,webmaster#email.com,192.168.10.111,SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
So lets say I have a file.txt, this documents Syntax is like this:
"1;22;333;'4444';55555",
I now want my code to do the following:
open the file = already done
read line and save each Parameter separated by ; into a variable like ( $one = 1, $two = 22, $three = 333, $four = '4444', $five = 55555; )
this step would be writing the variables into a DB but thats done already
Loop until all lines of the file are done
So I actually Need help with Step 2, i think I am able to do the Loop and DB code. Do you guys have any ideas or tips how I could do this? beginnerfriendly would be nice so I can learn out of it.
foreach $file (#file){
$currentfile = "$currentdir\\$file";
open(my $reader, "<", $currentfile) or die "Failed to open file: $!\n";
?????
close $reader;
}
If you're just doing 'numbered fields' then you should be thinking 'array':
use Data::Dumper;
while ( <$reader> ) {
chomp;
my #row = split /;/;
print Dumper \#row;
}
This will give you an array that you can access - e.g. $row[0] for the first element.
$VAR1 = [
'1',
'22',
'333',
'\'4444\'',
'55555'
];
If you know what the headers are 'named' and prefer to work on names you can do something similar with a hash:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #cols = qw ( id value fish name sprout );
while ( <DATA> ) {
my %row;
chomp;
#row{#cols} = split /;/;
print Dumper \%row;
}
__DATA__
1;22;333;'4444';55555
This gives instead:
$VAR1 = {
'fish' => '333',
'name' => '\'4444\'',
'id' => '1',
'value' => '22',
'sprout' => '55555'
};
Note - hashes are unordered, but their whole point is that you don't need to care about the 'order' - just print $row{name},"\n";
You need to read from the filehandle $reader, line by line. See the tutorial perlopentut and the full reference open.
Then you split each line by the separator ;, what returns a list which you assign to an array.
open my $reader, "<", $currentfile or die "Failed to open file: $!\n";
while (my $line = <$reader>) {
chomp($line);
my #params = split ';', $line;
# do something with #params, it will be overwritten on next iteration
}
close $reader;
The diamond operator <> reads from a filehandle, <$fh>, returning a line at a time. See about it in perlop. When there are no more lines it returns undef and looping stops. You may assign the string that it returns to a variable which you declare (my $line), which then exists only within the body of the while loop. If you don't, but do while (<$fh>) instead, the line is assigned to the special variable $_, which is default for many things in Perl.
The chomp removes the linefeed (new line) from the end of the line.
Note that '4444' from your example is not a number and cannot be used as such.
Alternatively, you can take a reference to the array with parameters on each line, and put it in another array which thus will in the end contain all lines.
my #all_params;
while (my $line = <$reader>) {
my #params = split ';', $line;
push #all_params, \#params;
}
Now #all_params has elements which are references, each to an array with parameters for one line. For how to work with references see the tutorial perlreftut and the Cookbook on complex data structures, perldsc.
The following is more complex but let me mention it since it's a bit of an idiom. You can do the above in one statement
my #all_params = map { [ split ';', $_ ] } <$reader>;
This uses map, which applies the code in { ... } to each element of the list that is submitted to it, returning a list. So it takes a list and returns the processed list. The [...] inside makes an anonymous array, equivalent to the reference we took of an array previously. The filehandle <$reader>returns all lines of the file in one list when invoked in the list context, which is in this case imposed by map (since it must receive a list).
An important one: always start your programs with
use warnings 'all';
use strict;
The order of these doesn't really matter. Mostly you'll see use strict; first.
Then your loop over filenames need be foreach my $file (#file) { ... } and you must declare all variables, so my $currentfile = ....
My script is fairly large but I'll simplify the code here.
Suppose that I create a CSV and I write the header like this:
my $csv = Text::CSV->new ({binary => 1}, eol => "\n");
open(my $out, ">", "$dir/out.csv") or die $!; #create
$out->print("CodeA,CodeB,Name,Count,Pos,Orientation\n"); #I write the header
Suppose that I got the some values stored in different variables and I want to write those variables as a line in the CSV.
I cannot figure out how, because on the Text::CSV documentation the print is not clearly explained, there's no direct examples and i don't know what an array ref is.
Here's a trivial example of using Text::CSV to write a CSV file. It generates a header line and a data line, and does so from fixed data.
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({binary => 1, eol => $/ })
or die "Failed to create a CSV handle: $!";
my $filename = "output.csv";
open my $fh, ">:encoding(utf8)", $filename or die "failed to create $filename: $!";
my(#heading) = ("CodeA", "CodeB", "Name", "Count", "Pos", "Orientation");
$csv->print($fh, \#heading); # Array ref!
my(#datarow) = ("A", "B", "Abelone", 3, "(6,9)", "NW");
$csv->print($fh, \#datarow); # Array ref!
close $fh or die "failed to close $filename: $!";
The row of data is collected in an array — I used #heading and #datarow. If I was outputting several rows, each row could be collected or created in #datarow and then output. The first argument to $csv->print should be the I/O handle — here, $fh, a file handle for the output file. The second should be an array ref. Using \#arrayname is one way of creating an array ref; the input routines for the Text::CSV module also create and return array refs.
Note the difference between the notation used here in the Text::CSV->new call and the notation used in your example. Also note that your $out->print("…"); call is using the basic file I/O and nothing to do with Text::CSV. Contrast with $csv->print($fh, …).
The rest of the code is more or less boilerplate.
output.csv
CodeA,CodeB,Name,Count,Pos,Orientation
A,B,Abelone,3,"(6,9)",NW
Note that the value with an embedded comma was surrounded by quotes by the Text::CSV module. The other values did not need quotes so they did not get them. You can tweak the details of the CSV output format with the options to Text::CSV->new.
For the headers you can use
$status = $csv->print ($out,[qw(CodeA CodeB Name Count Pos Orientation)]);
and for a row of values use
$status = $csv->print ($out,[$valueA,$valueB,$valueName,$valueCount,$valuePos,$valueOrientation]);
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
my $csv = Text::CSV->new ({
binary => 1,
auto_diag => 1,
sep_char => ',' # not really needed as this is the default
});
my $sum = 0;
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
$sum += $fields->[2];
}
if (not $csv->eof) {
$csv->error_diag();
}
close $data;
print "$sum\n";
I would like to read in a .csv file using CSV_XS then select columns from with by header to match what is stored in an array outputting a new .csv
use strict;
use warnings;
use Text::CSV_XS;
my $csvparser = Text::CSV_XS->new () or die "".Text::CSV_XS->error_diag();
my $file;
my #headers;
foreach $file (#args){
my #CSVFILE;
my $csvparser = Text::CSV_XS->new () or die "".Text::CSV_XS->error_diag();
for my $line (#csvfileIN) {
$csvparser->parse($line);
my #fields = $csvparser->fields;
$line = $csvparser->combine(#fields);
}
}
The following example, just parse a CSV file to a variable, then you can match, remove, add lines to that variable, and write back the variable to the same CSV file.
In this example i just remove one entry line from the CSV.
First, i would just parse the CSV file.
use Text::CSV_XS qw( csv );
$parsed_file_array_of_hashesv = csv(
in => "$input_csv_filename",
sep => ';',
headers => "auto"
); # as array of hash
Second, once you have the $parsed_file_array_of_hashesv, now you can loop that array in perl and detect the line you want to remove from the array.
and then remove it using
splice ARRAY, OFFSET, LENGTH
removes anything from the OFFSET index through the index OFFSET+LENGT
lets assume index 0
my #extracted_array = #$parsed_file_array_of_hashesv; #dereference hashes reference
splice #extracted_array, 0, 1;#remove entry 0
$ref_removed_line_parsed = \#extracted_array; #referece to array
Third, write back the array to the CSV file
$current_metric_file = csv(
in => $ref_removed_line_parsed, #only accepts referece
out => "$output_csv_filename",
sep => ';',
eol => "\n", # \r, \n, or \r\n or undef
#headers => \#sorted_column_names, #only accepts referece
headers => "auto"
);
Notice, that if you use the \#sorted_column_names you will be able to control the order of the columns
my #sorted_column_names;
foreach my $name (sort {lc $a cmp lc $b} keys %{ $parsed_file_array_of_hashesv->[0] }) { #all hashes have the same column names so we choose the first one
push(#sorted_column_names,$name);
}
That should write the CSV file without your line.
use open ":std", ":encoding(UTF-8)";
use Text::CSV_XS qw( );
# Name of columns to copy to new file.
my #col_names_out = qw( ... );
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
for (...) {
my $qfn_in = ...;
my $qfn_out = ...;
open(my $fh_in, "<", $qfn_in)
or die("Can't open \"$qfn_in\": $!\n");
open(my $fh_out, "<", $qfn_out)
or die("Can't create \"$qfn_out\": $!\n");
$csv->column_names(#{ $csv->getline($fh_in) });
$csv->say($fh_out, \#col_names_out);
while (my $row = $csv->getline_hr($fh_in)) {
$csv->say($fh_out, [ #$row{#col_names_out} ]);
}
}
I have two files both of them are delimited by pipe.
First file:
has may be around 10 columns but i am interested in first two columns which would useful in updating the column value of the second file.
first file detail:
1|alpha|s3.3|4|6|7|8|9
2|beta|s3.3|4|6|7|8|9
20|charlie|s3.3|4|6|7|8|9
6|romeo|s3.3|4|6|7|8|9
Second file detail:
a1|a2|**bob**|a3|a4|a5|a6|a7|a8|**1**|a10|a11|a12
a1|a2|**ray**|a3|a4|a5|a6|a7|a8||a10|a11|a12
a1|a2|**kate**|a3|a4|a5|a6|a7|a8|**20**|a10|a11|a12
a1|a2|**bob**|a3|a4|a5|a6|a7|a8|**6**|a10|a11|a12
a1|a2|**bob**|a3|a4|a5|a6|a7|a8|**45**|a10|a11|a12
My requirement here is to find unique values from 3rd column and also replace the 4th column from the last . The 4th column from the last may/may not have numeric number . This number would be appearing in the first field of first file as well. I need replace (second file )this number with the corresponding value that appears in the second column of the first file.
expected output:
unique string : ray kate bob
a1|a2|bob|a3|a4|a5|a6|a7|a8|**alpha**|a10|a11|a12
a1|a2|ray|a3|a4|a5|a6|a7|a8||a10|a11|a12
a1|a2|kate|a3|a4|a5|a6|a7|a8|**charlie**|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|**romeo**|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|45|a10|a11|a12
I am able to pick the unique string using below command
awk -F'|' '{a[$3]++}END{for(i in a){print i}}' filename
I would dont want to read the second file twice , first to pick the unique string and second time to replace 4th column from the last as the file size is huge. It would be around 500mb and there are many such files.
Currently i am using perl (Text::CSV) module to read the first file ( this file is of small size ) and load the first two columns into a hash , considering first column as key and second as value. then read the second file and replace the n-4 column with hash value. But this seems to be time consuming as Text::CSV parsing seems to be slow.
Any awk/perl solution keeping speed in mind would be really helpful :)
Note: Ignore the ** asterix around the text , they are just to highlight they are not part of the data.
UPDATE : Code
#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Utils;
use Text::CSV;
my %hash;
my $csv = Text::CSV->new({ sep_char => '|' });
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$data>) {
chomp $line;
if ($csv->parse($line)) {
my #fields = $csv->fields();
$hash{$field[0]}=$field[1];
} else {
warn "Line could not be parsed: $line\n";
}
}
close($data);
my $csv = Text::CSV->new({ sep_char => '|' , blank_is_undef => 1 , eol => "\n"});
my $file2 = $ARGV[1] or die "Need to get CSV file on the command line\n";
open ( my $fh,'>','/tmp/outputfile') or die "Could not open file $!\n";
open(my $data2, '<', $file2) or die "Could not open '$file' $!\n";
while (my $line = <$data2>) {
chomp $line;
if ($csv->parse($line)) {
my #fields = $csv->fields();
if (defined ($field[-4]) && looks_like_number($field[-4]))
{
$field[-4]=$hash{$field[-4]};
}
$csv->print($fh,\#fields);
} else {
warn "Line could not be parsed: $line\n";
}
}
close($data2);
close($fh);
Here's an option that doesn't use Text::CSV:
use strict;
use warnings;
#ARGV == 3 or die 'Usage: perl firstFile secondFile outFile';
my ( %hash, %seen );
local $" = '|';
while (<>) {
my ( $key, $val ) = split /\|/, $_, 3;
$hash{$key} = $val;
last if eof;
}
open my $outFH, '>', pop or die $!;
while (<>) {
my #F = split /\|/;
$seen{ $F[2] } = undef;
$F[-4] = $hash{ $F[-4] } if exists $hash{ $F[-4] };
print $outFH "#F";
}
close $outFH;
print 'unique string : ', join( ' ', reverse sort keys %seen ), "\n";
Command-line usage: perl firstFile secondFile outFile
Contents of outFile from your datasets (asterisks removed):
a1|a2|bob|a3|a4|a5|a6|a7|a8|alpha|a10|a11|a12
a1|a2|ray|a3|a4|a5|a6|a7|a8||a10|a11|a12
a1|a2|kate|a3|a4|a5|a6|a7|a8|charlie|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|romeo|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|45|a10|a11|a12
STDOUT:
unique string : ray kate bob
Hope this helps!
Use getline instead of parse, it is much faster. The following is a more idiomatic way of performing this task. Note that you can reuse the same Text::CSV object for multiple files.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Text::CSV;
my $csv = Text::CSV->new({
auto_diag => 1,
binary => 1,
blank_is_undef => 1,
eol => $/,
sep_char => '|'
}) or die "Can't use CSV: " . Text::CSV->error_diag;
open my $map_fh, '<', 'map.csv' or die "map.csv: $!";
my %mapping;
while (my $row = $csv->getline($map_fh)) {
$mapping{ $row->[0] } = $row->[1];
}
close $map_fh;
open my $in_fh, '<', 'input.csv' or die "input.csv: $!";
open my $out_fh, '>', 'output.csv' or die "output.csv: $!";
my %seen;
while (my $row = $csv->getline($in_fh)) {
$seen{ $row->[2] } = 1;
my $key = $row->[-4];
$row->[-4] = $mapping{$key} if defined $key and exists $mapping{$key};
$csv->print($out_fh, $row);
}
close $in_fh;
close $out_fh;
say join ',', keys %seen;
map.csv
1|alpha|s3.3|4|6|7|8|9
2|beta|s3.3|4|6|7|8|9
20|charlie|s3.3|4|6|7|8|9
6|romeo|s3.3|4|6|7|8|9
input.csv
a1|a2|bob|a3|a4|a5|a6|a7|a8|1|a10|a11|a12
a1|a2|ray|a3|a4|a5|a6|a7|a8||a10|a11|a12
a1|a2|kate|a3|a4|a5|a6|a7|a8|20|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|6|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|45|a10|a11|a12
output.csv
a1|a2|bob|a3|a4|a5|a6|a7|a8|alpha|a10|a11|a12
a1|a2|ray|a3|a4|a5|a6|a7|a8||a10|a11|a12
a1|a2|kate|a3|a4|a5|a6|a7|a8|charlie|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|romeo|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|45|a10|a11|a12
STDOUT
kate,bob,ray
This awk should work.
$ awk '
BEGIN { FS = OFS = "|" }
NR==FNR { a[$1] = $2; next }
{ !unique[$3]++ }
{ $(NF-3) = (a[$(NF-3)]) ? a[$(NF-3)] : $(NF-3) }1
END {
for(n in unique) print n > "unique.txt"
}' file1 file2 > output.txt
Explanation:
We set the input and output field separators to |.
We iterate through first file creating an array storing column one as key and assigning column two as the value
Once the first file is loaded in memory, we create another array by reading the second file. This array stores the unique values from column three of second file.
While reading the file, we look at the forth value from last to be present in our array from first file. If it is we replace it with the value from array. If not then we leave the existing value as is.
In the END block we iterate through our unique array and print it to a file called unique.txt. This holds all the unique entries seen on column three of second file.
The entire output of the second file is redirected to output.txt which now has the modified forth column from last.
$ cat output.txt
a1|a2|bob|a3|a4|a5|a6|a7|a8|alpha|a10|a11|a12
a1|a2|ray|a3|a4|a5|a6|a7|a8||a10|a11|a12
a1|a2|kate|a3|a4|a5|a6|a7|a8|charlie|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|romeo|a10|a11|a12
a1|a2|bob|a3|a4|a5|a6|a7|a8|45|a10|a11|a12
$ cat unique.txt
kate
bob
ray