How to print an array that it looks like a hash [duplicate] - perl

This question already has an answer here:
I need help in perl, how to write a code to get the output of my csv file in the form of a hash [closed]
(1 answer)
Closed 10 years ago.
I am new to Perl, and have to write a code which takes contents of a file into and array and print the output that it looks like a hash. Here is an example entry:
my %amino_acids = (F => ["Phenylalanine", "Phe", ["TTT", "TTC"]])
Out put should be exactly in above format.
Lines of Files are like this...
"Methionine":"Met":"M":"AUG":"ATG"
"Phenylalanine":"Phe":"F":"UUU, UUC":"TTT, TTC"
"Proline":"Pro":"P":"CCU, CCC, CCA, CCG":"CCT, CCC, CCA, CCG"
I have to take last codons after semicolon and ignore the first group.

Is it your intention to build the equivalent hash? Or do you really want the string format? This program uses Text::CSV to build the hash from the file and then dumps it using Data::Dump so that you have the string format as well.
use strict;
use warnings;
use Text::CSV;
use Data::Dump 'dump';
my $csv = Text::CSV->new({ sep_char => ':' });
open my $fh, '<', 'amino.txt' or die $!;
my %amino_acids;
while (my $data= $csv->getline($fh)) {
$amino_acids{$data->[2]} = [
$data->[0],
$data->[1],
[ $data->[4] =~ /[A-Z]+/g ]
];
}
print '$amino_acids = ', dump \%amino_acids;
output
$amino_acids = {
F => ["Phenylalanine", "Phe", ["TTT", "TTC"]],
M => ["Methionine", "Met", ["ATG"]],
P => ["Proline", "Pro", ["CCT", "CCC", "CCA", "CCG"]],
}
Update
If you really don't want to install modules (it is a very straightforward process and makes the code much more concise and reliable) then this does what you need.
use strict;
use warnings;
open my $fh, '<', 'amino.txt' or die $!;
print "my %amino_acids = (\n";
while (<$fh>) {
chomp;
my #data = /[^:"]+/g;
my #codons = $data[4] =~ /[A-Z]+/g;
printf qq{ %s => ["%s", "%s", [%s]],\n},
#data[2,0,1],
join ', ', map qq{"$_"}, #codons;
}
print ")\n";
output
my %amino_acids = (
M => ["Methionine", "Met", ["ATG"]],
F => ["Phenylalanine", "Phe", ["TTT", "TTC"]],
P => ["Proline", "Pro", ["CCT", "CCC", "CCA", "CCG"]],
)

Assuming you actually want valid perl as the output, this will do it:
open(my $IN, "<input.txt") or die $!;
while(<$IN>){
chomp;
my #tmp = split(':',$_);
if(#tmp != 5){
# error on this line
next;
}
my $group = join('","',split(/,\s*/,$tmp[4]));
print "\$amino_acids{$tmp[2]} = [$tmp[0],$tmp[1],[$group]];\n";
}
close $IN;
Using your sample lines, the output is:
$amino_acids{"M"} = ["Methionine","Met",["ATG"]];
$amino_acids{"F"} = ["Phenylalanine","Phe",["TTT","TTC"]];
$amino_acids{"P"} = ["Proline","Pro",["CCT","CCC","CCA","CCG"]];

#Borodin Thank you very much for your answer, actually I don't have to use Text::csv or Data::dump.I have to open a file and build the equivalent hash from the file.I am trying to do without using both, hopefully it will help.Thanks again!!!

Perl has no special method to print hashes. What you should probably do is create a hash when reading the file:
while (<FILE>) {
my #line = split ':'; # split the line into an array
$amino_acids{$line[0]} = \#line[1..-1]; # take elements 1..end
}
And then print out the hash one entry at a time:
foreach (keys %amino_acids) {
print "$_ => [", (join ",", #$amino_acids{$_}), "]\n";
}
Note that I didn't compile this, so it may need a small amount of work to get it done.

Related

Including a perl file that is generated in current file

I am working on a perl script that successfully generates output files containing hashes. I want to use those hashes in my file. Is it possible to include a file that is generated in that file or will I have to create another file?
Technically, it might be cleaner to start a new .pl file that uses those hashes, but I would like to keep everything in a single script if possible. Is it even possible to do so?
Edit: I'm just unsure if I can "circle" it back around so I can use those hashes in my file because the hashes are generated on a weekly basis. I don't want my file to mistakenly reach out for last week's hashes instead of the newly generated ones. I have not yet wrote my script in a manner to classify each week's generated hashes.
In summary, here is what my file does. It extracts a table from another file. removes columns and rows that are not needed. Once left with the only two columns needed, it takes them and puts them into a hash. One column being the key and the other being the value. For this reason, I've found Data::Dumper to be the best option for my hashes. I'm intermediate in Perl and this is a script I'm putting together for an internship.
Here is an example how you can save a hash as JSON to a file and later read back the JSON to a perl hash. This example is using JSON::XS:
use strict;
use warnings;
use Data::Dumper;
use JSON::XS;
{
my %h = (a => 1, b => 2);
my $str = encode_json( \%h );
my $fn = 'test.json';
save_json( $fn, \%h );
my $h2 = read_json( $fn );
print Dumper( $h2 );
}
sub read_json {
my ( $fn ) = #_;
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my $str = do { local $/; <$fh> };
close $fh;
my $h = decode_json $str;
return $h;
}
sub save_json {
my ( $fn, $hash ) = #_;
my $str = encode_json( $hash );
open ( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
print $fh $str;
close $fh;
}
Output:
$VAR1 = {
'a' => 1,
'b' => 2
};
Some alternatives to JSON are YAML and Storable.

Record separator within a record separator

How can I make use of a record separator, and then simultaneously use a sub-record separator? Perhaps that isn't the best way to think about what I am trying to do. Here is my goal:
I want to perform a while loop on a single tab delimitated item at a time, in a specified row of items. For every line (row) of tab separated items, I need to print the outcomes of all the while loops into a unique file. Allow the following examples to help clarify.
My input file will be something like the following. It will be called "Clustered_Barcodes.txt"
TTTATGC TTTATGG TTTATCC TTTATCG
TTTATAA TTTATAA TTTATAT TTTATAT TTTATTA
CTTGTAA
My perl code looks like the following:
#!/usr/bin/perl
use warnings;
use strict;
open(INFILE, "<", "Clustered_Barcodes.txt") or die $!;
my %hash = (
"TTTATGC" => "TATAGCGCTTTATGCTAGCTAGC",
"TTTATGG" => "TAGCTAGCTTTATGGGCTAGCTA",
"TTTATCC" => "GCTAGCTATTTATCCGCTAGCTA",
"TTTATCG" => "TAGCTAGCTTTATCGCGTACGTA",
"TTTATAA" => "TAGCTAGCTTTATAATAGCTAGC",
"TTTATAA" => "ATCGATCGTTTATAACGATCGAT",
"TTTATAT" => "TCGATCGATTTATATTAGCTAGC",
"TTTATAT" => "TAGCTAGCTTTATATGCTAGCTA",
"TTTATTA" => "GCTAGCTATTTATTATAGCTAGC",
"CTTGTAA" => "ATCGATCGCTTGTAACGATTAGC",
);
while(<INFILE>) {
$/ = "\n";
my #lines = <INFILE>;
open my $out, '>', "Clustered_Barcode_$..fasta" or die $!;
foreach my $sequence (#lines){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
}
}
}
My desired output would be three different files.
The first file will be called "Clustered_Barcode_1.fasta" and will look like:
>TTTATGC
TATAGCGCTTTATGCTAGCTAGC
>TTTATGG
TAGCTAGCTTTATGGGCTAGCTA
>TTTATCC
GCTAGCTATTTATCCGCTAGCTA
>TTTATCG
TAGCTAGCTTTATCGCGTACGTA
Note that this is formatted so that the keys are preceded by a carrot, and then on the next line is the longer associated sequence (value). This file includes all the sequences in the first line of Clustered_Barcodes.txt
My third file should be named "Clustered_Barcode_3.fasta" and look like the following:
>CTTGTAA
ATCGATCGCTTGTAACGATTAGC
When I run my code, it only takes the second and third lines of sequences in the input file. How can I start with the first line (by getting rid of the \n requirement for a record separator)? How can I then process each item at a time and then print the line's worth of results into one file? Also, if there is a way to incorporate the number of sequences into the file name, that would be great. It would help me to later organize the files by size. For example, the name could be something like "Clusterd_Barcodes_1_File_3_Sequences.fasta".
Thank you all.
OK, so here's one way to do it:
#!/usr/bin/perl
use strict;
use warnings;
Standard preamble.
my %hash = (
"TTTATGC" => "TATAGCGCTTTATGCTAGCTAGC",
"TTTATGG" => "TAGCTAGCTTTATGGGCTAGCTA",
"TTTATCC" => "GCTAGCTATTTATCCGCTAGCTA",
"TTTATCG" => "TAGCTAGCTTTATCGCGTACGTA",
"TTTATAA" => "TAGCTAGCTTTATAATAGCTAGC",
"TTTATAA" => "ATCGATCGTTTATAACGATCGAT",
"TTTATAT" => "TCGATCGATTTATATTAGCTAGC",
"TTTATAT" => "TAGCTAGCTTTATATGCTAGCTA",
"TTTATTA" => "GCTAGCTATTTATTATAGCTAGC",
"CTTGTAA" => "ATCGATCGCTTGTAACGATTAGC",
);
Set up the hash of sequences.
my $infile = 'Clustered_Barcodes.txt';
open my $infh, '<', $infile or die "$0: $infile: $!\n";
Open file for reading.
chomp(my #rows = readline $infh);
my $row_count = #rows;
Slurp all lines into memory in order to get the number of sequences. If you have too many sequences, this approach is not going to work (because you'll run out of memory (but that depends on how much RAM you have)).
my $i = 1;
for my $row (#rows) {
Loop over the lines.
my #fields = split /\t/, $row;
Split each line into fields separated by tabs.
my $outfile = "Clustered_Barcodes_${i}_File_${row_count}_Sequences.fasta";
$i++;
open my $outfh, '>', $outfile or die "$0: $outfile: $!\n";
Open current output file and increment counter.
for my $field (#fields) {
print $outfh ">$field\n$hash{$field}\n" if exists $hash{$field};
}
Write each field (and its mapping) to outfile.
}
And we're done. The main difference to your original code is using split /\t/ and foreach to loop over fields within a line.
We can do it without slurping, too:
while (my $row = readline $infh) {
chomp $row;
Loop over the lines, one by one. This replaces the 4 lines from chomp(my #rows = readline $infh); to for my $row (#rows) {.
But now we've lost the $i and $row_count variables, so we have to change the initialization of $outfile:
my $outfile = "Clustered_Barcodes_$..fasta";
That should be all the changes you need. (You can get $row_count back in this scenario by reading $infh twice (the first time just for counting, then seeking back to the start); this is left as an exercise for the reader.)
There's no need to read in the whole file that I see here. You just need to loop over the contents of each line:
while(my $line = <INFILE>) {
chomp $line;
open my $out, '>', "Clustered_Barcode_$..fasta" or die $!;
foreach my $sequence ( split /\t/, $line ){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
}
}
}

Count the number of items derived from split without putting into an array

I am looking to spare the use of an array for memory's sake, but still get the number of items derived from the split function for each pass of a while loop.
The ultimate goal is to filter the output files according to the number of their sequences, which could either be deduced by the number of rows the file has, or the number of carrots that appear, or the number of line breaks, etc.
Below is my code:
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
open(INFILE, "<", "Clustered_Barcodes.txt") or die $!;
my %hash = (
"TTTATGC" => "TATAGCGCTTTATGCTAGCTAGC",
"TTTATGG" => "TAGCTAGCTTTATGGGCTAGCTA",
"TTTATCC" => "GCTAGCTATTTATCCGCTAGCTA",
"TTTATCG" => "AGTCATGCTTTATCGCGATCGAT",
"TTTATAA" => "TAGCTAGCTTTATAATAGCTAGC",
"TTTATAA" => "ATCGATCGTTTATAACGATCGAT",
"TTTATAT" => "TCGATCGATTTATATTAGCTAGC",
"TTTATAT" => "TAGCTAGCTTTATATGCTAGCTA",
"TTTATTA" => "GCTAGCTATTTATTATAGCTAGC",
"CTTGTAA" => "ATCGATCGCTTGTAACGATTAGC",
);
while(my $line = <INFILE>){
chomp $line;
open my $out, '>', "Clustered_Barcode_$..txt" or die $!;
foreach my $sequence (split /\t/, $line){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
}
}
}
The input file, "Clustered_Barcodes.txt" when opened, looks like the following:
TTTATGC TTTATGG TTTATCC TTTATCG
TTTATAA TTTATAA TTTATAT TTTATAT TTTATTA
CTTGTAA
There will be three output files from the code, "Clustered_Barcode_1.txt", "Clustered_Barcode_2.txt", and "Clustered_Barcode_3.txt". An example of what the output files would look like could be the 3rd and final file, which would look like the following:
>CTTGTAA
ATCGATCGCTTGTAACGATTAGC
I need some way to modify my code to identify the number of rows, carrots, or sequences that appear in the file and work that into the title of the file. The new title for the above sequence could be something like "Clustered_Barcode_Number_3_1_Sequence.txt"
PS- I made the hash in the above code manually in attempt to make things simpler. If you want to see the original code, here it is. The input file format is something like:
>TAGCTAGC
GCTAAGCGATGCTACGGCTATTAGCTAGCCGGTA
Here is the code for setting up the hash:
my $dir = ("~/Documents/Sequences");
open(INFILE, "<", "~/Documents/Clustered_Barcodes.txt") or die $!;
my %hash = ();
my #ArrayofFiles = glob "$dir/*"; #put all files from the specified directory into an array
#print join("\n", #ArrayofFiles), "\n"; #this is a diagnostic test print statement
foreach my $file (#ArrayofFiles){ #make hash of barcodes and sequences
open (my $sequence, $file) or die "can't open file: $!";
while (my $line = <$sequence>) {
if ($line !~/^>/){
my $seq = $line;
$seq =~ s/\R//g;
#print $seq;
$seq =~ m/(CATCAT|TACTAC)([TAGC]{16})([TAGC]+)([TAGC]{16})(CATCAT|TACTAC)/;
$hash{$2} = $3;
}
}
}
while(<INFILE>){
etc
You can use regex to get the count:
my $delimiter = "\t";
my $line = "zyz pqr abc xyz";
my $count = () = $line =~ /$delimiter/g; # $count is now 3
print $count;
Your hash structure is not right for your problem as you have multiple entries for same ids. for example TTTATAA hash id has 2 entries in your %hash.
To solve this, use hash of array to create the hash.
Change your hash creation code in
$hash{$2} = $3;
to
push(#{$hash{$2}}, $3);
Now change your code in the while loop
while(my $line = <INFILE>){
chomp $line;
open my $out, '>', "Clustered_Barcode_$..txt" or die $!;
my %id_list;
foreach my $sequence (split /\t/, $line){
$id_list{$sequence}=1;
}
foreach my $sequence(keys %id_list)
{
foreach my $val (#{$hash{$sequence}})
{
print $out ">$sequence\n$val\n";
}
}
}
I have assummed that;
The first digit in the output file name is the input file line number
The second digit in the output file name is the input file column number
That the input hash is a hash of arrays to cover the case of several sequences "matching" the one barcode as mentioned in the comments
When a barcode has a match in the hash, that the output file will lists all the sequences in the array, one per line.
The simplest way to do this that I can see is to build the output file using a temporary filename and the rename it when you have all the data. According to the perl cookbook, the easiest way to create temporary files is with the module File::Temp.
The key to this solution is to move through the list of barcodes that appear on a line by column index rather than the usual perl way of simply iterating over the list itself. To get the actual barcodes, the column number $col is used to index back into #barcodes which is created by splitting the line on whitespace. (Note that splitting on a single space is special cased by perl to emulate the behaviour of one of its predecessors, awk (leading whitespace is removed and the split is on whitespace, not a single space)).
This way we have the column number (indexed from 1) and the line number we can get from the perl special variable, $. We can then use these to rename the file using the builtin, rename().
use warnings;
use strict;
use diagnostics;
use File::Temp qw(tempfile);
open(INFILE, "<", "Clustered_Barcodes.txt") or die $!;
my %hash = (
"TTTATGC" => [ "TATAGCGCTTTATGCTAGCTAGC" ],
"TTTATGG" => [ "TAGCTAGCTTTATGGGCTAGCTA" ],
"TTTATCC" => [ "GCTAGCTATTTATCCGCTAGCTA" ],
"TTTATCG" => [ "AGTCATGCTTTATCGCGATCGAT" ],
"TTTATAA" => [ "TAGCTAGCTTTATAATAGCTAGC", "ATCGATCGTTTATAACGATCGAT" ],
"TTTATAT" => [ "TCGATCGATTTATATTAGCTAGC", "TAGCTAGCTTTATATGCTAGCTA" ],
"TTTATTA" => [ "GCTAGCTATTTATTATAGCTAGC" ],
"CTTGTAA" => [ "ATCGATCGCTTGTAACGATTAGC" ]
);
my $cbn = "Clustered_Barcode_Number";
my $trailer = "Sequence.txt";
while (my $line = <INFILE>) {
chomp $line ;
my $line_num = $. ;
my #barcodes = split " ", $line ;
for my $col ( 1 .. #barcodes ) {
my $barcode = $barcodes[ $col - 1 ]; # arrays indexed from 0
# skip this one if its not in the hash
next unless exists $hash{$barcode} ;
my #sequences = #{ $hash{$barcode} } ;
# Have a hit - create temp file and output sequences
my ($out, $temp_filename) = tempfile();
say $out ">$barcode" ;
say $out $_ for (#sequences) ;
close $out ;
# Rename based on input line and column
my $new_name = join "_", $cbn, $line_num, $col, $trailer ;
rename ($temp_filename, $new_name) or
warn "Couldn't rename $temp_filename to $new_name: $!\n" ;
}
}
close INFILE
All of the barcodes in your sample input data have a match in the hash, so when I run this, I get 4 files for line 1, 5 for line 2 and 1 for line 3.
Clustered_Barcode_Number_1_1_Sequence.txt
Clustered_Barcode_Number_1_2_Sequence.txt
Clustered_Barcode_Number_1_3_Sequence.txt
Clustered_Barcode_Number_1_4_Sequence.txt
Clustered_Barcode_Number_2_1_Sequence.txt
Clustered_Barcode_Number_2_2_Sequence.txt
Clustered_Barcode_Number_2_3_Sequence.txt
Clustered_Barcode_Number_2_4_Sequence.txt
Clustered_Barcode_Number_2_5_Sequence.txt
Clustered_Barcode_Number_3_1_Sequence.txt
Clustered_Barcode_Number_1_2_Sequence.txt for example has:
>TTTATGG
TAGCTAGCTTTATGGGCTAGCTA
and Clustered_Barcode_Number_2_5_Sequence.txt has:
>TTTATTA
GCTAGCTATTTATTATAGCTAGC
Clustered_Barcode_Number_2_3_Sequence.txt - which matched a hash key with two sequences - had the following;
>TTTATAT
TCGATCGATTTATATTAGCTAGC
TAGCTAGCTTTATATGCTAGCTA
I was speculating here about what you wanted when a supplied barcode had two matches. Hope that helps.

Two csv files: Change one csv by the other and pull out that line

I have two CSV files. The first is a list file, it contains the ID and names. For example
1127100,Acanthocolla cruciata
1127103,Acanthocyrta haeckeli
1127108,Acanthometra fusca
The second is what I want to exchange and extract the line by the first number if a match is found. The first column of numbers correspond in each file. For example
1127108,1,0.60042
1127103,1,0.819671
1127100,2,0.50421,0.527007
10207,3,0.530422,0.624466
So I want to end up with CSV file like this
Acanthometra fusca,1,0.60042
Acanthocyrta haeckeli,1,0.819671
Acanthocolla cruciata,2,0.50421,0.527007
I tried Perl but opening two files at once proved messy. So I tried converting one of the CSV files to a string and parse it that way, but didnt work. But then I was reading about grep and other one-liners but I am not familiar with it. Would this be possible with grep?
This is the Perl code I tried
use strict;
use warnings;
open my $csv_score, '<', "$ARGV[0]" or die qq{Failed to open "$ARGV[0]" for input: $!\n};
open my $csv_list, '<', "$ARGV[1]" or die qq{Failed to open "$ARGV[1]" for input: $!\n};
open my $out, ">$ARGV[0]_final.txt" or die qq{Failed to open for output: $!\n};
my $string = <$csv_score>;
while ( <$csv_list> ) {
my ($find, $replace) = split /,/;
$string =~ s/$find/$replace/g;
if ($string =~ m/^$replace/){
print $out $string;
}
}
close $csv_score;
close $csv_list;
close $out;
The general purpose text processing tool that comes with all UNIX installations is named awk:
$ awk -F, -v OFS=, 'NR==FNR{m[$1]=$2;next} $1=m[$1]' file1 file2
Acanthometra fusca,1,0.60042
Acanthocyrta haeckeli,1,0.819671
Acanthocolla cruciata,2,0.50421,0.527007
Your code was failing because you only read the first line from the $csv_score file, and you tried to print $string every time it is changed. You also failed to remove the newline from the end of the lines from your $csv_list file. If you fix those things then it looks like this
use strict;
use warnings;
open my $csv_score, '<', "$ARGV[0]" or die qq{Failed to open "$ARGV[0]" for input: $!\n};
open my $csv_list, '<', "$ARGV[1]" or die qq{Failed to open "$ARGV[1]" for input: $!\n};
open my $out, ">$ARGV[0]_final.txt" or die qq{Failed to open for output: $!\n};
my $string = do {
local $/;
<$csv_score>;
};
while ( <$csv_list> ) {
chomp;
my ( $find, $replace ) = split /,/;
$string =~ s/$find/$replace/g;
}
print $out $string;
close $csv_score;
close $csv_list;
close $out;
output
Acanthometra fusca,1,0.60042
Acanthocyrta haeckeli,1,0.819671
Acanthocolla cruciata,2,0.50421,0.527007
10207,3,0.530422,0.624466
However that's not a safe way of doing things, as IDs may be found elsewhere than at the start of a line
I would build a hash out of the $csv_list file like this, which also makes the program more concise
use strict;
use warnings;
use v5.10.1;
use autodie;
my %ids;
{
open my $fh, '<', $ARGV[1];
while ( <$fh> ) {
chomp;
my ($id, $name) = split /,/;
$ids{$id} = $name;
}
}
open my $in_fh, '<', $ARGV[0];
open my $out_fh, '>', "$ARGV[0]_final.txt";
while ( <$in_fh> ) {
s{^(\d+)}{$ids{$1} // $1}e;
print $out_fh $_;
}
The output is identical to that of the first program above
The problem with the code as written is that you only do this once:
my $string = <$csv_score>;
This reads one line from $csv_score and you don't ever use the rest.
I would suggest that you need to:
Read the first file into a hash
Iterate the second file, and do a replace on the first column.
using Text::CSV is generally a good idea for processing it, but it doesn't seem to be necessary for your example.
So:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my $csv = Text::CSV->new( { binary => 1 } );
my %replace;
while ( my $row = $csv->getline( \*DATA ) ) {
last if $row->[0] =~ m/NEXT/;
$replace{ $row->[0] } = $row->[1];
}
print Dumper \%replace;
my $search = join( "|", map {quotemeta} keys %replace );
$search =~ qr/($search)/;
while ( my $row = $csv->getline( \*DATA ) ) {
$row->[0] =~ s/^($search)$/$replace{$1}/;
$csv->print( \*STDOUT, $row );
print "\n";
}
__DATA__
1127100,Acanthocolla cruciata
1127103,Acanthocyrta haeckeli
1127108,Acanthometra fusca
NEXT
1127108,1,0.60042
1127103,1,0.819671
1127100,2,0.50421,0.527007
10207,3,0.530422,0.624466
Note - this still prints that last line of your source content:
"Acanthometra fusca ",1,"0.60042 "
"Acanthocyrta haeckeli ",1,"0.819671 "
"Acanthocolla cruciata ",2,0.50421,"0.527007 "
(Your data contained whitespace, so Text::CSV wraps it in quotes)
If you want to discard that, then you could test if the replace actually occurred:
if ( $row->[0] =~ s/^($search)$/$replace{$1}/ ) {
$csv->print( \*STDOUT, $row );
print "\n";
}
(And you can of course, keep on using split /,/ if you're sure you won't have any of the whacky things that CSV supports normally).
I would like to provide a very different approach.
Let's say you are way more comfortable with databases than with Perl's data structures. You can use DBD::CSV to turn your CSV files into kind of relational databases. It uses Text::CSV under the hood (hat tip to #Sobrique). You will need to install it from CPAN as it's not bundled in the default DBI distribution though.
use strict;
use warnings;
use Data::Printer; # for p
use DBI;
my $dbh = DBI->connect( "dbi:CSV:", undef, undef, { f_ext => '.csv' } );
$dbh->{csv_tables}->{names} = { col_names => [qw/id name/] };
$dbh->{csv_tables}->{numbers} = { col_names => [qw/id int float/] };
my $sth_select = $dbh->prepare(<<'SQL');
SELECT names.name, numbers.int, numbers.float
FROM names
JOIN numbers ON names.id = numbers.id
SQL
# column types will be silently discarded
$dbh->do('CREATE TABLE result ( name CHAR(255), int INTEGER, float INTEGER )');
my $sth_insert =
$dbh->prepare('INSERT INTO result ( name, int, float ) VALUES ( ?, ?, ? ) ');
$sth_select->execute;
while (my #res = $sth_select->fetchrow_array ) {
p #res;
$sth_insert->execute(#res);
}
What this does is set up column names for the two tables (your CSV files) as those do not have a first row with names. I made the names up based on the data types. It will then create a new table (CSV file) named result and fill it by writing one row at a time.
At the same time it will output data (for debugging purposes) to STDERR through Data::Printer.
[
[0] "Acanthocolla cruciata",
[1] 2,
[2] 0.50421
]
[
[0] "Acanthocyrta haeckeli",
[1] 1,
[2] 0.819671
]
[
[0] "Acanthometra fusca",
[1] 1,
[2] 0.60042
]
The resulting file looks like this:
$ cat scratch/result.csv
name,int,float
"Acanthocolla cruciata",2,0.50421
"Acanthocyrta haeckeli",1,0.819671
"Acanthometra fusca",1,0.60042

How to print all values of an array in Perl

I am trying to do print all of the values of an array from a CSV file. I am sort of manually doing this in the example below. Can someone show me the code for doing this for all of the fields of the array no matter how many fields there are? I'm basically just trying to print each field on a new line.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
my $file = 'test.csv';
my $csv = Text::CSV_XS->new ({
quote_char => '"',
escape_char => '#',
binary => 1,
keep_meta_info => 0,
allow_loose_quotes => 1,
allow_whitespace => 1,
});
open (CSV, "<", $file) or die $!;
while (<CSV>) {
if ($csv->parse($_)) {
my #columns = $csv->fields();
print "$columns[0]\r\n";
print "$columns[1]\r\n";
print "$columns[2]\r\n";
print "$columns[3]\r\n";
print "$columns[4]\r\n";
print "$columns[5]\r\n";
print "$columns[6]\r\n";
print "$columns[7]\r\n";
}
else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close CSV;
foreach(#columns)
{
print "$_\r\n";
}
Instead of all the columns[number].
For debugging purposes, Data::Dump is my weapon of choice. It basically pretty-prints data structures.
use strict;
use warnings;
use Data::Dump 'dump';
# Do some stuff....
dump #array; # Unlike Data::Dumper, there's no need to backslash ('\#array')
dump %hash; # Same goes for hashes
dump $arrayref;
dump $hashref; # References handled just as well
There are many other ways to print arrays, of course:
say foreach #columns; # If you have Perl 5.10+
print $_,"\n" foreach #columns; # If you don't
print "#columns"; # Prints all elements, space-separated by default
The 'best' answer depends on the situation. Why do you need it? What are you working with? And what do you want it for? Then season the code accordingly.
If you just want to print the elements separated by spaces:
print #columns;
If you want to be a bit more fancy, you can use join:
print join("\n", #columns);
If you need to do something more, iterate over it:
foreach (#columns) {
# do stuff with $_
}
If you're doing this for diagnostic purposes (as opposed to presentation) you might consider Data::Dumper. In any case it's a good tool to know about if you want a quick printout of more-or-less arbitrary data.
{ $"="\n"; print $fh "#files"; }