How to write a multidiminsional array as tab delimited .txt file in perl - perl

I have a multidimensional array called #main
and I want to write this array into a tab delimited .txt file in perl
Can anyone help me in this issue?

open my $fh, '>', "out.txt" or die $!;
print $fh (join("\t", #$_), "\n") for #array;

I'm guessing that your multi-dimensional array is actually an array of references to arrays, since that's the only way that Perl will let you embed an array-in-an-array.
So for example:
#array1 = ('20020701', 'Sending Mail in Perl', 'Philip Yuson');
#array2 = ('20020601', 'Manipulating Dates in Perl', 'Philip Yuson');
#array3 = ('20020501', 'GUI Application for CVS', 'Philip Yuson');
#main = (\#array1, \#array2, \#array3);
To print them to a file:
open(my $out, '>', 'somefile.txt') || die("Unable to open somefile.txt: $!");
foreach my $row (#main) {
print $out join(",", #{$row}) . "\n";
}
close($out);

That is not a multidimensional array, it is an array that was formed by concatenating three other arrays.
perl -e '#f=(1,2,3); #g=(4,5,6); #h=(#f,#g); print join("\t",#h)."\n";'
Please provide desired output if you want further help.

Two dimensions I hope:
foreach my $row (#array) {
print join ("\t", #{$row}) . "\n";
}
Perl doesn't have multidimensional arrays. Instead, one of its three native datatypes ia a one dimensional array called a List. If you need a more complex structure in Perl, you can use references to other data structures in your List. For example, each item in your List is a reference to another List. The primary list can represent the rows, and the secondary list are the column values in that row.
In the above foreach loop is looping through the primary list (the one that represents each row), and $row is equal to the reference to the list that represents the column values.
In order to get a Perl list and not a reference to the list, I dereference the reference to the list. I do that by prefixing it with an # sign. I like using #{$row} because I think it's a little cleaner than just #$row.
Now that I can refer to my list of column values as #{$row}, I can use a join to create a string that separates each of the values in #{$row} with a tab character and print it out.

If "multidimensional" in your question means n > 2, the tab delimited format might be infeasible.
Is this a case where you want to solve a more general problem: to serialize a data structure?
Look for instance, at the YAML Module (install YAML::XS). There is a DumpFile(filepath, list) and a LoadFile(filepath) method. The output will not be a tab-delimited file, but still be human readable.
You could also use a JSON serializer instead, e.g. JSON::XS.

Related

i want to merge multiple csv files by specific condition using perl

i have multiple csv files, i want to merge all those files.....
i am showing some of my sample csv files below...
M1DL1_Interpro_sum.csv
IPR017690,Outer membrane, omp85 target,821
IPR014729,Rossmann,327
IPR013785,Aldolase,304
IPR015421,Pyridoxal,224
IPR003594,ATPase,179
IPR000531,TonB receptor,150
IPR018248,EF-hand,10
M1DL2_Interpro_sum.csv
IPR017690,Outer membrane, omp85 target,728
IPR013785,Aldolase,300
IPR014729,Rossmann,261
IPR015421,Pyridoxal,189
IPR011991,Winged,113
IPR000873,AMP-dependent synthetase/ligase,111
M1DL3_Interpro_sum.csv
IPR017690,Outer membrane,905
IPR013785,Aldolase,367
IPR014729,Rossmann,338
IPR015421,Pyridoxal,271
IPR003594,ATPase,158
IPR018248,EF-hand,3
now to merge these files i have tried the following code
#ARGV = <merge_csvfiles/*.csv>;
print #ARGV[0],"\n";
open(PAGE,">outfile.csv") || die"Can't open outfile.csv\n";
while($i<scalar(#ARGV))
{
open(FILE,#ARGV[$i]) || die"Can't open ...#ARGV[$i]...\n";
$data.=join("",<FILE>);
close FILE;
print"file completed...",$i+1,"\n";
$i++;
}
#data=split("\n",$data);
#data2=#data;
print scalar(#data);
for($i=0;$i<scalar(#data);$i++)
{
#id1=split(",",#data[$i]);
$id_1=#id1[0];
#data[$j]=~s/\n//;
if(#data[$i] ne "")
{
print PAGE "\n#data[$i],";
for($j=$i+1;$j<scalar(#data2);$j++)
{
#id2=split(",",#data2[$j]);
$id_2=#id2[0];
if($id_1 eq $id_2)
{
#data[$j]=~s/\n//;
print PAGE "#data2[$j],";
#data2[$j]="";
#data[$j]="";
print "match found at ",$i+1," and ",$j+1,"\n";
}
}
}
print $i+1,"\n";
}
merge_csvfiles is a folder which contains all the files
output of above code is
IPR017690,Outer membrane,821,IPR017690,Outer membrane ,728,IPR017690,Outer membrane,905
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR003594,ATPase,179,IPR003594,ATPase,158
IPR000531,TonB receptor,150
IPR018248,EF-hand,10,IPR018248,EF-hand,3
IPR011991,Winged,113
IPR000873,AMP-dependent synthetase/ligase
but i want the output in following format....
IPR017690,Outer membrane,821,IPR017690,Outer membrane ,728,IPR017690,Outer membrane,905
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR003594,ATPase,179,0,0,0,IPR003594,ATPase,158
IPR000531,TonB receptor,150,0,0,0,0,0,0
IPR018248,EF-hand,10,0,0,0,IPR018248,EF-hand,3
0,0,0,IPR011991,Winged,113,0,0,0
0,0,0,IPR000873,AMP-dependent synthetase/ligase,111,0,0,0
Has anybody got any idea how can i do this?
Thank you for the help
As mentioned in Miguel Prz's comment, you haven't explained how you want the merge to be performed, but, judging by the "desired output" sample, it appears that what you want is to concatenate lines with matching IDs from all three input files into a single line in the output file, with "0,0,0" taking the place of any lines which don't appear in a given file.
So, then:
#!/usr/bin/env perl
use strict;
use warnings;
my #input_files = glob 'merge_csvfiles/*.csv';
my %data;
for my $i (0 .. $#input_files) {
open my $infh, '<', $input_files[$i]
or die "Failed to open $input_files[$i]: $!";
while (<$infh>) {
chomp;
my $id = (split ',', $_, 2)[0];
$data{$id}[$i] = $_;
}
print "Input file read: $input_files[$i]\n";
}
open my $outfh, '>', 'outfile.csv' or die "Failed to open outfile.csv: $!";
for my $id (sort keys %data) {
my #merge_data;
for my $i (0 .. $#input_files) {
push #merge_data, $data{$id}[$i] || '0,0,0';
}
print $outfh join(',', #merge_data) . "\n";
}
The first loop collects all the lines from each file into a hash of arrays. The hash keys are the IDs, so the lines for that ID from all files are kept together, and the value for each key is (a reference to) an array of the line associated with that ID in each file; using an array for this allows us to keep track of values which are missing as well as those which are present.
The second loop then takes the keys of that hash (in alphabetical order) and, for each one, creates a temporary array of the values associated with that ID, substituting "0,0,0" for missing values, joins them into a single string, and prints that to the output file.
The results, in outfile.csv, are:
IPR000531,TonB receptor,150,0,0,0,0,0,0
0,0,0,IPR000873,AMP-dependent synthetase/ligase,111,0,0,0
IPR003594,ATPase,179,0,0,0,IPR003594,ATPase,158
0,0,0,IPR011991,Winged,113,0,0,0
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR017690,Outer membrane, omp85 target,821,IPR017690,Outer membrane, omp85 target,728,IPR017690,Outer membrane,905
IPR018248,EF-hand,10,0,0,0,IPR018248,EF-hand,3
Edit: Added explanations requested by OP in comments
can u expalain me the working of my $id = (split ',', $_, 2)[0]; and $# in this program
my $id = (split ',', $_, 2)[0]; gets the text prior to the first comma in the last line of text that was read:
Because I didn't specify what variable to put the data in, while (<$infh>) reads it into the default variable $_.
split ',', $_, 2 splits up the value of $_ into a list of comma-separated fields. The 2 at the end tells it to only produce at most 2 fields; the code will work fine without the 2, but, since I only need the first field, splitting into more parts isn't necessary.
Putting (...)[0] around the split command turns the returned list of fields into an (anonymous) array and returns the first element of that array. It's the same as if I'd written my #fields = split ',', $_, 2; my $id = $fields[0];, but shorter and without the extra variable.
$#array returns the highest-numbered index in the array #array, so for my $i (0 .. $#array) just means "loop over the indexes for all elements in #array". (Note that, if I hadn't needed the value of the index counter, I would have instead looped over the array's data directly, by using for my $filename (#input_files), but it would have been less convenient to keep track of the missing values if I'd done it that way.)

new to Perl - CSV - find a string and print all numbers in that column

I've got a bunch of data in a CSV file, first row is all strings (all text and underscores), all subsequent rows are filled with numbers relating to said strings.
I'm trying to parse through the first line and find particular strings, remember which column that string was in, and then go through the rest of the file and get the data in the same column. I need to do this to three strings.
I've been using Text::CSV but I can't figure out how to get it to increment a counter until it finds the string in the first line and then go to the next line, get the data from that same column, etc. etc. Here's what I've tried so far:
while (<CSV>) {
if ($csv->parse($data)) {
my #field = $csv->fields;
my $count = 0;
for $column (#field) {
print ++$count, " => ", $column, "\n";
}
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
Since $data is in line 1, it prints "1 $data" 25 times (# of lines in CSV file). How do I get it to remember which column it found $data in? Also, since I know all of the strings are in line 1, how do I get it to only parse through line 1, find all of the strings in #data, and then parse through the rest of the file, grabbing data from the necessary columns and putting it into a matrix or array of arrays?
Thanks for the help!
edit: I realized my questions were a bit poorly phrased. I don't know how to get the column number from CSV. How is this done?
Also, once I've got the column number, how do I tell it CSV to run through the subsequent lines and grab data from only that column?
Try something like this:
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({binary=>1});
my $thing_to_match = "blah";
my $matched_index;
my #stored_data = ();
while(my $row= $csv->getline(*DATA)) #grabs lines below __DATA__
#(near the end of the script)
{
my #fields = #$row;
#If we haven't found the matched index, yet, search for it.
if(not defined $matched_index)
{
foreach my $i(0..$#fields)
{
$matched_index = $i if($fields[$i] eq $thing_to_match);
}
}
#NOTE: We're pushing a *reference* to an array!
#Look at perldoc perldata
push #stored_data,\#fields;
}
die "Column for '$thing_to_match' not found!" unless defined $matched_index;
foreach my $row(#stored_data)
{
print $row->[$matched_index] . "\n";
}
__DATA__
stuff,more stuff,yet more stuff
"yes, this thing, is one item",blah,blarg
1,2,3
The output is:
more stuff
blah
2
I don't have time to write up a full example, but I wrote a module that might help you do this. Tie::Array::CSV uses some magic to make your csv file act like a Perl array of arrayrefs. In this way you can use your knowledge of Perl to interact with the file.
A word of warning though! One benefit of my module is that it is read/write. Since you only want read, be careful not to assign to it!

variable with multiple lines.delete first two lines in perl

I have a result of an sql query.it returns some 10 rows like below:
if i do the below in my perl script.
print $result
it gives me the output :
key value
----------- ------------------------------
1428116300 0003000
560779655 0003001
173413463 0003002
315642 0003003
1164414857 0003004
429589116 0003005
i just want to acheive that the first two lines to be deleted. and store the rest of each line in an array.
could any body please tell how do i achive this?
With something like :
my #lines = split /\n/, $result;
splice #lines,0,2;
Explanations :
split /\n/, $result is cutting your variable into an array of lines.
grep /^[\s\d]+$/ is filtering this array, and only keeps the elements that are a single line of spaces or digits (thus removing the first two lines)
Data-independent, little roundabout way: If you print $result out in a file, you can
use Tie::File;
tie #lines, Tie::File, $file or die "can't update $file: $!";
delete $lines[1];
delete $lines[2];
(untested)

perl Get indexes of matches in array

I want to search for an element in an array. What I want to get from this search is the all the indices of the array where I find a match.
So, for example the word I want to search is :
$myWord = cat
#allMyWords = my whole file with multiple occurrences of cat in random positions in file
So, if cat occurs at 3rd, 19th and 110th position, I want those indices as a result of it. I was wondering if there is a small and simple method to do this.
Thanks!
With List::MoreUtils:
use List::MoreUtils qw(indexes);
my #indexes = indexes { $_ eq 'cat' } #words;
If you haven't read the file yet, you can read it using "slurp mode":
local $/; # enable slurp mode
my #words = split(/\s+/, <>);
I got the answer. This is the code that will return all the indices in the array where an element we are searching for is found.
my( #index )= grep { $allMyWords[$_] eq $word } 0..$#allMyWords;
print "Index : #index\n";

Read CSV and create different arrays

I am creating a script to read values from csv files and use the values for other taskes. I have written the below code to read values.
sample file:
site,type,2009-01-01,2009-01-02,....
X,A,12,10,...
X,B,10,23,...
Y,A,20,33,...
Y,B,3,12,...
and so on....
Code:
my #value;
while (<INFILE>) {
next if $_ !~ /B/;
my ($v1, $v2, #v3) = split /[,]/, $_;
push(#value, #v3);
}
it gives me all the values of type B. I need help to create different arrays for each type B values.
Reading CSV files is harder than most of us thought at first. It even turns out that reading CSV files is frustratingly hard. Thus my recommendation is to not do this yourself, but to use Text::CSV_XS instead.
From what I comprehend, you want to use a list of lists:
my #value;
while (<INFILE>) {
next if $_ !~ /B/;
chomp;
my ( $v1, $v2, #v3 ) = split /[,]/, $_;
push #value, [#v3]; # This creates a list of lists
}
use Data::Dumper::Simple;
print Dumper #value;
Please have a look at this link. Hope this helps you.