How do I set up the data structure to make pie charts in GD::Graph? - perl

I am writting a Perl script to create pie graph using GD::Graph::pie with these arrays:
#Array1 = ("A", "B", "C", "D");
$array2 = [
['upto 100 values'],
['upto 100 values'],
['upto 100 values'],
['upto 100 values']
];
As per my understanding to get this done, I have to create an array with the references of above arrays, like:
my #graph_data = (\#Array1, #$array2);
I have also tried to use foreach loop but not getting good results. I want to create pie graph with first value in #Array1 against first value in $array2 and second value in #Array1 against second value in $array2 and so on. Also I want put the same title for each graph as per values in #Array1.
eg.
my #graph_data1 = (\#Array1[0], #$array2[0]);
Can anyone please suggest me the better way to do this?

Before getting into pie charts and stuff like that, I suggest you get yourself updated on basic Perl data structures and references. Please read perlreftut, youl should be able to solve this problem yourself afterwards.

I'm not sure I understand what you are trying to do, but this example will produce 3 pie charts, all of them using the same set of categories. I would second Manni's advice: spend some time with perlreftut and perldsc. Also, if you download the GD::Graph module, it provides many examples, including pie charts (see the samples subdirectory).
use strict;
use warnings;
use GD::Graph::pie;
my #categories = qw(foo bar fubb buzz);
my #data = (
[ 25, 32, 10, 44 ], # Data values for chart #1
[ 123, 221, 110, 142 ], # Data values for chart #2
[ 225, 252, 217, 264 ], # etc.
);
for my $i (0 .. $#data){
my $chart = GD::Graph::pie->new;
my #pie_data = ( \#categories, $data[$i] );
$chart->plot(\#pie_data);
open(my $fh, '>', "pie_chart_$i.gif") or die $!;
binmode $fh;
print $fh $chart->gd->gif;
close $fh;
}

To state in plainer English what the other answers say less directly:
my #graph_data = (\Array1, $#array2);
my #graph_data1 = (\Array1[0], $#array2[0]);
looks mad. You almost certainly mean:
my #graph_data = (\#Array1, $array2);
# you want the first element of each list in the same datastructure?
my #graph_data1 = ([$Array1[0]], [$array2->[0]]); # (['A'], [[..numbers..]])
# Note *two* [ and ] in 2nd bit
# ... or you want a different datastructure?
my #graph_data1 = ($Array1[0], $array2->[0]); # ('A', [..numbers..])
#Array1 is an array, you want a reference to it, and that would be \#Array1.
$array2 is a reference to an array already. It contains references to arrays, and I assume you want a list containing the reference to the array at index 0. Thus: $array2->[0] is the first indexed element via an array reference, and it's already an array reference.

I found the solution of this problem using below code.
my #pairs = map{"$Array1[$_]#$array2[$_],"} 0..$#Array1;
After this the values from array #pairs can be used to create graphs.

Related

How to access a customize hash structure in Perl?

I have a custom perl hash data structure . The sturcture is like bellow:
%myhash = (
1 => {
'scf1' => [
1,3,0,4,6,7,8,
],
'sef2' => [
10,15,20,30,
]
},
2 => {
'scf1' => [
10,3,0,41,6,47,81,
],
'scf3' => [
1,66,0,123,4,1,2435,33445,1
]
},
);
How I can access this kind of perl structure.
I'm afraid your code ... is showing signs that you are misunderstanding what hashes do, and how they work. Specifically, when you're referencing #{$myhash} - this is NOT the same as the %myhash that you undef.
Likewise - what's going on with #features? It looks like you're trying to build an array of arrays, but doing so by iterating through fetchrow_array and then pushing. Multidimensional arrays are sometimes the right tool for the job, but it is unclear why it would be suitable for what you're doing. (After all, you don't use it for anything else in this piece of code).
You've also got $line[2] - which is also not doing what you might think - it does NOT refer to $line, it's the second element of a list called #line - which doesn't exist.
You are also trying to process is list of database entries, and set it '-1' if it's undef.
We need some more detail about what data you're getting out of your database - $sth -> fetchrow_array() could be anything. However, I'd strongly suggest that what you want to do is name each of the fields as you go. I'd suggest you DON'T want to be using $line there, because it's ... well, wrong. You're iterating columns in the row you've just fetched.
Which field in your fetched array are the keys to your hash? It looks like you're trying to key on 'field 5' 'field 7' and trying to insert values of 'field 1' and 'field 2'. Is that correct?
Oh, and turn on use strict; use warnings whilst you're at it.
get the inner array:
my #array = #{$hash{1}->{'scf1'}};
# is same as
# my $array_ref = $hash{1}->{'scf1'};
# my #array = #{$array_ref};
# then you can
my $some_thing = $array[0];
or get one element:
$hash{1}->{'scf1'}->[0];
From your Data::Dumper dump, I see that you have a hash called %myhash. Each element in that hash contains a reference to another hash. And, each element in that inner hash contains a reference to an array.
Let's take your Data::Dumper, and restate it like this:
$myhash{1}->{sff1} = [1, 3, 0, 4, 6, 7, 8];
$myhash{1}->{sef2} = [10, 15, 20, 30];
$myhash{2}->{scf1} = [10, 3, 0, 41, 6, 47, 81];
$myhash{2}->{scf3} = [1, 66, 0, 123, 4, 2435, 33445, 1];
Same thing. It's just a bit more compact.
To print this out, we'll need to loop through each of these layers of references:
#
# First loop: The outer hash which is a plain normal hash
#
for my $outer_key ( sort keys %myhash ) {
#
# Each element in that hash points to another hash reference. Dereference
#
my %inner_hash = %{ $myhash{$outer_key} };
for my $inner_key ( sort keys %inner_hash ) {
#
# Finally, this is our array reference in the inner hash. Let's dereference and print
#
print "\$myhash{$outer_key}->{$inner_key}: ";
my #array = #{ $myhash{$outer_key}->{$inner_key} };
for my $value ( #array ) {
print "$value";
}
print "\n";
}
}

Regarding getting the size of array in a hash of array structure

I wrote the following Perl function
sub Outputing
{
my $featureMatrix = shift;
my $indexRow = shift;
my $fileName = "/projectworkspace/input.dat";
open(DATA, "> $fileName");
printf DATA "%d", $#$indexRow;
print DATA "\n";
my $numDataPoints = $#{$featureMatrix{$indexRow->[1]}};
printf DATA "%d", $numDataPoints;
print DATA "\n";
close DATA;
}
I calling Outputing as follows:
Outputing($matrix, $Rows);e
$matrix is a hash of array, whose structure is like this
my $matrix
= { 200 => [ 0.023, 0.035, 0.026 ],
110 => [ 0.012, 0.020, 0,033],
};
Rows is an array storing the sorted key of matrix, it is obtained as follows
my #Rows = sort keys %matrix;
both matrix and Rows are used as parameters passed to Outputing.
The printed out $numDataPoints is -1, which is not correct? I do not know which might be the reason that causes this problem? If we use the above example, and assume $indexRow->[1]=110, then $numDataPoints should be 2. I am not sure whether the $#{$featureMatrix{$indexRow->[1]}}; is the correct way to get the size of this array.
Assuming that you've included all the relevant code, this:
my #indexRow = sort keys %featureMatrix;
should be this:
my #indexRow = sort keys %$featureMatrix;
and this:
my $numDataPoints = $#{$featureMatrix{$indexRow->[1]}};
should be this:
my $numDataPoints = $#{$featureMatrix->{$indexRow->[1]}};
That is, the problem is that in some places, you're using a hash named %featureMatrix, and in others, you're using a hashref named $featureMatrix that refers to an anonymous hash.
You should be using use warnings and use strict to prevent such mistakes: those would have prevented you from using %featureMatrix when you've only declared $featureMatrix. (Actually, use warnings might not help in this case — it could detect if you used %featureMatrix exactly once, but in your case, you use it a few times — but use strict would almost certainly have helped.)

How to write a multidiminsional array as tab delimited .txt file in perl

I have a multidimensional array called #main
and I want to write this array into a tab delimited .txt file in perl
Can anyone help me in this issue?
open my $fh, '>', "out.txt" or die $!;
print $fh (join("\t", #$_), "\n") for #array;
I'm guessing that your multi-dimensional array is actually an array of references to arrays, since that's the only way that Perl will let you embed an array-in-an-array.
So for example:
#array1 = ('20020701', 'Sending Mail in Perl', 'Philip Yuson');
#array2 = ('20020601', 'Manipulating Dates in Perl', 'Philip Yuson');
#array3 = ('20020501', 'GUI Application for CVS', 'Philip Yuson');
#main = (\#array1, \#array2, \#array3);
To print them to a file:
open(my $out, '>', 'somefile.txt') || die("Unable to open somefile.txt: $!");
foreach my $row (#main) {
print $out join(",", #{$row}) . "\n";
}
close($out);
That is not a multidimensional array, it is an array that was formed by concatenating three other arrays.
perl -e '#f=(1,2,3); #g=(4,5,6); #h=(#f,#g); print join("\t",#h)."\n";'
Please provide desired output if you want further help.
Two dimensions I hope:
foreach my $row (#array) {
print join ("\t", #{$row}) . "\n";
}
Perl doesn't have multidimensional arrays. Instead, one of its three native datatypes ia a one dimensional array called a List. If you need a more complex structure in Perl, you can use references to other data structures in your List. For example, each item in your List is a reference to another List. The primary list can represent the rows, and the secondary list are the column values in that row.
In the above foreach loop is looping through the primary list (the one that represents each row), and $row is equal to the reference to the list that represents the column values.
In order to get a Perl list and not a reference to the list, I dereference the reference to the list. I do that by prefixing it with an # sign. I like using #{$row} because I think it's a little cleaner than just #$row.
Now that I can refer to my list of column values as #{$row}, I can use a join to create a string that separates each of the values in #{$row} with a tab character and print it out.
If "multidimensional" in your question means n > 2, the tab delimited format might be infeasible.
Is this a case where you want to solve a more general problem: to serialize a data structure?
Look for instance, at the YAML Module (install YAML::XS). There is a DumpFile(filepath, list) and a LoadFile(filepath) method. The output will not be a tab-delimited file, but still be human readable.
You could also use a JSON serializer instead, e.g. JSON::XS.

How can I use PDL rcols in a subroutine with pass-by-reference?

Specifically, I want to use rcols with the PERLCOLS option.
Here's what I want to do:
my #array;
getColumn(\#array, $file, 4); # get the fourth column from file
I can do it if I use \#array, but for backward compatibility I'd prefer not to do this. Here's how I'd do it using an array-ref-ref:
sub getColumn {
my ($arefref, $file, $colNum) = #_;
my #read = rcols $file, { PERLCOLS => [$colNum] };
$$arefref = $read[-1];
return;
}
But, I don't see how to make a subroutine that takes an array ref as an argument without saying something like #$aref = #{$read[-1]}, which, afaict, copies each element individually.
PS: reading the PDL::IO::Misc documentation, it seems like the perl array ought to be $read[0] but it's not.
PERLCOLS
- an array of column numbers which are to be read into perl arrays
rather than piddles. Any columns not specified in the explicit list
of columns to read will be returned after the explicit columns.
(default B).
I am using PDL v2.4.4_05 with Perl v5.10.0 built for x86_64-linux-thread-multi
I don't understand why this wouldn't work:
my $arr_ref;
getColumn( $arr_ref, $file, 4 );
sub getColumn {
my ( $arr_ref, $file, $colNum ) = #_;
my #read = rcols, $file, { PERLCOLS => [ $colNum ] };
# At this point, #read is a list of PDLs and array references.
$arr_ref = $read[-1];
}
Looking at the rcols() documentation, it looks like if you add the PERLCOLS option it returns whatever column you request as an array reference, so you should be able to just assign it to the array reference you passed in.
And as for the documentation question, what I understand from that is you haven't specified any explicit columns, therefore rcols() will return all of the columns in the file as PDLs first, and then return the columns you requested as Perl arrayrefs, which is why your arrayref is coming out in $read[-1].
I believe part of the difficulty with using rcols here is that the user is running PDL-2.4.4 while the rcols docs version was from PDL-2.4.7 which may have version skew in functionality. With the current PDL-2.4.10 release, it is easy to use rcols to read in a single column of data as a perl array which is returned via an arrayref:
pdl> # cat data
1 2 3 4
1 2 3 4
1 2 3 4
pdl> $col = rcols 'data', 2, { perlcols=>[2] }
ARRAY(0x2916e60)
pdl> #{$col}
3 3 3
Notice that in the current release, the perlcols option allows one to specify the output type of a column rather than just adding a perl-style column at the end.
Use pdldoc rcols or do help rcols in the PDL shell to see the more documentation.
A good resource is the perldl mailing list.

Reading a large file into Perl array of arrays and manipulating the output for different purposes

I am relatively new to Perl and have only used it for converting small files into different formats and feeding data between programs.
Now, I need to step it up a little. I have a file of DNA data that is 5,905 lines long, with 32 fields per line. The fields are not delimited by anything and vary in length within the line, but each field is the same size on all 5905 lines.
I need each line fed into a separate array from the file, and each field within the line stored as its own variable. I am having no problems storing one line, but I am having difficulties storing each line successively through the entire file.
This is how I separate the first line of the full array into individual variables:
my $SampleID = substr("#HorseArray", 0, 7);
my $PopulationID = substr("#HorseArray", 9, 4);
my $Allele1A = substr("#HorseArray", 14, 3);
my $Allele1B = substr("#HorseArray", 17, 3);
my $Allele2A = substr("#HorseArray", 21, 3);
my $Allele2B = substr("#HorseArray", 24, 3);
...etc.
My issues are: 1) I need to store each of the 5905 lines as a separate array. 2) I need to be able to reference each line based on the sample ID, or a group of lines based on population ID and sort them.
I can sort and manipulate the data fine once it is defined in variables, I am just having trouble constructing a multidimensional array with each of these fields so I can reference each line at will. Any help or direction is much appreciated. I've poured over the Q&A sections on here, but have not found the answer to my questions yet.
Do not store each line in it's own array. You need to construct a data structure. Start by reading the following tutorials form perldoc:
perlreftut
perldsc
perllol
Here's some starter code:
use strict;
use warnings;
# Array of data samples. We could use a hash as well; which is better
# depends on how you want to use the data.
my #sample;
while (my $line = <DATA>) {
chomp $line;
# Parse the input line
my ($sample_id, $population_id, $rest) = split(/\s+/, $line, 3);
# extract A/B allele pairs
my #pairs;
while ($rest =~ /(\d{1,3})(\d{3})|(\d{1,3}) (\d{1,2})/g) {
push #pairs, {
A => defined $1 ? $1 : $3,
B => defined $2 ? $2 : $4,
};
}
# Add this sample to the list of samples. Store it as a hashref so
# we can access attributes by name
push #sample, {
sample => $sample_id,
population => $population_id,
alleles => \#pairs,
};
}
# Print out all the values of alleles 2A and 2B for the samples in
# population py18. Note that array indexing starts at 0, so allele 2
# is at index 1.
foreach my $sample (grep { $_->{population} eq 'py18' } #sample) {
printf("%s: %d / %d\n",
$sample->{sample},
$sample->{alleles}[1]{A},
$sample->{alleles}[1]{B},
);
}
__DATA__
00292-97 py17 97101 129129 152164 177177 100100 134136 163165 240246 105109 124124 166166 292292 000000 000000 000000
00293-97 py18 89 97 129139 148154 179179 84 90 132134 167169 222222 105105 126128 164170 284292 000000 000000 000000
00294-97 py17 91 97 129133 152154 177183 100100 134140 161163 240240 103105 120128 164166 290292 000000 000000 000000
00295-97 py18 97 97 131133 148162 177179 84100 132134 161167 240252 111111 124128 164166 284290 000000 000000 000000
I'd start by looping through the lines and parsing each into a hash of fields, and I'd build a hash for each index along the way.
my %by_sample_id; # this will be a hash of hashes
my %by_population_id; # a hash of lists of hashes
foreach (<FILEHANDLE>) {
chomp; # remove newline
my %h; # new hash
$h{SampleID} = substr($_, 0, 7);
$h{PopulationID} = substr($_, 9, 4);
# etc...
$by_sample_id{ $h{SampleID} } = \%h; # a reference to %h
push #{$by_population_id{ $h{PopulationID} }}, \%h; # pushes hashref onto list
}
Then, you can use either index to access the data in which you're interested:
say "Allele1A for sample 123123: ", $by_sample_id{123123}->{Allele1A};
say "all the Allele1A values for population 432432: ",
join(", ", map {$_->{Allele1A}} #{$by_population_id{432432}});
I'm going to assume this isn't a one-off program, so my approach would be slightly different.
I've done a fair amount of data-mashing, and after a while, I get tired of writing queries against data structures.
So -
I would feed the data into a SQLite database(or other sql DB), and then write Perl queries off of that, using Perl DBI. This cranks up the complexity to well past a simple 'parse-and-hack', but after you've written several scripts doing queries on the same data, it becomes obvious that this is a pain, there must be a better way.
You would have a schema that looks similar to this
create table brians_awesome_data (id integer, population_id varchar(32), chunk1 integer, chunk2 integer...);
Then, after you used some of mobrule and Michael's excellent parsing, you'd loop and do some INSERT INTO your awesome_data table.
Then, you could use a CLI for your SQL program and do "select ... where ..." queries to quickly get the data you need.
Or, if it's more analytical/pipeliney, you could Perl up a script with DBI and get the data into your analysis routines.
Trust me, this is the better way to do it than writing queries against data structures over and over.