Perl - Spreadsheet::WriteExcel function explanation

Perl - Spreadsheet::WriteExcel function explanation - perl

I'd like to use the function "autofit_columns" as found here:CPAN
Here's my program so far(I skipped the DB connect and query part)
my $workbook = Spreadsheet::WriteExcel->new("TEST.xls");
my $bold = $workbook->add_format();
$bold->set_bold();
my $number = $workbook->add_format();
$number->set_num_format(0x01);
$worksheet = $workbook->add_worksheet('Sheet1');
my #headings = ('Blabla...');
foreach $i (#headings){
$worksheet->write(0, $col++, $i, $bold);
};
$col=0;
$lrow=1;
while (#row = $sth->fetchrow_array()) {
$worksheet->write($lrow,$col,\#row);
$lrow++;
};
$sth->finish;
$dbh->disconnect;
autofit_columns($worksheet);
$workbook->close();
sub autofit_columns {
my $worksheet = shift;
my $col = 0;
for my $width (#{$worksheet->{__col_widths}}) {
$worksheet->set_column($col, $col, $width) if $width;
$col++;
}
}
PROBLEM: My columns are not autofitted in the xls file... Any idea why?
I don't get the peice of code:
for my $width (#{$worksheet->{__col_widths}}) {
$worksheet->set_column($col, $col, $width) if $width;
$col++;
}

You need to look at that example again and implement add_write_handler part too before you write anything to your worksheet.
Please take a look at
$worksheet->add_write_handler(qr[\w], \&store_string_widths);
line and then at store_string_widths subroutine implementation.
Answer is that you need to store absolute width of the string at each write. Then, after you wrote all data to your worksheet, you need to walk through rows and find the biggest string's 'length' for each column - that would be desired column width.
Wish you luck.

You are missing the part of the example code that adds the callback function:
$worksheet->add_write_handler(qr[\w], \&store_string_widths);
You are also missing the store_string_widths() function.
In relation to your second question, the callback stores the maximum string length used for each column. The code snippet is using these lengths to set the column width for each column from the first to the last column that has a length stored. If a column hasn't an autfit width stored then its width isn't adjusted.
This is all a little hacky in Spreadsheet::WriteExcel. It will be more integrated into the module in Excel::Writer::XLSX which is the replacement for WriteExcel.

Related

Parsing CSV files, finding columns and remembering them

I am trying to figure out a way to do this, I know it should be possible. A little background first.
I want to automate the process of creating the NCBI Sequin block for submitting DNA sequences to GenBank. I always end up creating a table that lists the species name, the specimen ID value, the type of sequences, and finally the location of the the collection. It is easy enough for me to export this into a tab-delimited file. Right now I do something like this:
while ($csv) {
foreach ($_) {
if ($_ =! m/table|species|accession/i) {
#csv = split('\t', $csv);
print NEWFILE ">[species=$csv[0]] [molecule=DNA] [moltype=genomic] [country=$csv[2]] [spec-id=$csv[1]]\n";
}
else {
next;
}
}
}
I know that is messy, and I just typed up something similar to what I have by memory (don't have script on any of my computers at home, only at work).
Now that works for me fine right now because I know which columns the information I need (species, location, and ID number) are in.
But is there a way (there must be) for me to find the columns that are for the needed info dynamically? That is, no matter the order of the columns the correct info from the correct column goes to the right place?
The first row will usually as Table X (where X is the number of the table in the publication), the next row will usually have the column headings of interest and are nearly universal in title. Nearly all tables will have standard headings to search for and I can just use | in my pattern matching.

First off, I would be remiss if I didn’t recommend the excellent Text::CSV_XS module; it does a much more reliable job of reading CSV files, and can even handle the column-mapping scheme that Barmar referred to above.
That said, Barmar has the right approach, though it ignores the "Table X" row being a separate row entirely. I recommend taking an explicit approach, perhaps something like this (and this is going to have a bit more detail just to make things clear; I would probably write it more tightly in production code):
# Assumes the file has been opened and that the filehandle is stored in $csv_fh.
# Get header information first.
my $hdr_data = {};
while( <$csv_fh> ) {
if( ! $hdr_data->{'table'} && /Table (\d+)/ ) {
$hdr_data->{'table'} = $1;
next;
}
if( ! $hdr_data->{'species'} && /species/ ) {
my $n = 0;
# Takes the column headers as they come, creating
# a map between the column name and column number.
# Assumes that column names are case-insensitively
# unique.
my %columns = map { lc($_) => $n++ } split( /\t/ );
# Now pick out exactly the columns we want.
foreach my $thingy ( qw{ species accession country } ) {
$hdr_data->{$thingy} = $columns{$thingy};
}
last;
}
}
# Now process the rest of the lines.
while( <$csv_fh> ) {
my $col = split( /\t/ );
printf NEWFILE ">[species=%s] [molecule=DNA] [moltype=genomic] [country=%s] [spec-id=%s]\n",
$col[$hdr_data->{'species'}],
$col[$hdr_data->{'country'}],
$col[$hdr_data->{'accession'}];
}
Some variation of that will get you close to what you need.

Create a hash that maps column headings to column numbers:
my %columns;
...
if (/table|species|accession/i) {
my #headings = split('\t');
my $col = 0;
foreach my $col (#headings) {
$columns{"\L$col"} = $col++;
}
}
Then you can use $csv[$columns{'species'}].

Perl - assigning hashes - why assigned variable is changed?

I have the following problem:
I have a perl program which is extracting csv files, reads them and outputing result.
The information about the csv structure is in XML files, provided in the archives mentioned above.
In the old version of the program i have read those XML files for each line of the CSV file and everithing worked fine:
...;
foreach $b (#gz_files)
{
if ( index($b, 'condition1') >= 0
|| index($b, 'condition2') >= 0
|| index($b, 'condition3') >= 0 )
{
$lt = localtime;
open (my $outputfile, '>>'.'/path_to_output/'.$dir_file.'.tmp')
|| die print $lfh "$lt - /path_to_output/$dir_file\.tmp - $!\n";
if ($b ne "")
{
# this is the procedure, which reads xml_files
%cv_tmp = eventstype::initialize($complex_xml_path, $rating_input_xml_path);
#EXPORT=qw(%cv_tmp);
...;
This code adds the structure from XML files into %cv_tmp variable.
After that foreach row in CSV file I'am assigning the value of %cv_tmp to %complex_vals which is manipulated further.
...
%complex_vals=%mainfile::cv_tmp;
...
But after this manipulation i notice that %cv_tmp has changed - which is strange because this is the right-side of the assignment.
I do not want to change the %cv_tmp on each CSV row.
Sorry for the bad description but I am absolutely novice.
Thank you in advance.

Do you perhaps have something like
my %h1;
$h1{foo}{bar} = 123;
my %h2 = %h1;
$h2{foo}{bar} = 456;
print "$h1{foo}{bar}\n"; # 456
If so, you're not changing %h1 or %h2; you're changing the (anonymous) hash referenced by both $h1{foo} and $h2{foo}. You need to copy the referenced hash (not the reference to the hash) to solve this problem.
use Storable qw( dclone );
my %h1;
$h1{foo}{bar} = 123;
my %h2 = %{ dclone(\%h1) };
$h2{foo}{bar} = 456;
print "$h1{foo}{bar}\n"; # 123

new to Perl - CSV - find a string and print all numbers in that column

I've got a bunch of data in a CSV file, first row is all strings (all text and underscores), all subsequent rows are filled with numbers relating to said strings.
I'm trying to parse through the first line and find particular strings, remember which column that string was in, and then go through the rest of the file and get the data in the same column. I need to do this to three strings.
I've been using Text::CSV but I can't figure out how to get it to increment a counter until it finds the string in the first line and then go to the next line, get the data from that same column, etc. etc. Here's what I've tried so far:
while (<CSV>) {
if ($csv->parse($data)) {
my #field = $csv->fields;
my $count = 0;
for $column (#field) {
print ++$count, " => ", $column, "\n";
}
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
Since $data is in line 1, it prints "1 $data" 25 times (# of lines in CSV file). How do I get it to remember which column it found $data in? Also, since I know all of the strings are in line 1, how do I get it to only parse through line 1, find all of the strings in #data, and then parse through the rest of the file, grabbing data from the necessary columns and putting it into a matrix or array of arrays?
Thanks for the help!
edit: I realized my questions were a bit poorly phrased. I don't know how to get the column number from CSV. How is this done?
Also, once I've got the column number, how do I tell it CSV to run through the subsequent lines and grab data from only that column?

Try something like this:
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({binary=>1});
my $thing_to_match = "blah";
my $matched_index;
my #stored_data = ();
while(my $row= $csv->getline(*DATA)) #grabs lines below __DATA__
#(near the end of the script)
{
my #fields = #$row;
#If we haven't found the matched index, yet, search for it.
if(not defined $matched_index)
{
foreach my $i(0..$#fields)
{
$matched_index = $i if($fields[$i] eq $thing_to_match);
}
}
#NOTE: We're pushing a *reference* to an array!
#Look at perldoc perldata
push #stored_data,\#fields;
}
die "Column for '$thing_to_match' not found!" unless defined $matched_index;
foreach my $row(#stored_data)
{
print $row->[$matched_index] . "\n";
}
__DATA__
stuff,more stuff,yet more stuff
"yes, this thing, is one item",blah,blarg
1,2,3
The output is:
more stuff
blah
2

I don't have time to write up a full example, but I wrote a module that might help you do this. Tie::Array::CSV uses some magic to make your csv file act like a Perl array of arrayrefs. In this way you can use your knowledge of Perl to interact with the file.
A word of warning though! One benefit of my module is that it is read/write. Since you only want read, be careful not to assign to it!

Perl Loop Output to Excel Spreadsheet

I have a Perl script, the relevant bits of which are posted below.
# Pull values from cells
ROW:
for my $row ( $row_min + 1 .. $row_max ) {
my $target_cell = $worksheet->get_cell( $row, $target_col);
my $response_cell = $worksheet->get_cell( $row, $response_col);
if ( defined $target_cell && defined $response_cell ) {
my $target = $target_cell->value();
my $response = $response_cell->value();
# Determine relatedness
my $value = $lesk->getRelatedness($target, $response);
# Copy output to new Excel spreadhseet, 'data.xls'
my $workbook1 = Spreadsheet::WriteExcel->new('data.xls');
my $worksheet1 = $workbook1->add_worksheet();
$worksheet1->set_column(0, 3, 18);
my $row = 0;
foreach ($target) {
$row++;
$worksheet1->write( $row, 0, "Target = $target\n");
$worksheet1->write( $row, 1, "Response = $response\n");
$worksheet1->write( $row, 2, "Relatedness = $value\n");
}
}
}
This script uses the Perl modules ParseExcel and WriteExcel. The input data spreadsheet is a list of words under two columns, one labelled 'Target' and the other labelled 'Response.' The script takes each target word and each response word and computes a value of relatedness between them (that's what the
$lesk->getRelatedness
section of code is doing. It is calling a perl module called WordNet::Similarity that computes a measure of relatedness between words).
All of this works perfectly fine. The problem is I am trying to write the output (the measure of similarity, or $value in this script) into a new Excel file. No matter what I do with the code, the only output it will give me is the relatedness between the LAST target and response words. It ignores all of the rest.
However, this only occurs when I am trying to write to an Excel file. If I use the 'print' function instead, I can see all of the outputs in the command window. I can always just copy and paste this into Excel, but it would be much easier if I could automate this. Any idea what the problem is?

You're resetting the value of $row each time to 0.

Problem is solved. I just needed to move the
my $workbook1 = Spreadsheet::WriteExcel->new('data.xls');
my $worksheet1 = $workbook1->add_worksheet();
lines to another part of the script. Since they were in the 'for' statement, the program kept overwriting the 'data.xls' file every time it ran through the loop.

Why does my Perl for loop exit early?

I am trying to get a perl loop to work that is working from an array that contains 6 elements. I want the loop to pull out two elements from the array, perform certain functions, and then loop back and pull out the next two elements from the array until the array runs out of elements. Problem is that the loop only pulls out the first two elements and then stops. Some help here would be greatly apperaciated.
my open(infile, 'dnadata.txt');
my #data = < infile>;
chomp #data;
#print #data; #Debug
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
my $i=0;
my $j=0;
my #matrix =();
for(my $i=0; $i<2; $i++){
for( my $j=0; $j<$aalen; $j++){
$matrix[$i][$j] = 0;
}
}
The guidelines for this program states that the program should ignore the presence of gaps in the program. which means that DNA code that is matched up with a gap should be ignored. So the code that is pushed through needs to have alignments linked with gaps removed.
I need to modify the length of the array by two since I am comparing two sequence in this part of the loop.
#$lemseqcomp = $lenarray / 2;
#print $lenseqcomp;
#I need to initialize these saclar values.
$junk1 = " ";
$junk2 = " ";
$seq1 = " ";
$seq2 = " ";
This is the loop that is causeing issues. I belive that the first loop should move back to the array and pull out the next element each time it loops but it doesn't.
for($i=0; $i<$lenarray; $i++){
#This code should remove the the last value of the array once and
#then a second time. The sequences should be the same length at this point.
my $last1 =pop(#data1);
my $last2 =pop(#data1);
for($i=0; $i<length($last1); $i++){
my $letter1 = substr($last1, $i, 1);
my $letter2 = substr($last2, $i, 1);
if(($letter1 eq '-')|| ($letter2 eq '-')){
#I need to put the sequences I am getting rid of somewhere. Here is a good place as any.
$junk1 = $letter1 . $junk1;
$junk2 = $letter1 . $junk2;
}
else{
$seq1 = $letter1 . $seq1;
$seq2 = $letter2 . $seq2;
}
}
}
print "$seq1\n";
print "$seq2\n";
print "#data1\n";
I am actually trying to create a substitution matrix from scratch and return the data. The reason why the code looks weird, is because it isn't actually finished yet and I got stuck.
This is the test sequence if anyone is curious.
YFRFR
YF-FR
FRFRFR
ARFRFR
YFYFR-F
YFRFRYF

First off, if you're going to work with sequence data, use BioPerl. Life will be so much easier. However...
Since you know you'll be comparing the lines from your input file as pairs, it makes sense to read them into a datastructure that reflects that. As elsewhere suggested, an array like #data[[line1, line2],[line3,line4]) ensures that the correct pairs of lines are always together.
What I'm not clear on what you're trying to do is:
a) are you generating a consensus
sequence where the 2 sequences are
difference only by gaps
b) are your 2 sequences significantly
different and you're trying to
exclude the non-aligning parts and
then generate a consensus?
So, does the first pair represent your data, or is it more like the second?
ATCG---AAActctgGGGGG--taGC
ATCGcccAAActctgGGGGGTTtaGC
ATCG---AAActctgGGGGG--taGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
ATCGcccAAActctgGGGGGTTtaGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

The problem is that you're using $i as the counter variable for both your loops, so the inner loop modifies the counter out from under the outer loop. Try changing the inner loop's counter to $j, or using my to localize them properly.

Don't store your values as an array, store as a two-dimensional array:
my #dataset = ([$val1, $val2], [$val3, $val4]);
or
my #dataset;
push (#dataset, [$val_n1, $val_n2]);
Then:
for my $value (#dataset) {
### Do stuff with $value->[0] and $value->[1]
}

There are lots of strange things in your code: you are initializing a matrix then not using it; reading a whole file into an array; scanning a string C style but then not doing anything with the unmatched values; and finally, just printing the two last processed values (which, in your case, are the two first elements of your array, since you are using pop.)
Here's a guess.
use strict;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
# Preparing a regular expression. This is kind of useful if processing large
# amounts of data. This will match anything that is not in the string above.
my $regex = qr([^$aminoacids]);
# Our work function.
sub do_something {
my ($a, $b) = #_;
$a =~ s/$regex//g; # removing unwanted characters
$b =~ s/$regex//g; # ditto
# Printing, saving, whatever...
print "Something: $a - $b\n";
return ($a, $b);
}
my $prev;
while (<>) {
chomp;
if ($prev) {
do_something($prev, $_);
$prev = undef;
} else {
$prev = $_;
}
}
print STDERR "Warning: trailing data: $prev\n"
if $prev;

Since you are a total Perl/programming newbie, I am going to show a rewrite of your first code block, then I'll offer you some general advice and links.
Let's look at your first block of sample code. There is a lot of stuff all strung together, and it's hard to follow. I, personally, am too dumb to remember more than a few things at a time, so I chop problems into small pieces that I can understand. This is (was) known as 'chunking'.
One easy way to chunk your program is use write subroutines. Take any particular action or idea that is likely to be repeated or would make the current section of code long and hard to understand, and wrap it up into a nice neat package and get it out of the way.
It also helps if you add space to your code to make it easier to read. Your mind is already struggling to grok the code soup, why make things harder than necessary? Grouping like things, using _ in names, blank lines and indentation all help. There are also conventions that can help, like making constant values (values that cannot or should not change) all capital letters.
use strict; # Using strict will help catch errors.
use warnings; # ditto for warnings.
use diagnostics; # diagnostics will help you understand the error messages
# Put constants at the top of your program.
# It makes them easy to find, and change as needed.
my $AMINO_ACIDS = 'ARNDCQEGHILKMFPSTWYV';
my $AMINO_COUNT = length($AMINO_ACIDS);
my $DATA_FILE = 'dnadata.txt';
# Here I am using subroutines to encapsulate complexity:
my #data = read_data_file( $DATA_FILE );
my #matrix = initialize_matrix( 2, $amino_count, 0 );
# now we are done with the first block of code and can do more stuff
...
# This section down here looks kind of big, but it is mostly comments.
# Remove the didactic comments and suddenly the code is much more compact.
# Here are the actual subs that I abstracted out above.
# It helps to document your subs:
# - what they do
# - what arguments they take
# - what they return
# Read a data file and returns an array of dna strings read from the file.
#
# Arguments
# data_file => path to the data file to read
sub read_data_file {
my $data_file = shift;
# Here I am using a 3 argument open, and a lexical filehandle.
open( my $infile, '<', $data_file )
or die "Unable to open dnadata.txt - $!\n";
# I've left slurping the whole file intact, even though it can be very inefficient.
# Other times it is just what the doctor ordered.
my #data = <$infile>;
chomp #data;
# I return the data array rather than a reference
# to keep things simple since you are just learning.
#
# In my code, I'd pass a reference.
return #data;
}
# Initialize a matrix (or 2-d array) with a specified value.
#
# Arguments
# $i => width of matrix
# $j => height of matrix
# $value => initial value
sub initialize_matrix {
my $i = shift;
my $j = shift;
my $value = shift;
# I use two powerful perlisms here: map and the range operator.
#
# map is a list contsruction function that is very very powerful.
# it calls the code in brackets for each member of the the list it operates against.
# Think of it as a for loop that keeps the result of each iteration,
# and then builds an array out of the results.
#
# The range operator `..` creates a list of intervening values. For example:
# (1..5) is the same as (1, 2, 3, 4, 5)
my #matrix = map {
[ ($value) x $i ]
} 1..$j;
# So here we make a list of numbers from 1 to $j.
# For each member of the list we
# create an anonymous array containing a list of $i copies of $value.
# Then we add the anonymous array to the matrix.
return #matrix;
}
Now that the code rewrite is done, here are some links:
Here's a response I wrote titled "How to write a program". It offers some basic guidelines on how to approach writing software projects from specification. It is aimed at beginners. I hope you find it helpful. If nothing else, the links in it should be handy.
For a beginning programmer, beginning with Perl, there is no better book than Learning Perl.
I also recommend heading over to Perlmonks for Perl help and mentoring. It is an active Perl specific community site with very smart, friendly people who are happy to help you. Kind of like Stack Overflow, but more focused.
Good luck!

Instead of using a C-style for loop, you can read data from an array two elements at a time using splice inside a while loop:
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# stuff...
}
I've cleaned up some of your other code below:
use strict;
use warnings;
open(my $infile, '<', 'dnadata.txt');
my #data = <$infile>;
close $infile;
chomp #data;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
# initialize a 2 x 21 array for holding the amino acid data
my $matrix;
foreach my $i (0 .. 1)
{
foreach my $j (0 .. $aalen-1)
{
$matrix->[$i][$j] = 0;
}
}
# Process all letters in the DNA data
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# do something... not sure what?
# you appear to want to look up the letters in a reference table, perhaps $aminoacids?
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl - Spreadsheet::WriteExcel function explanation - perl

Related

Parsing CSV files, finding columns and remembering them

Perl - assigning hashes - why assigned variable is changed?

new to Perl - CSV - find a string and print all numbers in that column

Perl Loop Output to Excel Spreadsheet

Why does my Perl for loop exit early?

Categories

Resources