sprintf format and print inconsistencies whilst creating fixed width column - perl

(I have already cross posted onto another site and will update either with the solution but so far struggling with an answer)
19th Dec 2013 7:06pm PT --- I found the solution and I updated below.
I am outputting two items of data per line. The first column of data is not fixed length, and I want the second item of data to be correctly aligned in the same position each time so I am using sprintf to format the data and then mail out the data
The print command output illustrates that the data is formatted correctly.
Yet, when the output in my email is different, the alignment is all wrong.
I initially thought it was the mailer (MIME::Lite) program but I am not sure it is.
Reason why I think that is because I using eclipse Perl environment, when I look at the debug variable list, I see that the strings are padded out exactly like the output in my email, yet the print statement shows the data correctly aligned!!!
Please help me understand what is going on here and how to fix it.
use MIME::Lite;
$smtp = "mailserver";
$internal_email_address = 'myemailaddess';
$a = sprintf ("%-60s %-s\n", "the amount for Apple is ","34");
$b = sprintf ("%-60s %-s\n", "the amount for Lemons is", "7");
print $a;
print $b;
$c = $a.$b;
mailer( $internal_email_address,"issue", $c);
sub mailer {
my ( $addr, $subj, $output ) = #_;
print "$_\n" for $addr;
print "$_\n" for $subj;
print "$_\n" for $output;
$msg = MIME::Lite->new(
From => 'xxxx',
To => $addr,
Subject => $subj,
Data => $output
);
MIME::Lite->send( 'smtp', $smtp, Timeout => 60 );
eval { $msg->send };
$mailerror = "Status: ERROR email - MIME::Lite->send failed: $#\n" if $#;
if ( $mailerror eq '' ) {
$status = "Status: Mail sent\n";
}
else {
$status = $mailerror;
}
}

$a = sprintf ("%-10s %-s\n", "the amount for Apple is ","34");
The argument "the amount for Apple is" is too long for the format specifier %-10s, so the actual amount of space used for that argument will be the length of the string.
You could use a format specifier with a larger value (e.g., %-25s) that can accomodate any value you're likely to apply to it.
Or if you want sprintf to truncate the argument at 10 characters, use the format specifier %-10.10s.

Related

Appending values to Hash if key is same in Perl

Problem is to read a file with value at every new line. Content of file looks like
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound
With the string before first comma being the Key and string in between last and second last comma the Value
i.e. for the first line 3ssdwyeim3 becomes the key and 0.476 Value.
Now as we are looping over each line if the key exists we have to concatenate the values separated by comma.
Hence for the next new line as key already exists key remains 3ssdwyeim3 but the value is updated to 0.476,0.421.
Finally we have to print the keys and values in a file.
I have written a code to achieve the same, which is as follows.
sub findbreakdown {
my ( $out ) = #_;
my %timeLogger;
open READ, "out.txt" or die "Cannot open out.txt for read :$!";
open OUTBD, ">$out\_breakdown.csv" or die "Cannot open $out\_breakdown.csv for write :$!";
while ( <READ> ) {
if ( /(.*),.*,.*,.*,(.*),.*/ ) {
$btxnId = $1;
$time = $2;
if ( !$timeLogger{$btxnId} ) {
$timeLogger{$btxnId} = $time;
}
else {
$previousValue = $timeLogger{$btxnId};
$newValue = join ",", $previousValue, $time;
$timeLogger{$btxnId} = $newValue;
}
}
foreach ( sort keys %timeLogger ) {
print OUTBD "$_ ,$timeLogger{$_}\n";
}
}
close OUTBD;
close READ;
}
However Something is going wrong and its printing like this
3ssdwyeim3,0.476
3ssdwyeim3,0.476,0.421
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Whereas expected is:
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Your program is behaving correctly, but you are printing the current state of the entire hash after you process each line.
Therefore you are printing hash keys before they have the complete set of values, and you have many duplicated lines.
If you move the foreach loop that prints to the end of your program (or simply use the debugger to inspect the variables) you will find that the final state of the hash is exactly what you expect.
Edit: I previously thought the problem was the below, but it's because I misread the sample data in your question.
This regular expression is not ideal:
if (/(.*),.*,.*,.*,(.*),.*/) {
The .* is greedy and will match as much as possible (including some content with commas). So if any line contains more than six comma-separated items, more than one item will be included in the first matching group. This may not be a problem in your actual data, but it's not an ideal way to write the code. The expression is more ambiguous than necessary.
It would be better written like this:
if (/^([^,]*),[^,]*,[^,]*,[^,]*,([^,]*),[^,]*$/) {
Which would only match lines with exactly six items.
Or consider using split on the input line, which would be a cleaner solution.
This is much simpler than you have made it. You can just split each line into fields and use push to add the value to the list corresponding to the key
I trust you can modify this to read from an external file instead of the DATA file handle?
use strict;
use warnings 'all';
my %data;
while ( <DATA> ) {
my #fields = split /,/;
push #{ $data{$fields[0]} }, $fields[-2];
}
for my $key ( sort keys %data ) {
print join(',', $key, #{ $data{$key} }), "\n";
}
__DATA__
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound
output
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436

perl: Is there a heuristic for finding mismatched braces/brackets/parens etc

I'm using perl to parse a json file. When all is OK, I find matching braces fine. But if there is a mismatch, I can't think of a good way of finding where it is.
My data at this point is a sorted array (#merged)
of offsets of braces in the file, with offsets of closing braces set negative.
Here's the section that does the matching:
#stack=();
foreach $val (#merged) # go through merged array
{ if ($val>0) { push #stack, $val;} # push every opener onto a stack
else { $opn = pop #stack; # when a closer comes up, pop previous opener
#tmp = ($opn, abs $val); # array of one match
push #matches, [#tmp]; # the array of all matches
}
}
I also have information about the column, but I don't want the algorithm to depend on compulsive formatting.
I'd like to adapt this to perl text as well, for those times when the translator just says there's an unmatched brace at the end.
Are there any good heuristics for finding the location of the mismatch?
Use a parser, do not try to reinvent the wheel. Here is an example:
#!/usr/bin/env perl
use strict;
use warnings;
use JSON qw( decode_json encode_json );
my $data = { foo => 'bar', baz => [1,2,3], qux => { abc => 1, def => 2, ghi => 3} };
my $json = encode_json($data);
my $error_json = $json;
$error_json =~ s|\]||; # Remove a closing square bracket
eval {
my $error_data = decode_json($error_json); # Will throw an error
};
my $error = $#;
if ($error) {
print "JSON Error : $error";
my ($char_pos) = $error =~ m|at character offset (\d+)|;
print "Original : '$json'\n";
print "Error : '$error_json'\n";
print "..............";
print "."x($char_pos) . "^\n";
} else {
die "should not get here...something went wrong";
}
Output
JSON Error : , or ] expected while parsing array, at character offset 31 (before ":{"abc":1,"ghi":3,"d...") at foo.pl line 15.
Original : '{"foo":"bar","baz":[1,2,3],"qux":{"abc":1,"ghi":3,"def":2}}'
Error : '{"foo":"bar","baz":[1,2,3,"qux":{"abc":1,"ghi":3,"def":2}}'
.............................................^
I found an heuristic which works much of the time, especially if you're compulsive about lining up opening and closing brackets.
I scan the file, find matching brackets, and find the difference between the columns. The error usually stands out as a larger difference than most of the matches.
Of course, I have to ignore brackets in comments or in quotes.
I have used this with .pl and .js files, and it works well.

Perl Text::CSV $csv->fields() property not populated

I've got a script that reformats incoming data from a CSV into a readable format by a vended system. I may be going crazy, but I'm pretty sure it worked a week or two ago in the production environment. However, at some point in the last week or two, it stopped working. I tracked the problem down to the Text::CSV module not populating the $csv->fields() property.
my $csv = Text::CSV->new({sep_char => '|', allow_loose_quotes => 1});
$csv->column_names($csv->getline(*READ));
my #keys = $csv->fields;
Now, on my local machine (and, at least in my head, in the production environment two weeks ago, too), this would populate #keys with the parsed header fields. However, now, in both production and pre-production, this fails. The only difference I can tell is that my machine is running perl 5.12.4, while the prod/pprd is 5.8.8. The Text::CSV module on both is 1.21.
On my machine, if I use Data::Dumper and dump the $csv object, part of the properties is
'_FIELDS' => [
'ID',
'IDCARD_TYPE',
'FIRST_NAME',
'MIDDLE_NAME',
'LAST_NAME',
...
'EMAIL',
],
On the other machines:
'_FIELDS' => undef,
I've worked around this by using $csv->column_names to populate #keys, but something doesn't seem right and I'd really like to figure out what's going on. Any ideas?
Per the Text::CSV documentation, returning undef is the expected result of fields() after calling getline(). Try using parse() first. You might be using a different version of this module on your local machine. You can check the version using perl -MText::CSV -e 'print $Text::CSV::VERSION'.
Note that the return value is undefined after using getline (), which
does not fill the data structures returned by parse ().
Following alternate sequence worked for me:
$file = "test.csv" ;
if(!open($fh, "<", $file )) {
# Cannot call getline is a symptom of a bad open()
printf("### Error %s: could not open file %s\n", $ws, $file) ;
close($fh) ;
exit 1 ;
}
while(my $row = $csv->getline($fh)) {
# $row is a pointer to an Array
# The array is already parsed.
#items = #{$row} ;
for(my $i=0 ; $i<=$#items; $i++) {
printf("Field %d: (%s)\n", $i, $items[$i] ) ;
}
}
close $fh ;

perl printing variables

The common code is :
use strict;
use warnings;
my $f = $ARGV[0];
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::ApiVersion;
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous',
-port => '5306'
);
my $adaptor = $registry->get_adaptor( 'Homo sapiens', 'core', 'transcript' );
my $transcript =
$adaptor->fetch_by_translation_stable_id($f);
LAST LINE
#
For the LAST LINE, I am having trouble printing out the two values as two columns in the same line :
Attempt 1 code: print $f . $transcript->display_id. "\n";
Result : api.test.pl ENSP00000418690
ENSP00000418690ENST00000488594
Attempt 2 code :print $f, $transcript->display_id. "\n";
Result : perl api.test.pl ENSP00000418690 :
ENSP00000418690ENST00000488594
Any other attempt messes with accessing the display_id. What I want the format to be is :
ENSP00000418690 ENST00000488594
If you want a space between the values, print a space between the values.
print $f, " ", $transcript->display_id, "\n";
The dot joins two strings into one string, which isn't necessary here, because print accepts a comma-separated list of values.
Just to mention another possibility, you can two built-in perl variables.
{
local $, = " "; # Output field separator
local $\ = "\n"; # Output record separator
print $f, $transcript->display_id;
}
The output field separator is automatically printed between each argument in the list passed to print. The output record separator is printed after the final argument to print. Both are undefined by default, and interpreted to mean "don't print anything".
I declared them as local to a block as the simplest way of restricting them to the single print statement.

Converting code to perl sub, but not sure I'm doing it right

I'm working from a question I posted earlier (here), and trying to convert the answer to a sub so I can use it multiple times. Not sure that it's done right though. Can anyone provide a better or cleaner sub?
I have a good deal of experience programming, but my primary language is PHP. It's frustrating to know how to execute in one language, but not be able to do it in another.
sub search_for_key
{
my ($args) = #_;
foreach $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
my $thiskey = NULL;
my #result = map { $args->{search_ary}[$_][0] } # Get the 0th column...
grep { #$args->{search_in} =~ /$args->{search_ary}[$_][1]/ } # ... of rows where the
0 .. $#array; # first row matches
$thiskey = #result;
print "\nReturning: " . $thiskey . "\n";
return $thiskey;
}
search_for_key({
'search_ary' => $ref_cam_make,
'search_in' => 'Canon EOS Rebel XSi'
});
---Edit---
From the answers so far, I've cobbled together the function below. I'm new to Perl, so I don't really understand much of the syntax. All I know is that it throws an error (Not an ARRAY reference at line 26.) about that grep line.
Since I seem to not have given enough info, I will also mention that:
I am calling this function like this (which may or may not be correct):
search_for_key({
'search_ary' => $ref_cam_make,
'search_in' => 'Canon EOS Rebel XSi'
});
And $ref_cam_make is an array I collect from a database table like this:
$ref_cam_make = $sth->fetchall_arrayref;
And it is in the structure like this (if I understood how to make the associative fetch work properly, I would like to use it like that instead of by numeric keys):
Reference Array
Associative
row[1][cam_make_id]: 13, row[1][name]: Sony
Numeric
row[1][0]: 13, row[1][1]: Sony
row[0][0]: 19, row[0][1]: Canon
row[2][0]: 25, row[2][1]: HP
sub search_for_key
{
my ($args) = #_;
foreach my $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
print grep { $args->{search_in} =~ #$args->{search_ary}[$_][1] } #$args->{search_ary};
}
You are moving in the direction of a 2D array, where the [0] element is some sort of ID number and the [1] element is the camera make. Although reasonable in a quick-and-dirty way, such approaches quickly lead to unreadable code. Your project will be easier to maintain and evolve if you work with richer, more declarative data structures.
The example below uses hash references to represent the camera brands. An even nicer approach is to use objects. When you're ready to take that step, look into Moose.
use strict;
use warnings;
demo_search_feature();
sub demo_search_feature {
my #camera_brands = (
{ make => 'Canon', id => 19 },
{ make => 'Sony', id => 13 },
{ make => 'HP', id => 25 },
);
my #test_searches = (
"Sony's Cyber-shot DSC-S600",
"Canon cameras",
"Sony HPX-32",
);
for my $ts (#test_searches){
print $ts, "\n";
my #hits = find_hits($ts, \#camera_brands);
print ' => ', cb_stringify($_), "\n" for #hits;
}
}
sub cb_stringify {
my $cb = shift;
return sprintf 'id=%d make=%s', $cb->{id}, $cb->{make};
}
sub find_hits {
my ($search, $camera_brands) = #_;
return grep { $search =~ $_->{make} } #$camera_brands;
}
This whole sub is really confusing, and I'm a fairly regular perl user. Here are some blanket suggestions.
Do not create your own undef ever -- use undef then return at the bottom return $var // 'NULL'.
Do not ever do this: foreach $row, because foreach my $row is less prone to create problems. Localizing variables is good.
Do not needlessly concatenate, for it offends the style god: not this, print "\nReturning: " . $thiskey . "\n";, but print "\nReturning: $thiskey\n";, or if you don't need the first \n: say "Returning: $thiskey;" (5.10 only)
greping over 0 .. $#array; is categorically lame, just grep over the array: grep {} #{$foo[0]}, and with that code being so complex you almost certainly don't want grep (though I don't understand what you're doing to be honest.). Check out perldoc -q first -- in short grep doesn't stop until the end.
Lastly, do not assign an array to a scalar: $thiskey = #result; is an implicit $thiskey = scalar #result; (see perldoc -q scalar) for more info. What you probably want is to return the array reference. Something like this (which eliminates $thiskey)
printf "\nReturning: %s\n", join ', ', #result;
#result ? \#result : 'NULL';
If you're intending to return whether a match is found, this code should work (inefficiently). If you're intending to return the key, though, it won't -- the scalar value of #result (which is what you're getting when you say $thiskey = #result;) is the number of items in the list, not the first entry.
$thiskey = #result; should probably be changed to $thiskey = $result[0];, if you want mostly-equivalent functionality to the code you based this off of. Note that it won't account for multiple matches anymore, though, unless you return #result in its entirety, which kinda makes more sense anyway.