Perl printf Spacing - perl

I'm using printf to print out data in 2 columns. The first column is max 12 characters, but the data in the second column can get quite long. Is there a way to make it start from the same indentation that it starts on the first line after it line-wraps?
printf("%-12s\t%s", $key, $result);

I suggest that you use the core library Text::Wrap.
The following would implement what you're talking about:
use strict;
use warnings;
use Text::Wrap;
local $Text::Wrap::columns = 72;
while (<DATA>) {
my ($word, $paragraph) = split ' ', $_, 2;
print wrap(sprintf("%-12s", $word), ' 'x12, $paragraph), "\n";
}
__DATA__
one The fallen python hurts behind your entering delight. A leader defects within the birth! The torture overflows? The verdict beams behind the energy.
two A convinced undergraduate seasons the bonus. The present alert mends inside the gesture. How will the publicized coordinate swallow a log panic?
three A tourist faints? An alive biography behaves on top of a grief. A storm scares a conductor throughout an anxious initiate.
Outputs:
one The fallen python hurts behind your entering delight. A
leader defects within the birth! The torture overflows?
The verdict beams behind the energy.
two A convinced undergraduate seasons the bonus. The present
alert mends inside the gesture. How will the publicized
coordinate swallow a log panic?
three A tourist faints? An alive biography behaves on top of a
grief. A storm scares a conductor throughout an anxious
initiate.

I don't think printf can do what you want by itself, but you can do the wrapping yourself. The following example is primitive but usable:
sub wrap {
my ($str, $first_col_size, $max_col_size) = #_;
my $ret = $str;
$ret =~ s/(.{$max_col_size})/"$1\n" . (' ' x $first_col_size) . "\t"/ge;
$ret;
}
printf("%-12s\t%s\n", $key, wrap($result, 12, 60));
Or maybe you could use something like Text::ASCIITable on CPAN to do what you need.

Related

Sorting upper case and lower case with Perl

I tried to sorting upper case and lower case in the perl language. A bunch of text are save in as "electricity.txt"
in the .txt file:
Today's scientific question is: What in the world is electricity and
where does it go after it leaves the toaster?
Here is a simple experiment that will teach you an important
electrical lesson: On a cool dry day, scuff your feet along a carpet,
then reach your hand into a friend's mouth and touch one of his dental
fillings. Did you notice how your friend twitched violently and cried
out in pain? This teaches one that electricity can be a very powerful
force, but we must never use it to hurt others unless we need to learn
an important lesson about electricity.
Somehow, I can't get any uppercase word
and my code is
my %count;
my $openFileile = "electricity.txt";
open my $openFile, '<', $openFileile;
while (my $list = <$openFile>) {
chomp $list;
foreach my $word (split /\s+/, $list) {
$count{lc($word)}++;
}
}
printf "\n\nSorting Alphabetically with upper case words in front of lower-case words with the same initial characters\n";
foreach my $word (sort keys %count){
printf "%-31s \n", sort {"\$a" cmp uc"\$b"} lc($word);
}
Issue 1
First problem is the statement below means you are only storing the lower-case versions of all the words
$count{lc($word)}++;
After the initial while loop %count has only lower-case words. That means your foreach loop can never retrieve the upper-case words.
Issue 2
Second issue is this statement
printf "%-31s \n", sort {"\$a" cmp uc"\$b"} lc($word);
I have no idea what you think that the sort will achieve -- it is sorting a list with only one element, lc($word), so doesn't actually do anything.
A working example
Taking the comments above into account, here is a version that outputs both upper & lower-case words (abbreviated)
use strict;
use warnings;
my %count;
#my $openFileile = "electricity.txt";
#open my $openFile, '<', $openFileile;
while (my $list = <DATA>) {
chomp $list;
foreach my $word (split /\s+/, $list) {
$count{$word}++;
}
}
printf "\n\nSorting Alphabetically with upper case words in front of lower-case words with the same initial characters\n";
foreach my $word (sort keys %count){
printf "%-31s \n", $word;
}
__DATA__
Today's scientific question is: What in the world is electricity and where does it go after it leaves the toaster?
Here is a simple experiment that will teach you an important electrical lesson: On a cool dry day, scuff your feet along a carpet, then reach your hand into a friend's mouth and touch one of his dental fillings. Did you notice how your friend twitched violently and cried out in pain? This teaches one that electricity can be a very powerful force, but we must never use it to hurt others unless we need to learn an important lesson about electricity.
That print this
Sorting Alphabetically with upper case words in front of lower-case words with the same initial characters
Did
Here
On
This
Today's
What
a
about
after
along
...
use
very
violently
we
where
will
world
you
As Hunter McMillen's comment says, you are using lc on the words when creating the hash, therefore all of your original capitalization will be lost. Lets go through your code, as I spot some other mistakes.
First off, always use use strict; use warnings. Especially if you have a preference for long and complicated variable names. It will save you from typos and weird bugs.
open my $openFile, '<', $openFileile;
With open statements, it is idiomatic to check the return value of the open, to see if anything went wrong. And if it did, to report the error. I.e. add ..., or die "Cannot open '$openFileile': $!".
foreach my $word (split /\s+/, $list) {
Typically, if you split on whitespace you usually want to split on ' ' -- a single space. This is a special case for split, also the default split mode, it will split on \s+, but also remove leading whitespace.
$count{lc($word)}++;
Here is your problem. All the words lose their original case.
printf "\n\nSorting Alphabetically with upper case words in front of lower-case words with the same initial characters\n";
printf is a special formatting print. If you do not intend to use that formatting, use the regular print to avoid problems.
printf "%-31s \n", sort {"\$a" cmp uc"\$b"} lc($word);
You cannot sort just one (1) word. You need at least 2 words to be able to sort.
Why are you using double quotes, and then escaping the variable sigil? I am guessing this is you testing different things to see what works. This looks very unlikely to do what you want.
"\$a" will just become $a -- a dollar sign plus an "a". This is what you do when you want to print the variable name, e.g. print "\$a is $a" (prints $a is 12, for example).
lc will have no effect, since all your words are already in lower case.
Even if lc and uc would work here, you cannot use uc like that in the sort subroutine. The sort function will choose one word in the comparison at random and capitalize it. Effectively destroying your sort.
Also uc will change all the letters to upper case (cat => CAT). You want ucfirst (cat => Cat).
When I clean up your code, and also make the variable names somewhat more reasonable, I get this below. Also, I removed your file open, since I use the internal DATA file handle to facilitate testing. You can just put back your own open, with the additions I described above.
use strict;
use warnings;
my %words;
while (my $line = <DATA>) {
for my $word (split ' ', $line) { # split on ' ' a single space removes leading and trailing whitespace
my $key = lc $word; # save lowercase word as key
$words{$key}{count}++; # separate count
$words{$key}{value} = $word; # word original formatting as value
}
}
# printf is used for special formatting, if you are not using that formatting, use regular print to avoid unnecessary interpolation of %
print "\nSorting Alphabetically with upper case words in front of lower-case words with the same initial characters\n";
for my $word (sort keys %words) {
printf "%-31s : %s\n", $words{$word}{value}, $words{$word}{count};
}
__DATA__
Today's scientific question is: What in the world is electricity and where does it go after it leaves the toaster?
Here is a simple experiment that will teach you an important electrical lesson: On a cool dry day, scuff your feet along a carpet, then reach your hand into a friend's mouth and touch one of his dental fillings. Did you notice how your friend twitched violently and cried out in pain? This teaches one that electricity can be a very powerful force, but we must never use it to hurt others unless we need to learn an important lesson about electricity.
And it prints
a : 5
about : 1
after : 1
along : 1
an : 2
and : 3
be : 1
but : 1
can : 1
carpet, : 1
cool : 1
...etc
As can be noticed, this differentiates between carpet and carpet, since you are only splitting on whitespace. It keeps the non-word characters and includes them in the hash. There are different ways to find words in a text. For example, instead of split you could use a regex:
my #words = $line =~ /\w+/g; # \w is word characters, plus numbers, and underscore _
Even this is simplistic, but will work better than your split. You can add characters to the regex as your needs require, for example: /[\w\-]+/ -- include dash for hyphenated words, e.g. mega-carpet. (Note that dash - has to be escaped when placed between other characters inside a character class bracket, otherwise it will be interpreted as a range, e.g. a-z.)

Echo progress bar for while external process executing and take STDOUT when it done

How I can echo a progress bar while an external process is executing and capture its STDOUT when it's done, using only standard modules. And not using fork?
Run external process, something like: #array = `ls -l`;
While it executing, do printing progress bar, like: print '.';
Capture STDOUT of the process into array, when it done
Continue works main script
I'm reading about IPC::Open2, IPC::Open3, but I don't understand how to use them for this task. Maybe it's not the right direction?
What do you have so far? If you have having trouble with the interprocess communication, forget about the progress bar for the moment and ask just about that.
You can't really have a progress bar for something that has an indeterminate end. If you don't know how much input you will read, you don't know what fraction of it you have read. People tend to think of progress bars as a representation of fraction of work done, just not activity. That is, unless you use macOS and understand that "less than one minute" means "more than three hours". ;)
I tend to do something simple, where I output a dot every so often. I don't know how many dots I'll see, but I know that I'll see new ones.
$|++; # unbuffer stdout to see dots as they are output
while( <$fh> ) {
print '.' if $. % $chunk_size; # $. is the line number
print "\n[$.] " if $. % $chunk_size * $row_length;
...
}
That $fh can be anything that you want to read from, including a pipe. perlopentut has examples of reading from external processes. Those are doing a fork, though. And, the other modules will fork as well. What's the constraint that makes you think you can't use fork?
You can get more fancy with your display by using curses and other things (a carriage return is handy :), but I'm not inclined to type those out.
Perhaps OP is looking for something of next kind just to indicate that external process is running.
Define a handler for $SIG{ALRM} and set alarm 1 to run handler every second. Once process complete reset alarm 0 to turn off alarm handler.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $ls_l; # variable to store output of external command
$| = 1; # unbuffered output
$SIG{ALRM} = \&handler;
alarm 1; # run handler every second
say 'Running sig_alarm_sleep';
$ls_l=`./sig_alarm_sleep`;
say ' done';
alarm 0;
my #fields = qw(rwx count user group size month day time name);
my #lines = split("\n",$ls_l);
my(#array);
for( #lines ) {
my $x->#{#fields} = split(' ',$_);
push #array, $x;
}
say Dumper(\#array);
exit 0;
sub handler {
print '.';
$SIG{ALRM} = \&handler;
alarm 1;
}
Bash script sig_alarm_sleep sample
#!/usr/bin/bash
sleep 20
ls -al

sleep function stops script [duplicate]

Today in my college a teacher asked me a question. He wrote this code on the paper and said
"What will be the output of this code?"
use warnings;
for (1 .. 20)
{
print ".";
}
I found it easy and said that it will loop 20 times and at each iteration it will print a dot (.) and hence total 20 dots will be the output.
He said you are right and then he made some changes in the code. The code was:
use warnings;
for (1 .. 20)
{
print ".";
sleep 1;
}
He said the what will be the output now? I didn't know about the sleep function, I guessed that at each iteration it will print the dot (.) and then it will wait for 1 second (because of the sleep function) and then again it will iterate and then again it will print (.) then it will wait for 1 second and so on...
The teacher told me to check it at home. I tried it at home and I came to know that the second code waits for 20 seconds and then it prints all dots (20 dots) at once. I want to know how this happened? Why isn't the dot (.) is getting print on each iteration?
The real issue has nothing to do with sleep, but rather that............
You are Suffering from Buffering. The link provided takes you to an excellent article from The Perl Journal circa 1998 from Marc Jason Dominus (the author of Higher-Order Perl). The article may be over a decade old, but the topic is as relevant today as it was when he wrote it.
Others have explained the $| = 1; technique. I would add to those comments that in the predominant thinking of the Perl community seems to be that $| = 1 is preferable over $|++ simply because it is clearer in its meaning. I know, autoincrement is pretty simple too, but does everyone who will ever look at your code know $|'s behavior when ++ or -- are applied (without looking it up in perlvar). I happen to also prefer to localize any modification of Perl's "special variables" so that the effects are not washing over into other portions of code that may not play nice with a particular change to default Perl behavior. So that being the case, I would write it as:
use strict;
use warnings;
{
local $| = 1;
for ( 1 .. 20 ) {
print '.';
sleep 1;
}
}
Perl, and many other programs, line-buffer output by default. You can set $| to 1 if you need unbuffered output.
It's not clearing the buffer. If there is a newline at the end of the print statement it will do that for you automatically:
use warnings;
for (1 .. 20) {
print ".\n";
sleep 1;
}
If you don't want the newline (I don't imagine you do) you can use the special autoflush variable $|. Try setting it to 1 or incrementing it.
use warnings;
$|++;
for (1 .. 20) {
print ".";
sleep 1;
}

Term::TermKey: How to enable wide mouse support?

When I run this script, the position of the mouse works up to the column 255 - then the count begins by 0. Does this mean that my terminal does not support SGR/mode 1006?
(edited due ak2's answer)
#!/usr/bin/env perl
use warnings;
use 5.12.0;
use utf8;
use Term::TermKey qw(FLAG_UTF8 FORMAT_LONGMOD FORMAT_MOUSE_POS);
my $tk = Term::TermKey->new( \*STDIN );
binmode STDOUT, ':encoding(UTF-8)' if $tk->get_flags & FLAG_UTF8;
$|++;
print "\e[?1003h";
print "\e[?1006h";
say "Quit with \"q\"";
while( 1 ) {
$tk->waitkey( my $key );
say $tk->format_key( $key, FORMAT_LONGMOD | FORMAT_MOUSE_POS );
last if $tk->format_key( $key, 0 ) eq 'q';
}
print "\e[?1006l";
print "\e[?1003l";
No.
It means you're not using the very lastest libtermkey library yet, the one that supports positions greater than column 255. Possibly because I haven't actually released it yet ;)
I'll let you know once that's up, along with the extra CSI capture support for position reporting, etc..
Also: If you have more libtermkey-specific questions, you might want to let me know more directly. E.g. you could email me to let me know you've posted a question; I don't always make a habit of searching them out. :)
Edit 2012/04/26: I've now released libtermkey 0.15 and Term::TermKey 0.14, which supports these columns above 255, along with the position report API.
Switching on mode 1006 changes the mouse event encoding, but it doesn't actually enable mouse reporting. For that, you'll need to switch on mode 1000 (click and release only), 1002 (click, release and drag), or 1003 (click, release, and any mouse movement).

Perl Program to Mimic RNA Synthesis

Looking for suggestions on how to approach my Perl programming homework assignment to write an RNA synthesis program. I've summed and outlined the program below. Specifically, I'm looking for feedback on the blocks below (I'll number for easy reference). I've read up to chapter 6 in Elements of Programming with Perl by Andrew Johnson (great book). I've also read the perlfunc and perlop pod-pages with nothing jumping out on where to start.
Program Description: The program should read an input file from the command line, translate it into RNA, and then transcribe the RNA into a sequence of uppercase one-letter amino acid names.
Accept a file named on the command line
here I will use the <> operator
Check to make sure the file only contains acgt or die
if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }
Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C)
not sure how to do this
Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"
not sure but I'm thinking this is where I will start a %hash variables?
Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)
Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)
If a gap is encountered a new line is started and process is repeated
not sure but we can assume that gaps are multiples of threes.
Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?
Note
Must be self contained program (stored values for codon names & symbols).
Whenever the program reads a codon that has no symbol this is a gap in the RNA, it should start a new line of output and begin at the next occurance of "AUG". For simplicity we can assume that gaps are always multiples of threes.
Before I spend any additional hours on research I am hoping to get confirmation that I'm taking the right approach. Thanks for taking time to read and for sharing your expertise!
1. here I will use the <> operator
OK, your plan is to read the file line by line. Don't forget to chomp each line as you go, or you'll end up with newline characters in your sequence.
2. Check to make sure the file only contains acgt or die
if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }
In a while loop, the <> operator puts the line read into the special variable $_, unless you assign it explicitly (my $line = <>).
In the code above, you're reading one line from the file and discarding it. You'll need to save that line.
Also, the ne operator compares two strings, not one string and one regular expression. You'll need the !~ operator here (or the =~ one, with a negated character class [^acgt]. If you need the test to be case-insensitive, look into the i flag for regular expression matching.
3. Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C).
As GWW said, check your biology. T->U is the only step in transcription. You'll find the tr (transliterate) operator helpful here.
4. Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"
not sure but I'm thinking this is where I will start a %hash variables?
I would use a buffer here. Define an scalar outside the while(<>) loop. Use index to match "AUG". If you don't find it, put the last two bases on that scalar (you can use substr $line, -2, 2 for that). On the next iteration of the loop append (with .=) the line to those two bases, and then test for "AUG" again. If you get a hit, you'll know where, so you can mark the spot and start translation.
5. Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)
Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)
Again, as GWW said, build a hash table:
%codons = ( AUG => 'M', ...).
Then you can use (for eg.) split to build an array of the current line you're examining, build codons three elements at a time, and grab the correct aminoacid code from the hash table.
6.If a gap is encountered a new line is started and process is repeated
not sure but we can assume that gaps are multiples of threes.
See above. You can test for the existence of a gap with exists $codons{$current_codon}.
7. Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?
You know, looking at the above, it seems way too complex. I built a few building blocks; the subroutines read_codon and translate: I think they help the logic of the program immensely.
I know this is a homework assignment, but I figure it might help you get a feel for other possible approaches:
use warnings; use strict;
use feature 'state';
# read_codon works by using the new [state][1] feature in Perl 5.10
# both #buffer and $handle represent 'state' on this function:
# Both permits abstracting reading codons from processing the file
# line-by-line.
# Once read_colon is called for the first time, both are initialized.
# Since $handle is a state variable, the current file handle position
# is never reset. Similarly, #buffer always holds whatever was left
# from the previous call.
# The base case is that #buffer contains less than 3bp, in which case
# we need to read a new line, remove the "\n" character,
# split it and push the resulting list to the end of the #buffer.
# If we encounter EOF on the $handle, then we have exhausted the file,
# and the #buffer as well, so we 'return' undef.
# otherwise we pick the first 3bp of the #buffer, join them into a string,
# transcribe it and return it.
sub read_codon {
my ($file) = #_;
state #buffer;
open state $handle, '<', $file or die $!;
if (#buffer < 3) {
my $new_line = scalar <$handle> or return;
chomp $new_line;
push #buffer, split //, $new_line;
}
return transcribe(
join '',
shift #buffer,
shift #buffer,
shift #buffer
);
}
sub transcribe {
my ($codon) = #_;
$codon =~ tr/T/U/;
return $codon;
}
# translate works by using the new [state][1] feature in Perl 5.10
# the $TRANSLATE state is initialized to 0
# as codons are passed to it,
# the sub updates the state according to start and stop codons.
# Since $TRANSLATE is a state variable, it is only initialized once,
# (the first time the sub is called)
# If the current state is 'translating',
# then the sub returns the appropriate amino-acid from the %codes table, if any.
# Thus this provides a logical way to the caller of this sub to determine whether
# it should print an amino-acid or not: if not, the sub will return undef.
# %codes could also be a state variable, but since it is not actually a 'state',
# it is initialized once, in a code block visible form the sub,
# but separate from the rest of the program, since it is 'private' to the sub
{
our %codes = (
AUG => 'M',
...
);
sub translate {
my ($codon) = #_ or return;
state $TRANSLATE = 0;
$TRANSLATE = 1 if $codon =~ m/AUG/i;
$TRANSLATE = 0 if $codon =~ m/U(AA|GA|AG)/i;
return $codes{$codon} if $TRANSLATE;
}
}
I can give you a few hints on a few of your points.
I think your first goal should be to parse the file character by character, ensuring each one is valid, group them into sets of three nucleotides and then work on your other goals.
I think your biology is a bit off as well, when you transcribe DNA to RNA you need to think about what strands are involved. You may not need to "complement" your bases during your transcription step.
2. You should check this as your parse the file character by character.
3. You could do this with a loop and some if statements or hash
4. This could probably be done with a counter as you read the file character by character. Since you need to insert a space after every 3rd character.
5. This would be a good place to use a hash that's based on the amino acid codon table.
6. You'll have to look for the gap character as you parse the file. This seems to contradict your #2 requirement since the program says your text can only contain ATGC.
There are a lot of perl functions that could make this easier. There are also perl modules such as bioperl. But I think using some of these could defeat the purpose of your assignment.
Look at BioPerl and browse the source-modules for indicators on how to go about it.