How to centrally justify in printf function in Perl - perl

I know the printf function by default uses right-justification. - will make it left justify. But is it possible to make it centrally justify the format?

The printf function cannot center text.
However, there is a very old, and almost forgotten mechanism that can do this. You can create format statements in Perl that tells write statements how to print. By using format and write, you can center justify text.
This was something sort of done back in the days of Perl 3.x back in 1989, but sort of abandoned by the time Perl 4 came out. Perl 5, with its stronger variable scoping really put a crimp in the use of formats since using them would violate the way Perl 5 likes to scope variables (formats are global in nature).
You can learn more about it by looking at perldoc perlform. I haven't seen them used in years.

my #lines = (
"It is true that printf and sprintf",
"do not have a conversion to center-justify text.",
"However, you can achieve the same effect",
"by padding left-justified text",
"with an appropriate number of spaces."
);
my $max_length = 0;
foreach my $line (#lines) {
$max_length = (length $line > $max_length) ? length $line : $max_length;
}
foreach my $line (#lines) {
printf "%s%-${max_length}s\n", ' ' x int(($max_length - length $line)/2), $line;
}

You need to use two variables for each value you'd like to print and dynamically set the width of each around the value width. The problem becomes a little trickier if you want consistent total widths when your value has an odd/even string length. The following seems to do the trick!
use POSIX;
printf( "<%*s%*s>\n",
((10+length($_))/2), $_,
ceil((10-length($_))/2), "" )
for( qw( four five5 six666 7seven7 ) );
which prints
< four >
< five5 >
< six666 >
< 7seven7 >

You need to know the line width for this. For example, printing centered lines to the terminal:
perl -lne 'BEGIN {$cols=`tput cols`} print " " x (($cols-length)/2),$_;' /etc/passwd
Of course, this is not a printf formatting tag.

my #lines = (
"Some example lines",
"of differing length",
"to show a different approach",
"that actually",
"prints your content",
"centered in its max-width block",
"with minimal padding",
"(which the answer",
"by Sam Choukri does NOT do)."
);
# Get the required field width
my $max_length = 0;
foreach my $line ( #lines )
{
$max_length = length( $line ) if ( length( $line ) > $max_length );
}
foreach my $line ( #lines )
{
# First step, find out how much padding is required
my $padding = $max_length - length( $line );
# Get half that amount (rounded up) in spaces
$padding = ( ' ' x ( ( $padding + $padding % 2 ) / 2 ) );
# Print your output, with padding appended, right-justified to
# a max-width field.
# (Pipe character added to show that trailing padding is correct)
printf "%${max_length}s|\n", $line . $padding;
}

Related

Efficient way to read columns in a file using Perl

I have an input file like so, separated by newline characters.
AAA
BBB
BBA
What would be the most efficient way to count the columns (vertically), first with first, second with second etc etc.
Sample OUTPUT:
ABB
ABB
ABA
I have been using the following, but am unable to figure out how to remove the scalar context from it. Any hints are appreciated:
while (<#seq_prot>){
chomp;
my #sequence = map substr (#seq_prot, 1, 1), $start .. $end;
#sequence = split;
}
My idea was to use the substring to get the first letter of the input (A in this case), and it would cycle for all the other letters (The second A and B). Then I would increment the cycle number + 1 so as to get the next line, until I reached the end. Of course I can't seem to get the first part going, so any help is greatly appreciated, am stumped on this one.
Basically, you're trying to transpose an array.
This can be done easily using Array::Transpose
use warnings;
use strict;
use Array::Transpose;
die "Usage: $0 filename\n" if #ARGV != 1;
for (transpose([map {chomp; [split //]} <>])) {
print join("", map {$_ // " "} #$_), "\n"
}
For an input file:
ABCDEFGHIJKLMNOPQRS
12345678901234
abcdefghijklmnopq
ZYX
Will output:
A1aZ
B2bY
C3cX
D4d
E5e
F6f
G7g
H8h
I9i
J0j
K1k
L2l
M3m
N4n
O o
P p
Q q
R
S
You'll have to read in the file once for each column, or store the information and go through the data structure later.
I was originally thinking in terms of arrays of arrays, but I don't want to get into References.
I'm going to make the assumption that each line is the same length. Makes it simpler that way. We can use split to split your line into individual letters:
my = $line = "ABC"
my #split_line = split //, $line;
This will give us:
$split_line[0] = "A";
$split_line[1] = "B";
$split_line[2] = "C";
What if we now took each letter, and placed it into a #vertical_array.
my #vertical_array;
for my $index ( 0..##split_line ) {
$vertical_array[$index] .= "$split_line[$index];
}
Now let's do this with the next line:
$line = "123";
#split_line = split //, $line;
for my $index ( 0..##split_line ) {
$vertical_array[$index] .= "$split_line[$index];
}
This will give us:
$vertical_array[0] = "A1";
$vertical_array[1] = "B2";
$vertical_array[2] = "C3";
As you can see, I'm building the $vertical_array with each interation:
use strict;
use warnings;
use autodie;
use feature qw(say);
my #vertical_array;
while ( my $line = <DATA> ) {
chomp $line;
my #split_line = split //, $line;
for my $index ( 0..$#split_line ) {
$vertical_array[$index] .= $split_line[$index];
}
}
#
# Print out your vertical lines
#
for my $line ( #vertical_array ) {
say $line;
}
__DATA__
ABC
123
XYZ
BOY
FOO
BAR
This prints out:
A1XBFB
B2YOOA
C3ZYOR
If I had used references, I could probably have built an array of arrays and then flipped it. That's probably more efficient, but more complex. However, that may be better at handling lines of different lengths.

replace a string of characters with the line number

I have a text file that has approximately 3,000 lines. 99% of the time I need all 3,000 lines. However, periodically I will grep out the lines I need and direct the output to another text file to use.
The only problem I have in doing so, is: Embedded in the text file is a 6 character string of numbers that indicate the line number. In order to use the file, this area needs to be correctly renumbered...(I don't need to re-sort the data, but I need to replace the current six characters with the new line number. and it must be padded with zeros! Unfortuantely the entire rows is one long row of data with no field separators!
For example, my first three rows might look something like:
20130918082020ZZ000001RANDOMDATAFOLLOWSAFTERTHISABCDEFGH
20130810112000ZZ000999MORERANDOMDATAFOLLOWSAFTERTHISABCD
20130810112000ZZ000027SILLMORERANDOMDATAFOLLOWSAFTERTHIS
The six characters at positions 17-22 (Immediately following the "ZZ"), need be renumbered based on the current row number...so the above needs to look like:
20130918082020ZZ000001RANDOMDATAFOLLOWSAFTERTHISABCDEFGH
20130810112000ZZ000002MORERANDOMDATAFOLLOWSAFTERTHISABCD
20130810112000ZZ000003SILLMORERANDOMDATAFOLLOWSAFTERTHIS
Any ideas would be greatly appreciated!
Thanks,
KSL.
Here's the solution I came up with Perl. It assumes that the numbering is always 6 digits after the ZZ sequence.
In convert.pl:
use strict;
use warnings;
my $i = 1; # or the value you want to start numbering
while (<STDIN>) {
my $replace = sprintf("%06d", $i++);
$_ =~ s/ZZ\d{6}/ZZ$replace/g;
print $_;
}
In data.dat:
20130918082020ZZ000001RANDOMDATAFOLLOWSAFTERTHISABCDEFGH
20130810112000ZZ000999MORERANDOMDATAFOLLOWSAFTERTHISABCD
20130810112000ZZ000027SILLMORERANDOMDATAFOLLOWSAFTERTHIS
To run:
cat data.dat | perl convert.pl
Output
20130918082020ZZ000001RANDOMDATAFOLLOWSAFTERTHISABCDEFGH
20130810112000ZZ000002MORERANDOMDATAFOLLOWSAFTERTHISABCD
20130810112000ZZ000003SILLMORERANDOMDATAFOLLOWSAFTERTHIS
If I would solve this, I would create a simple python script to read those lines by filtering as grep does and using a internal counter from inside the python script.
As simple hints you can read each line in a string and access them using variablename[17:22] (17:22 is the position of the string you are trying to use).
Now, there is a method in the string in python which does the replace, just replace the values by the counter you create.
I hope this helps.
To do this in awk:
awk '{print substr($0,1,16) sprintf("%06d", NR) substr($0,23)}'
or
gawk 'match($0,/^(.*ZZ)[0-9]{6}(.*)/,a) {print a[1] sprintf("%06d",NR) a[2]}'
This is exactly the type of thing where unpack is useful.
#!/usr/bin/env perl
use v5.10.0;
use strict;
use warnings;
while( my $line = <> ){
chomp $line;
my #elem = unpack 'A16 A6 A*', $line;
$elem[1] = sprintf '%06d', $.;
# $. is the line number for the last used file handle
say #elem;
}
Actually looking at the lines, it looks like there is date information stored in the first 14 characters.
Assuming that at some point you might want to parse the lines for some reason you can use the following as an example of how you could use unpack to split up the lines.
#!/usr/bin/env perl
use v5.10.0; # say()
use strict;
use warnings;
use DateTime;
my #date_elem = qw'
year month day
hour minute second
';
my #elem_names = ( #date_elem, qw'
ZZ
line_number
random_data
');
while( my $line = <> ){
chomp $line;
my %data;
#data{ #elem_names } = unpack 'A4 (A2)6 A6 A*', $line;
# choose either this:
$data{line_number} = sprintf '%06d', $.;
say #data{#elem_names};
# or this:
$data{line_number} = $.;
printf '%04d' . ('%02d'x5) . "%2s%06d%s\n", #data{ #elem_names };
# the choice will affect the contents of %data
# this just shows the contents of %data
for( #elem_names ){
printf qq'%12s: "%s"\n', $_, $data{$_};
}
# you can create a DateTime object with the date elements
my $dt = DateTime->new(
(map{ $_, $data{$_} } #date_elem),
time_zone => 'floating',
);
say $dt;
print "\n";
}
Although it would be better to use a regular expression, so that you could throw out bogus data.
use v5.14; # /a modifier
...
my $rdate = join '', map{"(\\d{$_})"} 4, (2)x5;
my $rx = qr'$rdate (ZZ) (\d{6}) (.*)'xa;
while( my $line = <> ){
chomp $line;
my %data;
unless( #data{ #elem_names } = $line =~ $rx ){
die qq'unable to parse line "$line" ($.)';
}
...
It would be better still; to use named capture groups added in 5.10.
...
my $rx = qr'
(?<year> \d{4} ) (?<month> \d{2} ) (?<day> \d{2} )
(?<hour> \d{2} ) (?<minute> \d{2} ) (?<second> \d{2} )
ZZ
(?<line_number> \d{6} )
(?<random_data> .* )
'xa;
while( my $line = <> ){
chomp $line;
unless( $line =~ $rx ){
die qq'unable to parse line "$line" ($.)';
}
my %data = %+;
# for compatibility with previous examples
$data{ZZ} = 'ZZ';
...

How to "pad" a variable-length string to have an aligned last column

I have an input of the following format:
09:08:11 XXXXXXXXXXXXX 1.1.1.1
09:09:03 YYYYYYYY 2.2.2.2
09:12:37 ZZZZ 3.3.3.3
I am able to extract these individuals fields easily using the regex /(\S+)\s+(\S+)\s+(\S+)\s+/. I named them $time, $name, and $number. My problem is I want to display this so that the $number aligns perfectly. Because the $name can be of any length, what is the best solution for this?
I would like for the output to look like this. Please note that I had to use dots to align the last field since I wasn't able use the spacebar to do this, not sure why. Anyhoo.
09:08:11 XXXXXXXXXXXXX 1.1.1.1
09:09:03 YYYYYYYY 2.2.2.2
09:12:37 ZZZZ 3.3.3.3
I thought about putting the $name into an array. And then use a function to find the one with the longest character count. Finally I would pad out the shorter name to match the longest name. Is there a better and more efficient way to do this?
For formatting, try using the sprintf function, as in
$line = sprintf "%-12s %-20s %-s\n", $time, $name, $number;
(the minus sign means left-justify)
It was given by someone here:
my($name, $telephone, $stime);
format Infos =
#<<<<<<<<<<<<<<<<<<#<<<<<<<<<<<<<<<<<<<#<<<<<<<<<<<<<<<
$name, $telephone, $stime
.
$~ = 'Infos';
($name, $telephone, $stime) = ('Name:', 'Telephone:', 'Start Time:');
write;
($name, $telephone, $stime) = ('Mike Bax', 'tel-41596', 'Fri 8/22 13:31');
write;
You can change variables names, and it should work for you.
Have a look at Perl6::Form.
This is the recommended approach over using format. See this previous Stack Overflow answer: What other languages have features and/or libraries similar to Perl's format?
Here is an example of Perl6::Form doing just what you want with your data:
use Perl6::Form;
my ($time, $name, $number);
my #data = (
[ qw(09:08:11 XXXXXXXXXXXXX 1.1.1.1) ],
[ qw(09:09:03 YYYYYYYY 2.2.2.2) ],
[ qw(09:12:37 ZZZZ 3.3.3.3) ],
);
for my $line (#data) {
push #$time, $line->[0];
push #$name, $line->[1];
push #$number, $line->[2];
}
print form
'{[[[[[[} {[[[[[[[[[[[[[[[[[[[[} {[[[[[[}',
$time, $name, $number;
NB. If your data fields ever become larger than the form fields specified then you will run into formatting issues (and same thing also applies to sprintf and format solutions).
It's easy to fix depending on what your output requirements are. For example, if you want to maintain tabular stability then...
print form { layout => 'tabular' },
'{[[[[[[} {[[[[[[[[[[[[[[[[[[[[} {[[[[[[}',
$time, $name, $number;
...will word wrap each column for you.
If you do not know the maximum width of the middle field, you will have to make two passes over the data as #ijw notes.
#!/usr/bin/perl
use strict;
use warnings;
use Fcntl qw( :seek );
my $max_length = 0;
my $data_pos = tell DATA;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my $name = (split ' ', $line)[1];
my $name_length = length $name;
$max_length = $name_length if $name_length > $max_length;
}
seek DATA, $data_pos, SEEK_SET;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my ($date, $name, $ip) = split ' ', $line;
printf "%s %-${max_length}s %s\n", $date, $name, $ip;
}
__DATA__
09:08:11 XXXXXXXXXXXXX 1.1.1.1
09:09:03 YYYYYYYY 2.2.2.2
09:12:37 ZZZZ 3.3.3.3
Whatever you do, you're going to have to do with two passes - one to find the length, one to print, using that length in the formatting. I would go with the sprintf answer already given for the formatting, personally, just varying the number in it.
Exactly how you do the two passes rather depends on where the data is coming from - reading a file twice might be wiser than preprocessing it into memory if the file is huge, but the array solution you're proposing sounds good to me.
$lineLength = length($line1);
write;
}
close FD1;
format STDOUT =
#>>>>> #*
$lineLength,$line1
.
;
So, in above example, "$lineLength" is pushed to the right with leading spaces.

How can I print a matching line and the next three lines in Perl?

I need to search for a pattern and write that line as well as the next 3 lines into a file (FILE). Is this a correct way to do it? Thanks.
print FILE if /^abc/;
$n=3 if /^abc/;
print FILE if ($n-- > 0);
I like .. operator:
perl -ne 'print if (/abc/ and $n=3) .. not $n--'
but you doesn't have described what should happen if abc pattern is repeated in following three lines. If you want restart counter, your approach is correct if fix a little bug with double print.
perl -ne'$n=4 if/abc/;print if$n-->0'
This is a feature of the command-line grep(1). No programming needed:
grep abc --after-context=3
You do get '--' lines between groups of context, but those are easy enough to strip. It's also easy enough to do the whole thing in Perl. :)
The trick is what you want to do when one of the following three lines also contains the pattern you're looking for. grep(1) will reset the counter and keep printing lines.
You could simplify it to using a flag variable to know if you should print a line:
while( <$input> ) {
$n=4 if /^abc/;
print FILE if ($n-- > 0);
}
Besides simplification, it also fixes a problem: in your version the abc string will be printed twice.
There is no need to slurp the file in or try to write your code on a single line:
#!/usr/bin/perl
use strict;
use warnings;
while ( my $line = <DATA> ) {
if ( $line =~ /^abc/ ) {
print $line;
print scalar <DATA> for 1 .. 3;
}
}
__DATA__
x
y
z
abc
1
2
3
4
5
6
Another possible solution...
#!/usr/bin/perl
use strict;
my $count = 0;
while (<DATA>) {
$count = 1 if /abc/;
if ($count >= 1 and $count <= 3) {
next if /abc/;
print;
$count++;
}
}
__DATA__
test
alpha
abc
1
234123
new_Data
test
I'd rather take a few extra lines of code and make everything more clear. Something like this should work:
my $count = 0;
while ( my $line = pop #file ) {
if ( /^abc/ ) {
$count = 4;
}
if ( $count > 0 ) {
print FILE $line;
$count--;
}
}
Edit to respond to comments:
missing the regex was a bug, fixed now.
printing the newlines or not is certainly optional, as is slurping the file in. Different people have different styles, that's one of the things that people like about Perl!

How can I extract numeric data from a text file?

I want the Perl script to extract a data from a text file and save it as another text file. Each line of the text file contains an URL to a jpg like "http://pics1.riyaj.com/thumbs/000/082/104//small.jpg". I want the script to extract the last 6 numbers of each jpg URL, (i.e 082104) to a variable. I want the variable to be added to a different location on each line of the new text.
Input text:
text http://pics1.riyaj.com/thumbs/000/082/104/small.jpg text
text http://pics1.riyaj.com/thumbs/000/569/315/small.jpg text
Output text:
text php?id=82104 text
text php?id=569315 text
Thanks
What have you tried so far?
Here's a short program that gives you the meat of the problem, and you can add the rest of it:
while( )
{
s|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|;
print;
}
This is very close to the command-line program the handles the looping and printing for you with the -p switch (see the perlrun documentation for the details):
perl -pi.old -e 's|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|' inputfile > outputfile
I didn't know whether to answer according to what you described ("last 6 digits") or just assume that it all fits the pattern you showed. So I decided to answer both ways.
Here is a method that can handle lines more diverse than your examples.
use FileHandle;
my $jpeg_RE = qr{
(.*?) # Anything, watching out for patterns ahead
\s+ # At least one space
(?> http:// ) # Once we match "http://" we're onto the next section
\S*? # Any non-space, watching out for what follows
( (?: \d+ / )* # At least one digit, followed by a slash, any number of times
\d+ # another group of digits
) # end group
\D*? # Any number of non-digits looking ahead
\.jpg # literal string '.jpg'
\s+ # At least one space
(.*) # The rest of the line
}x;
my $infile = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );
while ( my $line = <$infile> ) {
my ( $pre_text, $digits, $post_text ) = ( $line =~ m/$jpeg_RE/ );
$digits =~ s/\D//g;
$outfile->printf( "$pre_text php?id=%s $post_text\n", substr( $digits, -6 ));
}
$infile->close();
However, if it's just as regular as you show, it gets a lot easier:
use FileHandle;
my $jpeg_RE = qr{
(?> \Qhttp://pics1.riyaj.com/thumbs/\E )
\d{3}
/
( \d{3} )
/
( \d{3} )
\S*?
\.jpg
}x;
my $infile = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );
while ( my $line = <$infile> ) {
$line =~ s/$jpeg_RE/php?id=$1$2/g;
$outfile->print( $line );
}
$infile->close();