POSIX: strtod - question

POSIX: strtod - question - perl

Can someone explain me, how strtod works resp. why I get here 10,2 despite the en_EN-locale?
#!/usr/bin/env perl
use warnings;
use 5.012;
use POSIX qw(locale_h strtod);
setlocale( LC_NUMERIC, 'en_EN.UTF-8' );
my $str = '5,6';
$! = 0;
my ( $num, $n_unparsed ) = strtod( $str );
if ( $str eq '' or $n_unparsed != 0 or $! ) {
die "Non-numeric input $str" . $! ? ": $!\n" : "\n";
}
say $num + 4.6;
# 10,2

Your ISO 3166 identifier is wrong. Try US.

Related

Perl: How to print a random section (word definition) from a dictionary file

I want to print a random new word English in dictionary file in terminal Unix by Perl. I want to select and print a random line and 2 follow lines.
But my code doesn't complete this work.
Please help me to improve it.
An example of the output I wish:
#inspire: ....
ghk
lko...
Dictionary file:
#inspiration: mean....
abc def...
ghk lmn
...
#inspire: ....
ghk
lko...
#people: ...
...
The complete dictionary file is here anhviet109K.txt. It's about 14MB
My code:
use strict;
use warnings;
use File::Copy qw(copy move);
my $files = 'anhviet109K.txt';
my $fh;
my $linewanted = 16 + int( rand( 513796 - 16 ) );
# 513796: number of lines of file dic.txt
open( $fh, "<", $files ) or die "cannot open < $fh: $!";
my $del = " {2,}";
my $temp = 0;
my $count = 0;
while ( my $line = <$fh> ) {
if ( ( $line =~ "#" ) && ( $. > $linewanted ) ) {
$count = 4;
}
else {
next;
}
if ( $count > 0 ) {
print $line;
$count--;
}
else {
last;
}
}
close $fh;

Something like this, perhaps?
Your data has helped me to exclude the header entries in your dictionary file
This program finds the location of all of the entries (lines beginning with #) in the file, then chooses one at random and prints it
Tốt học tiếng Anh may mắn
use strict;
use warnings 'all';
use Fcntl ':seek';
use constant FILE => 'anhviet109K.txt';
open my $fh, '<', FILE or die qq{Unable to open "#{[FILE]}" for input: $!};
my #seek; # Locations of all the definitions
my $addr = tell $fh;
while ( <$fh> ) {
push #seek, $addr if /^\#(?!00-)/;
$addr = tell $fh;
}
my $choice = $seek[rand #seek];
seek $fh, $choice, SEEK_SET;
print scalar <$fh>;
while ( <$fh> ) {
last if /^\#/;
print;
}
output
#finesse /fi'nes/
* danh từ
- sự khéo léo, sự phân biệt tế nhị
- mưu mẹo, mánh khoé
* động từ
- dùng mưu đoạt (cái gì); dùng mưu đẩy (ai) làm gì; dùng mưu, dùng kế
=to finesse something away+ dùng mưu đoạt cái gì

A single pass approach:
use strict;
use warnings;
use autodie;
open my $fh, '<:utf8', 'anhviet109K.txt';
my $definition = '';
my $count;
my $select;
while (my $line = <$fh>) {
if ($line =~ /^#(?!00-)/) {
++$count;
$select = rand($count) < 1;
if ($select) {
$definition = $line;
}
}
elsif ($select) {
$definition .= $line;
}
}
# remove blank line that some entries have
$definition =~ s/^\s+\z//m;
binmode STDOUT, ':utf8';
print $definition;
This iterative random selection always selects the first item, has a 1/2 chance of replacing it with the second item, a 1/3 for the third, and so on.

How to print data in column form in Perl?

I have a program that prints the contents of arrays in rows. I would like it to print each array in a column next to each other.
This is the code:
#!/usr/local/bin/perl
use strict;
use warnings;
my #M_array;
my #F_array;
open (my $input, "<", 'ssbn1898.txt');
while ( <$input> ) {
chomp;
my ( $name, $id ) = split ( /,/ );
if ( $id eq "M" ) {
push ( #M_array, $name );
}
else {
push ( #F_array, $name );
}
}
close ( $input );
print "M: #M_array \n";
print "F: #F_array \n";
Is this possible or am I trying to do something that can't be done?
Desired format:
M F
Namem1 Namef1
Namem2 Namef2

You can add whatever separator you would like between your data by using the join function, the example below formats the data in your array separated by tabs:
...
use List::MoreUtils qw/pairwise/;
my $separator = "\t";
print join($separator, qw(M F)), "\n";
print join(
"\n",
pairwise { ( $a // '') . $separator . ( $b // '') } #M_array, #F_array
), "\n";
...

I think, you should use Perl formats. Have a look at the Perl documentation. You may want to use the #* format field in your case.
I extended your code in order to print the desired output at the end
use strict;
use warnings;
my #M_array;
my #F_array;
open (my $input, "<", 'ssbn1898.txt');
while ( <$input> ) {
chomp;
my ( $name, $id ) = split ( /,/ );
if ( $id eq "M" ) {
push ( #M_array, $name );
}
else {
push ( #F_array, $name );
}
}
close ( $input );
unshift #M_array, 'M';
unshift #F_array, 'F';
my $namem;
my $namef;
my $max = 0;
$max = (length($_) gt $max ? length($_) : $max) for #M_array;
my $w = '#' . '<' x $max;
eval "
format STDOUT =
$w #*
\$namem, \$namef
.
";
while ( #M_array or #F_array) {
$namem = shift #M_array || '';
$namef = shift #F_array || '';
write;
}

join is probably the simplest approach to take tabs will align your columns nicely.
join ( "\t", #array ),
Alternatively, perl allows formatting via (s)printf:
printf ( "%-10s %-10s", "first", "second" );
Or a more detailed 'format'
Given what you're trying to do is put your two arrays into columns though:
#!/usr/local/bin/perl
use strict;
use warnings;
my $format = "%-10s\t%-10s\n";
my #M_array = qw ( M1 M2 M3 M4 M5 );
my #F_array = qw ( F1 F2 F3 );
my $maxrows = $#M_array > $#F_array ? $#M_array : $#F_array;
printf ( $format, "M", "F" );
for my $rownum ( 0..$maxrows ) {
printf ( $format, $M_array[$rownum] // '', $F_array[$rownum] // '' );
}
This will print a header row, and then loop through you arrays printing one line at a time. // is a conditional operation that tests if something is defined. It's only available in newer perls though*. In older versions || will do the trick - it's almost the same, but handles '' and 0 slightly differently.
* Perl 5.10 onward, so is pretty safe, but worth mentioning because some system are still rocking around with perl 5.8 on them.

You may format output with the sprintf function, but there are some more problems to solve: What if the arrays don't have the same count of entries? For this, you need a place-holder. How much letters must fit into a column? How should it be aligned? Some code for illustration:
#!/usr/bin/perl
use strict;
use warnings;
my #m = (1, 2, 3);
my #f = (11, 22, 33, 44);
# calculate how many rows to display
my $max = #m;
if (#m < #f) {
$max = #f;
}
# placeholder for missing data
my $none = '-';
# formatting 20 chars per column, left aligned
my $fmt = "%-20s%-20s\n";
# print header
print sprintf($fmt, "M", "F");
# print data rows
foreach my $i (0..$max-1) {
print sprintf($fmt, ($m[$i] or $none), ($f[$i] or $none));
}
If you are interested in more sophisticated formatting (for instance center-aligned text), you should switch to the special formatting capabilities Perl provides for report generation.

Borrowing from #HunterMcMillen
use strict;
use warnings;
use feature "say";
local $, = "\t"; # separator when printing list
my $i = (#F_array > #M_array) ? $#F_array : $#M_array;
say qw(M F);
say $M_array[$i] //"", $F_array[$i] //"" for 0 .. $i;

I guess Text::Table is the required module which comes with the perl distribution(just need to install).Go through the below documentation -
Documentation of Text::Table
You need to pass the content as array to the add() method and it will do the wonders for you.

Win32::Console: InputChar and codepage 65001

When I run this script in a Windows console where the active codepage is 65001 InputChar returns undef if I enter an ö (U+00F6). Does this mean that InputChar doesn't work with cp65001?
#!perl
use warnings;
use strict;
use 5.10.0;
use Devel::Peek;
use Win32::Console;
my $in = Win32::Console->new( STD_INPUT_HANDLE );
$in->Mode( ENABLE_PROCESSED_INPUT );
my $char = $in->InputChar();
Dump $char;
say "{$char}";
C:>chcp 65001
Active code page: 65001
C:>perl.pl
SV = NULL(0x0) at 0x12b6fac
REFCNT = 1
FLAGS = (PADMY)
Use of uninitialized value $char in concatenation (.) or string at ... line 21.
{}

If you look inside sub InputChar you can see it uses _ReadConsole which doesn't do unicode (i think char * isn't unicode)
It also doesn't do unicode because of the way ReadConsole function (Windows) is called, at least that is what documentation hints to me :)
update: OTOH, if I edit Win32-Console-0.10\Makefile.PL to add
DEFINE => ' -DUNICODE ',
and then recompile/reinstall Win32::Console, I can get AöBöCöDö10 into the program using the following
my $chars = ShInputChar( $in, 10 );
sub ShInputChar {
package Win32::Console;
my($self, $number) = #_;
return undef unless ref($self);
$number = 1 unless defined($number);
my $onumber = $number;
## double up or free to wrong pool, char versus wchar
$number = 2 * $number;
my $buffer = (" " x $number);
my $readed = _ReadConsole($self->{'handle'}, $buffer, $number) ;
my $err = sprintf "ErrSet \$!(%d)(%s)\n\$^E(%d)(%s)\n", $!,$!,$^E,$^E;
use Encode;
$buffer = Encode::decode('UTF-16LE', $buffer );
if ( $readed == $number or $onumber == $readed ) {
return $buffer;
}
else {
warn "wanted $number but read $readed returning buffer anyway";
return $buffer;
}
}
You should report this to the author , hes more knowledgeable about win32

I would be very cautious with libwin32 (of which Win32::Console is a part) as it was last updated over six years ago, in the early days of Windows Vista.
You may want to try Win32::Unicode::Console which has a very different API but is designed for your purpose.

How to write from n-th row to a file using perl

I have a source text in a file and looking for a code that would take the second (or n-th - in general) row from this file and print to a seperate file.
Any idea how to do this?

You can do this natively in Perl with the flip-flop operator and the special variable $. (used internally by ..), which contains the current line number:
# prints lines 3 to 8 inclusive from stdin:
while (<>)
{
print if 3 .. 8;
}
Or from the command line:
perl -wne'print if 3 .. 8' < filename.txt >> output.txt
You can also do this without Perl with: head -n3 filename.txt | tail -n1 >> output.txt

You could always:
Read all of the file in and but it into one variable.
Split the variable at the newline and store in an array
Write the value at the index 1 (for the second row) or the n-1 position to the separate file

use like this script.pl > outfile (or >> outfile for append)
this uses lexical filehandles and 3 arg open which are preferred to global filehandles and 2 arg open.
#!/usr/bin/perl
use strict;
use warnings;
use English qw( -no_match_vars );
use Carp qw( croak );
my ( $fn, $line_num ) = #ARGV;
open ( my $in_fh, '<', "$fn" ) or croak "Can't open '$fn': $OS_ERROR";
while ( my $line = <$in_fh> ) {
if ( $INPUT_LINE_NUMBER == $line_num ) {
print "$line";
}
}
note: $INPUT_LINE_NUMBER == $.
here's a slightly improved version that handles arbitrary amounts of line numbers and prints to a file.
script.pl <infile> <outfile> <num1> <num2> <num3> ...
#!/usr/bin/perl
use strict;
use warnings;
use English qw( -no_match_vars );
use Carp qw( croak );
use List::MoreUtils qw( any );
my ( $ifn, $ofn, #line_nums ) = #ARGV;
open ( my $in_fh , '<', "$ifn" ) or croak "can't open '$ifn': $OS_ERROR";
open ( my $out_fh, '>', "$ofn" ) or croak "can't open '$ofn': $OS_ERROR";
while ( my $line = <$in_fh> ) {
if ( any { $INPUT_LINE_NUMBER eq $_ } #line_nums ) {
print { $out_fh } "$line";
}
}

I think this will do what you want:
line_transfer_script.pl:
open(READFILE, "<file_to_read_from.txt");
open(WRITEFILE, ">File_to_write_to.txt");
my $line_to_print = $ARGV[0]; // you can set this to whatever you want, just pass the line you want transferred in as the first argument to the script
my $current_line_counter = 0;
while( my $current_line = <READFILE> ) {
if( $current_line_counter == $line_to_print ) {
print WRITEFILE $current_line;
}
$current_line_counter++;
}
close(WRITEFILE);
close(READFILE);
Then you'd call it like: perl line_transfer_script.pl 2 and that would write the 2nd line from file_to_read_from.txt into file_to_write_to.txt.

my $content = `tail -n +$line $input`;
open OUTPUT, ">$output" or die $!;
print OUTPUT $content;
close OUTPUT;

Find multiple substrings in strings and record location

The following is the script for finding consecutive substrings in strings.
use strict;
use warnings;
my $file="Sample.txt";
open(DAT, $file) || die("Could not open file!");
#worry about these later
#my $regexp1 = "motif1";
#my $regexp2 = "motif2";
#my $regexp3 = "motif3";
#my $regexp4 = "motif4";
my $sequence;
while (my $line = <DAT>) {
if ($line=~ /(HDWFLSFKD)/g){
{
print "its found index location: ",
pos($line), "-", pos($line)+length($1), "\n";
}
if ($line=~ /(HD)/g){
print "motif found and its locations is: \n";
pos($line), "-", pos($line)+length($1), "\n\n";
}
if ($line=~ /(K)/g){
print "motif found and its location is: \n";
pos($line), "-",pos($line)+length($1), "\n\n";
}
if ($line=~ /(DD)/g){
print "motif found and its location is: \n";
pos($line), "-", pos($line)+length($1), "\n\n";
}
}else {
$sequence .= $line;
print "came in else\n";
}
}
It matches substring1 with string and prints out position where substring1 matched. The problem lies in finding the rest of the substrings. For substrings2 it starts again from the beginning of the string (instead of starting from the position where substring1 was found). The problem is that every time it calculates position it starts from the beginning of string instead of starting from the position of the previously found substring. Since substrings are consecutive substring1, substring2, substring3, substring4, their positions have to occur after the previous respectively.

Try this perl program
use strict;
use warnings;
use feature qw'say';
my $file="Sample.txt";
open( my $dat, '<', $file) || die("Could not open file!");
my #regex = qw(
HDWFLSFKD
HD
K
DD
);
my $sequence;
while( my $line = <$dat> ){
chomp $line;
say 'Line: ', $.;
# reset the position of variable $line
# pos is an lvalue subroutine
pos $line = 0;
for my $regex ( #regex ){
$regex = quotemeta $regex;
if( scalar $line =~ / \G (.*?) ($regex) /xg ){
say $regex, ' found at location (', $-[2], '-', $+[2], ')';
if( $1 ){
say " but skipped: \"$1\" at location ($-[1]-$+[1])";
}
}else{
say 'Unable to find ', $regex;
# end loop
last;
}
}
}

I'm not a perl expert but you can use $- and $+ to track index location for last regex match found.
Below is code built on top of your code that explains this.
use strict;
use warnings;
my $file="sample.txt";
open(DAT, $file) || die("Could not open file!");
open (OUTPUTFILE, '>data.txt');
my $sequence;
my $someVar = 0;
my $sequenceNums = 1;
my $motif1 = "(HDWFLSFKD)";
my $motif2 = "(HD)";
my $motif3 = "(K)";
my $motif4 = "(DD)";
while (my $line = <DAT>)
{
$someVar = 0;
print "\nSequence $sequenceNums: $line\n";
print OUTPUTFILE "\nSequence $sequenceNums: $line\n";
if ($line=~ /$motif1/g)
{
&printStuff($sequenceNums, "motif1", $motif1, "$-[0]-$+[0]");
$someVar = 1;
}
if ($line=~ /$motif2/g and $someVar == 1)
{
&printStuff($sequenceNums, "motif2", $motif2, "$-[0]-$+[0]");
$someVar = 2;
}
if ($line=~ /$motif3/g and $someVar == 2)
{
&printStuff($sequenceNums, "motif3", $motif4, "$-[0]-$+[0]");
$someVar = 3;
}
if ($line=~ /$motif4/g and $someVar == 3)
{
&printStuff($sequenceNums, "motif4", $motif4, "$-[0]-$+[0]");
}
else
{
$sequence .= $line;
if ($someVar == 0)
{
&printWrongStuff($sequenceNums, "motif1", $motif1);
}
elsif ($someVar == 1)
{
&printWrongStuff($sequenceNums, "motif2", $motif2);
}
elsif ($someVar == 2)
{
&printWrongStuff($sequenceNums, "motif3", $motif3);
}
elsif ($someVar == 3)
{
&printWrongStuff($sequenceNums, "motif4", $motif4);
}
}
$sequenceNums++;
}
sub printStuff
{
print "Sequence: $_[0] $_[1]: $_[2] index location: $_[3] \n";
print OUTPUTFILE "Sequence: $_[0] $_[1]: $_[2] index location: $_[3]\n";
}
sub printWrongStuff
{
print "Sequence: $_[0] $_[1]: $_[2] was not found\n";
print OUTPUTFILE "Sequence: $_[0] $_[1]: $_[2] was not found\n";
}
close (OUTPUTFILE);
close (DAT);
Sample input:
MLTSHQKKFHDWFLSFKDSNNYNHDSKQNHSIKDDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSKQNHSIKDIFNRFNHYIYNDLGIRTIA
MLTSHQKKFSNNYNSKHDWFLSFKDQNHSIKDIFNRFNHYIYNDL

You really should read
perldoc perlre
perldoc perlreref
perldoc perlretut
You need the special variables #- and #+ if you need the positions. No need to try to compute them yourself.
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw( each_array );
my $source = 'AAAA BBCCC DD E FFFFF';
my $pattern = join '\s*', map { "($_+)" } qw( A B C D E F );
if ( $source =~ /$pattern/ ) {
my $it = each_array #-, #+;
$it->(); # discard overall match information;
while ( my ($start, $end) = $it->() ) {
printf "Start: %d - Length: %d\n", $start, $end - $start;
}
}
Start: 0 - Length: 4
Start: 7 - Length: 2
Start: 9 - Length: 3
Start: 15 - Length: 2
Start: 19 - Length: 1
Start: 26 - Length: 5

The result of a construct like
$line=~ /(HD)/g
is a list. Use while to step through the hits.

To match where the last match left off, use \G. perldoc perlre says (but consult your own installation's version's manual first):
The "\G" assertion can be used to
chain global matches (using "m//g"),
as described in "Regexp Quote-Like
Operators" in perlop. It is also
useful when writing "lex"-like
scanners, when you have several
patterns that you want to match
against consequent substrings of your
string, see the previous reference.
The actual location where "\G" will
match can also be influenced by using
"pos()" as an lvalue: see "pos" in
perlfunc. Note that the rule for
zero-length matches is modified
somewhat, in that contents to the left
of "\G" is not counted when
determining the length of the match.
Thus the following will not match
forever:
$str = 'ABC';
pos($str) = 1;
while (/.\G/g) {
print $&;
}