Populating Automatic Perl Variables when using Quantifiers - perl

I was trying to match the following line
5474c2ef012a759a c11ab88ae8daa276 63693b53799c91f1 be1d8c8738733d80
with
if(/[[:xdigit:]{8}[:xdigit:]{8}\s]{4}/)
Is there anyway I populate the automatic variables $1,$2,$3..$8 etc with half of each of those words.
i.e
$1=5474c2ef
$2=012a759a
$3=c11ab88a
$4=e8daa276
$5=63693b53
$6=799c91f1
$7=be1d8c87
$8=38733d80

You could capture them in an array:
use strict;
use warnings;
use Data::Dumper;
$_ = '5474c2ef012a759a c11ab88ae8daa276 63693b53799c91f1 be1d8c8738733d80 ';
my #nums = /\G(?:([[:xdigit:]]{8})([[:xdigit:]]{8})\s)/g;
if (#nums >= 8) {
print Dumper(\#nums);
}
(may behave differently than the original if there are more than four or if there're earlier 16-hex-digit sequences separated by more than just a space).

How about:
my $pat = '([[:xdigit:]]{8})\s?' x 8;
# produces: ([[:xdigit:]]{8})\s?([[:xdigit:]]{8})\s?....
/$pat/;
Update if you need to be strict on the spacing requirement:
my $pat = join('\s', map{'([[:xdigit:]]{8})' x 2} (1..4));
# produces: ([[:xdigit:]]{8})([[:xdigit:]]{8})\s....
/$pat/;

use strict;
use warnings;
use Data::Dumper;
$_ = '5474c2ef012a759a c11ab88ae8daa276 63693b53799c91f1 be1d8c8738733d80 ';
if (/((?:[[:xdigit:]]{16}\s){4})/) {
my #nums = map { /(.{8})(.{8})/ } split /\s/, $1;
print Dumper(\#nums);
}
__END__
$VAR1 = [
'5474c2ef',
'012a759a',
'c11ab88a',
'e8daa276',
'63693b53',
'799c91f1',
'be1d8c87',
'38733d80'
];

Yes, there is, but you don’t want to.
You just want to do this:
while ( /(\p{ahex}{8})/g ) { print "got $1\n" }

Related

How to print comma "," in the middle of sentenced when necessary, Part 1, Perl

I need to print comma "," when the list more than one, and if more than one, the last list I don't want to print comma. I know I can use Join to to do this but I can't loop #NAMES with comma if there is another #FAMILIES to add in.
#!/usr/bin/perl
use strict;
use warnings;
my #NAMES = qw(ALLIES BOBBY CAKRA);
my #FAMILIES = qw(A B C);
foreach my $names (#NAMES)
{
foreach my $families (#FAMILIES)
{
print "$names, // $families\n";
}
}
Expected Outcome:
ALLIES, // A
ALLIES, // B
ALLIES, // C
BOBBY, // A
BOBBY, // B
BOBBY, // C
CAKRA, // A
CAKRA, // B
CAKRA // C
I don't see that there is an elegant and clean way since you need to drop the comma on the last element of both arrays. Then add an explicit condition, while iterating over indices so to be able to single out the last elements
use warnings;
use strict;
use feature 'say';
my #names = qw(ALLIES BOBBY CAKRA);
my #families = qw(A B C);
for my $n (0..$#names) {
for my $f (0..$#families) {
say $names[$n],
( ($n == $#names and $f == $#families) ? ' // ' : ', // '),
$families[$f];
}
}
The parenthesis in the condition of the ternary operator ( ? : ) are needed for precedence. Another way is to use && instead of and, which binds more tightly, but I didn't want the code to rely on
a specific operator.
The syntax $#ary is for the index of the last element of #ary.
Special casing the last element is always messy, there are a bunch of trade offs, you just end up choosing which one looks less bad to you.
Another option compared to #zdim's perfectly good solution.
Note that I'm going to change #names and #families during execution, more tradeoffs, copying the array is the easy fix if it is a problem.
#!/usr/bin/perl
use strict;
use warnings;
my #names = qw(ALLIES BOBBY CAKRA);
my #families = qw(A B C);
my $last_name = pop(#names);
foreach my $names (#names)
{
foreach my $families (#families)
{
print "$names, // $families\n";
}
}
my $last_family = pop(#families);
foreach my $families (#families)
{
print "$last_name, // $families\n";
}
print "$last_name // $last_family\n";
Using join would generally be the best answer, but that would only work if you want to prevent a comma at the end of the line. (At least for a straight-forward answer, I'm sure you could hack it.)
You can make use of Perl's $#array_name variables inside a for loop to check when you're at the end of both lists, like so:
#!/usr/bin/perl
use strict;
use warnings;
my #names = qw(ALLIES BOBBY CAKRA);
my #families = qw(A B C);
for my $i (0..$#names) {
for my $j (0..$#families) {
print "$names[$i]" . ($i == $#names && $j == $#families ? ' ' : ', ') . "// $families[$j]\n";
}
}
Also, a just a note on style: the Perl Style Guide (try perldoc perlstyle) recommends using all-capital variable names only when they're constants. It's not a big deal, and definitely not required, but it can make it a little easier for others to follow your code. :)
The alternative is to separate the output from the cross-product generation, and handling the last cast specially.
my #cross_product;
for my $n (0..$#names) {
for my $f (0..$#families) {
push #cross_product, [ $n, $f ];
}
}
if (#cross_product) {
say "$_->[0], // $_->[1]" for #cross_product[0..#cross_product-2];
say "$_->[0] // $_->[1]" for $cross_product[-1];
}
You can even avoid using up any memory as follows:
use Set::CrossProduct qw( );
my $i = Set::CrossProduct->new([ \#names, \#families ]);
my $N = $i->cardinality;
say sprintf '%1$s%3$s // %2$d', $i->get(), $_?',':'' for -$N+1..0;
I thought of a variation to ikegami's (storing the results in a temporary array). There would be too many changes to comfortably fit in a comment, so here:
You could store the comma to the temporary list, too, and then remove it from only the last line.
#!/usr/bin/perl
use strict;
use warnings;
my #names = qw(ALLIES BOBBY CAKRA);
my #families = qw(A B C);
my #output_lines;
foreach my $name (#names) {
foreach my $family (#families) {
push #output_lines, [$name, ',', ' // ' . $family . "\n"];
}
}
if (#output_lines) {
$output_lines[-1][1] = ''; # remove comma from last line
print map { #$_ } #output_lines;
}

How to substring a string with several position with Perl?

I have several places where I want to cut my string in several parts.
For example:
$string= "AACCAAGTAA";
#cut_places= {0,4, 8 };
My $string should look like this: AACC AAGT AA;
How can I do that?
To populate an array, use round parentheses, not curly brackets (they're used for hash references).
One possible way is to use substr where the first argument is the position, so you can use the array elements. You just need to compute the length by subtracting the position from the following one; and to be able to compute the last length, you need the length of the whole string, too:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'AACCAAGTAA';
my #cut_places = (0, 4, 8);
push #cut_places, length $string;
my #parts = map {
substr $string, $cut_places[$_], $cut_places[$_+1] - $cut_places[$_]
} 0 .. $#cut_places - 1;
say for #parts;
If the original array contained lengths instead of positions, the code would be much easier.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'AACCAAGTAA';
my #lengths = (4, 4, 2); # 4, 4, 4 would work, too
my #parts = unpack join("", map "A$_", #lengths), $string;
say for #parts;
See unpack for details.
Here's a solution that starts by calculating the forward differences in the list of positions. The length of the string is first appended to the end of the list of it doesn't already span the full string
The differences are then used to build an unpack format string, which is used to build the required sequence of substrings.
I have written the functionality as a do block, which would be simple to convert to a subroutine if desired.
use strict;
use warnings 'all';
use feature 'say';
my $string = 'AACCAAGTAA';
my #cut_places = ( 0, 4, 8 );
my #parts = do {
my #places = #cut_places;
my $len = length $string;
push #places, $len unless $places[-1] >= $len;
my #w = map { $places[$_]-$places[$_-1] } 1 .. $#places;
my $patt = join ' ', map { "A$_" } #w;
unpack $patt, $string;
};
say "#parts";
output
AACC AAGT AA
Work out the lengths of needed parts first, then all methods are easier. Here regex is used
use warnings;
use strict;
use feature 'say';
my $string = 'AACCAAGTAA';
my #pos = (0, 4, 8);
my #lens = do {
my $prev = shift #pos;
"$prev", map { my $e = $_ - $prev; $prev = $_; $e } #pos;
};
my $patt = join '', map { '(.{'.$_.'})' } #lens;
my $re = qr/$patt/;
my #parts = grep { /./ } $string =~ /$re(.*)/g;
say for #parts;
The lengths #lens are computed by subtracting the successive positions, 2-1, 3-2 (etc). I use do merely so that the #prev variable, unneeded elsewhere, doesn't "pollute" the rest of the code.
The "$prev" is quoted so that it is evaluated first, before it changes in map.
The matches returned by regex are passed through grep to filter out empty string(s) due to the 0 position (or whenever successive positions are the same).
This works for position arrays of any lengths, as long as positions are consistent with a string.

How do I create a set from a multi-line string in Perl?

I have a multiline string as input. For example: my $input="a\nb\nc\nd"
I would like to create a set from this input, so that I can determine whether elements from a vector of strings are present in the set. My question is, how do I create a set from a multi-line string in Perl?
split can be used to store lines into an array variable:
use warnings;
use strict;
use Data::Dumper;
my $input = "a\nb\nc\nd";
my #lines = split /\n/, $input;
print Dumper(\#lines);
__END__
$VAR1 = [
'a',
'b',
'c',
'd'
];
#toolic is right; split does the trick to grab the input.
But you might want to go a step further and put those values into a hash, if you want to check set membership later on. Something like this:
use warnings;
use strict;
my $input = "a\nb\nc\nd";
my #lines = split /\n/, $input;
my %set_contains;
# set a flag for each line in the set
for my $line (#lines) {
$set_contains{ $line } = 1;
}
Then you can quickly check set membership like this:
if ( $set_contains{ $my_value } ) {
do_something( $my_value );
}

perl: naming hash variables read from YAML

I'm reading some information from a YAML file
groceries.yaml
# comment
fruit:
apples: 1
oranges: 1
grapes: 1
vegetables:
potatoes: 1
onions: 1
leeks: 1
into a perl script
myscript.pl
#!/usr/bin/perl
use strict;
use warnings;
use YAML::Tiny;
# Create a YAML file
my $stuff = YAML::Tiny->new;
# Open the config
$stuff = YAML::Tiny->read( 'groceries.yaml' );
print "Fruit: ", %{($stuff->[0]->{fruit})},"\n";
print "Vegetables: ", %{($stuff->[0]->{vegetables})},"\n";
exit
This works fine, but I would like to have one hash for fruit and one for vegetables. My naive attempt was
my #keys = keys %{($stuff->[0])};
foreach my $key (#keys){
my %{ $key } = %{($stuff->[0]->{$key})},"\n";
}
but clearly this doesn't work.
I'd love to understand what I'm doing wrong, and am open to different work flows that accomplish the same idea :)
Try this :
#!/usr/bin/perl
use strict;
use warnings;
use YAML::Tiny;
use Data::Dumper;
# Create a YAML file
my $stuff = YAML::Tiny->new;
# Open the config
$stuff = YAML::Tiny->read( 'groceries.yaml' );
my %fruits = %{ $stuff->[0]->{fruit} };
my %vegetables = %{ $stuff->[0]->{vegetables} };
I don't know why you put some parentheses in your code :
%{($stuff->[0]->{$key})},"\n";
I think this is the problem.
To iterate over the HASHes,
use Data::Dumper;
# ...
foreach my $key (keys %{ $stuff->[0] }) {
print Dumper $stuff->[0]->{$key};
}
Edit2
#!/usr/bin/perl
use strict;
use warnings;
use YAML::Tiny;
use Data::Dumper;
# Create a YAML file
my $stuff = YAML::Tiny->new;
# Open the config
$stuff = YAML::Tiny->read( 'groceries.yaml' );
my %top_h;
foreach my $key (keys %{ $stuff->[0] }) {
$top_h{$key} = $stuff->[0]->{$key};
}
print Dumper \%top_h;
This solution enables you to access %fruit and %vegetables. They are declared as package global variables using our so that they will be in the symbol table, which then allows you to do use symbolic references or glob assignments. You'll also need to turn off strict refs to enable this. Also see this reference.
use strict;
use warnings;
use YAML::Tiny;
use Data::Dumper;
my $stuff = YAML::Tiny->read('groceries.yml');
my %groceries = %{$stuff->[0]};
our %fruit;
our %vegetables;
{
no strict 'refs';
#no strict 'vars'; # don't need above 'our' declarations with this
while (my ($key, $val) = each %groceries) {
%$key = %$val;
# or *$key = $val;
}
}
print Dumper \%fruit;
If you don't know the keys upfront, then you'll also need to turn off strict vars so you don't need to declare the hashes before assigning to them. But then you might get a warning when you use the hash directly.
But having said all of that, I think it would be simplest to just use %groceries.
my ( $fruit, $vegetables) = #{$stuff->[0]}{ qw<fruit vegetables> };
If you want to do this in a loop, first, I would save the first "document" to a local reference.
my $yaml = $stuff->[0];
And then in a while loop, do this:
while ( my ( $k, $v ) = each %$yaml ) {
say ucfirst( $k ) . ': ' . %$v;
}
You could also use List::Pairwise and do this:
mapp { say ucfirst( $a ) . ': ' . %$b } %{ $stuff->[0] };

How can add values in each row and column and print at the end in Perl?

Below is the sample csv file
date,type1,type2,.....
2009-07-01,n1,n2,.....
2009-07-02,n21,n22,....
and so on...
I want to add the values in each row and each column and print at the end and bottom of each line. i.e.
date,type1,type2
2009-07-01,n1,n2,.....row_total1
2009-07-02,n21,n22,....row_total2
Total,col_total1,col_total1,......total
Please suggest.
Less elegant and shorter:
$ perl -plaF, -e '$r=0;$r+=$F[$_],$c[$_]+=$F[$_]for 1..$#F;$_.=",$r";END{$c[0]="Total";print join",",#c}'
Quick and dirty, but should do the trick in basic cases. For anything more complex, use Text::CSV and an actual script.
An expanded version as it's getting a little hairy:
#! perl -plaF,
$r=0;
$r+=$F[$_], $c[$_]+=$F[$_] for 1..$#F;
$_.=",$r";
END { $c[0]="Total"; print join ",", #c }'
Here is a straightforward way which you can easily build upon depending on your requirements:
use strict;
use warnings;
use 5.010;
use List::Util qw(sum);
use List::MoreUtils qw(pairwise);
use Text::ParseWords;
our ($a, $b);
my #header = parse_csv( scalar <DATA> );
my #total = (0) x #header;
output_csv( #header, 'row_total' );
for my $line (<DATA>) {
my #cols = parse_csv( $line );
my $label = shift #cols;
push #cols, sum #cols;
output_csv( $label, #cols );
#total = pairwise { $a + $b } #total, #cols;
}
output_csv( 'Total', #total );
sub parse_csv {
chomp( my $data = shift );
quotewords ',', 0, $data;
}
sub output_csv { say join ',' => #_ }
__DATA__
date,type1,type2
2009-07-01,1,2
2009-07-02,21,22
Outputs the expected:
date,type1,type2,row_total
2009-07-01,1,2,3
2009-07-02,21,22,43
Total,22,24,46
Some things to take away from above is the use of List::Util and List::MoreUtils:
# using List::Util::sum
my $sum_of_all_values_in_list = sum #list;
# using List::MoreUtils::pairwise
my #two_arrays_added_together = pairwise { $a + $b } #array1, #array2;
Also while I've used Text::ParseWords in my example you should really look into using Text::CSV. This modules covers more bizarre CSV edge cases and also provides correct CSV composition (my output_csv() sub is pretty naive!).
/I3az/
Like JB's perlgolf candidate, except prints the end line totals and labels.
#!/usr/bin/perl -alnF,
use List::Util qw(sum);
chomp;
push #F, $. == 1 ? "total" : sum(#F[1..$#F]);
print "$_,$F[-1]";
for (my $i=1;$i<#F;$i++) {
$totals[$i] += $F[$i];
}
END {
$totals[0] = "Total";
print join(",",#totals);
};
Is this something that needs to be done for sure in a Perl script? There is no "quick and dirty" method to do this in Perl. You will need to read the file in, accumulate your totals, and write the file back out (processing input and output line by line would be the cleanest).
If this is a one-time report, or you are working with a competent user base, the data you want can most easily be produced with a spreadsheet program like Excel.
Whenever I work with CSV, I use the AnyData module. It may add a bit of overhead, but it keeps me from making mistakes ("Oh crap, that date column is quoted and has commas in it!?").
The process for you would look something like this:
use AnyData;
my #columns = qw/date type1 type2 type3/; ## Define your input columns.
my $input = adTie( 'CSV', 'input_file.csv', 'r', {col_names => join(',', #columns)} );
push #columns, 'total'; ## Add the total columns.
my $output = adTie( 'CSV', 'output_file.csv', 'o', {col_names => join(',', #columns)} );
my %totals;
while ( my $row = each %$input ) {
next if ($. == 1); ## Skip the header row. AnyData will add it to the output.
my $sum = 0;
foreach my $col (#columns[1..3]) {
$totals{$col} += $row->{$col};
$sum += $row->{$col};
}
$totals{total} += $sum;
$row->{total} = $sum;
$output->{$row->{date}} = $row;
}
$output->{Total} = \%totals;
print adDump( $output ); ## Prints a little table to see the data. Not required.
undef $input; ## Close the file.
undef $output;
Input:
date,type1,type2,type3
2009-07-01,1,2,3
2009-07-03,31,32,33
2009-07-06,61,62,63
"Dec 31, 1969",81,82,83
Output:
date,type1,type2,type3,total
2009-07-01,1,2,3,6
2009-07-03,31,32,33,96
2009-07-06,61,62,63,186
"Dec 31, 1969",81,82,83,246
Total,174,178,182,534
The following in Perl does what you want, its not elegant but it works :-)
Call the script with the inputfile as argument, results in stdout.
chop($_ = <>);
print "$_,Total\n";
while (<>) {
chop;
split(/,/);
shift(#_);
$sum = 0;
for ($n = 0; 0 < scalar(#_); $n++) {
$c = shift(#_);
$sum += $c;
$sums[$n] += $c;
}
$total += $sum;
print "$_,$sum\n";
}
print "Total";
for ($n = 0; $n <= $#sums; $n++) {
print "," . $sums[$n];
}
print ",$total\n";
Edit: fixed for 0 values.
The output is like this:
date,type1,type2,type3,Total
2009-07-01,1, 2, 3,6
2009-07-02,4, 5, 6,15
Total,5,7,9,21