Add character to all nodes in phylogenetic tree - perl

My goal is to add a double underscore ("__") at all nodes (internal and branches) or at least to the internal nodes of a tree in Newick format.
The tree looks like this:
(EP00698_Gefionella_okellyi,(EP00699_Malawimonas_sp_californiana,EP00700_Malawimonas_jakobiformis)),((EP00004_Rigifila_ramosa,(EP00001_Collodictyon_triciliatum,EP00002_Diphylleia_rotans)),EP00003_Mantamonas_plastica)
I tried to add them with perl:
#!/usr/bin/perl
use Bio::Phylo::IO 'parse';
my $newick = '(EP00698_Gefionella_okellyi,(EP00699_Malawimonas_sp_californiana,EP00700_Malawimonas_jakobiformis)),((EP00004_Rigifila_ramosa,(EP00001_Collodictyon_triciliatum,EP00002_Diphylleia_rotans)),EP00003_Mantamonas_plastica);';
my $tree = parse( '-format' => 'newick', '-string' => $newick )->first;
$tree->visit(
sub {
my $node = shift;
$node->set_name( __ ) if $node->is_internal;
}
);
print $tree->to_newick();;
As a result, it gives me the exact same tree, without the modifications that I am trying to get.

Related

How to add a scalar value (string) to an existing hash in Perl

I just want to know what the process is behind merging a value into a hash.
I have a hash which has 5 to 6 keys depending on if the error outputs runtime values. The method that takes in the arguments also take an error message string in first, also. I want it to be able to add this error message string into the hash, to make one big hash basically.
This is how the method would be called:
ASC::Builder::Error->new("Simple error message here", code => "UNABLE_TO_PING_SWITCH_ERROR", switch_ip => $ip3, timeout => $t1);
The last two values assign runtime parameters/values to keys inside the context key in the error hash.
Here is a look at the error hash:
use constant ERROR_CODE => {
UNABLE_TO_PING_SWITCH_ERROR => {
category => 'Connection Error',
template => 'Could not ping switch %s in %s seconds.',
context => [qw(switch_ip timeout)],
tt => {template => 'disabled'},
fatal => 1,
wiki_page => 'www.error-solution.com/ERROR_CODE_A',
}
};
Here is my method to manipulate the error hash and construct the message
sub _create_error_hash {
my $error_string = shift; if(defined($params{code}) {
my $first_param = delete $params{code};
foreach my $key (#{$first_param->{context}}) {
$first_param->{$key} = $key;
}
my #template_args = map { $first_param->{$_}} #{$first_param->{context} };
$first_param->{message} = sprintf($first_param->{template}, #template_args); }
return bless $first_param;
}
sub _merge_hashes {
my ($message = {message => $messsage}, $first_param = {first_param => $first_param}) = # _;
#merge these two hashes and bless into $class (don't know how to merge hashes)
my %merged_hash = ($message, $first_param);
return bless $merged_hash, $class;
}
The output of _create_hash should be the input for _merge_hashes
Not sure if I have handled that properly. These methods will be use inside the new method (which is a mess right now) hence why it's not included.
That's just an attempt , of an example I seen on perlmonks, Here is the link:
http://www.perlmonks.org/?node_id=14263
I'm going to start with the simple explanation of how to merge a hash in perl, it's pretty simple.
use strict;
use warnings;
use Data::Printer;
my (%a, %b, %c);
%a = (a => 1, b => 2);
%b = (a => 0, c => 3, d => 4);
%c = (%a, %b);
p(%c); # { a => 0, b => 2, c => 3, d => 4 }
You'll note with the a keys that if there are duplicates whatever value appears in the second set will be the one that 'wins'.
I honestly have no idea what that second function is doing because it references variables that don't exist on practically every line. (The create one also does this but only on a couple lines).
From your description I think you only want a single key added so you don't really need to do that though, you should be able to just add the key to the original object: $error->{messsage} = $message
But assuming you did want to pass two hash references in and merge them, it would look something like this:
sub merge {
my ($first, $second) = #_;
my %merged = (%$first, %$second);
return bless \%merged, $class;
}

XML::Twig perl SVG filter with minimum parsing possible? start_tag_handler?

The filter code snippet below does something very simple.
If a top level SVG group is tagged with a label contained in a matchlist, its visibility is enforced (< g style="display:) and the group, as well as non grouped code, is copied as is to standard output (flush). Else, the group is removed from the stream (purge).
The question is this:
As I understand it, each whole group (which may be large) is parsed in turn and stored in memory, which is not necessary as only the start tag is examined and modified. What would be the code modifications to stop reading the file after the start tag and then to fly down to the matching < /g> end tag, copying the traversed code or not, without parsing anything than the nesting of tags?
I read that start_tag_handler is supposed to do that, but I couldn't manage to use flush or purge correctly.
As a bonus, return if $g->parent->gi eq "g"; which may be incorrect would no longer be necessary as interior groups would no longer be handled (before the exterior group).
Reports of any error would of course be appreciated: trifle: $indent seems to do nothing; anything wrong?
Full code and examples to play with # http://www.papou.byethost9.com/notes/svg-sieve/
svg-sieve -d all to test the inline example with debug.
svg-sieve -d ma -l Ixelles,language=ru,names=ru Brussels-municipalities.svg > Brussels-Ixelles-ru.svg
-d mi to discover the layers.
< thanks times=1000 />
André.
my $twig = new XML::Twig( twig_roots => { 'g' => \&g },
twig_print_outside_roots => 1, pretty_print => "$indent",);
sub g { my ($t, $g) = #_;
return if $g->parent->gi eq "g";
my $label=$g->att("inkscape:label");
$label or $label=$g->att("id");
if ( $label ~~ #matchlist | $label =~ /^BASE-/) {
my $style=$g->att("style");
$style =~ s/display:[^;]*(;?)/display:inline\1/;
$g->set_att( style => "$style");
$g-> flush;
} else {
$g-> purge;
}
}
Firstly, $indent won't do anything if, as a Perl var, it is not set to one of the property values Twig expects ('nice', 'indented' etc.)
Secondly, I'm not quite sure what you're aiming at. It seems like your memory concerns lie solely with the elements you don't want. If so, you may not need to use tag_handlers.
You can explicitly match the elements you want by joining your #matching on "|" in the Xpath of a user-defined handler. I would also flush $twig just after your parse line.
You also don't need this #return if $g->parent->gi eq "g"; if your XPath to <g> is absolute, e.g. '/root/parent/g'.
my $twig_handlers = {
'g[string(label) =~ /['.join('|',#matchlist).']/]' => sub { g_handler($_) },
};
my $twig = new XML::Twig(
twig_roots => { 'g' => 1 },
twig_handlers => $twig_handlers,
twig_print_outside_roots => 1, pretty_print => $indent,);
sub g_handler {
my $g = shift;
my $label=$g->att("label");
$label or $label=$g->att("id");
if ( $label ~~ #matchlist | $label =~ /^BASE-/) {
my $style=$g->att("style");
$style =~ s/display:[^;]*(;?)/display:inline$1/;
$g->set_att( style => "$style");
$g-> flush;
}
}

Creating subs on the fly from eval-ed string in perl

I need to transform data structures from a list of arrays into a tree-like one. I know the depth of the tree before I start processing the data, but I want to keep things flexible so I can re-use the code.
So I landed upon the idea of generating a subref on the fly (from within a Moose-based module) to go from array to tree. Like this (in a simplified way):
use Data::Dump qw/dump/;
sub create_tree_builder {
my $depth = shift;
return eval join '', 'sub { $_[0]->{$_[',
join(']}->{$_[', (1..$depth)),
']} = $_[', $depth + 1 , '] }';
}
my $s = create_tree_builder(5);
my $tree = {};
$s->($tree, qw/one two three four five/, 'a value');
print dump $tree;
# prints
# {
# one => { two => { three => { four => { five => "a value" } } } },
# }
This opened up worlds to me, and I'm finding cool uses for this process of eval-in a parametrically generated string into a function all over the place (clearly, a solution in search of problems).
However, it feels a little too good to be true, almost.
Any advice against this practice? Or suggestion for improvements?
I can see clearly that eval-ing arbitrary input might not be the safest thing, but what else?
Follow up
Thanks for all the answers. I used amon's code and benchmarked a bit, like this:
use Benchmark qw(:all) ;
$\ = "\n";
sub create_tree_builder {
my $depth = shift;
return eval join '', 'sub { $_[0]->{$_[',
join(']}->{$_[', (1..$depth)),
']} = $_[', $depth + 1 , '] }';
}
my $s = create_tree_builder(5);
$t = sub {
$_[0] //= {};
my ($tree, #keys) = #_;
my $value = pop #keys;
$tree = $tree->{shift #keys} //= {} while #keys > 1;
$tree->{$keys[0]} = $value;
};
cmpthese(900000, {
'eval' => sub { $s->($tree, qw/one two three four five/, 'a value') },
'build' => sub { $t->($tree, qw/one two three four five/, 'a value') },
});
The results are clearly in favour of building the tree, not of the eval'ed factory:
Rate build eval
build 326087/s -- -79%
eval 1525424/s 368% --
I'll admit I could have done that before. I'll try with more random trees (rather than assigning the same element over and over) but I see no reason that the results should be different.
Thanks a lot for the help.
It is very easy to write a generalized subroutine to build such a nested hash. It is much simpler that way than writing a factory that will produce such a subroutine for a specific number of hash levels.
use strict;
use warnings;
sub tree_assign {
# Create an empty tree if one was not given, using an alias to the original argument
$_[0] //= {};
my ($tree, #keys) = #_;
my $value = pop #keys;
$tree = $tree->{shift #keys} //= {} while #keys > 1;
$tree->{$keys[0]} = $value;
}
tree_assign(my $tree, qw/one two three four five/, 'a value');
use Data::Dump;
dd $tree;
output
{
one => { two => { three => { four => { five => "a value" } } } },
}
Why this might be a bad idea
Maintainability.
Code that is eval'd has to be eval'd inside the programmers head first– not always an easy task. Essentially, evaling is obfuscation.
Speed.
eval re-runs the perl parser and compiler, before normal execution resumes. However, the same technique can be used to gain start-up time by deferring compilation of subroutines until they are needed. This is not such a case.
There is more than one way to do it.
I like anonymous subroutines, but you don't have to use an eval to construct them. They are closures anyway. Something like
...;
return sub {
my ($tree, $keys, $value) = #_;
$#$keys >= $depth or die "need moar keys";
$tree = $tree->{$keys->[$_]} for 0 .. $depth - 1;
$tree->{$keys->[$depth]} = $value;
};
and
$s->($tree, [qw(one two three four five)], "a value");
would do something suprisingly similar. (Actually, using $depth now looks like a design error; the complete path is already specified by the keys. Therefore, creating a normal, named subroutine would probably be best.)
Understanding what the OP is doing a little better based on their comments, and riffing on Borodin's code, I'd suggest an interface change. Rather than writing a subroutine to apply a value deep in a tree, I'd write a subroutine to create an empty subtree and then work on that subtree. This allows you to work efficiently on the subtree without having to walk the tree on every operation.
package Root;
use Mouse;
has root =>
is => 'ro',
isa => 'HashRef',
default => sub { {} };
sub init_subtree {
my $self = shift;
my $tree = $self->root;
for my $key (#_) {
$tree = $tree->{$key} //= {};
}
return $tree;
}
my $root = Root->new;
my $subtree = $root->init_subtree(qw/one two three four/);
# Now you can quickly work with the subtree without having
# to walk down every time. This loop's performance is only
# dependent on the number of keys you're adding, rather than
# the number of keys TIMES the depth of the subtree.
my $val = 0;
for my $key ("a".."c") {
$subtree->{$key} = $val++;
}
use Data::Dump;
dd $root;
Data::Diver is your friend:
use Data::Diver 'DiveVal', 'DiveRef';
my $tree = {};
DiveVal( $tree, qw/one two three four five/ ) = 'a value';
# or if you hate lvalue subroutines:
${ DiveRef( $tree, qw/one two three four five/ ) } = 'a value';
use Data::Dump 'dump';
print dump $tree;

How to define multiple subsections for methods with Pod::Weaver?

I have some Moose classes that define several small groups of related methods. I would like to make these groups obvious in the package POD.
I use Dist::Zilla and Pod::Weaver with the =method command. Is it possible to insert some =head2-like commands between my =method commands to achieve the desired effect?
I wrote a post on how I did it for Redis::Client here: Falling in Love with Pod::Weaver.
The simplest thing to do is add custom Collect directives to your weaver.ini and organize your methods by giving each type a different custom POD command, like so:
[Collect / FOO METHODS]
command = foo_method
[Collect / BAR METHODS]
command = bar_method
[Collect / BAZ METHODS]
command = baz_method
Then write your POD like this
=foo_method blah blah
and Weaver will automatically collect them under their own =head1.
If you want to do something more complicated than that, you can write your own Pod::Weaver plugin. The gist is to search through the parsed POD for a custom command name and transform them by returning Pod::Elemental objects. Here's the plugin I wrote:
package Pod::Weaver::Plugin::RedisLinks;
# ABSTRACT: Add links to Redis documentation
use Moose;
with 'Pod::Weaver::Role::Transformer';
use Data::Dumper;
use Scalar::Util 'blessed';
use aliased 'Pod::Elemental::Element::Pod5::Ordinary';
sub transform_document {
my ( $self, $doc ) = #_;
my #children = $doc->children;
my #new_children;
foreach my $child( #{ $children[0] } ) {
if ( $child->can( 'command' )
&& $child->command =~ /^(?:key|str|list|hash|set|zset|conn|serv)_method/ ) {
my $meth_name = $child->content;
$meth_name =~ s/^\s*?(\S+)\s*$/$1/;
my $cmd_name = uc $meth_name;
$cmd_name =~ tr/_/ /;
my $link_name = $meth_name;
$link_name =~ tr/_/-/;
my $new_para = Ordinary->new(
content => sprintf 'Redis L<%s|%s> command.',
$cmd_name, 'http://redis.io/commands/' . $link_name );
push #new_children, $child, $new_para;
next;
}
push #new_children, $child;
}
$doc->children( \#new_children );
}
__PACKAGE__->meta->make_immutable;
1;
The transform_document method gets passed the parsed document as a parameter. It then goes through the top-level commands looking for elements labeled /^(?:key|str|list|hash|set|zset|conn|serv)_method/, munges the name a bit, and then builds a new POD paragraph containing the formatted POD content that I want.

Perl hash of hashes of hashes of hashes... is there an 'easy' way to get an element at the end of the list?

I have a Perl hash of hashes of ... around 11 or 12 elements deep. Please forgive me for not repeating the structure below!
Some of the levels have fixed labels, e.g. 'NAMES', 'AGES' or similar so accessing these levels are fine as I can use the labels directly, but I need to loop over the other variables which results in some very long statements. This is an example of half of one set of loops:
foreach my $person (sort keys %$people) {
foreach my $name (sort keys %{$people->{$person}{'NAMES'}}) {
foreach my $age (sort keys %{$people->{$person}{'NAMES'}{$name}{'AGES'}}) {
. . . # and so on until I get to the push #list,$element; part
This is just an example, but it follows the structure of the one I have. It might be shorter not to have the fixed name sections (elements in caps) but they are required for reference purposes else where.
I tried to cast the elements as hashes to shorten it at each stage,
e.g. for the second foreach I tried various forms of:
foreach my $name (sort keys %{$person->{'NAMES'}})
but this didn't work. I'm sure I've seen something similar before, so the semantics may be incorrect.
I've studied pages regarding Hash of Hashes and references to hashes and their elements and so on without luck. I've seen examples of while each loops but they don't seem to be particularly shorter or easier to implement. Maybe there is just a different method of doing this and I'm missing the point. I've written out the full set of foreach loops once and it would be great if I don't have to repeat it another six times or so.
Of course, there may be no 'easy' way, but all help appreciated!
$person is the key, to shorten things for the inner loops you need to assign the value to something:
foreach my $person_key (sort keys %$people) {
my $person = $people->{$person_key};
my $names = $person->{NAMES};
foreach my $name (sort keys %$names) {
Also you can work with each keyword. This definetly should help.
while( my ($person, $val1) = each(%$people) ) {
while( my ($name, $val2) = each(%$val1) ) {
while( my ($age, $val3) = each(%$val2) ) {
print $val3->{Somekey};
You could use Data::Walk, which is kind of File::Find for data structures.
If you want to build a somewhat more flexible solution, you could traverse the data tree recursively. Consider this example data tree (arbitrary depth):
Example data
my %people = (
memowe => {
NAMES => {
memo => {AGE => 666},
we => {AGE => 667},
},
},
bladepanthera => {
NAMES => {
blade => {AGE => 42},
panthera => {AGE => 17},
},
},
);
From your question I concluded you just want to work on the leaves (AGEs in this case). So one could write a recursive traverse subroutine that executes a given subref on all leaves it could possibly find in key-sorted depth-first order. This subref gets the leave itself and a path of hash keys for convenience:
Preparations
sub traverse (&$#) {
my ($do_it, $data, #path) = #_;
# iterate
foreach my $key (sort keys %$data) {
# handle sub-tree
if (ref($data->{$key}) eq 'HASH') {
traverse($do_it, $data->{$key}, #path, $key);
next;
}
# handle leave
$do_it->($data->{$key}, #path, $key);
}
}
I think it's pretty clear how this guy works from the inlined comments. It would be no big change to execute the coderef on all nodes and not the leaves only, if you wanted. Note that I exceptionally added a prototype here for convenience because it's pretty easy to use traverse with the well-known map or grep syntax:
Executing stuff on your data
traverse { say shift . " (#_)" } \%people;
Also note that it works on hash references and we initialized the #path with an implicit empty list.
Output:
42 (bladepanthera NAMES blade AGE)
17 (bladepanthera NAMES panthera AGE)
666 (memowe NAMES memo AGE)
667 (memowe NAMES we AGE)
The given subroutine (written as a { block }) could do anything with the given data. For example this more readable push subroutine:
my #flattened_people = ();
traverse {
my ($thing, #path) = #_;
push #flattened_people, { age => $thing, path => \#path };
} \%people;