HashLists in Perl - perl

#!/usr/bin/perl -w
use strict;
my $string = $ARGV[0];
my #caracteresSeparados = split(//,$string);
my $temp;
my #complementoADN;
foreach my $i(#caracteresSeparados){
if($i eq 'a'){
$temp = 't';
push(#complementoADN,$temp);
}elsif($i eq 'c'){
$temp = 'g';
push(#complementoADN,$temp);
}elsif($i eq 'g'){
$temp = 'c';
push(#complementoADN,$temp);
}elsif($i eq 't'){
$temp = 'a';
push(#complementoADN,$temp);
}
}
printf("#complementoADN\n");
I've this code that receive by argument one string with A,C,G,T letters.
My objective with this script is to receive that string that the user just can write these letters above and then should print in console the same letters replaced, i mean
A replace by T
C replace by G
G replace by C
T replace by A
I'm not restricting user to introduce other letters, but it's no problem for now...
One Example:
user introduce argument: ACAACAATGT
Program should print: TGTTGTTACA
My script is doing it right.
My question is, can i do it with Hash Lists? If yes can you show me that script with Hashes working? Thanks a lot :)

It doesn't involve hashes, but if you're looking to simplify your program, look up the tr// ("transliteration") operator. I believe the below will be identical to yours:
#!/usr/bin/perl -w
use strict;
my $string = $ARGV[0];
my $complementoADN = $string;
$complementoADN =~ tr/ACGT/TGCA/;
print $complementoADN, "\n";

IMSoP's answer is correct - in this case, tr is the most appropriate tool for the job.
However, yes, it can also be done with a hash:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
my $original = 'ACAACAATGT';
my $replaced;
my %flip = (
A => 'T',
C => 'G',
G => 'C',
T => 'A',
);
for (split '', $original) {
$replaced .= $flip{$_};
}
say $replaced;
Output:
TGTTGTTACA

Related

How can I combine Data::Dumper and Statistics::Diversity::Shannon into a whole loop?

I want to combine this two functions together to get Shannon Diversity Index.
How can do ?
The first function is using Data::Dumper to get the unique numbers.
#!perl
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Sortkeys=1;
my #names = qw(A A A A B B B C D);
my %counts;
$counts{$_}++ for #names;
printf "\$VAR1 = { %s};\n",
join ' ',
map "$_ ",
sort { $b <=> $a }
values(%counts);
exit;
This is the output
$VAR1 = { 4 3 1 1 };
Then I can input it into the second function.
The second function is using Statistics::Diversity::Shannon to get Shannon Diversity Index.
#!perl
use warnings;
use strict;
use Statistics::Diversity::Shannon;
my #data = qw( 4 3 1 1 );
my $d = Statistics::Diversity::Shannon->new( data => \#data );
my $H = $d->index();
my $E = $d->evenness();
print "$d/$H/$E";
exit;
How can I combine this two functions into a whole loop by using the original data set (A A A A B B B C D) to get the Shannon Diversity Index.
Data::Dumper is a debugging tool, not a serializing too. Not a good one, at least.
But you aren't even using Data::Dumper. You're using something far worse.
Let's start by using something acceptable like JSON.
#!/usr/bin/perl
use strict;
use warnings;
use Cpanel::JSON::XS qw( encode_json );
{
my #names = qw( A A A A B B B C D );
my %counts; ++$counts{$_} for #names;
my #data = sort { $b <=> $a } values(%counts);
print encode_json(\#data);
}
(Note that the sort { $b <=> $a } doesn't appear required.)
And this is one way to read it back in:
#!/usr/bin/perl
use strict;
use warnings;
use Cpanel::JSON::XS qw( decode_json );
use Statistics::Diversity::Shannon qw( );
{
my $json = do { local $/; <> };
my $data = decode_json($json);
my $d = Statistics::Diversity::Shannon->new( data => $data );
my $H = $d->index();
my $E = $d->evenness();
print "$H/$E\n";
}
Above, I assumed you meant "work together" when you said "combine into whole loop".
On the other hand, maybe you meant "combine into a single file". If that's the case, then you can use the following:
#!/usr/bin/perl
use strict;
use warnings;
use Statistics::Diversity::Shannon qw( );
{
my #names = qw( A A A A B B B C D );
my %counts; ++$counts{$_} for #names;
my #data = values(%counts);
my $d = Statistics::Diversity::Shannon->new( data => \#data );
my $H = $d->index();
my $E = $d->evenness();
print "$H/$E\n";
}
Your first code snippet does not use Data::Dumper correctly. Data::Dumper mainly provides one function, Dumper, which outputs any data in a format that can be interpreted as Perl code.
# instead of printf "\$VAR1 = ...
print Dumper([values %counts]);
Since the output of Data::Dumper::Dumper is Perl code, you can read it by evaluating it as Perl code (with eval).
So if your first script writes output to a file called some.data, your second script can call
my $VAR1;
open my $fh, "<", "some.data";
eval do { local $/; <$fh> }; # read data from $fh and call eval on it
# now the data from the first script is in $VAR1
my $d = Statistics::Diversity::Shannon->new( data => $VAR1 );
...

how can I display a variable's name in perl, along with the value of the variable?

To diagnose or debug my perl code, I would like to easily display the name of a variable along with its value. In bash, one types the following:
#!/bin/bash
dog=pitbull
declare -p dog
In perl, consider the following script, junk.pl:
#!/usr/bin/perl
use strict; use warnings;
my $dog="pitbull";
my $diagnosticstring;
print STDERR "dog=$dog\n";
sub checkvariable {
foreach $diagnosticstring (#_) { print "nameofdiagnosticstring=$diagnosticstring\n"; }
}
checkvariable "$dog";
If we call this script, we obtain
bash> junk.pl
dog=pitbull
nameofdiagnosticstring=pitbull
bash>
But instead, when the subroutine checkvariable is called, I would like the following to be printed:
dog=pitbull
This would make coding easier and less error-prone, since one would not have to type the variable's name twice.
You can do something like this with PadWalker (which you'll need to install from CPAN). But it's almost certainly far more complex than you'd like it to be.
#!/usr/bin/perl
use strict;
use warnings;
use PadWalker 'peek_my';
my $dog="pitbull";
print STDERR "dog=$dog\n";
sub checkvariable {
my $h = peek_my(0);
foreach (#_) {
print '$', $_,'=', ${$h->{'$'. $_}}, "\n";
}
}
checkvariable "dog";
Data::Dumper::Names may be what you're looking for.
#! perl
use strict;
use warnings;
use Data::Dumper::Names;
my $dog = 'pitbull';
my $cat = 'lynx';
my #mice = qw(jumping brown field);
checkvariable($dog, $cat, \#mice);
sub checkvariable {
print Dumper #_;
}
1;
Output:
perl test.pl
$dog = 'pitbull';
$cat = 'lynx';
#mice = (
'jumping',
'brown',
'field'
);
(not an answer, a formatted comment)
The checkvariable sub receives only a value, and there's no (simple or reliable) way to find out what variable holds that value.
This is why Data::Dumper forces you to specify the varnames as strings:
perl -MData::Dumper -E '
my $x = 42;
my $y = "x";
say Data::Dumper->Dump([$x, $y]);
say Data::Dumper->Dump([$x, $y], [qw/x y/])
'
$VAR1 = 42;
$VAR2 = 'x';
$x = 42;
$y = 'x';
Something as following usually helps
use strict;
use warnings;
use Data::Dumper;
my $debug = 1;
my $container = 20;
my %hash = ( 'a' => 7, 'b' => 2, 'c' => 0 );
my #array = [ 1, 7, 9, 8, 21, 16, 37, 42];
debug('container',$container) if $debug;
debug('%hash', \%hash) if $debug;
debug('#array', #array) if $debug;
sub debug {
my $name = shift;
my $value = shift;
print "DEBUG: $name [ARRAY]\n", Dumper($value) if ref $value eq 'ARRAY';
print "DEBUG: $name [HASH]\n", Dumper($value) if ref $value eq 'HASH';
print "DEBUG: $name = $value\n" if ref $value eq '';
}
But why not run perl script under build-in debugger? Option -d
The Perl Debugger

Bug with parsing by Text::CSV_XS?

Tried to use Text::CSV_XS to parse some logs. However, the following code doesn't do what I expected -- split the line into pieces according to separator " ".
The funny thing is, if I remove the double quote in the string $a, then it will do splitting.
Wonder if it's a bug or I missed something. Thanks!
use Text::CSV_XS;
$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';
$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
print $e, "\n";
}
EDIT:
In the above code snippet, it I change the = (after time) to be a space, then it works fine. Started to wonder whether this is a bug after all?
$a = 'id=firewall time "2010-05-09 16:07:21 UTC"';
You have confused the module by leaving both the quote character and the escape character set to double quote ", and then left them embedded in the fields you want to split.
Disable both quote_char and escape_char, like this
use strict;
use warnings;
use Text::CSV_XS;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my $space_sep = Text::CSV_XS->new({
sep_char => ' ',
quote_char => undef,
escape_char => undef,
});
$space_sep->parse($string);
for my $field ($space_sep->fields) {
print "$field\n";
}
output
id=firewall
time="2010-05-09
16:07:21
UTC"
But note that you have achieved exactly the same things as print "$_\n" for split ' ', $string, which is to be preferred as it is both more efficient and more concise.
In addition, you must always use strict and use warnings; and never use $a or $b as variable names, both because they are used by sort and because they are meaningless and undescriptive.
Update
As #ThisSuitIsBlackNot points out, your intention is probably not to split on spaces but to extract a series of key=value pairs. If so then this method puts the values straight into a hash.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my %data = $string =~ / ([^=\s]+) \s* = \s* ( "[^"]*" | [^"\s]+ ) /xg;
use Data::Dump;
dd \%data;
output
{ id => "firewall", time => "\"2010-05-09 16:07:21 UTC\"" }
Update
This program will extract the two name=value strings and print them on separate lines.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = $string =~ / (?: "[^"]*" | \S )+ /xg;
print "$_\n" for #fields;
output
id=firewall
time="2010-05-09 16:07:21 UTC"
If you are not actually trying to parse csv data, you can get the time field by using Text::ParseWords, which is a core module in Perl 5. The benefit to using this module is that it handles quotes very well.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
my $str = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = quotewords(' ', 0, $str);
print Dumper \#fields;
my %hash = map split(/=/, $_, 2), #fields;
print Dumper \%hash;
Output:
$VAR1 = [
'id=firewall',
'time=2010-05-09 16:07:21 UTC'
];
$VAR1 = {
'time' => '2010-05-09 16:07:21 UTC',
'id' => 'firewall'
};
I also included how you can make the data more accessible by adding it to a hash. Note that hashes cannot contain duplicate keys, so you need a new hash for each new time key.

Populating Automatic Perl Variables when using Quantifiers

I was trying to match the following line
5474c2ef012a759a c11ab88ae8daa276 63693b53799c91f1 be1d8c8738733d80
with
if(/[[:xdigit:]{8}[:xdigit:]{8}\s]{4}/)
Is there anyway I populate the automatic variables $1,$2,$3..$8 etc with half of each of those words.
i.e
$1=5474c2ef
$2=012a759a
$3=c11ab88a
$4=e8daa276
$5=63693b53
$6=799c91f1
$7=be1d8c87
$8=38733d80
You could capture them in an array:
use strict;
use warnings;
use Data::Dumper;
$_ = '5474c2ef012a759a c11ab88ae8daa276 63693b53799c91f1 be1d8c8738733d80 ';
my #nums = /\G(?:([[:xdigit:]]{8})([[:xdigit:]]{8})\s)/g;
if (#nums >= 8) {
print Dumper(\#nums);
}
(may behave differently than the original if there are more than four or if there're earlier 16-hex-digit sequences separated by more than just a space).
How about:
my $pat = '([[:xdigit:]]{8})\s?' x 8;
# produces: ([[:xdigit:]]{8})\s?([[:xdigit:]]{8})\s?....
/$pat/;
Update if you need to be strict on the spacing requirement:
my $pat = join('\s', map{'([[:xdigit:]]{8})' x 2} (1..4));
# produces: ([[:xdigit:]]{8})([[:xdigit:]]{8})\s....
/$pat/;
use strict;
use warnings;
use Data::Dumper;
$_ = '5474c2ef012a759a c11ab88ae8daa276 63693b53799c91f1 be1d8c8738733d80 ';
if (/((?:[[:xdigit:]]{16}\s){4})/) {
my #nums = map { /(.{8})(.{8})/ } split /\s/, $1;
print Dumper(\#nums);
}
__END__
$VAR1 = [
'5474c2ef',
'012a759a',
'c11ab88a',
'e8daa276',
'63693b53',
'799c91f1',
'be1d8c87',
'38733d80'
];
Yes, there is, but you don’t want to.
You just want to do this:
while ( /(\p{ahex}{8})/g ) { print "got $1\n" }

Easy way to print Perl array? (with a little formatting)

Is there an easy way to print out a Perl array with commas in between each element?
Writing a for loop to do it is pretty easy but not quite elegant....if that makes sense.
Just use join():
# assuming #array is your array:
print join(", ", #array);
You can use Data::Dump:
use Data::Dump qw(dump);
my #a = (1, [2, 3], {4 => 5});
dump(#a);
Produces:
"(1, [2, 3], { 4 => 5 })"
If you're coding for the kind of clarity that would be understood by someone who is just starting out with Perl, the traditional this construct says what it means, with a high degree of clarity and legibility:
$string = join ', ', #array;
print "$string\n";
This construct is documented in perldoc -fjoin.
However, I've always liked how simple $, makes it. The special variable $" is for interpolation, and the special variable $, is for lists. Combine either one with dynamic scope-constraining 'local' to avoid having ripple effects throughout the script:
use 5.012_002;
use strict;
use warnings;
my #array = qw/ 1 2 3 4 5 /;
{
local $" = ', ';
print "#array\n"; # Interpolation.
}
OR with $,:
use feature q(say);
use strict;
use warnings;
my #array = qw/ 1 2 3 4 5 /;
{
local $, = ', ';
say #array; # List
}
The special variables $, and $" are documented in perlvar. The local keyword, and how it can be used to constrain the effects of altering a global punctuation variable's value is probably best described in perlsub.
Enjoy!
Also, you may want to try Data::Dumper. Example:
use Data::Dumper;
# simple procedural interface
print Dumper($foo, $bar);
For inspection/debugging check the Data::Printer module. It is meant to do one thing and one thing only:
display Perl variables and objects on screen, properly formatted (to
be inspected by a human)
Example usage:
use Data::Printer;
p #array; # no need to pass references
The code above might output something like this (with colors!):
[
[0] "a",
[1] "b",
[2] undef,
[3] "c",
]
You can simply print it.
#a = qw(abc def hij);
print "#a";
You will got:
abc def hij
# better than Dumper --you're ready for the WWW....
use JSON::XS;
print encode_json \#some_array
Using Data::Dumper :
use strict;
use Data::Dumper;
my $GRANTstr = 'SELECT, INSERT, UPDATE, DELETE, LOCK TABLES, EXECUTE, TRIGGER';
$GRANTstr =~ s/, /,/g;
my #GRANTs = split /,/ , $GRANTstr;
print Dumper(#GRANTs) . "===\n\n";
print Dumper(\#GRANTs) . "===\n\n";
print Data::Dumper->Dump([\#GRANTs], [qw(GRANTs)]);
Generates three different output styles:
$VAR1 = 'SELECT';
$VAR2 = 'INSERT';
$VAR3 = 'UPDATE';
$VAR4 = 'DELETE';
$VAR5 = 'LOCK TABLES';
$VAR6 = 'EXECUTE';
$VAR7 = 'TRIGGER';
===
$VAR1 = [
'SELECT',
'INSERT',
'UPDATE',
'DELETE',
'LOCK TABLES',
'EXECUTE',
'TRIGGER'
];
===
$GRANTs = [
'SELECT',
'INSERT',
'UPDATE',
'DELETE',
'LOCK TABLES',
'EXECUTE',
'TRIGGER'
];
This might not be what you're looking for, but here's something I did for an assignment:
$" = ", ";
print "#ArrayName\n";
Map can also be used, but sometimes hard to read when you have lots of things going on.
map{ print "element $_\n" } #array;
I've not tried to run below, though. I think this's a tricky way.
map{print $_;} #array;