Why does escaping of # disappear when read from an array? - perl

I have a string,
my $element="abc#$def"
I escape # using,
$element=~s/#/\\#/g;
It is printed as: abc\#$def, which is perfect.
Next part of the code is:
push(#arr,$element);
foreach $val (#arr)
{
print $val;
}
And the value printed within the foreach loop is: abc#$def.
Why is # not escaped here? And how can I retain the escaping?

You're not quite showing us everything. To get your claimed result, I had to create the variable $def initialized as shown below. But, when I do that, I get the result you expect, not the result you show.
$ cat xx.pl
use strict;
use warnings;
my $def = '$def';
my $element = "abc#$def";
$element =~ s/#/\\#/g;
print "$element\n";
my #arr;
push(#arr, $element);
foreach my $val (#arr)
{
print $val;
print "\n";
}
$ perl xx.pl
abc\#$def
abc\#$def
$
This was tested with Perl 5.14.1 on MacOS X 10.6.8, but I don't think the behaviour would vary with any other version of Perl 5.
Given this, can you update your question to show a script similar to mine (in particular, with both use strict; and use warnings;) but which produces the result you show?

With something like this:
$element=~s/#/\\#/g;
You have to escape the \
Edit
this code works on my machine as you expect:
use strict;
use warnings;
use Data::Dumper;
my $element='abc#$def';
my #arr;
$element=~s/#/\\#/g;
print $element."\n";
push(#arr,$element);
foreach my $val (#arr)
{
print $val;
}

Related

How to remove array's newlines and add an element at the beginning of it in Perl?

First of I have to apologize for editing my initial post. But after I provide my code I did the question fuzzy.
So, I have this an array (#start_cod) containing lines separated by /n as follows:
print #start_cod;
tatatattataattatatttat
cacacacaacaccacaac
aaaaaaaaaaaaaaa
I need to remove the newlines and add ">text" ONLY at the beginning of the array as follow:
>text
tatatattataattatatttatcacacacaacaccacaacaaaaaaaaaaaaaaa
I tried:
s/\s+\z// for #start_cod;
print ">text#start_cod";
I tried also with chomp
chomp #start_cod;
print ">text#start_cod";
and
my #start_cod = split("\n",$start_cod);
$start_cod = join("",#start_cod);
print ">text$start_cod";
but I get
aaaaaaaaaaaaaaaaaaa>textcacacacacaacaccacaac>textaattatatattataattatatttat
Any suggestions on how to handle this in Perl Programming?
Here is my code which works 100%.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my %alliloux =();
$/="\n>";
while (<>) {
s/>//g;
my ($onoma, #seq) = split (/\n/, $_);
my ($sp, $head) = split (/\./, $onoma);
push #{ $alliloux{$sp} }, join "\n", ">$onoma", #seq;
}
foreach my $sp (keys %alliloux) {
chomp $sp;
my ($head, $dna) = split(/\t/, $sp);
my #start_cod = substr($dna, 3);
say #start_cod;
Input file:
>name aaaaaaaaaaaaaaaaaa
>name2 acacacacacaacaccacaac
>namex aattatatattataattatatttat
output after Perl run
tatatattataattatatttat
cacacacaacaccacaac
aaaaaaaaaaaaaaa
Desired output:
>text
tatatattataattatatttatcacacacaacaccacaacaaaaaaaaaaaaaaa
If I understand your question correctly, this should do what you want:
use strict;
use warnings;
my #start_cod = (
'aaaaaaaaaaaaaaaaaa',
'acacacacacaacaccacaac',
'aattatatattataattatatttat',
);
print ">text\n", #start_cod, "\n";
The print first prints ">text" and a newline once, then you get the #start_cod items on a line, and the last "\n" makes sure you have a newline after the last element.
Output:
>text
aaaaaaaaaaaaaaaaaaacacacacacaacaccacaacaattatatattataattatatttat
You might want to see Read FASTA into Hash. It's the same problem and very close to the code I wrote before I read it. Also, there are modules on CPAN that can handle FASTA.
I think you want to combine the sequences that start with the same name, disregarding the numbers. The sequences shouldn't have interior whitespace. In your code, you are constantly adding whitespace. You even join on a newline. So, you go to the doctor and say "My arm hurts when I do this", and the doctor says "So don't do that". :)
When you run into these sort of problems, check the results of your operations at each step to see if you get what you expect. Here's a much simplified version of a program that I think does what you want. I've removed most of the data structure because they are complicating your process.
In short, read a line and remove the newline at the end. That's one source of your newlines. Then, extract the sequence and concatenate that to the previous sequence. When you join with newlines, you are adding newlines. So, don't do that:
use v5.14;
use warnings;
use Data::Dumper;
my %alliloux = ();
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
# now split on whitespace, but only up to two parts.
# There's no array here.
my( $name, $seq ) = split /\s+/, $_, 2;
# remove the numbers at the end to get the prefix of the
# name.
my $prefix = $name =~ s/\d+\z//r;
# append the current sequence for this prefix to what we
# have already seen.f
$alliloux{$prefix} .= $seq;
}
say Dumper( \%alliloux );
foreach my $base ( keys %alliloux ) {
say ">text $alliloux{$base}";
}
__DATA__
>name aaa
>name2 cccc
>name99 aattaatt
You don't need the intermediate array. You can build up your string as you go. You don't need to have all the parts before you do that.
Now, to figure out where you might be going wrong, do a little at once. Ensure that you've extracted the right thing. It's handle to put characters around the variables you interpolate so you can see whitespace at the beginning or end:
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
my( $name, $seq ) = split /\s+/, $_, 2;
say "Name: <$name>";
say "Seq: <$seq>"
}
Then, add another step, and ensure that works:
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
my( $name, $seq ) = split /\s+/, $_, 2;
say "Name: <$name>";
say "Seq: <$seq>"
my $prefix = $name =~ s/\d+\z//r;
say "Prefix: <$prefix>";
}
Repeat this process for each step. Then, when you come with a question, you've pinpointed the point where things diverge. Here's the same technique in your program:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
while (<DATA>) {
s/>//g;
my ($onoma, #seq) = split (/\n/, $_);
say "Onoma: <$onoma>";
}
__DATA__
>name aaa
>name2 cccc
>name99 aattaatt
The output shows that you never had anything in #seq. You are splitting on a newline, but unless you've changed the default line ending, you'll only get a newline at the end:
Onoma: <name aaa>
Onoma: <name2 cccc>
Onoma: <name99 aattaatt>
Now there's nothing in #seq, so a line like join "\n", ">$onoma", #seq; is really just join "\n", ">$onoma". You could have seen that with a little checking.
The description lacks clarity of the problem.
By looking at the desired output the following code comes to mind. Please see if it does what you was looking for.
Even looking at your code it is not clear what you try to do -- some part of the code does not make much sense.
use strict;
use warnings;
use feature 'say';
my #start_cod;
while( <DATA> ) {
chomp;
next unless />\s?name.?\s+(.*)/;
push #start_cod, $1;
}
print ">text\n " . join('',#start_cod);
__DATA__
>name aaaaaaaaaaaaaaaaaa
>name2 acacacacacaacaccacaac
.
.
.
> namex aattatatattataattatatttat

Perl: Printing out the file where a word occurs

I am trying to write a small program that takes from command line file(s) and prints out the number of occurrence of a word from all files and in which file it occurs. The first part, finding the number of occurrence of a word, seems to work well.
However, I am struggling with the second part, namely, finding in which file (i.e. file name) the word occurs. I am thinking of using an array that stores the word but don’t know if this is the best way, or what is the best way.
This is the code I have so far and seems to work well for the part that counts the number of times a word occurs in given file(s):
use strict;
use warnings;
my %count;
while (<>) {
my $casefoldstr = lc $_;
foreach my $str ($casefoldstr =~ /\w+/g) {
$count{$str}++;
}
}
foreach my $str (sort keys %count) {
printf "$str $count{$str}:\n";
}
The filename is accessible through $ARGV.
You can use this to build a nested hash with the filename and word as keys:
use strict;
use warnings;
use List::Util 'sum';
while (<>) {
$count{$word}{$ARGV}++ for map +lc, /\w+/g;
}
foreach my $word ( keys %count ) {
my #files = keys %$word; # All files containing lc $word
print "Total word count for '$word': ", sum( #{ $count{$word} }{#files} ), "\n";
for my $file ( #files ) {
print "$count{$word}{$file} counts of '$word' detected in '$file'\n";
}
}
Using an array seems reasonable, if you don't visit any file more than once - then you can always just check the last value stored in the array. Otherwise, use a hash.
#!/usr/bin/perl
use warnings;
use strict;
my %count;
my %in_file;
while (<>) {
my $casefoldstr = lc;
for my $str ($casefoldstr =~ /\w+/g) {
++$count{$str};
push #{ $in_file{$str} }, $ARGV
unless ref $in_file{$str} && $in_file{$str}[-1] eq $ARGV;
}
}
foreach my $str (sort keys %count) {
printf "$str $count{$str}: #{ $in_file{$str} }\n";
}

Split, insert and join

Here's I want to archive. I want to split a one-liner comma-separated and insert #domain.com then join it back as comma-separated.
The one-liner contains something like:
username1,username2,username3
and I want to be something like:
username1#domain.com,username2#domain.com,username3#domain.com
So my Perl script that I tried which doesn't not work properly:
my $var ='username1,username2,username3';
my #tkens = split /,/, $var;
my #user;
foreach my $tken (#tkens) {
push (#user, "$tken\#domain.com");
}
my $to = join(',',#user);
Is there any shortcut on this in Perl and please post sample please. Thanks
Split, transform, stitch:
my $var ='username1,username2,username3';
print join ",", map { "$_\#domain.com" } split(",", $var);
# ==> username1#domain.com,username2#domain.com,username3#domain.com
You could also use a regular expression substitution:
#!/usr/bin/perl
use strict;
use warnings;
my $var = "username1,username2,username3";
# Replace every comma (and the end of the string) with a comma and #domain.com
$var =~ s/$|,/\#domain.com,/g;
# Remove extra comma after last item
chop $var;
print "$var\n";
You already have good answers. Here I am just telling why your script is not working. I didn't see any print or say line in your code, so not sure how you are trying to print something. No need of last line in your program. You can simply suffix #domain.com with each value, push to an array and print it with join.
#!/usr/bin/perl
use strict;
use warnings;
my $var = 'username1,username2,username3';
my #tkens = split ',', $var;
my #user;
foreach my $tken (#tkens)
{
push #user, $tken."\#domain.com"; # `.` after `$tken` for concatenation
}
print join(',', #user), "\n"
Output:
username1#domain.com,username2#domain.com,username3#domain.com

Perl looping through a hash produces strange value

I'm using Perl to parse the output from objdump. I have the following code:
#!/usr/bin/perl
%count = {};
while (<>) {
if (/^\s+[[:xdigit:]]+:\s+[[:xdigit:]]+\s+([a-z]+).+$/) {
++$count{"$1"};
}
}
while (($key, $val) = each %count) {
print "$key $val\n";
}
In the resulting output, most parts are okay like this:
strhib 2
strcc 167
stmlsda 4
swivc 21
ldmlsia 4
But there is one strange line:
HASH(0x8ae2158)
What's going on here? I expect $1 to be a string, and ++$count{"$1"} should be perfectly fine.
Thank you.
So the correct code should be:
#!/usr/bin/perl
use strict;
my %count;
while (<>) {
if (/^\s+[[:xdigit:]]+:\s+[[:xdigit:]]+\s+([a-z]+).+$/) {
++$count{"$1"};
}
}
while (my ($key, $val) = each %count) {
print "$key $val\n";
}
If you had use warnings; you would have seen: "Reference found where even-sized list expected". Instead of
%count = {};
you should have said
my %count;
What you wrote was equivalent to this:
%count = ({} => undef);
That is, you initialized your hash with an empty hashref as a key with no associated value. The hashref stringified to "HASH(0x8ae2158)" (the number may change). To clear out a hash, you use parens () not braces {}. Braces construct a hash reference.
Even short programs like this should start with:
use strict;
use warnings;
The bugs you catch will be your own. :-)
The warnings pragma is preferred to the -w switch because it acts lexically. See What's wrong with -w and $^W in perllexwarn.

How can I make Perl functions that use $_ by default?

I have an array and a simple function that trims white spaces:
my #ar=("bla ", "ha 1")
sub trim { my $a = shift; $a =~ s/\s+$//; $a}
Now, I want to apply this to an array with the map function. Why can't I do this by just giving the function name like one would do with built-in functions?
For example, you can do
print map(length, #ar)
But you can't do
print map(trim, #ar)
You have to do something like:
print map {trim($_)} #ar
print map(trim($_), #ar)
If you are using 5.10 or later, you can specify _ as the prototype for trim. If you are using earlier versions, use Axeman's answer:
As the last character of a prototype, or just before a semicolon, you can use _ in place of $ : if this argument is not provided, $_ will be used instead.
use strict; use warnings;
my #x = ("bla ", "ha 1");
sub trim(_) { my ($x) = #_; $x =~ s!\s+$!!; $x }
print map trim, #x;
Incidentally, don't use $a and $b outside of a sort comparator: They are immune from strict checking.
However, I prefer not to use prototypes for functions I write mainly because their use makes it harder to mentally parse the code. So, I would prefer using:
map trim($_), #x;
See also perldoc perlsub:
This is all very powerful, of course, and should be used only in moderation to make the world a better place.
The prototype that Sinan talks about is the best current way. But for earlier versions, there is still the old standby:
sub trim {
# v-- Here's the quick way to do it.
my $str = #_ ? $_[0] : $_;
# That was it.
$str =~ s/^\s+|\s+$//;
return $str;
}
Of course, I have a trim function with more features and handles more arguments and list context, but it doesn't demonstrate the concept as well. The ternary expression is a quick way to do what the '_' prototype character now does.
My favorite way to optionally use $_ without needing 5.10+ is as follows:
sub trim {
my ($s) = (#_, $_);
$s =~ s/\s+$//;
$s
}
This assigns the first element of #_ to $s if there is one. Otherwise it uses $_.
Many Perl built-in functions operate on $_ if given no arguments.
If your function did the same, it would work:
my #ar = ("bla ", "ha 1");
sub trim { my $s = #_ ? $_[0] : $_; $s =~ s/\s+$//; $s}
print map(trim, #ar), "\n";
And yes, Perl is kind of gross.