How to get the last item of a split in Perl? - perl

$k="1.3.6.1.4.1.1588.2.1.1.1.6.2.1.37.32";
#a= split('\.',$k);
print #a[-1]; # WORKS!
print (split '\.',$k)[-1]; # Fails: not proper syntax.`
I'd like to print the last element of a split without having to use an intermediary variable. Is there a way to do this? I'm using Perl 5.14.

Perl is attributing the open parenthesis( to the print function. The syntax error comes from that the print() cannot be followed by [-1]. Even if there is whitespace between print and (). You need to prefix the parenthesis with a + sign to force list context if you do not want to add parens to your print.
print +(split'\.', $k)[-1];
If you are not using your syntax as the parameter to something that expects to have parens, it will also work the way you tried.
my $foo = (split '\.', $k)[-1];
print $foo;

Instead of creating a complete list and slicing it to get the last element, you could use a regex capture:
use strict;
use warnings;
my $k = "1.3.6.1.4.1.1588.2.1.1.1.6.2.1.37.32";
my ($last) = $k =~ /(\d+)$/;
print $last;
Output:
32

rindex() split last position while index() split from first position found
print substr( $k, rindex($k, '.')+1 );

Related

How to remove array's newlines and add an element at the beginning of it in Perl?

First of I have to apologize for editing my initial post. But after I provide my code I did the question fuzzy.
So, I have this an array (#start_cod) containing lines separated by /n as follows:
print #start_cod;
tatatattataattatatttat
cacacacaacaccacaac
aaaaaaaaaaaaaaa
I need to remove the newlines and add ">text" ONLY at the beginning of the array as follow:
>text
tatatattataattatatttatcacacacaacaccacaacaaaaaaaaaaaaaaa
I tried:
s/\s+\z// for #start_cod;
print ">text#start_cod";
I tried also with chomp
chomp #start_cod;
print ">text#start_cod";
and
my #start_cod = split("\n",$start_cod);
$start_cod = join("",#start_cod);
print ">text$start_cod";
but I get
aaaaaaaaaaaaaaaaaaa>textcacacacacaacaccacaac>textaattatatattataattatatttat
Any suggestions on how to handle this in Perl Programming?
Here is my code which works 100%.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my %alliloux =();
$/="\n>";
while (<>) {
s/>//g;
my ($onoma, #seq) = split (/\n/, $_);
my ($sp, $head) = split (/\./, $onoma);
push #{ $alliloux{$sp} }, join "\n", ">$onoma", #seq;
}
foreach my $sp (keys %alliloux) {
chomp $sp;
my ($head, $dna) = split(/\t/, $sp);
my #start_cod = substr($dna, 3);
say #start_cod;
Input file:
>name aaaaaaaaaaaaaaaaaa
>name2 acacacacacaacaccacaac
>namex aattatatattataattatatttat
output after Perl run
tatatattataattatatttat
cacacacaacaccacaac
aaaaaaaaaaaaaaa
Desired output:
>text
tatatattataattatatttatcacacacaacaccacaacaaaaaaaaaaaaaaa
If I understand your question correctly, this should do what you want:
use strict;
use warnings;
my #start_cod = (
'aaaaaaaaaaaaaaaaaa',
'acacacacacaacaccacaac',
'aattatatattataattatatttat',
);
print ">text\n", #start_cod, "\n";
The print first prints ">text" and a newline once, then you get the #start_cod items on a line, and the last "\n" makes sure you have a newline after the last element.
Output:
>text
aaaaaaaaaaaaaaaaaaacacacacacaacaccacaacaattatatattataattatatttat
You might want to see Read FASTA into Hash. It's the same problem and very close to the code I wrote before I read it. Also, there are modules on CPAN that can handle FASTA.
I think you want to combine the sequences that start with the same name, disregarding the numbers. The sequences shouldn't have interior whitespace. In your code, you are constantly adding whitespace. You even join on a newline. So, you go to the doctor and say "My arm hurts when I do this", and the doctor says "So don't do that". :)
When you run into these sort of problems, check the results of your operations at each step to see if you get what you expect. Here's a much simplified version of a program that I think does what you want. I've removed most of the data structure because they are complicating your process.
In short, read a line and remove the newline at the end. That's one source of your newlines. Then, extract the sequence and concatenate that to the previous sequence. When you join with newlines, you are adding newlines. So, don't do that:
use v5.14;
use warnings;
use Data::Dumper;
my %alliloux = ();
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
# now split on whitespace, but only up to two parts.
# There's no array here.
my( $name, $seq ) = split /\s+/, $_, 2;
# remove the numbers at the end to get the prefix of the
# name.
my $prefix = $name =~ s/\d+\z//r;
# append the current sequence for this prefix to what we
# have already seen.f
$alliloux{$prefix} .= $seq;
}
say Dumper( \%alliloux );
foreach my $base ( keys %alliloux ) {
say ">text $alliloux{$base}";
}
__DATA__
>name aaa
>name2 cccc
>name99 aattaatt
You don't need the intermediate array. You can build up your string as you go. You don't need to have all the parts before you do that.
Now, to figure out where you might be going wrong, do a little at once. Ensure that you've extracted the right thing. It's handle to put characters around the variables you interpolate so you can see whitespace at the beginning or end:
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
my( $name, $seq ) = split /\s+/, $_, 2;
say "Name: <$name>";
say "Seq: <$seq>"
}
Then, add another step, and ensure that works:
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
my( $name, $seq ) = split /\s+/, $_, 2;
say "Name: <$name>";
say "Seq: <$seq>"
my $prefix = $name =~ s/\d+\z//r;
say "Prefix: <$prefix>";
}
Repeat this process for each step. Then, when you come with a question, you've pinpointed the point where things diverge. Here's the same technique in your program:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
while (<DATA>) {
s/>//g;
my ($onoma, #seq) = split (/\n/, $_);
say "Onoma: <$onoma>";
}
__DATA__
>name aaa
>name2 cccc
>name99 aattaatt
The output shows that you never had anything in #seq. You are splitting on a newline, but unless you've changed the default line ending, you'll only get a newline at the end:
Onoma: <name aaa>
Onoma: <name2 cccc>
Onoma: <name99 aattaatt>
Now there's nothing in #seq, so a line like join "\n", ">$onoma", #seq; is really just join "\n", ">$onoma". You could have seen that with a little checking.
The description lacks clarity of the problem.
By looking at the desired output the following code comes to mind. Please see if it does what you was looking for.
Even looking at your code it is not clear what you try to do -- some part of the code does not make much sense.
use strict;
use warnings;
use feature 'say';
my #start_cod;
while( <DATA> ) {
chomp;
next unless />\s?name.?\s+(.*)/;
push #start_cod, $1;
}
print ">text\n " . join('',#start_cod);
__DATA__
>name aaaaaaaaaaaaaaaaaa
>name2 acacacacacaacaccacaac
.
.
.
> namex aattatatattataattatatttat

Perl `split` does not `split` to default array

I have this strange problem with split in that it does not by default split into the default array.
Below is some toy code.
#!/usr/bin/perl
$A="A:B:C:D";
split (":",$A);
print $_[0];
This does not print anything. However if I explicitly split into the default array like
#!/usr/bin/perl
$A="A:B:C:D";
#_=split (":",$A);
print $_[0];
It's correctly prints A. My perl version is v5.22.1.
split does not go to #_ by default. #_ does not work like $_. It's only for arguments to a function. perlvar says:
Within a subroutine the array #_ contains the parameters passed to that subroutine. Inside a subroutine, #_ is the default array for the array operators pop and shift.
If you run your program with use strict and use warnings you'll see
Useless use of split in void context at
However, split does use $_ as its second argument (the string that is split up) if nothing is supplied. But you must always use the return value for something.
You have to assign the split to an array:
use strict;
use warnings;
my $string = "A:B:C:D";
my #array = split(/:/, $string);
print $array[0] . "\n";

Not an ARRAY reference error in "pop($str)"

I am learning Perl for work and I'm trying to practise with some basic programs.
I want my program to take a string from STDIN and modify it by taking the last character and putting it at the start of the string.
I get an error when I use variable $str in $str = <STDIN>.
Here is my code:
my $str = "\0";
$str = <STDIN>;
sub last_to_first {
chomp($str);
pop($str);
print $str;
}
last_to_first;
Exec :
Matrix :hi
Not an ARRAY reference at matrix.pl line 13, <STDIN> line 1.
Why your approach doesn't work
The pop keyword does not work on strings. Strings in Perl are not automatically cast to character arrays, and those array keywords only work on arrays.
The error message is Not an ARRAY reference because pop sees a scalar variable. References are scalars in Perl (the scalar here is something like a reference to the address of the actual array in memory). The pop built-in takes array references in Perl versions between 5.14 and 5.22. It was experimental, but got removed in the (currently latest) 5.24.
Starting with Perl 5.14, an experimental feature allowed pop to take a scalar expression. This experiment has been deemed unsuccessful, and was removed as of Perl 5.24.
How to make it work
You have to split and join your string first.
my $str = 'foo';
# turn it into an array
my #chars = split //, $str;
# remove the last char and put it at the front
unshift #chars, pop #chars;
# turn it back into a string
$str = join '', #chars;
print $str;
That will give you ofo.
Now to use that as a sub, you should pass a parameter. Otherwise you do not need a subroutine.
sub last_to_first {
my $str = shift;
my #chars = split //, $str;
unshift #chars, pop #chars;
$str = join '', #chars;
return $str;
}
You can call that sub with any string argument. You should do the chomp to remove the trailing newline from STDIN outside of the sub, because it is not needed for switching the chars. Always build your subs in the smallest possible unit to make it easy to debug them. One piece of code should do exactly one functionality.
You also do not need to initialize a string with \0. In fact, that doesn't make sense.
Here's a full program.
use strict;
use warnings 'all';
my $str = <STDIN>;
chomp $str;
print last_to_first($str);
sub last_to_first {
my $str = shift;
my #chars = split //, $str;
unshift #chars, pop #chars;
$str = join '', #chars;
return $str;
}
Testing your program
Because you now have one unit in your last_to_first function, you can easily implement a unit test. Perl brings Test::Simple and Test::More (and other tools) for that purpose. Because this is simple, we'll go with Test::Simple.
You load it, tell it how many tests you are going to do, and then use the ok function. Ideally you would put the stuff you want to test into its own module, but for simplicity I'll have it all in the same program.
use strict;
use warnings 'all';
use Test::Simple tests => 3;
ok last_to_first('foo', 'ofo');
ok last_to_first('123', '321');
ok last_to_first('qqqqqq', 'qqqqqq');
sub last_to_first {
my $str = shift;
my #chars = split //, $str;
unshift #chars, pop #chars;
$str = join '', #chars;
return $str;
}
This will output the following:
1..3
ok 1
ok 2
ok 3
Run it with prove instead of perl to get a bit more comprehensive output.
Refactoring it
Now let's change the implementation of last_to_first to use a regular expression substitution with s/// instead of the array approach.
sub last_to_first {
my $str = shift;
$str =~ s/^(.+)(.)$/$2$1/;
return $str;
}
This code uses a pattern match with two groups (). The first one has a lot of chars after the beginning of the string ^, and the second one has exactly one char, after which the string ends $. You can check it out here. Those groups end up in $1 and $2, and all we need to do is switch them around.
If you replace your function in the program with the test, and then run it, the output will be the same. You have just refactored one of the units in your program.
You can also try the substr approach from zdim's answer with this test, and you will see that the tests still pass.
The core function pop takes an array, and removes and returns its last element.
To manipulate characters in a string you can use substr, for example
use warnings;
use strict;
my $str = <STDIN>;
chomp($str);
my $last_char = substr $str, -1, 1, '';
my $new_str = $last_char . $str;
The arguments to substr mean: search the variable $str, at offset -1 (one from the back), for a substring of length 1, and replace that with an empty string '' (thus removing it). The substring that is found, here the last character, is returned. See the documentation page linked above.
In the last line the returned character is concatenated with the remaining string, using the . operator.
You can browse the list of functions broken down by categories at Perl functions by category.
Perl documentation has a lot of goodies, please look around.
Strings are very often manipulated using regular expressions. See the tutorial perlretut, the quick start perlrequick, the quick reference perlreref, and the full reference perlre.
You can also split a string into a character array and work with that. This is shown in detail in the answer by simbabque, which packs a whole lot more of good advice.
This is for substring function used for array variables:
my #arrays = qw(jan feb mar);
last_to_first(#arrays);
sub last_to_first
{
my #lists = #_;
my $last = pop(#lists);
#print $last;
unshift #lists, $last;
print #lists;
}
This is for substring function used for scalar variables:
my $str = "";
$str = <STDIN>;
chomp ($str);
last_to_first($str);
sub last_to_first
{
my $chr = shift;
my $lastchar = substr($chr, -1);
print $lastchar;
}

Split, insert and join

Here's I want to archive. I want to split a one-liner comma-separated and insert #domain.com then join it back as comma-separated.
The one-liner contains something like:
username1,username2,username3
and I want to be something like:
username1#domain.com,username2#domain.com,username3#domain.com
So my Perl script that I tried which doesn't not work properly:
my $var ='username1,username2,username3';
my #tkens = split /,/, $var;
my #user;
foreach my $tken (#tkens) {
push (#user, "$tken\#domain.com");
}
my $to = join(',',#user);
Is there any shortcut on this in Perl and please post sample please. Thanks
Split, transform, stitch:
my $var ='username1,username2,username3';
print join ",", map { "$_\#domain.com" } split(",", $var);
# ==> username1#domain.com,username2#domain.com,username3#domain.com
You could also use a regular expression substitution:
#!/usr/bin/perl
use strict;
use warnings;
my $var = "username1,username2,username3";
# Replace every comma (and the end of the string) with a comma and #domain.com
$var =~ s/$|,/\#domain.com,/g;
# Remove extra comma after last item
chop $var;
print "$var\n";
You already have good answers. Here I am just telling why your script is not working. I didn't see any print or say line in your code, so not sure how you are trying to print something. No need of last line in your program. You can simply suffix #domain.com with each value, push to an array and print it with join.
#!/usr/bin/perl
use strict;
use warnings;
my $var = 'username1,username2,username3';
my #tkens = split ',', $var;
my #user;
foreach my $tken (#tkens)
{
push #user, $tken."\#domain.com"; # `.` after `$tken` for concatenation
}
print join(',', #user), "\n"
Output:
username1#domain.com,username2#domain.com,username3#domain.com

Why is my Perl code not omitting newlines?

I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...