perl substr to get a substring - perl

#!/usr/bin/perl
my $str = "abc def yyy ghi";
print substr($str, 0 , index($str,' '));
I want substr to print def yyy
print substr ($str, index ($str, ' '), rindex($str, ' ') does not work?
Any idea?

You didn't specify EXACTLY what you want as far as logic but the best guess is you want to print characters between first and last spaces.
Your example code would print too many characters as it prints # of characters before the last space (in your example, 10 instead of 7). To fix, you need to adjust the # of characters printed by subtracting the # of characters before the first space.
Also, you need to start one character to the right of your "index" value to avoid printing the first space - this "+1" and "-1" in the example below
$cat d:\scripts\test1.pl
my $str = "abc def yyy ghi";
my $first_space_index = index ($str, ' ');
my $substring = substr($str, $first_space_index + 1,
rindex($str, ' ') - $first_space_index - 1);
print "|$substring|\n";
test1.pl
|def yyy|

The third argument is length, not offset. But it can be negative to indicate chars from the end of the string, which is easily gotten from rindex and length, like so:
my $str = "abc def yyy ghi";
print substr( $str, 1 + index( $str, ' ' ), rindex( $str, ' ' ) - length($str) );
(Note adding 1 to get the offset after the first space.)

If you want to print text between first and last space, wouldn't it be easier with regex?
print $1 if "abc def yyy ghi" =~ / (.*) /

frankly, substr/index/rindex are really not the way to go there. You are better off doing something like:
my $str = "abc def yyy ghi";
my #row = split ' ', $str;
pop #row and shift #row;
print "#row";
Which is more inefficient, but captures the actual intent better

Related

Creating a hashmap using perl split function

I am attempting to create a hashmap from a text file. The way the text file is set up is as follows.
(integer)<-- varying white space --> (string value)
. . .
. . .
. . .
(integer)<-- varying white space --> (string value)
eg:
5 this is a test
23 this is another test
123 this is the final test
What I want to do is assign the key to the integer, and then the entire string following to the value. I was trying something along the lines of
%myHashMap;
while(my $info = <$fh>){
chomp($info);
my ($int, $string) = split/ /,$info;
$myHashMap{$int} = $string;
}
This doesn't work though because I have spaces in the string. Is there a way to clear the initial white space, grab the integer, assign it to $int, then clear white space till you get to the string, then take the remainder of the text on that line and place it in my $string value?
You could replace
split / /, $info # Fields are separated by a space.
with
split / +/, $info # Fields are separated by spaces.
or the more general
split /\s+/, $info # Fields are separated by whitespace.
but you'd still face with the problem of the leading spaces. To ignore those, use
split ' ', $info
This special case splits on whitespace, ignoring leading whitespace.
Don't forget to tell Perl that you expect at most two fields!
$ perl -E'say "[$_]" for split(" ", " 1 abc def ghi", 2)'
[1]
[abc def ghi]
The other option would be to use the following:
$info =~ /^\s*(\S+)\s+(\S.*)/
You just need to split each line of text on whitespace into two fields
This example program assumes that the input file is passed as a parameter on the command line. I have used Data::Dump only to show the resulting hash structure
use strict;
use warnings 'all';
my %data;
while ( <DATA> ) {
s/\s*\z//;
my ($key, $val) = split ' ', $_, 2;
next unless defined $val; # Ensure that there were two fields
$data{$key} = $val;
}
use Data::Dump;
dd \%data;
output
{
5 => "this is a test",
23 => "this is another test",
123 => "this is the final test",
}
First you clear initial white space use this
$info =~ s/^\s+//g;
second you have more than 2 spaces in between integer and string so use split like this to give 2 space with plus
split/ +/,$info;
The code is
use strict;
use warnings;
my %myHashMap;
while(my $info = <$fh>){
chomp($info);
$info =~ s/^\s+//g;
my ($int, $string) = split/ +/,$info;
$myHashMap{$int} = $string;
}

How to rewind next-search start position by 1?

How can I rewind the start of the next search position by 1? For example, suppose I want to match all digits between #. The following will give me only odd numbers.
my $data="#1#2#3#4#";
while ( $data =~ /#(\d)#/g ) {
print $1, "\n";
}
But if I could rewind the start of the next position by 1, I would get both even and odd numbers.
This doesn't work: pos() = pos() - 1;
I know I can accomplish this using split. But this doesn't answer my question.
for (split /#/, $data) {
print $_, "\n";
}
One approach is to use a look-ahead assertion:
while ( $data =~ /#(\d)(?=#)/g ) {
print $1, "\n";
}
The characters in the look-ahead assertion are not part of the matched expression and do not update pos() past the \d part of the regular expression.
More demos:
say "#1#2#3#4#" =~ /#(\d)/g; # 1234
say "#1#2#3#4" =~ /#(\d)/g; # 1234
say "#1#2#3#4#" =~ /#(\d)(?=#)/g; # 1234
say "#1#2#3#4" =~ /#(\d)(?=#)/g; # 123
You're calling pos() on $_, instead of $data
From perldoc
Returns the offset of where the last m//g search left off for the variable in question ($_ is used when the variable is not specified)
So,
pos($data) = pos($data) - 1;

Cannot split line and save to variable using whitespace in perl

I'm having some trouble with parsing a file.
Two lines in the file contain the word ' Mapped', and I would like to extract the number that is in those two lines.
And this is my code:
my %cellHash = ();
my $mapped = 0;
my $alnPairs = 0;
my #mappedReads = ();
while (<ALIGN_SUMMARY>) {
chomp($_);
if (/Mapped/) {
print "\n$_\n";
$mapped = (split / /, $_)[2];
push(#mappedReads, $mapped);
}
if (/Aligned pairs/) {
print "\n$_\n";
$alnPairs = (split / /, $_)[4];
}
}
{ $cellHash{$cellDir} } = (
'MappedR1' => $mappedReads[0] ,
'MappedR2' => $mappedReads[1] ,
'AlnPairs' => $alnPairs ,
);
foreach my $cellName ( keys %cellHash){
print OUTPUT $cellName,
"\t", ${ $cellHash{$cellName} }{"LibSize"},
"\t", ${ $cellHash{$cellName} }{"MappedR1"},
"\t", ${ $cellHash{$cellName} }{"MappedR2"},
"\t", ${ $cellHash{$cellName} }{"AlnPairs"},
"\n";
}
But the OUTPUT file only has the 'AlignedPairs' column and never anything in MappedR1 or MappedR2.
What am I doing wrong? Thanks!
When I look at the file, it looks like there is more than a single space. Here is an example of what I mean and what I did to extract the number.
my $test = "blah : 123455";
my #test_ary = split(/ /, $test);
print scalar #test_ary . "\n"; # Prints the size of the array
$number = $1 if $test =~ m/([0-9]+)/;
print "$number\n"; # Prints the extracted number
Output of run:
Size of array: 8
The extracted number: 123455
Hope this helps.
First off, paste in your actual input and output if you want anyone to actually test somethnig for you, not an image.
Second, you're not splitting on whitespace, you're splitting on a single literal space. Use the special case of
split ' ', $_;
to split on arbitrary length whitespace, discarding leading and trailing whitespace.

stripping off numbers and alphabetics in perl

I have an input variable, say $a. $a can be either number or string or mix of both.
My question is how can I strip off the variable to separate numeric digits and alphabetic characters?
Example;
$a can be 'AB9'
Here I should be able to store 'AB' in one variable and '9' in other.
How can I do that?
Check this version, it works with 1 or more numeric and alphabetic characters in a variable.
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $var = '11a';
my (#digits, #alphabetics);
while ($var =~ /([a-zA-Z]+)/g) {
push #alphabetics, $1;
}
while ($var =~ /(\d+)/g) {
push #digits, $1;
}
print Dumper(\#alphabetics);
print Dumper(\#digits);
Here's one way to express it very shortly:
my ($digits) = $input =~ /(\d+)/;
my ($alpha) = $input =~ /([a-z]+)/i;
say 'digits: ' . ($digits // 'none');
say 'non-digits: ' . ($alpha // 'none');
It's important to use the match operator in list context here, otherwise it would return if the match succeeded.
If you want to get all occurrences in the input string, simply change the scalar variables in list context to proper arrays:
my #digits = $input =~ /(\d+)/g;
my #alpha = $input =~ /([a-z]+)/gi;
say 'digits: ' . join ', ' => #digits;
say 'non-digits: ' . join ', ' => #alpha;
For my $input = '42AB17C', the output is
digits: 42, 17
non-digits: AB, C

Why isn't Perl's tr/// doing what I want?

I am using Perl for a script that takes in input as two short strings of DNA. As an output, I concatenate the two strings strings then print the second string lined up over its copy at the end of the concatenated string. For example: if input string are AAAA and TTTTT then print:
AAAAATTTTT
TTTTT
I know there are other ways to do this but I am curious to know why my use of tr/// isn't working.
The code for the program is:
use strict;
use warnings;
print "enter a DNA sequence \n";
$DNA1=<>; #<> shorthand for STDIN
$DNA1=~ s/\r?\n?$//;
print $DNA1 "\n\n";
print "enter second DNA sequence \n";
$DNA2=<>;
$DNA2=~ s/\r?\n?$//;
print $DNA2 "\n\n";
$DNA= join("",($DNA1,$DNA2));
print "Both DNA sequences are \"$DNA\" \n\n";
$DNA3=$DNA1;
$DNA3=~ tr/ATCGatcg//;
print $DNA3 "\n\n";
$DNA4= join("",($DNA3,$DNA2));
print $DNA4 "\n\n";
exit;
Your tr changes any of ACTGatcg and removes them. I think you want
$DNA3 =~ tr/atcgATCG/ /;
You need to put a space in the second half of the tr command.
Alternatively, it seems that what you're trying to do is create a variable containing as many spaces as there were characters in the first string:
my $spaces = ' ' x length($DNA1);
It might just be a simple syntax error. Try:
$DNA3 =~ tr/ATCGatcg/ /;
where the second slash separates your two translation entities, and you have a space character between the second and third slashes.
Good luck!
Edit: my mistake - misunderstood what you wanted to do. Answer adjusted accordingly :)
Is this the program that you want?
#!perl
my $s1 = 'AAAAAAAAA';
my $s2 = 'TCGAGCTA';
print
$s1, $s2, "\n",
' ' x length( $s1 ), $s2, "\n";