How to substring a string with several position with Perl? - perl

I have several places where I want to cut my string in several parts.
For example:
$string= "AACCAAGTAA";
#cut_places= {0,4, 8 };
My $string should look like this: AACC AAGT AA;
How can I do that?

To populate an array, use round parentheses, not curly brackets (they're used for hash references).
One possible way is to use substr where the first argument is the position, so you can use the array elements. You just need to compute the length by subtracting the position from the following one; and to be able to compute the last length, you need the length of the whole string, too:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'AACCAAGTAA';
my #cut_places = (0, 4, 8);
push #cut_places, length $string;
my #parts = map {
substr $string, $cut_places[$_], $cut_places[$_+1] - $cut_places[$_]
} 0 .. $#cut_places - 1;
say for #parts;
If the original array contained lengths instead of positions, the code would be much easier.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'AACCAAGTAA';
my #lengths = (4, 4, 2); # 4, 4, 4 would work, too
my #parts = unpack join("", map "A$_", #lengths), $string;
say for #parts;
See unpack for details.

Here's a solution that starts by calculating the forward differences in the list of positions. The length of the string is first appended to the end of the list of it doesn't already span the full string
The differences are then used to build an unpack format string, which is used to build the required sequence of substrings.
I have written the functionality as a do block, which would be simple to convert to a subroutine if desired.
use strict;
use warnings 'all';
use feature 'say';
my $string = 'AACCAAGTAA';
my #cut_places = ( 0, 4, 8 );
my #parts = do {
my #places = #cut_places;
my $len = length $string;
push #places, $len unless $places[-1] >= $len;
my #w = map { $places[$_]-$places[$_-1] } 1 .. $#places;
my $patt = join ' ', map { "A$_" } #w;
unpack $patt, $string;
};
say "#parts";
output
AACC AAGT AA

Work out the lengths of needed parts first, then all methods are easier. Here regex is used
use warnings;
use strict;
use feature 'say';
my $string = 'AACCAAGTAA';
my #pos = (0, 4, 8);
my #lens = do {
my $prev = shift #pos;
"$prev", map { my $e = $_ - $prev; $prev = $_; $e } #pos;
};
my $patt = join '', map { '(.{'.$_.'})' } #lens;
my $re = qr/$patt/;
my #parts = grep { /./ } $string =~ /$re(.*)/g;
say for #parts;
The lengths #lens are computed by subtracting the successive positions, 2-1, 3-2 (etc). I use do merely so that the #prev variable, unneeded elsewhere, doesn't "pollute" the rest of the code.
The "$prev" is quoted so that it is evaluated first, before it changes in map.
The matches returned by regex are passed through grep to filter out empty string(s) due to the 0 position (or whenever successive positions are the same).
This works for position arrays of any lengths, as long as positions are consistent with a string.

Related

Data value of array not printing properly

I have written a script which collects marks of students and print the one who scored above 50.
Script is below:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #array = (
'STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
');
print Dumper(\#array);
my $class = "3";
foreach my $each_value (#array) {
print "EACH: $each_value\n";
my ($name, $score ) = split (/,/, $each_value);
if ($score lt 50) {
next;
} else {
print "$name, \"GOOD SCORE\", $score, $class";
}
}
Here I wanted to print data of STUDENT1, since his score is greater than 50.
So output should be:
STUDENT1, "GOOD SCORE", 90, 3
But its printing output like this:
STUDENT1, "GOOD SCORE", 90
STUDENT2, 3
Here some manipulation happens between 90 STUDENT2 which it discards to separate it.
I know I was not splitting data with new line character since we have single element in the array #array.
How can I split the element which is in array to new line, so that inside for loop I can split again with comma(,) to have the values in $name and $score.
Actually the #array is coming as an argument to this script. So I have to modify this script in order to parse right values.
As you already know your "array" only has one "element" with a string with the actual records in it, so it essentially is more a scalar than an array.
And as you suspect, you can split this scalar just as you already did with the newline as a separator instead of a comma. You can then put a foreach around the result of split() to iterate over the records.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $records = 'STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
';
my $class = "3";
foreach my $record (split("\n", $records)) {
my ($name, $score) = split(',', $record);
if ($score >= 50) {
print("$name, \"GOOD SCORE\", $score, $class\n");
}
}
As a small note, lt is a string comparison operator. The numeric comparisons use symbols, such as <.
Although you have an array, you only have a single string value in it:
my #array = (
'STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
');
That's not a big deal. Dave Cross has already shown you have you can break that up into multiple values, but there's another way I like to handle multi-line strings. You can open a filehandle on a reference to the string, then read lines from the string as you would a file:
my $string = 'STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
';
open my $string_fh, '<', \$string;
while( <$string_fh> ) {
chomp;
...
}
One of the things to consider while programming is how many times you are duplicating the data. If you have it in a big string then split it into an array, you've now stored the data twice. That might be fine and its usually expedient. You can't always avoid it, but you should have some tools in your toolbox that let you avoid it.
And, here's a chance to use indented here docs:
use v5.26;
my $string = <<~"HERE";
STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
HERE
open my $string_fh, '<', \$string;
while( <$string_fh> ) {
chomp;
...
}
For your particular problem, I think you have a single string where the lines are separated by the '|' character. You don't show how you call this program or get the data, though.
You can choose any line ending you like by setting the value for the input record separator, $/. Set it to a pipe and this works:
use v5.10;
my $string = 'STUDENT1,90|STUDENT2,40|STUDENT3,30|STUDENT4,30';
{
local $/ = '|'; # input record separator
open my $string_fh, '<', \$string;
while( <$string_fh> ) {
chomp;
say "Got $_";
}
}
Now the structure of your program isn't too far away from taking the data from standard input or a file. That gives you a lot of flexibility.
The #array contains one element, Actually the for loop will working correct, you can fix it without any change in the for block just by replacing this array:
my #array = (
'STUDENT1,90',
'STUDENT2,40',
'STUDENT3,30',
'STUDENT4,30');
Otherwise you can iterate on them by splitting lines using new line \n .

How to extract a value from a variable within if condition

I am trying to extract the timestamp portion from the variable, but somehow the substr function isnt working
This is what I have tried
while(<INFILE>){
chomp;
if(/timestamp:(.+$)/){
$ts = $1;
$ts =~ substr($ts, 10);
print $ts;
}
close(INFILE);
This is how the line is in the file
timestamp: 25JUN2019_02:55:02.234
somedata..
..
..
..
timestamp: 25JUN2019_07:00:28.718
I want the output to be
02:55:02.234
07:00:28.718
But instead the output is
25JUN2019_02:55:02.234
25JUN2019_07:00:28.718
Several issues:
You are using the bind operator =~ instead of the assignment operator =
You should always use strict and use warnings
You should ignore whitespace before your match
If there is additional data on the line, substr will return it as well. You should scope your substr to only include want you want.
Revised code:
use strict;
use warnings;
while(<DATA>) {
chomp;
if (/timestamp:\s*(.+$)/) {
my $ts = substr($1, 10, 12); # only include length of data you want
print $ts;
}
}
__DATA__
timestamp: 25JUN2019_02:55:02.234
Output:
02:55:02.234
Two problems:
=~ is the binding operator, you probably want normal assignment.
substr $ts, 10 returns the substring form position 10 to the end of $ts. To only extract 12 characters, use
$ts = substr $ts, 10, 12;
You can also extract the timestamp directly:
if(my ($ts) = /timestamp: [^_]+_(\S+)/){
print $ts, "\n";
}

Perl int to array

I have a number stored in a Perl variable and I want to 'pass/convert/store' its digits in the different positions of an array. An example for a better sight:
I have, let's say, this number stored:
$hello = 429384
And I need a new array with the digits stored in it, so:
$hello2[0] = 4
$hello2[1] = 2
$hello2[2] = 9
Etc..
I can probably make it with a couple of loops, but I want to know if there is an efficient and fast way to do it. Thx in advance!
my #hello = split //, $hello;
In Perl if you use number in a string operator, the conversion is done automatically
$hello = 429384;
#hello = split //, $hello;
print $hello[0];
Using only Regex and without using any inbuilt function:
#!/usr/bin/perl
use strict;
use warnings;
my $string=429384;
my #numbers = $string =~ /./g; # dot matches a single character at a time
#and returns it
print "#numbers \n";
this is significantly faster than the regexp way:
$string = '1234567890';
$_-=48 for #digits = unpack 'C*',$string;
benchmark:
use Time::HiRes;
$string = '1234567890';
$start_time = [Time::HiRes::gettimeofday()];
for (1.. 100000){
$_-=48 for #digits= unpack 'C*',$string;
}
$diff = Time::HiRes::tv_interval($start_time);
print "\n\n$diff\n";
$start_time = [Time::HiRes::gettimeofday()];
for (1.. 100000){
#digits = split //, $string;
}
$diff = Time::HiRes::tv_interval($start_time);
print "\n\n$diff\n";
output:
0.265814
0.314735

extract multiple substr and match & replace using perl

I need to extract multiple substrings at fixed positions from a line and the same time replace whitespaces at another position.
For example, I have a string '01234567890 '. I want to extract characters at positions 1,2,6,7,8 and the same time if position 12, 13 are whitespaces, I want to replace them with 0101. It is all position based.
What is the best way to achieve this using perl ?
I can use substr and string comparison and then concatenate them together, but the code looked rather chuncky....
I would probably split (or: explode) the string into an array of single chars:
my #chars = split //, $string; # // is special with split
Now we can do array slices: extracting multiple arguments at once.
use List::MoreUtils qw(all);
if (all {/\s/} #chars[12, 13]) {
#chars[12, 13] = (0, 1);
my #extracted_chars = #chars[1, 2, 6..8];
# do something with extracted data.
}
We can then turn the #chars back into a string like
$string = join "", #chars;
If you want to remove certain chars instead of extracting them, you would have to use slices inside a loop, an ugly undertaking.
Complete sub with nice interface to do this kind of thing
sub extract (%) {
my ($at, $ws, $ref) = #{{#_}}{qw(at if_whitespace from)};
$ws //= [];
my #chars = split //, $$ref;
if (all {/\s/} #chars[#$ws]) {
#chars[#$ws] = (0, 1) x int(#$ws / 2 + 1);
$$ref = join "", #chars;
return #chars[#$at];
}
return +();
}
my $string = "0123456789ab \tef";
my #extracted = extract from => \$string, at => [1,2,6..8], if_whitespace => [12, 13];
say "#extracted";
say $string;
Output:
1 2 6 7 8
0123456789ab01ef
This is two separate operations, and should be coded as such. This code seems to do what you need.
use strict;
use warnings;
my $str = 'abcdefghijab efghij';
my #extracted = map { substr $str, $_, 1 } 1, 2, 6, 7, 8;
print "#extracted\n";
for (substr $str, 12, 2) {
$_ = '01' if $_ eq ' ';
}
print $str, "\n";
output
b c g h i
abcdefghijab01efghij

Cutting apart string in Perl

I have a string in Perl that is 23 digits long. I need to cut it apart into different pieces. First 2 digits in one variable, next 3 in another variable, next 4 into another variable, etc. Basically the 23 digits needs to end up as 6 separate variables (2,3,4,4,3,7) characters, in that order.
Any ideas how I can cut the string up like this?
There are lots of ways to do it, but the shortest is probably unpack:
my $string = '1' x 23;
my #values = unpack 'A2A3A4A4A3A7', $string;
If you need separate variables, you can use a list assignment:
my ($v1, $v2, $v3, $v4, $v5, $v6) = unpack 'A2A3A4A4A3A7', $string;
Expanding on Alex's method, rather than specify each start and end, use the list you gave of lengths.
#!/usr/bin/env perl
use strict;
use warnings;
my $string = "abcdefghijklmnopqrstuvw";
my $pos = 0;
my #split = map {
my $start = $pos;
my $end = $_;
$pos += $end;
substr( $string, $start, $end);
} (2,3,4,4,3,7);
print "$_\n" for #split;
This said you probably should look at unpack which is used for fixed width fields. I have no experience with it though.
You could use a regex, viz:
$string =~ /\d{2}\d{3}\d{4}\d{4}\d{3}\d{7}/
and capture each part by surrounding with brackets ().
You then find each capture in the variables $1, $2 ...
or get them all in the returned list
See perldoc perlre
You want to use perldoc substr.
$substring = substr($string, $start, $length);
I'd also use `map' on a list of [start, length] pairs to make your life easier:
$string = "123456789";
#values = map {substr($string, $_->[0], $_->[1])} ([1, 3], [4, 2] , ...);
Here's a sub that will do it, using the already discussed unpack.
sub string_slices {
my $str = shift;
return unpack( join( 'A', '', #_ ), $str );
}