Chopping the last sequence of a pattern - perl

I have this series of values
rd_8KB_rms
rd_8KB_rms_qd1
rd_8KB_wh
rd_8KB_wh_q1
rd_8KB_wms
rd_8KB_wms_qd1
rd_256K_rms
rd_256K_rms_1
and where there are 3 underscores I would like to chop the last underscore and the characters that trail it ( which are variable in number). I think I have tried variations of substr, split, regex but can't find anything that works

You can use transliteration tr/_// to count the number of underscores and substitution s/_[^_]*$// to remove the part from the last underscore to the end.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
while (<DATA>) {
chomp;
s/_[^_]*$// if tr/_// == 3;
say;
}
__DATA__
rd_8KB_rms
rd_8KB_rms_qd1
rd_8KB_wh
rd_8KB_wh_q1
rd_8KB_wms
rd_8KB_wms_qd1
rd_256K_rms
rd_256K_rms_1
If there can be even more underscores, use a variant like
s/_[^_]*$// until tr/_// <= 3;

Related

How to get the index of the last digit in a string in Perl?

In Perl, how do you find the index of the last digit in a string?
Example: Hello123xyz index of last digit is 7
A RE to match the last digit in the string and the #- variable to get the index of the start of the match:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
sub last_digit_index($) {
if ($_[0] =~ /\d\D*\z/) {
return $-[0];
} else {
return -1;
}
}
say last_digit_index("Hello123xyz"); # 7
I'd probably use the pos function here. Match with /g and the Perl remembers where the match left off. The next global match on that string will start where the last match left off, so isolate this in a sub or block to avoid weird effects on subsequent matches on the same variable.
Since the position counts from 0, the next position will be one greater than the 1-based position of the final digit. You decide if you want to subtract 1 or not:
use v5.10;
say last_digit_pos('Hello123xyz');
sub last_digit_pos {
my( $string ) = #_;
$string =~ m/^.*\d/sg;
return pos($string); # 6
}
And, if the string doesn't match, pos doesn't return a defined value.
Can also leverage List::MoreUtils::last_index
use List::MoreUtils qw(last_index);
my $last_digit_index = last_index { /[0-9]/ } split '', $string;
I find this simple: break the string into a list of characters with a typical use of split, and use a library to find the last one which is a digit, via a trivial regex.
Note that this is "expensive" as it creates a scalar for each character and runs regex multiple times. So if efficiency matters -- if this is done on an absolutely gigantic string, or many many many times on smaller strings -- then better seek other approaches, or at least benchmark it before deciding on it. (Note, that would have to be really a lot of strings to see degraded efficiency.)

How do I search and replace with "OR" condition

This is a trivial issue, but I hope someone can point me to the right way to do it. I have a string "Thunderstorms" which I replace with "T/storms".
s/Thunderstorms/T\/Storms/gi
It so happens that "Thunderstorms" is sometimes written as "Thunder Storms". Instead of writing two search and replace commands, I am looking for replacing "Thunderstorms" or "Thunder storms" with "T/Storms" in one command.
You can use \s* to match zero or more whitespaces:
use warnings;
use strict;
while (<DATA>) {
s{Thunder\s*storms}{T/Storms}gi;
print;
}
__DATA__
Thunderstorms
Thunder Storms
Thunder storms
Outputs:
T/Storms
T/Storms
T/Storms
I used different delimeters for the substitution operator (s{}{}) to avoid escaping the /.
use |.
s/Thunderstorms|Thunder\sStorms/T\/Storms/gi
sample code:
use strict;
use warnings;
my $str = 'Thunderstorms foo Thunder Storms bar';
$str =~ s/Thunderstorms|Thunder\sStorms/T\/Storms/gi;
print $str;

Perl if Statement only specific length of numbers

is it possible to have an if-statement where I look if my $expression has less than 12 integers and only integers. Like
if($expression> less than 12numbers and only integers).
You can match it using regex. Below is the code snippet.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);
my $exp = "1234567898711";
if ($exp =~ /^\d{12}$/) {
say "Matched expression: $exp";
} else {
say "Not matched";
}
EDIT:
If you want to look for 12 digits or less than that use below expression:
\d{1,12}
Note: This expression is only when you have straight digits. If its a alphanumeric, then it needed to be changed accordingly.

How to combine two regex pattern in perl?

I want to combine two regex pattern to split string and get a table of integers.
this the example :
$string= "1..1188,1189..14,14..15";
$first_pattern = /\../;
$second_pattern = /\,/;
i want to get tab like that:
[1,1188,1189,14,14,15]
Use | to connect alternatives. Also, use qr// to create regex objects, using plain /.../ matches against $_ and assigns the result to $first_pattern and $second_pattern.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = '1..1188,1189..14,14..15';
my $first_pattern = qr/\.\./;
my $second_pattern = qr/,/;
my #integers = split /$first_pattern|$second_pattern/, $string;
say for #integers;
You probably need \.\. to match two dots, as \.. matches a dot followed by anything but a newline. Also, there's no need to backslash a comma.

Why does split return an array with every second element empty?

I'm trying to split a string every 5 characters. The array I'm getting back from split isn't how I'm expecting it: all the even indexes are empty, the parts I'm looking for are on odd indexes.
This version doesn't output anything:
use warnings;
use strict;
my #ar = <DATA>;
foreach (#ar){
my #mkh = split (/(.{5})/,$_);
print $mkh[2];
}
__DATA__
aaaaabbbbbcccccdddddfffff
If I replace the print line with this (odd indexes 1 and 3):
print $mkh[1],"\n", $mkh[3];
The output is the first two parts:
aaaaa
bbbbb
I don't understand this, I expected to be able to print the first two parts with this:
print $mkh[0],"\n", $mkh[1];
Can someone explain what is wrong in my code, and help me fix it?
The first argument in split is the pattern to split on, i.e. it describes what separates your fields. If you put capturing groups in there (as you do), those will be added to the output of the split as specified in the split docs (last paragraph).
This isn't what you want - your separator isn't a group of five characters. You're looking to split a string every X characters. For that, better use:
my #mkh = (/...../g);
# or
my #mkh = (/.{5}/g);
or one of the other options you'll find in: How can I split a string into chunks of two characters each in Perl?
Debug using Data::Dump
To observe exactly what your split operation is doing, use a module like Data::Dump:
use warnings;
use strict;
while (<DATA>) {
my #mkh = split /(.{5})/;
use Data::Dump;
dd #mkh;
}
__DATA__
aaaaabbbbbcccccdddddfffff
Outputs:
("", "aaaaa", "", "bbbbb", "", "ccccc", "", "ddddd", "", "fffff", "\n")
As you can see, your code is splitting on groups of 5 characters, and leaving empty strings between them. This is obviously not what you want.
Use Pattern Matching instead
Instead, you simply want to capture groups of 5 characters. Therefore, you just need a pattern match with the /g Modifier:
use warnings;
use strict;
while (<DATA>) {
my #mkh = /(.{5})/g;
use Data::Dump;
dd #mkh;
}
__DATA__
aaaaabbbbbcccccdddddfffff
Outputs:
("aaaaa", "bbbbb", "ccccc", "ddddd", "fffff")
You can also use zero-width delimiter, which can be described as split string at places which are in front of 5 chars (by using \K positive look behind)
my #mkh = split (/.{5}\K/, $_);