Perl character searching in a string with patterns - perl

Is there a built in function on which determines if a character is present in a string. Also how can you determine if a value is a string or a number.

There is a built in function in perl called the index function also use pattern matching like
to use index : index($stringvariable,"char to search");
to determine if a number use the code m/\d/
if you want to determine if a value is a string use m/\D/
Use pattern matching techniques.

Perl's scalars are a string and a number at the same time. To test whether a scalar can be used as a number without any warnings:
use Scalar::Util qw/looks_like_number/;
my $variable = ...;
if (not defined $variable) {
# it is not usable as either a number or a string, as it is "undef"
}
elsif (looks_like_number $variable) {
# it is a number, but can also be used as a string
}
else {
# you can use it as a string
}
Actually, the story is a bit more complex with objects which may or may not be usable as numbers or strings. Furthermore, looks_like_number can return a true value for Infinity and NaN (not a number), which may not be what you consider to be a number.
To test whether a string contains some substring, you can use regexes or the index function:
my $haystack = "foo";
my $needle = "o";
if (0 <= index $haystack, $needle) {
# the $haystack contains the $needle
}
Some people prefer the equivalent test -1 != index ... instead.

Related

How to get the index of the last digit in a string in Perl?

In Perl, how do you find the index of the last digit in a string?
Example: Hello123xyz index of last digit is 7
A RE to match the last digit in the string and the #- variable to get the index of the start of the match:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
sub last_digit_index($) {
if ($_[0] =~ /\d\D*\z/) {
return $-[0];
} else {
return -1;
}
}
say last_digit_index("Hello123xyz"); # 7
I'd probably use the pos function here. Match with /g and the Perl remembers where the match left off. The next global match on that string will start where the last match left off, so isolate this in a sub or block to avoid weird effects on subsequent matches on the same variable.
Since the position counts from 0, the next position will be one greater than the 1-based position of the final digit. You decide if you want to subtract 1 or not:
use v5.10;
say last_digit_pos('Hello123xyz');
sub last_digit_pos {
my( $string ) = #_;
$string =~ m/^.*\d/sg;
return pos($string); # 6
}
And, if the string doesn't match, pos doesn't return a defined value.
Can also leverage List::MoreUtils::last_index
use List::MoreUtils qw(last_index);
my $last_digit_index = last_index { /[0-9]/ } split '', $string;
I find this simple: break the string into a list of characters with a typical use of split, and use a library to find the last one which is a digit, via a trivial regex.
Note that this is "expensive" as it creates a scalar for each character and runs regex multiple times. So if efficiency matters -- if this is done on an absolutely gigantic string, or many many many times on smaller strings -- then better seek other approaches, or at least benchmark it before deciding on it. (Note, that would have to be really a lot of strings to see degraded efficiency.)

Extract substring using two delimiters and NO REGEX

I have a function whose aim is to extract a substring found between two delimiters. I would use regex but in this case I have explicit instructions not to use them.
I had a simpler and more elegant solution which was just one line but I cannot for the life of me remember or find it.
sub findBetween {
my ($theString,$delimiter1,$delimiter2) = (#_);
my $tmp = substr($theString, index($theString,$delimiter1)+length($delimiter1));
$tmp = substr($tmp, 0, index($tmp,$delimiter2));
return $tmp;}
Thank you for taking a look at this issue, I am aware it is very basic and somewhat redundant. What I need is a simpler solution involving perl basic functions and no regex.
You can use two index() calls to locate both delimiters and use indexes to extract string between them,
sub findBetween {
my ($theString,$delimiter1,$delimiter2) = #_;
my $i1 = index($theString, $delimiter1, 0) + length($delimiter1);
my $i2 = index($theString, $delimiter2, $i1);
return substr($theString, $i1, $i2-$i1);
}
print findBetween("111--2222~~333", "--", "~~"), "\n";
output
2222
I would simply use index
use strict;
use warnings;
my $string = "hello my world";
my $substr = "my";
if (index($string, $substr) != -1) {
print "$substr found in $string";
}
Extract from perldoc
• index STR,SUBSTR,POSITION
• index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at zero. If the substring is not found, index returns -1.

Comparing empty string with number

I have a global value:
my $gPrevious ='';
# main();
func();
sub func {
my $localval = 52552;
if ($gPrevious != $localval) {
------------
x statements;
}
}
Output:
Argument "" isn't numeric in numeric ne (!=) at line x.
Why do I use the ne operator?
Here I am comparing with an empty string.
The value undef is there specifically so that you can test whether a variable has been defined yet. Initialising a variable to a string when it is to be compared to a number rather defeats this purpose.
You should leave your $gPrevious variable undefined, and test for that inside your subroutine.
Like this
my $gPrevious;
func();
sub func {
my $localval = 52552;
unless (defined $gPrevious and $gPrevious == $localval) {
# x statements;
}
}
Using the '==' or '!=' comparison operator requires having two numbers.
If one of the two is a string that perl can easily make a number such as "1", perl will do what you want and compare the 1.
But: If you compare a string or undef with the '!=' operator, there will be a warning.
In your case, I suggest you simply use
$gPrevious = 0;
# ...
if ( $gPrevious != $local_val ){ #...
which would get rid of the warning and work fine.
Testing for defined is another possibility, while testing for integers is not trivial.
You could use eq instead, but this is problematic, as then for example '1.0' ne '1'.
Perl provides two separate operators for string comparison (eq/ne) and numeric comparison (==/!=). LHS value must match the type of operator being used. In your case, LHS is a string and you are using a numeric operator for comparison. Hence, the error.

Finding the data type of a scalar variable in Perl

I have a function which accepts an input from the user. The input maybe an integer, a float or a string. I have three overloaded functions which should be called based on the DATA TYPE of the entered data. For example, if the user enters an integer (say 100), the function having integer parameter should be called. If the user enters a string (say "100") the function having the string parameter should be called.
So I need to find out the data type of the entered data. With regular expressions I am able to distinguish between an integer and a float (since I just need to find out the type, I wont prefer using the library provided at cpan.org), but I am not able to figure out how to differentiate an integer from a string. Perl treats "100" and 100 as the same? Is there any way to work around this problem?
From perldoc perldata:
Scalars aren’t necessarily one thing or another. There’s no place to declare a scalar variable to be of type "string", type "number", type "reference", or anything else. Because of the automatic conversion of scalars, operations that return scalars don’t need to care (and in fact, cannot care) whether their caller is looking for a string, a number, or a reference. Perl is a contextually polymorphic language whose scalars can be strings, numbers, or references (which includes objects). Although strings and numbers are considered pretty much the same thing for nearly all purposes, references are strongly-typed, uncastable pointers with builtin reference-counting and destructor invocation.
So for integer scalars, you'll just need to decide ahead of time how you want to process them. Perl will cheerfully convert from a number to a string or vice versa depending on the context.
Perl does not make a useful distinction between numbers and string representations of those numbers. Your script should not either. You could write some code to differentiate between things that look like integers and floats, but the only way to know if it is a string is if the scalar does not look like an integer or a float.
Here is a simple routine that will return int, rat, or str for its argument. Note that 100 and '100' are both int, but something like 'asdf' will be str.
use Scalar::Util 'looks_like_number';
sub guess_type {
looks_like_number($_[0]) ? $_[0] =~ /\D/ ? 'rat' : 'int' : 'str'
}
say guess_type 1; # int
say guess_type "1"; # int
say guess_type 1.1; # rat
say guess_type 'asdf'; # str
Since you are working on mapping Perl variables to C functions, you could write something like this:
sub myfunction {
if (looks_like_number($_[0]) {
if ($_[0] =~ /\D/) {C_float($_[0])}
else { C_int($_[0])}
}
else {C_string($_[0])}
}
Which should "do the right thing" when given a Perl scalar. You may also want to add in a check to see if the argument is a reference, and then handle that case differently.
#!perl6
use v6;
multi guess ( Int $a ) { say "got integer: $a" }
multi guess ( Str $a ) { say "got string: $a" }
multi guess ( Rat $a ) { say "got float: $a" }
guess(3);
guess("3");
guess(3.0);
Cheating, I know...
Paul
Have you considered passing the function a hash reference with the keys indicating what datatype the input is?
my $str_input = { string => "100" };
my $int_input = { integer => 100 };
my $float_input = { float => 100.0 };
You can check what type you got by checking which key the input has:
my $datatype = shift (keys %{$input}) and take it from there. (Note the implicit dereferencing happening to $input)
switch ($datatype) {
case string:
C_string($input->{$datatype});
case integer:
C_integer($input->{$datatype});
case float:
C_float($input->{$datatype});
}

How do I tell if a variable has a numeric value in Perl?

Is there a simple way in Perl that will allow me to determine if a given variable is numeric? Something along the lines of:
if (is_number($x))
{ ... }
would be ideal. A technique that won't throw warnings when the -w switch is being used is certainly preferred.
Use Scalar::Util::looks_like_number() which uses the internal Perl C API's looks_like_number() function, which is probably the most efficient way to do this.
Note that the strings "inf" and "infinity" are treated as numbers.
Example:
#!/usr/bin/perl
use warnings;
use strict;
use Scalar::Util qw(looks_like_number);
my #exprs = qw(1 5.25 0.001 1.3e8 foo bar 1dd inf infinity);
foreach my $expr (#exprs) {
print "$expr is", looks_like_number($expr) ? '' : ' not', " a number\n";
}
Gives this output:
1 is a number
5.25 is a number
0.001 is a number
1.3e8 is a number
foo is not a number
bar is not a number
1dd is not a number
inf is a number
infinity is a number
See also:
perldoc Scalar::Util
perldoc perlapi for looks_like_number
The original question was how to tell if a variable was numeric, not if it "has a numeric value".
There are a few operators that have separate modes of operation for numeric and string operands, where "numeric" means anything that was originally a number or was ever used in a numeric context (e.g. in $x = "123"; 0+$x, before the addition, $x is a string, afterwards it is considered numeric).
One way to tell is this:
if ( length( do { no warnings "numeric"; $x & "" } ) ) {
print "$x is numeric\n";
}
If the bitwise feature is enabled, that makes & only a numeric operator and adds a separate string &. operator, you must disable it:
if ( length( do { no if $] >= 5.022, "feature", "bitwise"; no warnings "numeric"; $x & "" } ) ) {
print "$x is numeric\n";
}
(bitwise is available in perl 5.022 and above, and enabled by default if you use 5.028; or above.)
Check out the CPAN module Regexp::Common. I think it does exactly what you need and handles all the edge cases (e.g. real numbers, scientific notation, etc). e.g.
use Regexp::Common;
if ($var =~ /$RE{num}{real}/) { print q{a number}; }
Usually number validation is done with regular expressions. This code will determine if something is numeric as well as check for undefined variables as to not throw warnings:
sub is_integer {
defined $_[0] && $_[0] =~ /^[+-]?\d+$/;
}
sub is_float {
defined $_[0] && $_[0] =~ /^[+-]?\d+(\.\d+)?$/;
}
Here's some reading material you should look at.
A simple (and maybe simplistic) answer to the question is the content of $x numeric is the following:
if ($x eq $x+0) { .... }
It does a textual comparison of the original $x with the $x converted to a numeric value.
Not perfect, but you can use a regex:
sub isnumber
{
shift =~ /^-?\d+\.?\d*$/;
}
A slightly more robust regex can be found in Regexp::Common.
It sounds like you want to know if Perl thinks a variable is numeric. Here's a function that traps that warning:
sub is_number{
my $n = shift;
my $ret = 1;
$SIG{"__WARN__"} = sub {$ret = 0};
eval { my $x = $n + 1 };
return $ret
}
Another option is to turn off the warning locally:
{
no warnings "numeric"; # Ignore "isn't numeric" warning
... # Use a variable that might not be numeric
}
Note that non-numeric variables will be silently converted to 0, which is probably what you wanted anyway.
rexep not perfect... this is:
use Try::Tiny;
sub is_numeric {
my ($x) = #_;
my $numeric = 1;
try {
use warnings FATAL => qw/numeric/;
0 + $x;
}
catch {
$numeric = 0;
};
return $numeric;
}
Try this:
If (($x !~ /\D/) && ($x ne "")) { ... }
I found this interesting though
if ( $value + 0 eq $value) {
# A number
push #args, $value;
} else {
# A string
push #args, "'$value'";
}
Personally I think that the way to go is to rely on Perl's internal context to make the solution bullet-proof. A good regexp could match all the valid numeric values and none of the non-numeric ones (or vice versa), but as there is a way of employing the same logic the interpreter is using it should be safer to rely on that directly.
As I tend to run my scripts with -w, I had to combine the idea of comparing the result of "value plus zero" to the original value with the no warnings based approach of #ysth:
do {
no warnings "numeric";
if ($x + 0 ne $x) { return "not numeric"; } else { return "numeric"; }
}
You can use Regular Expressions to determine if $foo is a number (or not).
Take a look here:
How do I determine whether a scalar is a number
There is a highly upvoted accepted answer around using a library function, but it includes the caveat that "inf" and "infinity" are accepted as numbers. I see some regex stuff for answers too, but they seem to have issues. I tried my hand at writing some regex that would work better (I'm sorry it's long)...
/^0$|^[+-]?[1-9][0-9]*$|^[+-]?[1-9][0-9]*(\.[0-9]+)?([eE]-?[1-9][0-9]*)?$|^[+-]?[0-9]?\.[0-9]+$|^[+-]?[1-9][0-9]*\.[0-9]+$/
That's really 5 patterns separated by "or"...
Zero: ^0$
It's a kind of special case. It's the only integer that can start with 0.
Integers: ^[+-]?[1-9][0-9]*$
That makes sure the first digit is 1 to 9 and allows 0 to 9 for any of the following digits.
Scientific Numbers: ^[+-]?[1-9][0-9]*(\.[0-9]+)?([eE]-?[1-9][0-9]*)?$
Uses the same idea that the base number can't start with zero since in proper scientific notation you start with the highest significant bit (meaning the first number won't be zero). However, my pattern allows for multiple digits left of the decimal point. That's incorrect, but I've already spent too much time on this... you could replace the [1-9][0-9]* with just [0-9] to force a single digit before the decimal point and allow for zeroes.
Short Float Numbers: ^[+-]?[0-9]?\.[0-9]+$
This is like a zero integer. It's special in that it can start with 0 if there is only one digit left of the decimal point. It does overlap the next pattern though...
Long Float Numbers: ^[+-]?[1-9][0-9]*\.[0-9]+$
This handles most float numbers and allows more than one digit left of the decimal point while still enforcing that the higher number of digits can't start with 0.
The simple function...
sub is_number {
my $testVal = shift;
return $testVal =~ /^0$|^[+-]?[1-9][0-9]*$|^[+-]?[1-9][0-9]*(\.[0-9]+)?([eE]-?[1-9][0-9]*)?$|^[+-]?[0-9]?\.[0-9]+$|^[+-]?[1-9][0-9]*\.[0-9]+$/;
}
if ( defined $x && $x !~ m/\D/ ) {}
or
$x = 0 if ! $x;
if ( $x !~ m/\D/) {}
This is a slight variation on Veekay's answer but let me explain my reasoning for the change.
Performing a regex on an undefined value will cause error spew and will cause the code to exit in many if not most environments. Testing if the value is defined or setting a default case like i did in the alternative example before running the expression will, at a minimum, save your error log.