index argument contains . perl - perl

If a string contains . representing any character, index doesn't match on it. What to do so that it takes . as any character?
For ex,
index($str, $substr)
if $substr contains . anywhere, index will always return -1
thanks
carol

That is not possible. The documentation says:
The index function searches for one string within another, but without
the wildcard-like behavior of a full regular-expression pattern match.
...
The keywords, you can use for further googlings are:
perl regular expression wildcard
Update:
If you just want to know, if your string matches, using a regular expression could look like that:
my $string = "Hello World!";
if( $string =~ /ll. Worl/ )
{
print "Ahoi! Position: ".($-[0])."\n";
}
This is matching a single character.
$-[0] is the offset into the string of the beginning of the entire
match.
-- http://perldoc.perl.org/perlvar.html
If you want to have a pattern, that is matching an arbitary amount of arbitary characters, you could choose a pattern like...
...
if( $string =~ /ll.*orl/ )
{
...
See perlvar for further information about special perl variables. You will find the variable #LAST_MATCH_START and some explanation about $-[0] over there. There are several more variables, that can help you to find sub matches and to gather other interessting information about your matches...

From perldoc -f index, you can see index() doesn't have any regex syntax:
index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-
expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If
POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after
its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at 0 (or
whatever you've set the $[ variable to--but don't do that). If the substring is not found, "index" returns one less than the
base, ordinarily "-1"
A simple test:
$ perl -e 'print index("1234567asdfghj.","j.")'
13

Use regex:
$str =~ /$substr/g;
$index = pos();

Related

Extract substring using two delimiters and NO REGEX

I have a function whose aim is to extract a substring found between two delimiters. I would use regex but in this case I have explicit instructions not to use them.
I had a simpler and more elegant solution which was just one line but I cannot for the life of me remember or find it.
sub findBetween {
my ($theString,$delimiter1,$delimiter2) = (#_);
my $tmp = substr($theString, index($theString,$delimiter1)+length($delimiter1));
$tmp = substr($tmp, 0, index($tmp,$delimiter2));
return $tmp;}
Thank you for taking a look at this issue, I am aware it is very basic and somewhat redundant. What I need is a simpler solution involving perl basic functions and no regex.
You can use two index() calls to locate both delimiters and use indexes to extract string between them,
sub findBetween {
my ($theString,$delimiter1,$delimiter2) = #_;
my $i1 = index($theString, $delimiter1, 0) + length($delimiter1);
my $i2 = index($theString, $delimiter2, $i1);
return substr($theString, $i1, $i2-$i1);
}
print findBetween("111--2222~~333", "--", "~~"), "\n";
output
2222
I would simply use index
use strict;
use warnings;
my $string = "hello my world";
my $substr = "my";
if (index($string, $substr) != -1) {
print "$substr found in $string";
}
Extract from perldoc
• index STR,SUBSTR,POSITION
• index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at zero. If the substring is not found, index returns -1.

How can I preserve the uppercase/lower case of a string in search using perl?

I want to search for "Frequencies" (its first letter in uppercase) in my text files. And my code will print to the output file some columns including "Frequencies". But there are also occurrences of "frequencies" (its first letter in lowercase) in the text files. I am using this part $search_word = qr/Frequencies/; in the code. How can I make the first letter of the word "Frequencies" upper case in the $search_word = qr/Frequencies/; part to eliminate the occurrences of "frequencies" in the search?
In Perl, you have ucfirst to capitalize the first letter. For example:
$a = "freQuEncY";
$a = ucfirst(lc($a)); # $a <-- "Frequency";
Why don't you use regex match to check , like this
if($string_to_be_searched =~ /Frequencies/){
do something; # like print
}
Try this one:
if ( $$test_string[$i] =~ /\b(?i)f(?-i)requencies/ ) {
my $captured = ucfirst($&);
# process $captured
}
Explanation:
The regex matches will be case-insensitive for the first letter of the word frequencies only. (?i) turns on case-insensitive matching at the position it occurs for the remainder of the pattern or until it is revoked by (?-i). This works for other flags too, cf. perldoc section on re.
$& contains the full match
\b denotes a word boundary (perhaps you don't need that but your problem description suggests you do).

Perl - partial pattern matching in a sequence of letters

I am trying to find a pattern using perl. But I am only interested with the beginning and the end of the pattern. To be more specific I have a sequence of letters and I would like to see if the following pattern exists. There are 23 characters. And I'm only interested in the beginning and the end of the sequence.
For example I would like to extract anything that starts with ab and ends with zt. There is always
So it can be
abaaaaaaaaaaaaaaaaaaazt
So that it detects this match
but not
abaaaaaaaaaaaaaaaaaaazz
So far I tried
if ($line =~ /ab[*]zt/) {
print "found pattern ";
}
thanks
* is a quantifier and meta character. Inside a character class bracket [ .. ] it just means a literal asterisk. You are probably thinking of .* which is a wildcard followed by the quantifier.
Matching entire string, e.g. "abaazt".
/^ab.*zt$/
Note the anchors ^ and $, and the wildcard character . followed by the zero or more * quantifier.
Match substrings inside another string, e.g. "a b abaazt c d"
/\bab\S*zt\b/
Using word boundary \b to denote beginning and end instead of anchors. You can also be more specific:
/(?<!\S)ab\S*zt(?!\S)/
Using a double negation to assert that no non-whitespace characters follow or precede the target text.
It is also possible to use the substr function
if (substr($string, 0, 2) eq "ab" and substr($string, -2) eq "zt")
You mention that the string is 23 characters, and if that is a fixed length, you can get even more specific, for example
/^ab.{19}zt$/
Which matches exactly 19 wildcards. The syntax for the {} quantifier is {min, max}, and any value left blank means infinite, i.e. {1,} is the same as + and {0,} is the same as *, meaning one/zero or more matches (respectively).
Just a * by itself wont match anything (except a literal *), if you want to match anything you need to use .*.
if ($line =~ /^ab.*zt$/) {
print "found pattern ";
}
If you really want to capture the match, wrap the whole pattern in a capture group:
if (my ($string) = $line =~ /^(ab.*zt)$/) {
print "found pattern $string";
}

What does Perl's substr do?

My variable $var has the form 'abc.de'. What does this substr exactly do in this statement:
$convar = substr($var,0,index(".",$var));
index() finds one string within another and returns the index or position of that string.
substr() will return the substring of a string between 2 positions (starting at 0).
Looking at the above, I suspect the index method is being used incorrectly (since its definition is index STR, SUBSTR), and it should be
index($var, ".")
to find the '.' within 'abc.de' and determine a substring of "abc.de"
The substr usage implied here is -
substr EXPR,OFFSET,LENGTH
Since the offset is 0, the operation returns the string upto but not including the first '.' position (as returned by index(".", $var)) into $convar.
Have a look at the substr and index functions in perldoc to clarify matters further.
The Perl substr function has format:
substr [string], [offset], [length]
which returns the string from the index offset to the index offset+length
index has format:
index [str], [substr]
which returns the index of the first occurrence of substr in str.
so substr('abc.de', 0, index(".", $var));
would return the substring starting at index 0 (i.e. 'a') up to the number of characters to the first occurrence of the string "."
So $convar will have "abc" in the example you have
edit: damn, people are too fast :P
edit2: and Brian is right about index being used incorrectly
Why not run it and find out?
#!/usr/bin/perl
my $var = $ARGV[0];
my $index = index(".",$var);
print "index is $index.\n";
my $convar = substr($var, 0, $index);
print "convar is $convar.\n";
Run that on a bunch of words and see what happens.
Also, you may want to type:
perldoc -f index
perldoc -f substr
Fabulously, you can write data into a substring using substr as the left hand side of an assignment:
$ perl -e '$a="perl sucks!", substr($a,5,5)="kicks ass"; print $a'
perl kicks ass!
You don't even need to stick to the same length - the string will expand to fit.
Technically, this is known as using substr as an lvalue.

How do I get the length of a string in Perl?

What is the Perl equivalent of strlen()?
length($string)
perldoc -f length
length EXPR
length Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns length of $_. Note that this cannot be used on an
entire array or hash to find out how many elements these have. For
that, use "scalar #array" and "scalar keys %hash" respectively.
Note the characters: if the EXPR is in Unicode, you will get the num-
ber of characters, not the number of bytes. To get the length in
bytes, use "do { use bytes; length(EXPR) }", see bytes.
Although 'length()' is the correct answer that should be used in any sane code, Abigail's length horror should be mentioned, if only for the sake of Perl lore.
Basically, the trick consists of using the return value of the catch-all transliteration operator:
print "foo" =~ y===c; # prints 3
y///c replaces all characters with themselves (thanks to the complement option 'c'), and returns the number of character replaced (so, effectively, the length of the string).
length($string)
The length() function:
$string ='String Name';
$size=length($string);
You shouldn't use this, since length($string) is simpler and more readable, but I came across some of these while looking through code and was confused, so in case anyone else does, these also get the length of a string:
my $length = map $_, $str =~ /(.)/gs;
my $length = () = $str =~ /(.)/gs;
my $length = split '', $str;
The first two work by using the global flag to match each character in the string, then using the returned list of matches in a scalar context to get the number of characters. The third works similarly by splitting on each character instead of regex-matching and using the resulting list in scalar context