What does Perl's substr do? - perl

My variable $var has the form 'abc.de'. What does this substr exactly do in this statement:
$convar = substr($var,0,index(".",$var));

index() finds one string within another and returns the index or position of that string.
substr() will return the substring of a string between 2 positions (starting at 0).
Looking at the above, I suspect the index method is being used incorrectly (since its definition is index STR, SUBSTR), and it should be
index($var, ".")
to find the '.' within 'abc.de' and determine a substring of "abc.de"

The substr usage implied here is -
substr EXPR,OFFSET,LENGTH
Since the offset is 0, the operation returns the string upto but not including the first '.' position (as returned by index(".", $var)) into $convar.
Have a look at the substr and index functions in perldoc to clarify matters further.

The Perl substr function has format:
substr [string], [offset], [length]
which returns the string from the index offset to the index offset+length
index has format:
index [str], [substr]
which returns the index of the first occurrence of substr in str.
so substr('abc.de', 0, index(".", $var));
would return the substring starting at index 0 (i.e. 'a') up to the number of characters to the first occurrence of the string "."
So $convar will have "abc" in the example you have
edit: damn, people are too fast :P
edit2: and Brian is right about index being used incorrectly

Why not run it and find out?
#!/usr/bin/perl
my $var = $ARGV[0];
my $index = index(".",$var);
print "index is $index.\n";
my $convar = substr($var, 0, $index);
print "convar is $convar.\n";
Run that on a bunch of words and see what happens.
Also, you may want to type:
perldoc -f index
perldoc -f substr

Fabulously, you can write data into a substring using substr as the left hand side of an assignment:
$ perl -e '$a="perl sucks!", substr($a,5,5)="kicks ass"; print $a'
perl kicks ass!
You don't even need to stick to the same length - the string will expand to fit.
Technically, this is known as using substr as an lvalue.

Related

Extract substring using two delimiters and NO REGEX

I have a function whose aim is to extract a substring found between two delimiters. I would use regex but in this case I have explicit instructions not to use them.
I had a simpler and more elegant solution which was just one line but I cannot for the life of me remember or find it.
sub findBetween {
my ($theString,$delimiter1,$delimiter2) = (#_);
my $tmp = substr($theString, index($theString,$delimiter1)+length($delimiter1));
$tmp = substr($tmp, 0, index($tmp,$delimiter2));
return $tmp;}
Thank you for taking a look at this issue, I am aware it is very basic and somewhat redundant. What I need is a simpler solution involving perl basic functions and no regex.
You can use two index() calls to locate both delimiters and use indexes to extract string between them,
sub findBetween {
my ($theString,$delimiter1,$delimiter2) = #_;
my $i1 = index($theString, $delimiter1, 0) + length($delimiter1);
my $i2 = index($theString, $delimiter2, $i1);
return substr($theString, $i1, $i2-$i1);
}
print findBetween("111--2222~~333", "--", "~~"), "\n";
output
2222
I would simply use index
use strict;
use warnings;
my $string = "hello my world";
my $substr = "my";
if (index($string, $substr) != -1) {
print "$substr found in $string";
}
Extract from perldoc
• index STR,SUBSTR,POSITION
• index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at zero. If the substring is not found, index returns -1.

In Perl, how can I tell split not to strip empty trailing fields?

Was trying to count the number of lines in a string of text (including empty lines). A little surprised by the behavior of split. Had expected the following to output 2 but it printed 1 on my perl 5.14.2.
$str = "hello\
world\n\n";
#a = split(/\n/, $str);
print $#a, "\n";
Seems that split() is insensitive to consecutive \n (add more \n's at the end of the string will not increase the printout). The only I can get it sort of close to giving the number of lines is
$str = "hello\
world\n\n";
#a = split(/(\n)/, $str);
printf "%d\n", ($#a + 1)/2, "\n";
But it looks more like a workaround than a straight solution. Any ideas?
perldoc -f split:
If LIMIT is negative, it is treated as if it were instead
arbitrarily large; as many fields as possible are produced.
If LIMIT is omitted (or, equivalently, zero), then it is usually
treated as if it were instead negative but with the exception that
trailing empty fields are stripped (empty leading fields are
always preserved); if all fields are empty, then all fields are
considered to be trailing (and are thus stripped in this case).
$ perl -E 'my $x = "1\n2\n\n"; my #x = split /\n/, $x, -1; say $#x'
3
Perhaps the problem is that you are using $#a when scalar #a is what you are actually looking for?
I apologize if you are already aware of this or if this is not the issue, but $#a returns the index of the last element of #a and (scalar #a) returns the number of elements that #a contains. Since array indexing starts at 0, $#a is one less than scalar #a.

index argument contains . perl

If a string contains . representing any character, index doesn't match on it. What to do so that it takes . as any character?
For ex,
index($str, $substr)
if $substr contains . anywhere, index will always return -1
thanks
carol
That is not possible. The documentation says:
The index function searches for one string within another, but without
the wildcard-like behavior of a full regular-expression pattern match.
...
The keywords, you can use for further googlings are:
perl regular expression wildcard
Update:
If you just want to know, if your string matches, using a regular expression could look like that:
my $string = "Hello World!";
if( $string =~ /ll. Worl/ )
{
print "Ahoi! Position: ".($-[0])."\n";
}
This is matching a single character.
$-[0] is the offset into the string of the beginning of the entire
match.
-- http://perldoc.perl.org/perlvar.html
If you want to have a pattern, that is matching an arbitary amount of arbitary characters, you could choose a pattern like...
...
if( $string =~ /ll.*orl/ )
{
...
See perlvar for further information about special perl variables. You will find the variable #LAST_MATCH_START and some explanation about $-[0] over there. There are several more variables, that can help you to find sub matches and to gather other interessting information about your matches...
From perldoc -f index, you can see index() doesn't have any regex syntax:
index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-
expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If
POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after
its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at 0 (or
whatever you've set the $[ variable to--but don't do that). If the substring is not found, "index" returns one less than the
base, ordinarily "-1"
A simple test:
$ perl -e 'print index("1234567asdfghj.","j.")'
13
Use regex:
$str =~ /$substr/g;
$index = pos();

Modify the last two characters of a string in Perl

I am looking for a solution to a problem:
I have the NSAP address which is 20 characters long:
39250F800000000000000100011921680030081D
I now have to replace the last two characters of this string with F0 and the final string should look like:
39250F80000000000000010001192168003008F0
My current implementation chops the last two characters and appends F0 to it:
my $nsap = "39250F800000000000000100011921680030081D";
chop($nsap);
chop($nsap);
$nsap = $nsap."F0";
Is there a better way to accomplish this?
You can use substr:
substr ($nsap, -2) = "F0";
or
substr ($nsap, -2, 2, "F0");
Or you can use a simple regex:
$nsap =~ s/..$/F0/;
This is from substr's manpage:
substr EXPR,OFFSET,LENGTH,REPLACEMENT
substr EXPR,OFFSET,LENGTH
substr EXPR,OFFSET
Extracts a substring out of EXPR and returns it.
First character is at offset 0, or whatever you've
set $[ to (but don't do that). If OFFSET is nega-
tive (or more precisely, less than $[), starts
that far from the end of the string. If LENGTH is
omitted, returns everything to the end of the
string. If LENGTH is negative, leaves that many
characters off the end of the string.
Now, the interesting part is that the result of substr can be used as an lvalue, and be assigned:
You can use the substr() function as an lvalue, in
which case EXPR must itself be an lvalue. If you
assign something shorter than LENGTH, the string
will shrink, and if you assign something longer
than LENGTH, the string will grow to accommodate
it. To keep the string the same length you may
need to pad or chop your value using "sprintf".
or you can use the replacement field:
An alternative to using substr() as an lvalue is
to specify the replacement string as the 4th argu-
ment. This allows you to replace parts of the
EXPR and return what was there before in one oper-
ation, just as you can with splice().
$nsap =~ s/..$/F0/;
replaces the last two characters of a string with F0.
Use the substr( ) function:
substr( $nsap, -2, 2, "F0" );
chop( ) and the related chomp( ) are really intended for removing line ending characters - newlines and so on.
I believe that substr( ) will be quicker than using a regular expression.

How do I get the length of a string in Perl?

What is the Perl equivalent of strlen()?
length($string)
perldoc -f length
length EXPR
length Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns length of $_. Note that this cannot be used on an
entire array or hash to find out how many elements these have. For
that, use "scalar #array" and "scalar keys %hash" respectively.
Note the characters: if the EXPR is in Unicode, you will get the num-
ber of characters, not the number of bytes. To get the length in
bytes, use "do { use bytes; length(EXPR) }", see bytes.
Although 'length()' is the correct answer that should be used in any sane code, Abigail's length horror should be mentioned, if only for the sake of Perl lore.
Basically, the trick consists of using the return value of the catch-all transliteration operator:
print "foo" =~ y===c; # prints 3
y///c replaces all characters with themselves (thanks to the complement option 'c'), and returns the number of character replaced (so, effectively, the length of the string).
length($string)
The length() function:
$string ='String Name';
$size=length($string);
You shouldn't use this, since length($string) is simpler and more readable, but I came across some of these while looking through code and was confused, so in case anyone else does, these also get the length of a string:
my $length = map $_, $str =~ /(.)/gs;
my $length = () = $str =~ /(.)/gs;
my $length = split '', $str;
The first two work by using the global flag to match each character in the string, then using the returned list of matches in a scalar context to get the number of characters. The third works similarly by splitting on each character instead of regex-matching and using the resulting list in scalar context