How can I insert text into a string in Perl? - perl

If I had:
$foo= "12."bar bar bar"|three";
how would I insert in the text ".." after the text 12. in the variable?

Perl allows you to choose your own quote delimiters. If you find you need to use a double quote inside of an interpolating string (i.e. "") or single quote inside of a non-interpolating string (i.e. '') you can use a quote operator to specify a different character to act as the delimiter for the string. Delimiters come in two forms: bracketed and unbracketed. Bracketed delimiters have different beginning and ending characters: [], {}, (), [], and <>. All other characters* are available as unbracketed delimiters.
So your example could be written as
$foo = qq(12."bar bar bar"|three);
Inserting text after "12." can be done many ways (TIMTOWDI). A common solution is to use a substitution to match the text you want to replace.
$foo =~ s/^(12[.])/$1../;
the ^ means match at the start of the sting, the () means capture this text to the variable $1, the 12 just matches the string "12", and the [] mean match any one of the characters inside the brackets. The brackets are being used because . has special meaning in regexes in general, but not inside a character class (the []). Another option to the character class is to escape the special meaning of . with \, but many people find that to be uglier than the character class.
$foo =~ s/^(12\.)/$1../;
Another way to insert text into a string is to assign the value to a call to substr. This highlights one of Perl's fairly unique features: many of its functions can act as lvalues. That is they can be treated like variables.
substr($foo, 3, 0) = "..";
If you did not already know where "12." exists in the string you could use index to find where it starts, length to find out how long "12." is, and then use that information with substr.
Here is a fully functional Perl script that contains the code above.
#!/usr/bin/perl
use strict;
use warnings;
my $foo = my $bar = qq(12."bar bar bar"|three);
$foo =~ s/(12[.])/$1../;
my $i = index($bar, "12.") + length "12.";
substr($bar, $i, 0) = "..";
print "foo is $foo\nbar is $bar\n";
* all characters except whitespace characters (space, tab, carriage return, line feed, vertical tab, and formfeed) that is

If you want to use double quotes in a string in Perl you have two main options:
$foo = "12.\"bar bar bar\"|three";
or:
$foo = '12."bar bar bar"|three';
The first option escapes the quotes inside the string with backslash.
The second option uses single quotes. This means the double quotes are treated as part of the string. However, in single quotes everything is literal so $var or #array isn't treated as a variable. For example:
$myvar = 123;
$mystring = '"$myvar"';
print $mystring;
> "$myvar"
But:
$myvar = 123;
$mystring = "\"$myvar\"";
print $mystring;
> "123"
There are also a large number of other Quote-like Operators you could use instead.

$foo = "12.\"bar bar bar\"|three";
$foo =~s/12\./12\.\.\./;
print $foo; # results in 12...\"bar bar bar\"|three"

Related

Is the use of "||" in a substring search prohibited?

I have a little Perl script which includes a substring search as follows.
#!/usr/bin/perl
use strict;
use warnings;
my $line = "this && is || a test if && ||";
my $nb_if = findSymbols($line, "if ");
my $nb_and = findSymbols($line, "&&");
my $nb_or = findSymbols($line, "||");
print "\nThe result for this func is $nb_if=if , $nb_and=and, $nb_or=or\n";
sub findSymbols {
my $n = () = ($_[0] =~ m/$_[1]/g);
return $n;
}
It should return:
The result for this func is 1=if , 2=and, 2=or
but, instead it returns:
The result for this func is 1=if , 2=and, 30=or
I don't understand what's wrong with my code.
Use quotemeta to escape the special meaning of the regular expression containing || (and any other characters which you pass to the function):
sub findSymbols {
my $pat = quotemeta $_[1];
my $n = () = ($_[0] =~ m/$pat/g);
return $n;
}
The pipe character (|) has a special meaning in regular expressions. It means "or" (matching either the thing on its left or the thing on its right). So having a regex that consists of just two pipes is interpreted as meaning "match an empty string or an empty string or an empty string" - and that matches everywhere in your string (30 times!)
So you need to stop the pipe being interpreted as a special character and let it just represent an actual pipe character. Here are three ways to do that:
Escape the pipes with backslashes when you're creating the string that you pass to findSymbols().
# Note: I've also changed "..." to '...'
# to avoid having to double-escape
my $nb_or = findSymbols($line, '\|\|');
Use quotemeta() to automatically escape problematic characters in any string passed to findSymbols().
my $escaped_regex = quotemeta($_[0]);
my $n = () = ($_[0] =~ m/$escaped_regex/g);
Use \Q...\E to automatically escape any problematic characters used in your regex.
# Note: In this case, the \E isn't actually needed
# as it's at the end of the regex.
my $n = () = ($_[0] =~ m/\Q$_[0]\E/g);
For more detailed information on using regular expressions in Perl, see perlretut and perlre.
| is the alternation operator in the regular expression used by m//. You need to escape each | with a backslash to match literal |s.
my $nb_or = findSymbols($line, "\\|\\|"); # or '\|\|`
(but using quotemeta as suggested by #toolic is a much better idea, as it frees your caller from having to worry about details that should be part of the abstraction provided by findSymbols.)

Add a new line in a variable using perl

I am trying to add a new line in a variable after certain number of words. For example: If we have a variable:
$x = "This a variable, start a new line here, This is a new line.";
If I print the above variable
print $x;
I should get the below output:
This is a variable,
start a new line here,
This is a new line.
How can I achieve this in Perl from the variable itself?
I do not agree to the formula "after certain number of words".
Note that the first target line has 4 words, whereas remaining 2 have
5 words each.
Actually you need to replace each comma and following sequence of
spaces (if any) with a comma and \n.
So the intuitive way to do it is:
$x =~ s/,\s*/,\n/g;
The simplest way is to split the string on comma followed by a space and then
join the word groups with a comma followed by a newline.
my $x = "This a variable, start a new line here, This is a new line.";
print join(",\n", split /, /, $x) . "\n";
output
This a variable,
start a new line here,
This is a new line.
For solving the general, how do I reformat this string with line breaks after n-columns? problem, use the Text::Wrap library (as suggested by #ikegami):
use Text::Wrap;
my $x = "The quick brown fox jumped over the lazy dog.";
$Text::Wrap::columns = 15;
# wrap() needs an array of words
my #words = split /\s+/, $x;
# Initial tab, subsequent tab values set to '' (think indent amount)
print wrap('', '', #words) . "\n";
output
The quick
brown fox
jumped over
the lazy dog.
You probably want to use regular expressions. You can do this:
$x =~ s/^(\S+\s+){3}\K/\n/;
Or if this is about the commas and not the spaces:
$x =~ s/^([^,]+,+){2}\s*\K/\n/;
(in this case I also remove any potential space that would be after the comma)
You can also configure separately how many words or comma you want, by putting this in a variable:
my $nbwords = 7; # add a line after the 7th word
$x =~ s/^(\S+\s+){$nbwords}\K/\n/;
Now, that would keep the last space so you may want to do this:
my $nbwords = 7; # add a line after the 7th word
$nbwords--; # becomes 6 because there is another word after that we match as well
$x =~ s/^(\S+\s+){$nbwords}\S+\K\s+/\n/;
You should probably learn to use Regexps but just to explain the above:
\s is any space character (like space, tab, line feed, etc)
\S (uppercase) is any character except a space character
+ means any number of characters of that type described with what is before. So \s+ means any number of consecutive space characters.
{123} means 123 times that type of character ...
{3,80} means 3 to 80 times. So + is equivalent to {1,} (one to unlimited)
\K means that whatever is before will not be replaced, only what is after.

Perl tr operator is transliterating based on the variable's name not its value

I'm using Perl 5.16.2 to try to count the number of occurrences of a particular delimiter in the $_ string. The delimiter is passed to my Perl program via the #ARGV array. I verify that it is correct within the program. My instruction to count the number of delimiters in the string is:
$dlm_count = tr/$dlm//;
If I hardcode the delimiter, e.g. $dlm_count = tr/,//; the count comes out correctly. But when I use the variable $dlm, the count is wrong. I modified the instruction to say
$dlm_count = tr/$dlm/\t/;
and realized from how the tabs were inserted in the string that the operation was substituting every instance of any of the four characters "$", "d", "l", or "m" to \t — i.e. any of the four characters that made up my variable name $dlm.
Here is a sample program that illustrates the problem:
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm = ",";
my $dlm_count = tr/$dlm/\t/;
print "The count is $dlm_count\n";
print "The modified string is $_\n";
There are only two commas in the $_ string, but this program prints the following:
The count is 3
The modified string is abc efghij,k ,nopqrstuvwxyz
Why is the $dlm token being treated as a literal string of four characters instead of as a variable name?
You cannot use tr that way, it doesn't interpolate variables. It runs strictly character by character replacement. So this
$string =~ tr/a$v/123/
is going to replace every a with 1, every $ with 2, and every v with 3. It is not a regex but a transliteration. From perlop
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $# if $#;
eval "tr/$oldlist/$newlist/, 1" or die $#;
The above example from docs hints how to count. For $dlms in $string
$dlm_count = eval "\$string =~ tr/$dlm//";
The $string is escaped so to not be interpolated before it gets to eval. In your case
$dlm_count = eval "tr/$dlm//";
You can also use tools other than tr (or regex). For example, with string being in $_
my $dlm_count = grep { /$dlm/ } split //;
When split breaks $_ by the pattern that is empty string (//) it returns the list of all characters in it. Then the grep block tests each against $dlm so returning the list of as many $dlm characters as there were in $_. Since this is assigned to a scalar, $dlm_count is set to the length of that list, which is the count of all $dlm.
In the section of the docs on perlop 'Quote Like Operators', it states:
Because the transliteration table is built at compile time, neither
the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation. That means that if you want to use variables, you must
use an eval():
As documented and as you discovered, tr/// doesn't interpolate. The simple solution is to use s/// instead.
my $dlm = ",";
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm_count = s/\Q$dlm/\t/g;
If the transliteration is being performed in a loop, the following might speed things up noticeably:
my $dlm = ",";
my $tr = eval "sub { tr/\Q$dlm\E/\\t/ }";
for (...) {
my $dlm_count = $tr->();
...
}
Although several answers have hinted at the eval() idiom for tr///, none have the form that covers cases where the string has tr syntax characters in it, e.g.- (hyphen):
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm = ",";
my $dlm_count = eval sprintf "tr/%s/%s/", map quotemeta, $dlm, "\t";
But as others have noted, there are lots of ways to count characters in Perl that avoid eval(), here's another:
my $dlm_count = () = m/$dlm/go;

index argument contains . perl

If a string contains . representing any character, index doesn't match on it. What to do so that it takes . as any character?
For ex,
index($str, $substr)
if $substr contains . anywhere, index will always return -1
thanks
carol
That is not possible. The documentation says:
The index function searches for one string within another, but without
the wildcard-like behavior of a full regular-expression pattern match.
...
The keywords, you can use for further googlings are:
perl regular expression wildcard
Update:
If you just want to know, if your string matches, using a regular expression could look like that:
my $string = "Hello World!";
if( $string =~ /ll. Worl/ )
{
print "Ahoi! Position: ".($-[0])."\n";
}
This is matching a single character.
$-[0] is the offset into the string of the beginning of the entire
match.
-- http://perldoc.perl.org/perlvar.html
If you want to have a pattern, that is matching an arbitary amount of arbitary characters, you could choose a pattern like...
...
if( $string =~ /ll.*orl/ )
{
...
See perlvar for further information about special perl variables. You will find the variable #LAST_MATCH_START and some explanation about $-[0] over there. There are several more variables, that can help you to find sub matches and to gather other interessting information about your matches...
From perldoc -f index, you can see index() doesn't have any regex syntax:
index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-
expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If
POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after
its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at 0 (or
whatever you've set the $[ variable to--but don't do that). If the substring is not found, "index" returns one less than the
base, ordinarily "-1"
A simple test:
$ perl -e 'print index("1234567asdfghj.","j.")'
13
Use regex:
$str =~ /$substr/g;
$index = pos();

How can I manually interpolate string escapes in a Perl string?

In perl suppose I have a string like 'hello\tworld\n', and what I want is:
'hello world
'
That is, "hello", then a literal tab character, then "world", then a literal newline. Or equivalently, "hello\tworld\n" (note the double quotes).
In other words, is there a function for taking a string with escape sequences and returning an equivalent string with all the escape sequences interpolated? I don't want to interpolate variables or anything else, just escape sequences like \x, where x is a letter.
Sounds like a problem that someone else would have solved already. I've never used the module, but it looks useful:
use String::Escape qw(unbackslash);
my $s = unbackslash('hello\tworld\n');
You can do it with 'eval':
my $string = 'hello\tworld\n';
my $decoded_string = eval "\"$string\"";
Note that there are security issues tied to that approach if you don't have 100% control of the input string.
Edit: If you want to ONLY interpolate \x substitutions (and not the general case of 'anything Perl would interpolate in a quoted string') you could do this:
my $string = 'hello\tworld\n';
$string =~ s#([^\\A-Za-z_0-9])#\\$1#gs;
my $decoded_string = eval "\"$string\"";
That does almost the same thing as quotemeta - but exempts '\' characters from being escaped.
Edit2: This still isn't 100% safe because if the last character is a '\' - it will 'leak' past the end of the string though...
Personally, if I wanted to be 100% safe I would make a hash with the subs I specifically wanted and use a regex substitution instead of an eval:
my %sub_strings = (
'\n' => "\n",
'\t' => "\t",
'\r' => "\r",
);
$string =~ s/(\\n|\\t|\\n)/$sub_strings{$1}/gs;