Perl regex replace first name last name with first name last initial - perl

I want to have the output of $var below to be John D
my $var = "John Doe";
I have tried
$var =~ s/(.+\b.).+\z],'\1.'//g;

Here's a general solution (feel free to swap in '\w' where I used '.', and add a \s where I used \s+)
my $var = "John Doe";
(my $fname, my $linitial) = $var =~ /(.*)\s+(.).*/
Then you have the values
$fname = 'John';
$linitial = 'D';
and you can do:
print "$fname $linitial";
to get
"John D"
EDIT
Until you do your next match, each of the capture parentheses creates a variable ($1 and $2, respectively), so the whole thing can be shortened a bit as follows:
my $var = "John Doe";
$var =~ /(.*)\s+(.).*/
print "$1 $2";

To replace the last sequence of non-whitespace characters with just the initial character, you could write this
use strict;
use warnings;
my $var = "John Doe";
$var =~ s/(\S)\S*\s*$/$1/;
print $var;
output
John D

Assuming your string has ascii names this will work
$var =~ s/([a-zA-Z]+)\s([a-zA-Z]+)/$1." ".substr($2,0,1)/ge;

$var = "John Doe";
s/^(\w+)\s+(\w)/$1 \u$2/ for $var;

A simple regex that solves this problem is the substitution
s/^\w+\s+\K(\w).*/\U$1/s
What does this do?
^ \w+ \s+ matches a word at the beginning of the string, plus whitespace towards the next word
\K is the keep escape. It keeps the currently matched part outside of that substring that is considered “matched” by the regex engine. This avoids an extra capture group, and is practically a look-behind.
(\w) matches and captures one “word” character. This is the leading character of the second word in the string.
.* matches the rest of the string. I do this to overwrite any other names that may come: you stated that Lester del Ray should be transformed to Lester D, not Lester D Ray as a solution with \w* instead of the .* part would have done. The /s modifier is relevant for this, as it enables . to match every character including newlines (who knows what's inside the string?).
The substitution uses the \U modifier to uppercase the rest of the string, which consists of the value of the capture.
Test:
$ perl -E'$_ = shift; s/^\w+\s+\K(\w).*/\U$1/s; say' "Lester del Ray"
Lester D
$ perl -E'$_ = shift; s/^\w+\s+\K(\w).*/\U$1/s; say' "John Doe"
John D

Something like this might be a little more usable/reusable in the long run.
$initial = sub { return substr shift, 0, 1 ; };
make a get initial function
$var =~ s/(\w)\s+(\w)/&$initial($1) &$initial($2)/sge;
Then replace the first and second results using execute in the regex;

Related

Finding index of white space in Perl

I'm trying to find the index of white space in a string in Perl.
For example, if I have the string
stuff/more stuffhere
I'd like to select the word "more" with a substring method. I can find the index of "/" but haven't figured out how to find the index of white space. The length of the substring I'm trying to select will vary, so I can't hard code the index. There will only be one white space in the string (other than those after the end of the string).
Also, if anybody has any better ideas of how to do this, I'd appreciate hearing them. I'm fairly new to programming so I'm open to advice. Thanks.
Just use index:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'stuff/more stuffhere';
my $index_of_slash = index $string, '/';
my $index_of_space = index $string, ' ';
say "Between $index_of_slash and $index_of_space.";
The output is
Between 5 and 10.
Which is correct:
0 1
01234567890123456789
stuff/more stuffhere
If by "whitespace" you also mean tabs or whatever, you can use a regular expression match and the special variables #- and #+:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = "stuff/more\tstuffhere";
if ($string =~ m{/.*(?=\s)}) {
say "Between $-[0] and $+[0]";
}
The (?=\s) means is followed by a whitespace character, but the character itself is not part of the match, so you don't need to do any maths on the returned values.
As you stated, you want to select the word between the first /
and the first space following it.
If this is the case, you maybe don't need any index (you need just
the word).
A perfect tool to find something in a text is regex.
Look at the following code:
$txt = 'stuff/more stuffxx here';
if ($txt =~ /\/(.+?) /) {
print "Match: $1.\n";
}
The regex used tries to match:
a slash,
a non-empty sequence of any chars (note ? - reluctant
version), enclosed in a capturing group,
a space.
So after the match $1 contains what was captured by the first
capturing group, i.e. "your" word.
But if for any reason you were interested in starting and ending
offsets to this word, you can read them from $-[1]
and $+[1] (starting / ending indices of the first capturing group).
The arrays #- (#LAST_MATCH_START) and #+ (#LAST_MATCH_END) give offsets of the start and end of last successful submatches. See Regex related variables in perlvar.
You can capture your real target, and then read off the offset right after it with $+[0]
#+
This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. $+[0] is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against.
Example
my $str = 'target and target with spaces';
while ($str =~ /(target)\s/g)
{
say "Position after match: $+[0]"
}
prints
Position after match: 7
Position after match: 18
These are positions right after 'target', so of spaces that come after it.
Or you can capture \s instead and use $-[1] + 1 (first position of the match, the space).
You can use
my $str = "stuff/more stuffhere";
if ($str =~ m{/\K\S+}) {
... substr($str, $-[0], $+[0] - $-[0]) ...
}
But why substr? That's very weird there. Maybe if you told us what you actually wanted to do, we could provide a better alternatives. Here are three cases:
Data extraction:
my $str = "stuff/more stuffhere";
if ( my ($word) = $str =~ m{/(\S+)} ) {
say $word; # more
}
Data replacement:
my $str = "stuff/more stuffhere";
$str =~ s{/\K\S+}{REPLACED};
say $str; # stuff/REPLACED stuffhere
Data replacement (dynamic):
my $str = "stuff/more stuffhere";
$str =~ s{/\K(\S+)}{ uc($1) }e;
say $str; # stuff/MORE stuffhere

How can I prevent Perl from interpreting double-backslash as single-backslash character?

How can I print a string (single-quoted) containing double-backslash \\ characters as is without making Perl somehow interpolating it to single-slash \? I don't want to alter the string by adding more escape characters also.
my $string1 = 'a\\\b';
print $string1; #prints 'a\b'
my $string1 = 'a\\\\b';
#I know I can alter the string to escape each backslash
#but I want to keep string as is.
print $string1; #prints 'a\\b'
#I can also use single-quoted here document
#but unfortunately this would make my code syntactically look horrible.
my $string1 = <<'EOF';
a\\b
EOF
print $string1; #prints a\\b, with newline that could be removed with chomp
The only quoting construct in Perl that doesn't interpret backslashes at all is the single-quoted here document:
my $string1 = <<'EOF';
a\\\b
EOF
print $string1; # Prints a\\\b, with newline
Because here-docs are line-based, it's unavoidable that you will get a newline at the end of your string, but you can remove it with chomp.
Other techniques are simply to live with it and backslash your strings correctly (for small amounts of data), or to put them in a __DATA__ section or an external file (for large amounts of data).
If you are mildly crazy, and like the idea of using experimental software that mucks about with perl's internals to improve the aesthetics of your code, you can use the Syntax::Keyword::RawQuote module, on CPAN since this morning.
use syntax 'raw_quote';
my $string1 = r'a\\\b';
print $string1; # prints 'a\\\b'
Thanks to #melpomene for the inspiration.
Since the backslash interpolation happens in string literals, perhaps you could declare your literals using some other arbitrary symbol, then substitute them for something else later.
my $string = 'a!!!b';
$string =~ s{!}{\\}g;
print $string; #prints 'a\\\b'
Of course it doesn't have to be !, any symbol that does not conflict with a normal character in the string will do. You said you need to make a number of strings, so you could put the substitution in a function
sub bs {
$_[0] =~ s{!}{\\}gr
}
my $string = 'a!!!b';
print bs($string); #prints 'a\\\b'
P.S.
That function uses the non-destructive substitution modifier /r introduced in v5.14. If you are using an older version, then the function would need to be written like this
sub bs {
$_[0] =~ s{!}{\\}g;
return $_[0];
}
Or if you like something more readable
sub bs {
my $str = shift;
$str =~ s{!}{\\}g;
return $str;
}

Perl tr operator is transliterating based on the variable's name not its value

I'm using Perl 5.16.2 to try to count the number of occurrences of a particular delimiter in the $_ string. The delimiter is passed to my Perl program via the #ARGV array. I verify that it is correct within the program. My instruction to count the number of delimiters in the string is:
$dlm_count = tr/$dlm//;
If I hardcode the delimiter, e.g. $dlm_count = tr/,//; the count comes out correctly. But when I use the variable $dlm, the count is wrong. I modified the instruction to say
$dlm_count = tr/$dlm/\t/;
and realized from how the tabs were inserted in the string that the operation was substituting every instance of any of the four characters "$", "d", "l", or "m" to \t — i.e. any of the four characters that made up my variable name $dlm.
Here is a sample program that illustrates the problem:
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm = ",";
my $dlm_count = tr/$dlm/\t/;
print "The count is $dlm_count\n";
print "The modified string is $_\n";
There are only two commas in the $_ string, but this program prints the following:
The count is 3
The modified string is abc efghij,k ,nopqrstuvwxyz
Why is the $dlm token being treated as a literal string of four characters instead of as a variable name?
You cannot use tr that way, it doesn't interpolate variables. It runs strictly character by character replacement. So this
$string =~ tr/a$v/123/
is going to replace every a with 1, every $ with 2, and every v with 3. It is not a regex but a transliteration. From perlop
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $# if $#;
eval "tr/$oldlist/$newlist/, 1" or die $#;
The above example from docs hints how to count. For $dlms in $string
$dlm_count = eval "\$string =~ tr/$dlm//";
The $string is escaped so to not be interpolated before it gets to eval. In your case
$dlm_count = eval "tr/$dlm//";
You can also use tools other than tr (or regex). For example, with string being in $_
my $dlm_count = grep { /$dlm/ } split //;
When split breaks $_ by the pattern that is empty string (//) it returns the list of all characters in it. Then the grep block tests each against $dlm so returning the list of as many $dlm characters as there were in $_. Since this is assigned to a scalar, $dlm_count is set to the length of that list, which is the count of all $dlm.
In the section of the docs on perlop 'Quote Like Operators', it states:
Because the transliteration table is built at compile time, neither
the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation. That means that if you want to use variables, you must
use an eval():
As documented and as you discovered, tr/// doesn't interpolate. The simple solution is to use s/// instead.
my $dlm = ",";
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm_count = s/\Q$dlm/\t/g;
If the transliteration is being performed in a loop, the following might speed things up noticeably:
my $dlm = ",";
my $tr = eval "sub { tr/\Q$dlm\E/\\t/ }";
for (...) {
my $dlm_count = $tr->();
...
}
Although several answers have hinted at the eval() idiom for tr///, none have the form that covers cases where the string has tr syntax characters in it, e.g.- (hyphen):
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm = ",";
my $dlm_count = eval sprintf "tr/%s/%s/", map quotemeta, $dlm, "\t";
But as others have noted, there are lots of ways to count characters in Perl that avoid eval(), here's another:
my $dlm_count = () = m/$dlm/go;

how to remove date identifier and special charactar from string

I want to remove date identifier and * from string .
$string = "*102015 Supplied air hood";
$output = "Supplied air hood";
i have used
$string =~ s/[#\%&\""*+]//g;
$string =~ s/^\s+//;
what should i used to get string value = "Supplied air hood";
Thanks in advance
To remove everything from the string up to the first space, you can write
$str =~ s/^\S*\s+//;
Your pattern doesn't contain numbers. It would remove the *, but nothing else. If you want to remove a * followed by six digits and a blank at the beginning of the string, do it like this:
$string =~ s/^\*\d{6} //;
However, if that string always contains a pattern like this, you don't need a regular expression substitution. You can simply take a substring.
my $output = substr $string, 8;
That will assign the content of $string starting from the 9th character
The script below does what you want, assuming that the date always appears at the beginning the line, and that it is follow by exactly one space.
use strict;
use warnings;
while (<DATA>)
{
# skip one or more characters not a space
# then skip exactly one space
# then capture all remaining characters
# and assign them to $s
my ($s) = $_ =~ /[^ ]+ (.*)/;
print $s, "\n";
}
__DATA__
*110115 first date
*110115 second date
*110315 third date
Output is:
first date
second date
third date

How to convert single quoted string to double quoted string so it gets interpreted?

For example I have,
my $str = '\t';
print "My String is ".$str;
I want the output to interpret the tab character and return something like:
"My String is \t"
I am actually getting the value of the string from the database, and it returns it as a single quoted string.
String::Interpolate does exactly that
$ perl -MString::Interpolate=interpolate -E 'say "My String is [". interpolate(shift) . "]"' '\t'
My String is [ ]
'\t' and "\t" are string literals, pieces of Perl code that produces strings ("\","t" and the tab character respectively). The database doesn't return Perl code, so describing the problem in terms of single-quoted literals and double-quoted literals makes no sense. You have a string, period.
The string is formed of the characters "\" and "t". You want to convert that sequence of characters into the tab character. That's a simple substitution.
s/\\t/\t/g
I presume you don't want to deal with just \t. You can create a table of the sequences.
my %escapes = (
"t" => "\t",
"n" => "\n",
"\" => "\\",
);
my $escapes_pat = join('', map quotemeta, keys(%escapes));
$escapes_pat = qr/[$escapes_pat]/;
s/\\($escapes_pat)/$escapes{$1}/g;
You can follow the technique in perlfaq4's answer to How can I expand variables in text strings?:
If you can avoid it, don't, or if you can use a templating system, such as Text::Template or Template Toolkit, do that instead. You might even be able to get the job done with sprintf or printf:
my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
However, for the one-off simple case where I don't want to pull out a full templating system, I'll use a string that has two Perl scalar variables in it. In this example, I want to expand $foo and $bar to their variable's values:
my $foo = 'Fred';
my $bar = 'Barney';
$string = 'Say hello to $foo and $bar';
One way I can do this involves the substitution operator and a double /e flag. The first /e evaluates $1 on the replacement side and turns it into $foo. The second /e starts with $foo and replaces it with its value. $foo, then, turns into 'Fred', and that's finally what's left in the string:
$string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
The /e will also silently ignore violations of strict, replacing undefined variable names with the empty string. Since I'm using the /e flag (twice even!), I have all of the same security problems I have with eval in its string form. If there's something odd in $foo, perhaps something like #{[ system "rm -rf /" ]}, then I could get myself in trouble.
To get around the security problem, I could also pull the values from a hash instead of evaluating variable names. Using a single /e, I can check the hash to ensure the value exists, and if it doesn't, I can replace the missing value with a marker, in this case ??? to signal that I missed something:
my $string = 'This has $foo and $bar';
my %Replacements = (
foo => 'Fred',
);
# $string =~ s/\$(\w+)/$Replacements{$1}/g;
$string =~ s/\$(\w+)/
exists $Replacements{$1} ? $Replacements{$1} : '???'
/eg;
print $string;
Well, I just tried below workaround it worked. Please have a look
my $str1 = "1234\n\t5678";
print $str1;
#it prints
#1234
# 5678
$str1 =~ s/\t/\\t/g;
$str1 =~ s/\n/\\n/g;
print $str1;
#it prints exactly the same
#1234\n\t5678