Perl- How do I insert a space before each capital letter except for the first occurrence or existing? - perl

I have a string like:
SomeCamel WasEnteringText
I have found various means of splitting up the string and inserting spaces with php str_replace but, i need it in perl.
Sometimes there may be a space before the string, sometimes not. Sometimes there will be a space in the string but, sometimes not.
I tried:
my $camel = "SomeCamel WasEnteringText";
#or
my $camel = " SomeCamel WasEntering Text";
$camel =~ s/^[A-Z]/\s[A-Z]/g;
#and
$camel =~ s/([\w']+)/\u$1/g;
and many more combinations of =~s//g; after much reading.
I need a guru to direct this camel towards an oasis of answers.
OK, based on the input below I now have:
$camel =~ s/([A-Z])/ $1/g;
$camel =~ s/^ //; # Strip out starting whitespace
$camel =~ s/([^[:space:]]+)/\u$1/g;
Which gets it done but seems excessive. Works though.

s/(?<!^)[A-Z][a-z]*+(?!\s+)\K/ /g;
And the less "screw this horsecrap" version:
s/
(?<!^) #Something not following the start of line,
[A-Z][a-z]*+ #That starts with a capital letter and is followed by
#Zero or more lowercased letters, not giving anything back,
(?!\s+) #Not followed by one or more spaces,
\K #Better explained here [1]
/ /gx; #"Replace" it with a space.
EDIT: I noticed that this also adds extra whitespace when you add punctuation into the mix, which probably isn't what the OP wants; thankfully, the fix is simply changing the negative look ahead from \s+ to \W+. Although now I'm beginning to wonder why I actually added those pluses. Drats, me!
EDIT2: Erm, apologies, originally forgot the /g flag.
EDIT3: Okay, someone downvote me. I went retarded. No need for the negative lookbehind for ^ - I really dropped the ball on this one. Hopefully fixed:
s/[A-Z][a-z]*+(?!\W)\K/ /gx;
1: http://perldoc.perl.org/perlre.html

Try:
$camel =~ s/(?<! )([A-Z])/ $1/g; # Search for "(?<!pattern)" in perldoc perlre
$camel =~ s/^ (?=[A-Z])//; # Strip out extra starting whitespace followed by A-Z
Please note that the obvious try of $camel =~ s/([^ ])([A-Z])/$1 $2/g; has a bug: it doesn't work if there are capital letters following one another (e.g. "ABCD" will be transformed into "ABCD" and not "A B C D")

Try :
s/(?<=[a-z])(?=[A-Z])/ /g
This inserts as space after a lower case character (ie not a space or start of string) and before and upper case character.

Improving ...
... on Hughmeir's, this works also with numbers and words starting with lower-case letters.
s/[a-z0-9]+(?=[A-Z])\K/ /gx
Tests
myBrainIsBleeding => my_Brain_Is_Bleeding
MyBrainIsBleeding => My_Brain_Is_Bleeding
myBRAInIsBLLEding => my_BRAIn_Is_BLLEding
MYBrainIsB0leeding => MYBrain_Is_B0leeding
0My0BrainIs0Bleeding0 => 0_My0_Brain_Is0_Bleeding0

Related

Perl - can't remove trailing characters at the end of string

I have some trailing characters at the end of a string peregrinevwap^_^_
print "JH 4 - app: $application \n";
app: peregrinevwap^_^_
Do you know why they are there and how I can remove them. I tried the chomp command but this hasn't worked.
Check out the tr//cd operator to get rid of unwanted characters.
It's documented in "perldoc perlop"
$application =~ tr/a-zA-Z//cd;
Will remove everything except letters from the string and
$application =~ tr/^_//d;
Will remove all "^" and "_" characters.
If you only want to remove certain characters when they at the end of the string, use the s// search/replace operator with regular expressions and the $ anchor to match the end of the string.
Here's an example:
s/[\^_]*$//;
Let's hope the underscores do not occur at the end of your strings, otherwise you can't automatically separate them from these unwanted characters.
Are you sure these characters are actually ^ and _ characters?
^_ could also indicate Ctrl-Underscore, ASCII character 0x1F (Unit Separator). (Not a character I've ever seen used, but you never know.)
If this is in fact the case, you can remove them with something like:
$application =~ s/\x1F//g;

Need Regular expression - perl

I am looking for a regx for below expression:
my $text = "1170 KB/s (244475 bytes in 2.204s)"; # I want to retrieve last ‘2.204’ from this String.
$text =~ m/\d+[^\d*](\d+)/; #Regexp
my $num = $1;
print " $num ";
Output:
204
But I need 2.204 as output, please correct me.
Can any one help me out?
The regex is doing exactly what you asked it to: It is matching digits \d+, followed by one non-digit or star [^\d*], followed by digits \d+. The only thing that matches that in your string is 204.
If you want a quick fix, you can just move the parentheses:
m/(\d+[^\d*]\d+)/
This would (with the above input) match what you want. A more exact way to put it would be:
m/(\d+\.\d+)/
Of course this will match any float precision number, so if you can have more of those, that's not a good idea. You can shore it up by using an anchor, like so:
m/(\d+\.\d+)s\)/
Where s\) forces the match to occur at only that place. Further strictures:
m/\(\d+\D+(\d+\.\d+)s\)/
You might also want to account for the possibility of your target number not being a float:
m/\(\d+\D+(\d+\.?\d*)s\)/
By using ? and * we allow for those parts not to match at all. This is not recommended to do unless you are using anchors. You can also replace everything in the capture group with [\d.]+.
If you are not fond of matching the parentheses, you can match the text:
m/bytes in ([\d.]+)s/
I'd go with the second marker as indicator where you are in the string:
my ($num) = ($text =~ /(\d+\.\d+)s/);
with explanations:
/( # start of matching group
\d+ # first digits
\. # a literal '.', take \D if you want non-numbers
\d+ # second digits
)/x # close the matching group and the regex
You had the matching groups wrong. Also the [^\d] is a bit excessive, generally you can negate some of the backspaced special classes (\d,\h, \s and \w) with their respective uppercase letter.
Try this regex:
$text =~ m/\d+[^\d]*(\d+\.?\d*)s/;
That should match 1+ digits, a decimal point if there is one, 0 or more decimal places, and make sure it's followed by a "s".

searching a word with a particular character in it in perl

am trying to search a word where it starts with any character (Capital letter) but ends with zero in perl.
For example
ABC0
XYZ0
EIU0
QW0
What I have tried -
$abc =~ /^[A-Z].+0$/
But I am not getting proper output for this. Can anybody help me please?
The ^ anchores at the start of a string, the $ at the end. .+ matches as many non-newline-characters as possible. Therefore
"ABC0 XYZ0 EIU0 QW0" =~ /^[A-Z].+0$/
matches the whole string.
The \b assertion matches at word edges: everywhere a word character and a non-word-character are adjacent. The \w charclass holds only word characters, the \S charclass all non-space-characters. Either of these is better than ..
So you may want to use /\b[A-Z]\W*0\b/.
This might work :
$abc =~ /\b[A-Z].*0\b/
\b matches word boundaries.

How do I replace all occurrences of certain characters with their predecessors?

$s = "bla..bla";
$s =~ s/([^%])\./$1/g;
I think it should replace all occurrences of . that is not after % with the character that is before ..
But $s is then: bla.bla, but
it should be blabla. Where is the problem? I know I can use quantifiers, but I need do it this way.
When a global regular expression is searching a string it will not find overlapping matches.
The first match in your string will be a., which is replaced with a. When the regex engine resumes searching it starts at the next . so it sees .bla as the rest of the string, and your regex requires a character to match before the . so it cannot match again.
Instead, use a negative lookbehind to perform the assertion that the previous character is not %:
$s =~ s/(?<!%)\.//g;
Note that if you use a positive lookbehind like (?<=[^%]), you will not replace the . if it is the first character in the string.
The problem is that even with the /g flag, each substitution starts looking where the previous one left off. You're trying to replace a. with a and then a. with a, but the second replacement doesn't happen because the a has already been "swallowed" by the previous replacement.
One fix is to use a zero-width lookbehind assertion:
$s =~ s/(?<=[^%])\.//g;
which will remove any . that is not the first character in the string, and that is not preceded by %.
But you might actually want this:
$s =~ s/(?<!%)\.//g;
which will remove any . that is not preceded by %, even if it is the first character in the string.
Much simpler than look-behinds is to use:
$s =~ s/([^%])\.+/$1/g;
This replaces any string of one or more dots after a character other than % by nothing.

how to delete single quotes but not apostrophes in perl

I would like to know how to delete single quotes but not apostrophes in perl.
For example:
'It's raining again!'
print
It's raining again!
Thanks so much
If you assume that a single-quote is always preceded or followed by whitespace, the following pair of regular expressions should work:
$line =~ s/\s'/ /g; #preceded by whitespace
$line =~ s/'\s/ /g; #followed by whitespace
you also need to account for if the string starts or ends with a single quote:
$str =~ s/^'//; #at the start of a string
$str =~ s/'$//; #at the end of a string
foreach (<DATA>) {
s/(:?(^\s*'|'$))//g;
print;
}
__DATA__
'It's raining again!'
OUTPUT
It's raining again!
EXPLANATIONS
there's more one than one way to do it
(:?) prevent non-needed capture
Tricky one. Some single quotes come after or before letters, but you want to remove only those between letters. Perhaps something like this, using negative lookarounds:
s/(?<![\pL\s])'|'(?![\pL\s])//g;
Which will remove either single quotes without letters or whitespace after or before it. Lots of negations to keep track of there. The expanded version:
s/
(?<![\pL\s])' # no letters or whitespace before single quote
| # or
'(?![\pL\s]) # no letters or whitespace after single quote
//gx;
This will cover words like - as Eli Algranti pointed out in a comment - boys' toys and that's, but language is always tricky to predict. For example, it will be next to impossible to solve something like:
'She looked at him and said, 'That's impossible!''
Of course, if you expect your single quotes to appear only at end or beginning of string, you don't need to be this fancy, you can just remove the last and first character, with any means necessary. Such as, for example, as sputnik just suggested:
s/^'|'$//g;