I need to be able to take a sentence in any case and convert it to having the 1st word and each word capitalized except the following words: to, a, the, at, in, of, with, and, but, or
example: "hello how are you dan" needed result: Hello How are You Dan"
Now I know this looks like homework but I am at the point that I need to learn by seeing correct script usage. Lots of effort has been put in to figure out how to do this but I need someone to bridge the gap by showing me the correct method...then I can review it and learn from it.
Windos' answer is spot on, but I'm bored, so here is a fully working implementation:
function Get-CustomTitleCase {
param(
[string]$InputString
)
$NoCapitalization = #(
'are',
'to',
'a',
'the',
'at',
'in',
'of',
'with',
'and',
'but',
'or')
( $InputString -split " " |ForEach-Object {
if($_ -notin $NoCapitalization){
"$([char]::ToUpper($_[0]))$($_.Substring(1))"
} else { $_ }
}) -join " "
}
Use it like this:
PS C:\> Get-CustomTitleCase "hello, how are you dan"
Hello, How are You Dan
$string = 'hello how are you dan to, a, the, at, in, of, with, and, but, or'
[Regex]::Replace($string, '\b(?!(are|to|a|the|at|in|of|with|and|but|or)\b)\w', { param($letter) $letter.Value.ToUpper() })
Regex Explanation:
\b #Start at the beginning of a word.
(?!(are|to|a|the|at|in|of|with|and|but|or) #match only if a word does not begin with "to, a, the, at, in, of, with, and, but, or"
\b #Second \b to signify that there are no characters after the words listed in the negative lookahead list.
\w #Match any single word character
$letter.Value.ToUpper() # convert the matched letter(value) to uppercase
Negative Lookahead
Regex101 Link
I won't give you a complete script, but I can pseudo code this to hopefully put you on the right track.
First of all, create an array of strings containing all the words you don't want to capitalize.
The split the input string ('hello how are you dan') by spaces. You should end up with an array similar to 'hello', 'how', 'are'...
Loop through the split up string, and see if the word is in the first array you created.
If it is, ignore it, but if it isn't you want to take the first letter and use a string method to ensure it is in it's uppercase form.
You then need to join the string back up (don't forget the spaces.) You could either reconstruct the string ready for output as you're looping through the split array or at the end.
(Emphasis added to hint towards certain keywords you'll be after.)
Related
I have run into an issue where a perl script we use to parse a text file is omitting lines containing the tilde (~) character, and I can't figure out why.
The sample below illustrates what I mean:
#!/usr/bin/perl
use warnings;
formline " testing1\n";
formline " ~testing2\n";
formline " testing3\n";
my $body_text = $^A;
$^A = "";
print $body_text
The output of this example is:
testing1
testing3
The line containing the tilde is dropped entirely from the accumulator. This happens whether there is any text preceding the character or not.
Is there any way to print the line with the tilde treated as a literal part of the string?
~ is special in forms (see perlform) and there's no way to escape it. But you can create a field for it and populate it with a tilde:
formline " \#testing2\n", '~';
The first argument to formline is the "picture" (template). That picture uses various characters to mean particular things. The ~ means to suppress output if the fields are blank. Since you supply no fields in your call to formline, your fields are blank and output is suppressed.
my #lines = ( '', 'x y z', 'x~y~z' );
foreach $line ( #lines ) { # forms don't use lexicals, so no my on control
write;
}
format STDOUT =
~ ID: #*
$line
.
The output doesn't have a line for the blank field because the ~ in the picture told it to suppress output when $line doesn't have anything:
ID: x y z
ID: x~y~z
Note that tildes coming from the data are just fine; they are like any other character.
Here's probably something closer to what you meant. Create a picture, #* (variable-width multiline text), and supply it with values to fill it:
while( <DATA> ) {
local $^A;
formline '#*', $_;
print $^A, "\n";
}
__DATA__
testing1
~testing2
testing3
The output shows the field with the ~:
testing1
~testing2
testing3
However, the question is very odd because the way you appear to be doing things seems like you aren't really doing what formats want to do. Perhaps you have some tricky thing where you're trying to take the picture from input data. But if you aren't going to give it any values, what are you really formatting? Consider that you may not actually want formats.
I want to search for "Frequencies" (its first letter in uppercase) in my text files. And my code will print to the output file some columns including "Frequencies". But there are also occurrences of "frequencies" (its first letter in lowercase) in the text files. I am using this part $search_word = qr/Frequencies/; in the code. How can I make the first letter of the word "Frequencies" upper case in the $search_word = qr/Frequencies/; part to eliminate the occurrences of "frequencies" in the search?
In Perl, you have ucfirst to capitalize the first letter. For example:
$a = "freQuEncY";
$a = ucfirst(lc($a)); # $a <-- "Frequency";
Why don't you use regex match to check , like this
if($string_to_be_searched =~ /Frequencies/){
do something; # like print
}
Try this one:
if ( $$test_string[$i] =~ /\b(?i)f(?-i)requencies/ ) {
my $captured = ucfirst($&);
# process $captured
}
Explanation:
The regex matches will be case-insensitive for the first letter of the word frequencies only. (?i) turns on case-insensitive matching at the position it occurs for the remainder of the pattern or until it is revoked by (?-i). This works for other flags too, cf. perldoc section on re.
$& contains the full match
\b denotes a word boundary (perhaps you don't need that but your problem description suggests you do).
Here's the scenario -- One step of the process involves fixing city names when the data is obviously misspelled, along with some basic conversions like "MTN" to "Mountain" and so forth. I've built a variable containing several substitution strings, and I'm trying to apply that set of subs on one of the input fields later down the line.
my $citysub = <<'EOF';
s/DEQUEEN/DE QUEEN/;
s/ELDORADO/EL DORADO/;
... # there are about 100 such substitution strings
EOF
...
while ($line <INFILE>)
{
...
#field = split(/","/,$line); # it's a comma-delimited file with quoted strings; this is spltting exactly like I intend; at the end, I'll piece it back together properly
...
# the 9th field and 12th field are city names, i.e., $field[8] and $field[12]
$field[8] =~ $citysub; # this is what I'm wanting to do, but it doesn't work!
# since that doesn't work, I'm using the following, but it's much slower, obviiously
$field[8] = `echo $field[8]|sed -e "$citysub"`; # external calls to system commands
So, what's the proper syntax to insert a multi-line substitution string and apply it toward a single array value?
my %citysub = ( "DEQUEEN" => "DE QUEEN", "ELDORADO" => "EL DORADO" );
for my $find ( keys %citysub ) {
my $replace = $citysub{ $find };
$field[8] =~ s/$find/$replace/g;
}
Explanation: Create a hash of "thing to match" => "thing to replace with". then loop over that hash and run s/// with the thing to match and the thing to replace with.
I am new to PERL and working on a regex to match only words with equal to or more than 3 letters . Here is the program I am trying. I tried adding \w{3,} since it should match 3 re more characters. But it is still matching <3 characters in a word. For example If i give "This is a Pattern". I want my $field to match only "This" and "Pattern" and skip "is" and "a".
#!/usr/bin/perl
while (<STDIN>) {
foreach my $reg_part (split(/\s+/, $_)) {
if ($reg_part =~ /([^\w\#\.]*)?([\w{3,}\#\(\)\+\$\.]+)(?::(.+))?/) {
print "reg_part = $reg_part \n";
my ($mod, $field, $pat) = ($1, $2, $3);
print "#$mod#$field#$pat#$negate#\n";
}
}
}
exit(0);
What am I missing?
You have
[\w{3,}...]+
which is the same as
[{},3\w...]+
I think you want
(?:\w{3,}|[\$\#()+.])+
Break your regular expression up.
You know you want three word characters, so specify :-
# Match three word characters.
\w{3}
After that, you don't really care if the word has more characters, but you won't block it either.
# Match 0 or more word characters
\w*
Finally, you want to ensure that you have boundaries to catch the end of words. So, putting it all together. To match a word with at least three word characters, possibly more, use:-
# Word boundaries at start and end
\b\w{3}\w*\b
Note - \w matches alphanumeric - if it's just alpha you need:-
# Alpha only
\b[A-Za-z]{3}[A-Za-z]*\b
Consider the following string
String = "this is for test. i'm new to perl! Please help. can u help? i hope so."
In the above string after . or ? or ! the next character should be in upper case. how can I do that?
I'm reading from text file line by line and I need to write modified data to another file.
your help will be greatly appreciated.
you could use a regular expression
try this:
my $s = "...";
$s =~ s/([\.\?!]\s*[a-z])/uc($1)/ge; # of course $1 , thanks to plusplus
the g-flag searches for all matches and the e-flag executes uc to convert the letter to uppercase
Explanation:
with [.\?!] you search for your punctuation marks
\s* is for whitespaces between the marks and the first letter of your next word and
[a-z] matches on a single letter (in this case the first one of the next word
the regular expression mentioned above searches with these patterns for every appearance of a punctuation mark followed by (optional) whitespaces and a letter and replaces it with the result of uc (which converts the match to uppercase).
For example:
my $s = "this is for test. i'm new to perl! Please help. can u help? i hope so.";
$s =~ s/([\.\?!]\s*[a-z])/uc(&1)/ge;
print $s;
will find ". i", "! P", ". c" and "? i" and replaces then, so the printed result is:
this is for test. I'm new to perl! Please help. Can u help? I hope so.
You can use the substitution operator s///:
$string =~ s/([.?!]\s*\S)/ uc($1) /ge;
Here's a split solution:
$str = "this is for test. im new to perl! Please help. can u help? i hope so.";
say join "", map ucfirst, split /([?!.]\s*)/, $str;
If all you are doing is printing to a new file, you don't need to join the string back up. E.g.
while ($line = <$input>) {
print $output map ucfirst, split /([?!.]\s*)/, $line;
}
edit - completely misread the question, thought you were just asking to uppercase the is for some reason, apologies for any confusion!
as the answers so far state, you could look at regular expressions, and the substitution operator (s///). No-one has mentioned the \b (word boundary) character though, which may be useful to find the single is - otherwise you are going to have to keep adding punctuation characters that you find to the character class match (the [ ... ]).
e.g.
my $x = "this is for test. i'm new to perl! Please help. can u help? i hope so. ".
\"i want it to work!\". Dave, Bob, Henry viii and i are friends. foo i bar.";
$x =~ s/\bi\b/I/g; # or could use the capture () and uc($1) in eugene's answer
gives:
# this is for test. I'm new to perl! Please help. can u help? I hope so.
# "I want it to work!". Dave, Bob, Henry viii and I are friends. foo I bar.