How to capitalize first word in every sentence with Swift - swift

By taking into account of user locale, how can I capitalize the first word of each sentence in a paragraph? What I want to achieve is no matter the case inside the sentence, the first letter of each word will be uppercase and the rest will be lowercase. I can do only one sentence by first converting everything to lower case, then get first letter and make uppercase and finally add them up together. My question is different than How to capitalize each word in a string using Swift iOS since I don't want to capitalize each word. I just want to capitalize the first word of each sentence. capitalizedString turns
"this is first sentence. this is second sentence."
to
"This Is First Sentence. This Is Second Sentence."
What I want is
"This is first sentence. This is second sentence."
My question is also different than Capitalise first letter of every sentence Since #rintaro's code doesn't work on my below example. It keeps capital letters in original text intact. With #rintaro's code;
before
"someSentenceWith UTF text İŞğĞ. anotherSentenceğüÜğ"
after
"SomeSentenceWith UTF text İŞğĞ. AnotherSentenceğüÜğ."
What I want to achieve,
"Somesentencewith utf text işğğ. Anothersentenceğüüğ."
My code below can only do partial conversion.
var description = "someSentenceWith UTF text İŞğĞ. anotherSentenceğüÜğ"
description = description.lowercaseStringWithLocale(NSLocale.currentLocale())
let first = description.startIndex
let rest = advance(first,1)..<description.endIndex
let capitalised = description[first...first].uppercaseStringWithLocale(NSLocale.currentLocale()) + description[rest]
I will really appreciate if you can please read my question carefully, since this is the third time I am editing the question. I am really sorry if I couldn't ask it clearly since I am not a native speaker. So even though #rintaro answered similar question, his answer doesn't solve my problem. #martin-r suggests a Objective-C answer which again doesn't solve the problem I have. There were another user eric something who also suggested another answer but deleted afterwards. I just can't understand why several people suggest different answer which doesn't answer my question.

Try:
let str = "someSentenceWith UTF text İŞğĞ. anotherSentenceğüÜğ"
var result = ""
str.uppercaseString.enumerateSubstringsInRange(indices(str), options: .BySentences) { (sub, _, _, _) in
result += sub[sub.startIndex ... sub.startIndex]
result += sub[sub.startIndex.successor() ..< sub.endIndex].lowercaseString
}
println(result) // -> "Somesentencewith utf text i̇şğğ. Anothersentenceğüüğ"
ADDED: Swift2
let str = "someSentenceWith UTF text İŞğĞ. anotherSentenceğüÜğ"
var result = ""
str.uppercaseString.enumerateSubstringsInRange(str.characters.indices, options: .BySentences) { (sub, _, _, _) in
result += String(sub!.characters.prefix(1))
result += String(sub!.characters.dropFirst(1)).lowercaseString
}
print(result)

Updating #rintaro's code for Swift 3:
let str = "someSentenceWith UTF text İŞğĞ. anotherSentenceğüÜğ"
var result = ""
str.uppercased().enumerateSubstrings(in: str.startIndex..<str.endIndex, options: .bySentences) { (sub, _, _, _) in
result += String(sub!.characters.prefix(1))
result += String(sub!.characters.dropFirst(1)).lowercased()
}
print(result)

You can use Regular Expressions to achieve this. I'm adding this function as a String extension so it will be trivial to call in the future:
extension String {
func toUppercaseAtSentenceBoundary() -> String {
var string = self.lowercaseString
var capacity = string.utf16Count
var mutable = NSMutableString(capacity: capacity)
mutable.appendString(string)
var error: NSError?
if let regex = NSRegularExpression(
pattern: "(?:^|\\b\\.[ ]*)(\\p{Ll})",
options: NSRegularExpressionOptions.AnchorsMatchLines,
error: &error
) {
if let results = regex.matchesInString(
string,
options: NSMatchingOptions.allZeros,
range: NSMakeRange(0, capacity)
) as? [NSTextCheckingResult] {
for result in results {
let numRanges = result.numberOfRanges
if numRanges >= 1 {
for i in 1..<numRanges {
let range = result.rangeAtIndex(i)
let substring = mutable.substringWithRange(range)
mutable.replaceCharactersInRange(range, withString: substring.uppercaseString)
}
}
}
}
}
return mutable
}
}
var string = "someSentenceWith UTF text İŞğĞ. anotherSentenceğüÜğ.".toUppercaseAtSentenceBoundary()

I wrote this extension in Swift 3 according to #Katy's code
extension String {
func toUppercaseAtSentenceBoundary() -> String {
var result = ""
self.uppercased().enumerateSubstrings(in: self.startIndex..<self.endIndex, options: .bySentences) { (sub, _, _, _) in
result += String(sub!.characters.prefix(1))
result += String(sub!.characters.dropFirst(1)).lowercased()
}
return result as String
}
}
How to use:
let string = "This is First sentence. This is second Sentence.".toUppercaseAtSentenceBoundary()
print(string) /* Output: "This is first sentence. This is second sentence." */

Related

Swift 5.1 - is there a clean way to deal with locations of substrings/ pattern matches

I'm very, very new to Swift and admittedly struggling with some of its constructs. I have to work with a text file and do many manipulations - here's an example to illustrate the point:
let's say I have a text like this (multi line)
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
x----------------x
I want to be able to do simple things like find the location of #name, then split it to get the name and so on. I've done this in javascript and it was pretty simple with the use of substr and the regex matches.
In swift, which is supposed to be swift and easy and what not, I'm finding this exceedingly confusing.
Can someone help with how one might do
Find the location of the start of a substring
Extract all text between from the end of a substring to the end of text
Sorry if this is trivial - but the Apple documentation feels very complicated, and lots of examples are years old. I can't also seem to find easy application of regex.
You can use string range(of: String) method to find the range of your string, get its upperBound and search for the end of the line from that position of the string:
Playground testing:
let sentence = """
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
"""
if let start = sentence.range(of: "#name:")?.upperBound,
let end = sentence[start...].range(of: "\n")?.lowerBound {
let substring = sentence[start..<end]
print("name:", substring)
}
If you need to get the string from there to the end of the string you can use PartialRangeFrom:
if let start = sentence.range(of: "#summary:")?.upperBound {
let substring = sentence[start...]
print("summary:", substring)
}
If you find yourself using that a lot you can extend StringProtocol and create your own method:
extension StringProtocol {
func substring<S:StringProtocol,T:StringProtocol>(between start: S, and end: T, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: start, options: options)?.upperBound,
let upper = self[lower...].range(of: end, options: options)?.lowerBound
else { return nil }
return self[lower..<upper]
}
func substring<S:StringProtocol>(after string: S, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: string, options: options)?.upperBound else { return nil }
return self[lower...]
}
}
Usage:
let name = sentence.substring(between: "#name:", and: "\n") // " a name"
let sumary = sentence.substring(after: "#summary:") // " a paragraph of text\n\n{{something}}\n\na whole bunch of multi-line text"
You can use regular expressions as well:
let name = sentence.substring(between: "#\\w+:", and: "\\n", options: .regularExpression) // " a name"
You can do this with range() and distance():
let str = "Example string"
let range = str.range(of: "amp")!
print(str.distance(from: str.startIndex, to: range.lowerBound)) // 2
let lastStr = str[range.upperBound...]
print(lastStr) // "le string"

Regular expressions in swift

I'm bit confused by NSRegularExpression in swift, can any one help me?
task:1 given ("name","john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets.
task:2 given ("name"," john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets and extra spaces and finally get array of strings.
task:3 given key = value // comment
then I should get ["key","value","comment"]. Here I should get only strings in the line by avoiding = and //
I have tried below code for task 1 but not passed.
let string = "(name,john,string for user name)"
let pattern = "(?:\\w.*)"
do {
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count))
for match in matches {
if let range = Range(match.range, in: string) {
let name = string[range]
print(name)
}
}
} catch {
print("Regex was bad!")
}
Thanks in advance.
RegEx in Swift
These posts might help you to explore regular expressions in swift:
Does a string match a pattern?
Swift extract regex matches
How can I use String slicing subscripts in Swift 4?
How to use regex with Swift?
Swift 3 - How do I extract captured groups in regular expressions?
How to group search regular expressions using swift?
Task 1 & 2
This expression might help you to match your desired outputs for both Task 1 and 2:
"(\s+)?([a-z\s]+?)(\s+)?"
Based on Rob's advice, you could much reduce the boundaries, such as the char list [a-z\s]. For example, here, we can also use:
"(\s+)?(.*?)(\s+)?"
or
"(\s+)?(.+?)(\s+)?"
to simply pass everything in between two " and/or space.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
JavaScript Demo
const regex = /"(\s+)?([a-z\s]+?)(\s+)?"/gm;
const str = `"name","john","name of john"
"name"," john","name of john"
" name "," john","name of john "
" name "," john"," name of john "`;
const subst = `\n$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Task 3
This expression might help you to design an expression for the third task:
(.*?)([a-z\s]+)(.*?)
const regex = /(.*?)([a-z\s]+)(.*?)/gm;
const str = `key = value // comment
key = value with some text // comment`;
const subst = `$2,`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Separate the string by non alpha numeric characters except white spaces. Then trim the elements with white spaces.
extension String {
func words() -> [String] {
return self.components(separatedBy: CharacterSet.alphanumerics.inverted.subtracting(.whitespaces))
.filter({ !$0.isEmpty })
.map({ $0.trimmingCharacters(in: .whitespaces) })
}
}
let string1 = "(name,john,string for user name)"
let string2 = "(name, john,name of john)"
let string3 = "key = value // comment"
print(string1.words())//["name", "john", "string for user name"]
print(string2.words())//["name", "john", "name of john"]
print(string3.words())//["key", "value", "comment"]
Here I have done with after understanding all of above comments.
let text = """
Capturing and non-capturing groups are somewhat advanced topics. You’ll encounter examples of capturing and non-capturing groups later on in the tutorial
"""
extension String {
func rex (_ expr : String)->[String] {
return try! NSRegularExpression(pattern: expr, options: [.caseInsensitive])
.matches(in: self, options: [], range: NSRange(location: 0, length: self.count))
.map {
String(self[Range($0.range, in: self)!])
}
}
}
let r = text.rex("(?:\\w+-\\w+)") // pass any rex
A single pattern, works for test:1...3, in Swift.
let string =
//"(name,john,string for user name)" //test:1
//#"("name"," john","name of john")"# //test:2
"key = value // comment" //test:3
let pattern = #"(?:\w+)(?:\s+\w+)*"# //Swift 5+ only
//let pattern = "(?:\\w+)(?:\\s+\\w+)*"
do {
let regex = try NSRegularExpression(pattern: pattern)
let matches = regex.matches(in: string, range: NSRange(0..<string.utf16.count))
let matchingWords = matches.map {
String(string[Range($0.range, in: string)!])
}
print(matchingWords) //(test:3)->["key", "value", "comment"]
} catch {
print("Regex was bad!")
}
Let’s consider:
let string = "(name,José,name is José)"
I’d suggest a regex that looks for strings where:
It’s the substring either after the ( at the start of the full string or after a comma, i.e., look behind assertion of (?<=^\(|,);
It’s the substring that does not contain , within it, i.e., [^,]+?;
It’s the substring that is terminated by either a comma or ) at the end of the full string, i.e., look ahead assertion of (?=,|\)$), and
If you want to have it skip white space before and after the substrings, throw in the \s*+, too.
Thus:
let pattern = #"(?<=^\(|,)\s*+([^,]+?)\s*+(?=,|\)$)"#
let regex = try! NSRegularExpression(pattern: pattern)
regex.enumerateMatches(in: string, range: NSRange(string.startIndex..., in: string)) { match, _, _ in
if let nsRange = match?.range(at: 1), let range = Range(nsRange, in: string) {
let substring = String(string[range])
// do something with `substring` here
}
}
Note, I’m using the Swift 5 extended string delimiters (starting with #" and ending with "#) so that I don’t have to escape my backslashes within the string. If you’re using Swift 4 or earlier, you’ll want to escape those back slashes:
let pattern = "(?<=^\\(|,)\\s*+([^,]+?)\\s*+(?=,|\\)$)"

Remove the first six characters from a String (Swift)

What's the best way to go about removing the first six characters of a string? Through Stack Overflow, I've found a couple of ways that were supposed to be solutions but I noticed an error with them. For instance,
extension String {
func removing(charactersOf string: String) -> String {
let characterSet = CharacterSet(charactersIn: string)
let components = self.components(separatedBy: characterSet)
return components.joined(separator: "")
}
If I type in a website like https://www.example.com, and store it as a variable named website, then type in the following
website.removing(charactersOf: "https://")
it removes the https:// portion but it also removes all h's, all t's, :'s, etc. from the text.
How can I just delete the first characters?
In Swift 4 it is really simple, just use dropFirst(n: Int)
let myString = "Hello World"
myString.dropFirst(6)
//World
In your case: website.dropFirst(6)
Why not :
let stripped = String(website.characters.dropFirst(6))
Seems more concise and straightforward to me.
(it won't work with multi-char emojis either mind you)
[EDIT] Swift 4 made this even shorter:
let stripped = String(website.dropFirst(6))
length is the number of characters you want to remove (6 in your case)
extension String {
func toLengthOf(length:Int) -> String {
if length <= 0 {
return self
} else if let to = self.index(self.startIndex, offsetBy: length, limitedBy: self.endIndex) {
return self.substring(from: to)
} else {
return ""
}
}
}
It will remove first 6 characters from a string
var str = "Hello-World"
let range1 = str.characters.index(str.startIndex, offsetBy: 6)..<str.endIndex
str = str[range1]
print("the end time is : \(str)")

Get numbers characters from a string [duplicate]

This question already has answers here:
Filter non-digits from string
(12 answers)
Closed 6 years ago.
How to get numbers characters from a string? I don't want to convert in Int.
var string = "string_1"
var string2 = "string_20_certified"
My result have to be formatted like this:
newString = "1"
newString2 = "20"
Pattern matching a String's unicode scalars against Western Arabic Numerals
You could pattern match the unicodeScalars view of a String to a given UnicodeScalar pattern (covering e.g. Western Arabic numerals).
extension String {
var westernArabicNumeralsOnly: String {
let pattern = UnicodeScalar("0")..."9"
return String(unicodeScalars
.flatMap { pattern ~= $0 ? Character($0) : nil })
}
}
Example usage:
let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"
let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly
print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334
Extending to matching any of several given patterns
The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:
extension String {
var easternArabicNumeralsOnly: String {
let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
"\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns array in the Eastern Arabic example above.
Why use the UnicodeScalar patterns approach over Character ones?
A Character in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).
Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView (.characters property) for some test String instances, its very apparent that the UnicodeScalar approach is faster than the Character approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String size.
Knowing the limitations of working with Unicode scalars vs Characters in Swift
Now, there are drawbacks using the UnicodeScalar approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.
E.g., consider a string holding the four characters "Café". The last character, "é", is represented by two unicode scalars, "e" and "\u{301}". If we were to implement pattern matching against, say, UnicodeScalar("a")...e, the filtering method as applied above would allow one of the two unicode scalars to pass.
extension String {
var onlyLowercaseLettersAthroughE: String {
let patterns = [UnicodeScalar("1")..."e"]
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
let str = "Cafe\u{301}"
print(str) // Café
print(str.onlyLowercaseLettersAthroughE) // Cae
/* possibly we'd want "Ca" or "Caé"
as result here */
In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character pattern matching over UnicodeScalar.
Edit: Updated for Swift 4 & 5
Here's a straightforward method that doesn't require Foundation:
let newstring = string.filter { "0"..."9" ~= $0 }
or borrowing from #dfri's idea to make it a String extension:
extension String {
var numbers: String {
return filter { "0"..."9" ~= $0 }
}
}
print("3 little pigs".numbers) // "3"
print("1, 2, and 3".numbers) // "123"
import Foundation
let string = "a_1_b_2_3_c34"
let result = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
print(result)
Output:
12334
Here is a Swift 2 example:
let str = "Hello 1, World 62"
let intString = str.componentsSeparatedByCharactersInSet(
NSCharacterSet
.decimalDigitCharacterSet()
.invertedSet)
.joinWithSeparator("") // Return a string with all the numbers
This method iterate through the string characters and appends the numbers to a new string:
class func getNumberFrom(string: String) -> String {
var number: String = ""
for var c : Character in string.characters {
if let n: Int = Int(String(c)) {
if n >= Int("0")! && n < Int("9")! {
number.append(c)
}
}
}
return number
}
For example with regular expression
let text = "string_20_certified"
let pattern = "\\d+"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: text.characters.count)) {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}
If there are multiple occurrences of the pattern use matches(in..
let matches = regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.characters.count))
for match in matches {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}

How to use NSStringEnumerationOptions.ByWords with punctuation

I'm using this code to find the NSRange and text content of the string contents of a NSTextField.
nstext.enumerateSubstringsInRange(NSMakeRange(0, nstext.length),
options: NSStringEnumerationOptions.ByWords, usingBlock: {
(substring, substringRange, _, _) -> () in
//Do something with substring and substringRange
}
The problem is that NSStringEnumerationOptions.ByWords ignores punctuation, so that
Stop clubbing, baby seals
becomes
"Stop" "clubbing" "baby" "seals"
not
"Stop" "clubbing," "baby" "seals
If all else fails I could just check the characters before or after a given word and see if they are on the exempted list (where would I find which characters .ByWords exempts?); but there must be a more elegant solution.
How can I find the NSRanges of a set of words, from a string which includes the punctuation as part of the word?
You can use componentsSeparatedByString instead
var arr = nstext.componentsSeparatedByString(" ")
Output :
"Stop" "clubbing," "baby" "seals
Inspired by Richa's answer, I used componentsSeparatedByString(" "). I had to add a bit of code to make it work for me, since I wanted the NSRanges from the output. I also wanted it to still work if there were two instances of the same word - e.g. 'please please stop clubbing, baby seals'.
Here's what I did:
var words: [String] = []
var ranges: [NSRange] = []
//nstext is a String I converted to a NSString
words = nstext.componentsSeparatedByString(" ")
//apologies for the poor naming
var nstextLessWordsWeHaveRangesFor = nstext
for word in words
{
let range:NSRange = nstextLessWordsWeHaveRangesFor.rangeOfString(word)
ranges.append(range)
//create a string the same length as word so that the 'ranges' don't change in the future (if I just replace it with "" then the future ranges will be wrong after removing the substring)
var fillerString:String = ""
for var i=0;i<word.characters.count;++i{
fillerString = fillerString.stringByAppendingString(" ")
}
nstextLessWordsWeHaveRangesFor = nstextLessWordsWeHaveRangesFor.stringByReplacingCharactersInRange(range, withString: fillerString)
}