Remove substring from a string knowing first and last characters in Swift - swift

Having a string like this:
let str = "In 1273, however, they lost their son in an accident;[2] the young Theobald was dropped by his nurse over the castle battlements.[3]"
I'm looking for a solution of removing all appearances of square brackets and anything that between it.
I was trying using a String's method: replacingOccurrences(of:with:), but it requires the exact substring it needs to be removed, so it doesn't work for me.

You can use:
let updated = str.replacingOccurrences(of: "\\[[^\\]]+\\]", with: "", options: .regularExpression)
The regular expression (without the required escapes needed in a Swift string is:
\[[^\]+]\]
The \[ and \] look for the characters [ and ]. They have a backslash to remove the normal special meaning of those characters in a regular expression.
The [^]] means to match any character except the ] character. The + means match 1 or more.

You can create a while loop to get the lowerBound of the range of the first string and the upperBound of the range of the second string and create a range from that. Next just remove the subrange of your string and set the new startIndex for the search.
var str = "In 1273, however, they lost their son in an accident;[2] the young Theobald was dropped by his nurse over the castle battlements.[3]"
var start = str.startIndex
while let from = str.range(of: "[", range: start..<str.endIndex)?.lowerBound,
let to = str.range(of: "]", range: from..<str.endIndex)?.upperBound,
from != to {
str.removeSubrange(from..<to)
start = from
}
print(str) // "In 1273, however, they lost their son in an accident; the young Theobald was dropped by his nurse over the castle battlements."

Related

How do you find all first letter capitalized words in a string? Like a name ("My Name and another His Name")?

GOAL: I want to have a regex that searches a string for all capitalized first letters of words/names in a string. Then replace those capitalized words with "Whatever", if the word is 4 or more characters long. Then replace words 4 or more letters that are lowercased with "whatever".
String Input:
let myString = "This is a regular string. How does this Work? Does any Name know How?"
Desired String Output:
let myString = "Whatever is a whatever whatever. How whatever this Whatever? Whatever any Whatever whatever whatever How?"
What I've tried:
let myString = "This is a regular string. How does this Work? Does any Name know How?"
let x = myString.replacingOccurrences(of: "\\b\\p{Lu}{4,}\\b",
with: "Whatever",
options: .regularExpression)
let finalX = x.replacingOccurrences(of: "\\b\\p{L}{4,}\\b",
with: "whatever",
options: .regularExpression)
print(finalX)
Problem:
The first check is for capitalized letters, the second is for lowercased letters, but it still returns all lowercased.
Would anyone know how to go about this with what I have?
The regular expression you want is the following:
\b[A-Z][a-z]{3,}\b
That will only work with the basic letters A-Z and a-z. If you want to fully support any alphabet then you would need something like this:
\b\p{Lu}\p{Ll}{3,}\b
The \b means "word boundary". The \p{Lu} means "uppercase letters". The \p{Ll} means "lowercase letters".
You only want to use this in the call to replacingOccurrences. There's little reason to do the initial check using contains. That will be looking for the literal text you pass in and of course the regular expression you are using won't be found as literal text in the string.
let myString = "This is a regular string. How does this Work? Does any Name know How?"
let x = myString.replacingOccurrences(of: "\\b\\p{Lu}\\p{Ll}{3,}\\b",
with: "Whatever",
options: .regularExpression)
print(x)
Output:
Whatever is a regular string. How does this Whatever? Whatever any Whatever know How?
To do both you also need "\\b\\p{Ll}{4,}\\b".
The following changes both sets.
let myString = "This is a regular string. How does this Work? Does any Name know How?"
let x = myString
.replacingOccurrences(of: "\\b\\p{Ll}{4,}\\b",
with: "whatever",
options: .regularExpression)
.replacingOccurrences(of: "\\b\\p{Lu}\\p{Ll}{3,}\\b",
with: "Whatever",
options: .regularExpression)
print(x)
Output:
Whatever is a whatever whatever. How whatever whatever Whatever? Whatever any Whatever whatever How?

Trimming Substrings from String Swift/SwiftUI

Let's say I have the following strings:
"Chest Stretch (left)"
"Chest Stretch (right)"
How can I use SwiftUI to output only:
"Chest Stretch"
I thought this may be a possible duplicate of swift - substring from string.
However, I am seeking a way to do this inside var body: some View within an if conditional expression.
A possible way is Regular Expression
let string = "Chest Stretch (left)"
let trimmedString = string.replacingOccurrences(of: "\\s\\([^)]+\\)", with: "", options: .regularExpression)
The found pattern will be replaced with an empty string.
The pattern is:
One whitespace character \\s
An opening parenthesis \\(
One or more characters which are not a closing parentheses [^)]+
and a closing parenthesis \\)
Or simpler if the delimiter character is always the opening parenthesis
let trimmedString = String(string.prefix(while: {$0 != "("}).dropLast())
Or
let trimmedString = string.components(separatedBy: " (").first!

Count leading tabs in Swift string

I need to count the number of leading tabs in a Swift string. I know there are fairly simple solutions (e.g. looping over the string until a non-tab character is encountered) but I am looking for a more elegant solution.
I have attempted to use a regex such as ^\\t* along with the .numberOfMatches method but this detects all the tab characters as one match. For example, if the string has three leading tabs then that method just returns 1. Is there a way to write a regex that treats each individual tab character as a single match?
Also open to other ways of approaching this without using a regex.
Here is a non-regex solution
let count = someString.prefix(while: {$0 == "\t"}).count
You may use
\G\t
See the regex demo.
Here,
\G - matches a string start position or end of the previous match position, and
\t - matches a single tab.
Swift test:
let string = "\t\t123"
let regex = try! NSRegularExpression(pattern: "\\G\t", options: [])
let numberOfOccurrences = regex.numberOfMatches(in: string, range: NSRange(string.startIndex..., in: string))
print(numberOfOccurrences) // => 2

Split a string on all characters except some with a regular expression

I have to split a long string with lyrics to a song into lines and then, for each line, split them into words. I'm going to hold this information in a 2 dimensional array.
I've seen some similar questions and they have been solved using [NSRegularExpression] (https://developer.apple.com/documentation/foundation/nsregularexpression)
but I can't seem to find any regular expression that equals "everything except something" which is what I want to split on when splitting a string into words.
More specifically I want to split on Everything except alphanumerics or ' or -. In Java this regular expression is [^\\w'-]+
Below is the string, followed by my Swift code to attempt to achieve this task (I just split on whitespace instead of actually splitting on words with "[^\w'-]+" as I can't figure out how to do it.
1 Is this the real life?
2 Is this just fantasy?
3 Caught in a landslide,
4 No escape from reality.
5
6 Open your eyes,
7 Look up to the skies and see,
8 I'm just a poor boy, I need no sympathy,
9 Because I'm easy come, easy go,
10 Little high, little low,
11 Any way the wind blows doesn't really matter to me, to me.
12
13 Mama, just killed a man,
(etc.)
let lines = s?.components(separatedBy: "\n")
var all_words = [[String]]()
for i in 0..<lines!.count {
let words = lines![i].components(separatedBy: " ")
let new_words = words.filter {$0 != ""}
all_words.append(new_words)
}
I suggest to use a reverse pattern, [\w'-]+, to match the strings you need and use the matches matching function.
Your code will look like:
for i in 0..<lines!.count {
let new_words = matches(for: "[\\w'-]+", in: lines![i])
all_words.append(new_words)
}
The following line of code:
print(matches(for: "[\\w'-]+", in: "11 Any way the wind blows doesn't really matter to me, to me."))
yields ["11", "Any", "way", "the", "wind", "blows", "doesn\'t", "really", "matter", "to", "me", "to", "me"].
One simple solution is to replace the sequences with a special character first and then split on that character:
let words = string
.replacingOccurrences(of: "[^\\w'-]+", with: "|", options: .regularExpression)
.split(separator: "|")
print(words)
However, if you can, use the system function to enumerate words.

How to remove spaces from a string in Swift?

I have the need to remove leading and trailing spaces around a punctuation character.
For example: Hello , World ... I 'm a newbie iOS Developer.
And I'd like to have: > Hello, World... I'm a newbie iOS Developer.
How can I do this? I tried to get components of the string and enumerate it by sentences. But that is not what I need
Rob's answer is great, but you can trim it down quite a lot by taking advantage of the \p{Po} regular expression class. Getting rid of the spaces around punctuation then becomes a single regular expression replace:
import Foundation
let input = "Hello , World ... I 'm a newbie iOS Developer."
let result = input.replacingOccurrences(of: "\\s*(\\p{Po}\\s?)\\s*",
with: "$1",
options: [.regularExpression])
print(result) // "Hello, World... I'm a newbie iOS Developer."
Rob's answer also tries to trim leading/trailing spaces, but your input doesn't have any of those. If you do care about that you can just call result.trimmingCharacters(in: .whitespacesAndNewlines) on the result.
Here's an explanation for the regular expression. Removing the double-escapes it looks like
\s*(\p{Po}\s?)\s*
This is comprised of the following components:
\s* - Match zero or more whitespace characters (and throw them away)
(…) - Capturing group. Anything inside this group is preserved by the replacement (the $1 in the replacement refers to this group).
\p{Po} - Match a single character in the "Other_Punctuation" unicode category. This includes things like ., ', and …, but excludes things like ( or -.
\s? - Match a single optional whitespace character. This preserves the space after periods (or ellipses).
\s* - Once again, match zero or more whitespace characters (and throw them away). This is what turns your , World into , World.
For Swift 3 or 4 you can use :
let trimmedString = string.trimmingCharacters(in: .whitespaces)
This is a really wonderful problem and a shame that it isn't easier to do in Swift today (someday it will be, but not today).
I kind of hate this code, but I'm getting on a plane for 20 hours, and don't have time to make it nicer. This may at least get you started using NSMutableString. It'd be nice to work in String, and Swift hates regular expressions, so this is kind of hideous, but at least it's a start.
import Foundation
let input = "Hello, World ... I 'm a newbie iOS Developer."
let adjustments = [
(pattern: "\\s*(\\.\\.\\.|\\.|,)\\s*", replacement: "$1 "), // elipsis or period or comma has trailing space
(pattern: "\\s*'\\s*", replacement: "'"), // apostrophe has no extra space
(pattern: "^\\s+|\\s+$", replacement: ""), // remove leading or trailing space
]
let mutableString = NSMutableString(string: input)
for (pattern, replacement) in adjustments {
let re = try! NSRegularExpression(pattern: pattern)
re.replaceMatches(in: mutableString,
options: [],
range: NSRange(location: 0, length: mutableString.length),
withTemplate: replacement)
}
mutableString // "Hello, World... I'm a newbie iOS Developer."
Regular expressions can be very confusing when you first encounter them. A few hints at reading these:
The specific language Foundation uses is described by ICU.
Backslash (\) means "the next character is special" for a regex. But inside a Swift string, backslash means "the next character is special" of the string. So you have to double them all.
\s means "a whitespace character"
\s* means "zero or more whitespace characters"
\s+ means "one or more whitespace characters"
$1 means "the thing we matched in parentheses"
| means "or"
^ means "start of string"
$ means "end of string"
. means "any character" so to mean "an actual dot" you have to type "\\." in a Swift string.
Notice that I check for both "..." and "." in the same regular expression. You kind of have to do something like that, or else the "." will match three times inside the "...". Another approach would be to first replace "..." with "…" (the single ellipsis character, typed on a Mac by pressing Opt-;). Then "…" is a one-character punctuation. (You could also decide to re-expand all ellipsis back to dot-dot-dot at the end of the process.)
Something like this is probably how I'd do it in real life, get it done and shipped, but it may be worth the pain/practice to try to build this as a character-by-character state machine, walking one character at a time, and keeping track of your current state.
You can try something like
string.replacingOccurrences(of: " ,", with: ",") for every punctuation...
Interesting problem; here's my stab at a non-Regex approach:
func correct(input: String) -> String {
typealias Correction = (punctuation: String, replacement: String)
let corrections: [Correction] = [
(punctuation: "...", replacement: "... "),
(punctuation: "'", replacement: "'"),
(punctuation: ",", replacement: ", "),
]
var transformed = input
for correction in corrections {
transformed = transformed
.components(separatedBy: correction.punctuation)
.map({ $0.trimmingCharacters(in: .whitespaces) })
.joined(separator: correction.replacement)
}
return transformed
}
let testInput = "Hello , World ... I 'm a newbie iOS Developer."
let testOutput = correct(input: testInput)
// Hello, World... I'm a newbie iOS Developer.
If you were doing this manually by processing characters arrays, you would merely need to check the previous and next characters around spaces. You can achieve the same result using functional style programming with zip, filter and map:
let testInput = "Hello , World ... I 'm a newbie iOS Developer."
let punctuation = Set(".\',")
let previousNext = zip( [" "] + testInput, String(testInput.dropFirst()) + [" "] )
let filteredChars = zip(Array(previousNext),testInput)
.filter{ $1 != " "
|| !($0.0 != " " && punctuation.contains($0.1))
}
let filteredInput = String(filteredChars.map{$1})
print(testInput) // Hello , World ... I 'm a newbie iOS Developer.
print(filteredInput) // Hello, World... I'm a newbie iOS Developer.
Swift 4, 4.2 and 5
let str = " Akbar Code "
let trimmedString = str.trimmingCharacters(in: .whitespaces)