get substrings from string - swift

I have the following string from a server:
I agree with the <a>((http://example.com)) Terms of Use</a> and I've read the <a>((http://example2.com)) Privacy</a>
now I want to show it like this in a label:
I agree with the Terms of Use and I've read the Privacy
I tried to cut of the ((http://example.com)) from the string and save it in another String. I need the link because the text should be clickable later.
I tried this to get the text that I want:
//the link:
let firstString = "(("
let secondString = "))"
let link = (text.range(of: firstString)?.upperBound).flatMap { substringFrom in
(text.range(of: secondString, range: substringFrom..<text.endIndex)?.lowerBound).map { substringTo in
String(text[substringFrom..<substringTo])
}
}
//the new text
if let link = link {
newString = text.replacingOccurrences(of: link, with: kEmptyString)
}
I got this from here: Swift Get string between 2 strings in a string
The problem with this is that it only removes the text inside the (( )) brackets. The brackets are still there. I tried to play with the offset of the indexes but this doesn't changed anything. Moreover this solution works if there's only one link in the text. If there are multiple links I think they should be stored and I have to loop through the text. But I don't know how to do this. I tried many things but I don't get this working. Is there maybe an easier way to get what I want to do?

You can use a regular expression to do a quick search replace.
let text = "I agree with the <a>((http://example.com)) Terms of Use</a> and I've read the <a>((http://example2.com)) Privacy</a>"
let resultStr = text.replacingOccurrences(of: "<a>\\(\\(([^)]*)\\)\\) ", with: "<a href=\"$1\">", options: .regularExpression, range: nil)
print(resultStr)
Output:
I agree with the Terms of Use and I've read the Privacy

You can use something like this to get the links:
let s = "I agree with the ((http://example.com)) Terms of Use and I've read the ((http://example2.com)) Privacy"
let firstDiv = s.split(separator: "(") // ["I agree with the ", "http://example.com)) Terms of Use and I\'ve read the ", "http://example2.com)) Privacy"]
let mid = firstDiv[1] // http://example.com)) Terms of Use and I've read the
let link1 = mid.split(separator: ")")[0] // http://example.com
let link2 = firstDiv[2].split(separator: ")")[0] // http://example2.com

Related

Match and extract href info using regex

I am trying to make a regex that match and extract href link information in more than one case, for example both with double, single and no quotation mark in Swift.
A regex to match href and extract info <a href=https://www.google.com>Google</a>.
Google
<a href='https://www.google.com'>Google</a>
I have found this regex, but it only works with double quotation:
<a href="([^"]+)">([^<]+)<\/a>
Result:
Match 1: Google
Group 1: https://www.google.com
Group 2: Google
What I want is to detect all of the three ways that I provided with the sample text.
Note: I know that regex shouldn't be used for parsing HTML, but I am using it for a very small use case so it's fine.
assuming there is no other attribute in anchor tags in the file you wish to parse, you can use the following regex : /<a href=('|"|)([^'">]+)\1>([^<]+)<\/a>/$2 $3/gm.
It first captures either single quote, double quote or nothing and then \1 recalls that capturing group, watch it live here on regex101.
Answer is already in comments but posting this since the approach is bit different.
In swift 5.7+ & iOS 16+ u can use regexBuilder for this.
import RegexBuilder
var link1 = "A regex to match href and extract info <a href=https://www.google.com>Google</a>."
var link2 = "Google"
var link3 = "<a href='https://www.google.com'>Google</a>"
let regex = Regex {
Capture {
"https://www."
ZeroOrMore(.word)
"."
ZeroOrMore(.word)
}
}
if let result1 = try? regex.firstMatch(in: link1) {
print("link: \(result1.output.1)")
}
if let result2 = try? regex.firstMatch(in: link2) {
print("link: \(result2.output.1)")
}
if let result3 = try? regex.firstMatch(in: link3) {
print("link: \(result3.output.1)")
}
This work well for the above 3 provided strings. But depend on the scenarios u might need to change the implementation.

Swift: How to identify and delete prepositions in a string

I am trying to identify keys word in user entry to search for, so I thought of filtering out some parts of speech in order to extract key words to query in my database .
currently I use the code below to replace the word "of" from a string
let rawString = "I’m jealous of my parents. I’ll never have a kid as cool as theirs, one who is smart, has devilishly good looks, and knows all sorts of funny phrases."
var filtered = self.rawString.replacingOccurrences(of: "of", with: "")
what I want to do now is extend it to replace all preposition in a string.
What I was thinking of doing is creating a huge list of known prepositions like
let prepositions = ["in","through","after","under","beneath","before"......]
and then spliting the string by white space with
var WordList : [String] = filtered.components(separatedBy: " ")
and then looping through the wordlist to find a prepositional match and deleting it. Creating the list will be ugly and might not be efficient for my code.
What is the best way to detect and delete prepositions from a string?
Use NaturalLanguage:
import NaturalLanguage
let text = "The ripe taste of cheese improves with age."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]
var newSentence = [String]()
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
guard let tag = tag, tag != .preposition else { return true }
newSentence.append("\(text[tokenRange])")
return true
}
print("Input: \(text)")
print("Output: \(newSentence.joined(separator: " "))")
This prints:
Input: The ripe taste of cheese improves with age.
Output: The ripe taste cheese improves age
Notice the two prepositions of and with are removed. My approach also removes the punctuation; you can adjust this with the .omitPunctuation option.
var newString = rawString
.split(separator: " ")
.filter{ !prepositions.contains(String($0))}
.joined(separator: " ")

Swift 5.1 - is there a clean way to deal with locations of substrings/ pattern matches

I'm very, very new to Swift and admittedly struggling with some of its constructs. I have to work with a text file and do many manipulations - here's an example to illustrate the point:
let's say I have a text like this (multi line)
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
x----------------x
I want to be able to do simple things like find the location of #name, then split it to get the name and so on. I've done this in javascript and it was pretty simple with the use of substr and the regex matches.
In swift, which is supposed to be swift and easy and what not, I'm finding this exceedingly confusing.
Can someone help with how one might do
Find the location of the start of a substring
Extract all text between from the end of a substring to the end of text
Sorry if this is trivial - but the Apple documentation feels very complicated, and lots of examples are years old. I can't also seem to find easy application of regex.
You can use string range(of: String) method to find the range of your string, get its upperBound and search for the end of the line from that position of the string:
Playground testing:
let sentence = """
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
"""
if let start = sentence.range(of: "#name:")?.upperBound,
let end = sentence[start...].range(of: "\n")?.lowerBound {
let substring = sentence[start..<end]
print("name:", substring)
}
If you need to get the string from there to the end of the string you can use PartialRangeFrom:
if let start = sentence.range(of: "#summary:")?.upperBound {
let substring = sentence[start...]
print("summary:", substring)
}
If you find yourself using that a lot you can extend StringProtocol and create your own method:
extension StringProtocol {
func substring<S:StringProtocol,T:StringProtocol>(between start: S, and end: T, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: start, options: options)?.upperBound,
let upper = self[lower...].range(of: end, options: options)?.lowerBound
else { return nil }
return self[lower..<upper]
}
func substring<S:StringProtocol>(after string: S, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: string, options: options)?.upperBound else { return nil }
return self[lower...]
}
}
Usage:
let name = sentence.substring(between: "#name:", and: "\n") // " a name"
let sumary = sentence.substring(after: "#summary:") // " a paragraph of text\n\n{{something}}\n\na whole bunch of multi-line text"
You can use regular expressions as well:
let name = sentence.substring(between: "#\\w+:", and: "\\n", options: .regularExpression) // " a name"
You can do this with range() and distance():
let str = "Example string"
let range = str.range(of: "amp")!
print(str.distance(from: str.startIndex, to: range.lowerBound)) // 2
let lastStr = str[range.upperBound...]
print(lastStr) // "le string"

How to replace a substring with a link(http) in swift 3?

I have a string and substring(http) and I want to replace that substring but I don't know when that substring will end. I mean want to check it until one space is not coming and after that I want to replace it.
I am checking that if my string contains http which is also a string then I want to replace it when space will come.
Here below is my example :-
let string = "Hello.World everything is good http://www.google.com By the way its good".
This is my string It can be dynamic also I mean in this above string http is there, so I want to replace "http://www.google.com" to "website".
So it would be
string = "Hello.World everything is good website By the way its good"
A possible solution is Regular Expression
The pattern searches for http:// or https:// followed one or more non-whitespace characters up to a word boundary.
let string = "Hello.World everything is good http://www.google.com By the way its good"
let trimmedString = string.replacingOccurrences(of: "https?://\\S+\\b", with: "website", options: .regularExpression)
print(trimmedString)
Split each words, replace and join back should solve this.
// split into array
let arr = string.components(separatedBy: " ")
// do checking and join
let newStr = arr.map { word in
return word.hasPrefix("http") ? "website" : word
}.joined(separator: " ")
print(newStr)

How can we remove every characters other than numbers, dot and colon in swift?

I am stuck at getting a string from html body
<html><head>
<title>Uaeexchange Mobile Application</title></head><body>
<div id='ourMessage'>
49.40:51.41:50.41
</div></body></html>
I Would like to get the string containing 49.40:51.41:50.41 . I don't want to do it by string advance or index. Can I get this string by specifying I need only numbers,dot(.) and colon(:) in swift. I mean some numbers and some special characters?
I tried
let stringArray = response.componentsSeparatedByCharactersInSet(
NSCharacterSet.decimalDigitCharacterSet().invertedSet)
let newString = stringArray.joinWithSeparator("")
print("Trimmed\(newString)and count\(newString.characters.count)")
but this obviously trims away dot and colon too. any suggestions friends?
The simple answer to your question is that you need to include "." & ":" in the set that you want to keep.
let response: String = "<html><head><title>Uaeexchange Mobile Application</title></head><body><div id='ourMessage'>49.40:51.41:50.41</div></body></html>"
var s: CharacterSet = CharacterSet.decimalDigits
s.insert(charactersIn: ".:")
let stringArray: [String] = response.components(separatedBy: s.inverted)
let newString: String = stringArray.joined(separator: "")
print("Trimmed '\(newString)' and count=\(newString.characters.count)")
// "Trimmed '49.40:51.41:50.41' and count=17\n"
Without more information on what else your response might be, I can't really give a better answer, but fundamentally this is not a good solution. What if the response had been
<html><head><title>Uaeexchange Mobile Application</title></head><body>
<div id='2'>Some other stuff: like this</div>
<div id='ourMessage'>49.40:51.41:50.41</div>
</body></html>
Using a replace/remove solution to this is a hack, not an algorithm - it will work until it doesn't.
I think you should probably be looking for the <div id='ourMessage'> and reading from there to the next <, but again, we'd need more information on the specification of the format of the response.
I'd recommend to use an HTML parser, nevertheless this is a simple solution with regular expression:
let extractedString = response.replacingOccurrences(of: "[^\\d:.]+", with: "", options: .regularExpression)
Or the positive regex search which is more code but also more reliable:
let pattern = ">\\s?([\\d:.]+)\\s?<"
let regex = try! NSRegularExpression(pattern: pattern)
if let match = regex.firstMatch(in: response, range: NSMakeRange(0, response.utf8.count)) {
let range = match.rangeAt(1)
let startIndex = response.index(response.startIndex, offsetBy: range.location)
let endIndex = response.index(startIndex, offsetBy: range.length)
let extractedString = response.substring(with: startIndex..<endIndex)
print(extractedString)
}
While the simple (negative) regex search removes all characters which don't match digits, dots and colons the positive search considers also the closing (>) and opening tags (<) around the desired result so an accidental digit, dot or colon doesn't match the pattern.
You can also use the String.replacingOccurrences() method in other ways, without regex, as follows:
import Foundation
var response: String = "<html><head><title>Uaeexchange Mobile Application</title></head><body><div id='ourMessage'>49.40:51.41:50.41</div></body></html>"
let charsNotToBeTrimmed = (0...9).map{String($0)} + ["." ,":"] // you can add any character you want here, that's the advantage
for i in response.characters{
if !charsNotToBeTrimmed.contains(String(i)){
response = response.replacingOccurrences(of: String(i), with: "")
}
}
print(response)
Basically, this creates an array of characters which should not be trimmed and if a character is not out there, it gets removed in the for-loop
But you have to be warned that what you're trying to do isn't quite right...