Get body text from message URL in Swift - swift

This question follows on from the question:
Drag messages from Mail onto Dock using Swift
I have now received a drag and drop message from dragging a message from Mail to the dock. The only thing that the I get is the message title and the message URL as follows:
message:%3C2004768713.4671#mail.stackoverflow.com%3E
How do I get the body text from this URL?
Thanks
Andrew

You can use a simple regular expression for this.
The .* parameter is hungry and will try to match as much as possible of the url subject:message pair before the (hopefully) last # character in the string.
I have also included a small tool function (rangeFromNSRange()) to help convert an Objective-C NSRange struct into a proper Range<String.Index> Swift struct.
Paste the following in a Swift playground to see it working (and have fun!):
import Cocoa
func rangeFromNSRange(range: NSRange, string: String) -> Range<String.Index>
{
let startIndex = string.startIndex.advancedBy(range.location)
let endIndex = startIndex.advancedBy(range.length)
return Range(start: startIndex, end: endIndex)
}
func parseEmailURL(urlAsString url: String) -> (subject: String, message: String)?
{
let regexFormat = "^(.*):(.*)#"
if let regex = try? NSRegularExpression(pattern: regexFormat, options: NSRegularExpressionOptions.CaseInsensitive)
{
let matches = regex.matchesInString(url, options: NSMatchingOptions(rawValue: 0), range: NSMakeRange(0, url.characters.count))
if let match = matches.first
{
let subjectRange = rangeFromNSRange(match.rangeAtIndex(1), string: url)
let messageRange = rangeFromNSRange(match.rangeAtIndex(2), string: url)
let subject = url.substringWithRange(subjectRange)
let message = url.substringWithRange(messageRange)
return (subject, message)
}
}
return nil
}
parseEmailURL(urlAsString: "message:2004768713.4671#mail.stackoverflow.com")

Related

Split String or Substring with Regex pattern in Swift

First let me point out... I want to split a String or Substring with any character that is not an alphabet, a number, # or #. That means, I want to split with whitespaces(spaces & line breaks) and special characters or symbols excluding # and #
In Android Java, I am able to achieve this with:
String[] textArr = text.split("[^\\w_##]");
Now, I want to do the same in Swift. I added an extension to String and Substring classes
extension String {}
extension Substring {}
In both extensions, I added a method that returns an array of Substring
func splitWithRegex(by regexStr: String) -> [Substring] {
//let string = self (for String extension) | String(self) (for Substring extension)
let regex = try! NSRegularExpression(pattern: regexStr)
let range = NSRange(string.startIndex..., in: string)
return regex.matches(in: string, options: .anchored, range: range)
.map { match -> Substring in
let range = Range(match.range(at: 1), in: string)!
return string[range]
}
}
And when I tried to use it, (Only tested with a Substring, but I also think String will give me the same result)
let textArray = substring.splitWithRegex(by: "[^\\w_##]")
print("substring: \(substring)")
print("textArray: \(textArray)")
This is the out put:
substring: This,is a #random #text written for debugging
textArray: []
Please can Someone help me. I don't know if the problem if from my regex [^\\w_##] or from splitWithRegex method
The main reason why the code doesn't work is range(at: 1) which returns the content of the first captured group, but the pattern does not capture anything.
With just range the regex returns the ranges of the found matches, but I suppose you want the characters between.
To accomplish that you need a dynamic index starting at the first character. In the map closure return the string from the current index to the lowerBound of the found range and set the index to its upperBound. Finally you have to add manually the string from the upperBound of the last match to the end.
The Substring type is a helper type for slicing strings. It should not be used beyond a temporary scope.
extension String {
func splitWithRegex(by regexStr: String) -> [String] {
guard let regex = try? NSRegularExpression(pattern: regexStr) else { return [] }
let range = NSRange(startIndex..., in: self)
var index = startIndex
var array = regex.matches(in: self, range: range)
.map { match -> String in
let range = Range(match.range, in: self)!
let result = self[index..<range.lowerBound]
index = range.upperBound
return String(result)
}
array.append(String(self[index...]))
return array
}
}
let text = "This,is a #random #text written for debugging"
let textArray = text.splitWithRegex(by: "[^\\w_##]")
print(textArray) // ["This", "is", "a", "#random", "#text", "written", "for", "debugging"]
However in macOS 13 and iOS 16 there is a new API quite similar to the java API
let text = "This,is a #random #text written for debugging"
let textArray = Array(text.split(separator: /[^\w_##]/))
print(textArray)
The forward slashes indicate a regex literal

Swift 5.1 - is there a clean way to deal with locations of substrings/ pattern matches

I'm very, very new to Swift and admittedly struggling with some of its constructs. I have to work with a text file and do many manipulations - here's an example to illustrate the point:
let's say I have a text like this (multi line)
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
x----------------x
I want to be able to do simple things like find the location of #name, then split it to get the name and so on. I've done this in javascript and it was pretty simple with the use of substr and the regex matches.
In swift, which is supposed to be swift and easy and what not, I'm finding this exceedingly confusing.
Can someone help with how one might do
Find the location of the start of a substring
Extract all text between from the end of a substring to the end of text
Sorry if this is trivial - but the Apple documentation feels very complicated, and lots of examples are years old. I can't also seem to find easy application of regex.
You can use string range(of: String) method to find the range of your string, get its upperBound and search for the end of the line from that position of the string:
Playground testing:
let sentence = """
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
"""
if let start = sentence.range(of: "#name:")?.upperBound,
let end = sentence[start...].range(of: "\n")?.lowerBound {
let substring = sentence[start..<end]
print("name:", substring)
}
If you need to get the string from there to the end of the string you can use PartialRangeFrom:
if let start = sentence.range(of: "#summary:")?.upperBound {
let substring = sentence[start...]
print("summary:", substring)
}
If you find yourself using that a lot you can extend StringProtocol and create your own method:
extension StringProtocol {
func substring<S:StringProtocol,T:StringProtocol>(between start: S, and end: T, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: start, options: options)?.upperBound,
let upper = self[lower...].range(of: end, options: options)?.lowerBound
else { return nil }
return self[lower..<upper]
}
func substring<S:StringProtocol>(after string: S, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: string, options: options)?.upperBound else { return nil }
return self[lower...]
}
}
Usage:
let name = sentence.substring(between: "#name:", and: "\n") // " a name"
let sumary = sentence.substring(after: "#summary:") // " a paragraph of text\n\n{{something}}\n\na whole bunch of multi-line text"
You can use regular expressions as well:
let name = sentence.substring(between: "#\\w+:", and: "\\n", options: .regularExpression) // " a name"
You can do this with range() and distance():
let str = "Example string"
let range = str.range(of: "amp")!
print(str.distance(from: str.startIndex, to: range.lowerBound)) // 2
let lastStr = str[range.upperBound...]
print(lastStr) // "le string"

Range<String.Index> Versus String.Index

I am having an issue understanding the difference between Range-String.Index- and String.Index
For example:
func returnHtmlContent() -> String {
let filePath = URL(string:"xxx.htm")
filePath?.startAccessingSecurityScopedResource();
let htmlFile = Bundle.main.path(forResource: "getData", ofType: "htm");
let html = try! String(contentsOfFile: htmlFile!, encoding: String.Encoding.utf8);
filePath?.stopAccessingSecurityScopedResource();
return html;
};
func refactorHtml(content: String) -> String {
let StartingString = "<div id=\"1\">";
let EndingString = "</div></form>";
func selectString() -> String {
var htmlContent = returnHtmlContent();
let charLocationStart = htmlContent.range(of: StartingString);
let charLocationEnd = htmlContent.range(of: EndingString);
htmlContent.remove(at: charLocationStart);
return htmlContent;
};
let formattedBody = selectString();
return formattedBody;
};
refactorHtml(content: returnHtmlContent());
The idea in pseudocode
Generated HTMLBody
Pass To Function that formats
Remove all characters before StartingString
Remove all Characters After EndingString
Send NewString to Variable
Now - when I try to find the index position I cant seem to get the right value type, this is the error I am getting
Cannot convert value of type 'Range<String.Index>?' to expected argument type 'String.Index'
This is running in a playground
String indices aren't integers. They're opaque objects (of type String.Index) which can be used to subscript into a String to obtain a character.
Ranges aren't limited to only Range<Int>. If you look at the declaration of Range, you can see it's generic over any Bound, so long as the Bound is Comparable (which String.Index is).
So a Range<String.Index> is just that. It's a range of string indices, and just like any other range, it has a lowerBound, and an upperBound.

Trying to parse HTML in Swift 4 using only the Standard Library

I'm trying to parse some HTML to pull all links that come after any occurrences of the string:
market_listing_row_link" href="
to gather a list of item URL's using only the Swift 4 Standard Library.
What I think I need is a for loop that keeps on checking characters with a condition that once the full string is found, it then starts reading the following item URL into an array until a double quote is reached, stopping and then repeating this process until the end of file. Slightly familiar in C we had access to a function (I think it was fgetc) that did this while advancing a position indicator for the file. Is there any similar way to do this in Swift?
My code so far can only find the first occurrence of the string I'm looking for when there are 10 I need to find.
import Foundation
extension String {
func slice(from: String, to: String) -> String? {
return (range(of: from)?.upperBound).flatMap { substringFrom in
(range(of: to, range: substringFrom..<endIndex)?.lowerBound).map { substringTo in
String(self[substringFrom..<substringTo])
}
}
}
}
let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")!
let itemListHTML = try String(contentsOf: itemListURL, encoding: .utf8)
let itemURL = URL(string: itemListHTML.slice(from: "market_listing_row_link\" href=\"", to: "\"")!)!
print(itemURL)
// Prints the current first URL found matching: http://steamcommunity.com/market/listings/252490/Wyrm%20Chest
You can use regex to find all string occurrences between two specific strings (check this SO answer) and use the extension method ranges(of:) from this answer to get all ranges of that regex pattern. You just need to pass options .regularExpression to that method.
extension String {
func ranges(of string: String, options: CompareOptions = .literal) -> [Range<Index>] {
var result: [Range<Index>] = []
var start = startIndex
while let range = range(of: string, options: options, range: start..<endIndex) {
result.append(range)
start = range.lowerBound < range.upperBound ? range.upperBound : index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
}
return result
}
func slices(from: String, to: String) -> [Substring] {
let pattern = "(?<=" + from + ").*?(?=" + to + ")"
return ranges(of: pattern, options: .regularExpression)
.map{ self[$0] }
}
}
Testing playground
let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")!
let itemListHTML = try! String(contentsOf: itemListURL, encoding: .utf8)
let result = itemListHTML.slices(from: "market_listing_row_link\" href=\"", to: "\"")
result.forEach({print($0)})
Result
http://steamcommunity.com/market/listings/252490/Night%20Howler%20AK47
http://steamcommunity.com/market/listings/252490/Hellcat%20SAR
http://steamcommunity.com/market/listings/252490/Metal
http://steamcommunity.com/market/listings/252490/Volcanic%20Stone%20Hatchet
http://steamcommunity.com/market/listings/252490/Box
http://steamcommunity.com/market/listings/252490/High%20Quality%20Bag
http://steamcommunity.com/market/listings/252490/Utilizer%20Pants
http://steamcommunity.com/market/listings/252490/Lizard%20Skull
http://steamcommunity.com/market/listings/252490/Frost%20Wolf
http://steamcommunity.com/market/listings/252490/Cloth

Find String from String using NSRegularExpression Swift

I want to fetch url of images from the String using NSRegularExpression.
func findURlUsingExpression(urlString: String){
do{
let expression = try NSRegularExpression(pattern: "\\b(http|https)\\S*(jpg|png)\\b", options: NSRegularExpressionOptions.CaseInsensitive)
let arrMatches = expression.matchesInString(urlString, options: NSMatchingOptions(rawValue: 0), range: NSMakeRange(0, urlString.characters.count))
for match in arrMatches{
let matchText = urlString.substringWithRange(Range(urlString.startIndex.advancedBy(match.range.location) ..< urlString.startIndex.advancedBy(match.range.location + match.range.length)))
print(matchText)
}
}catch let error as NSError{
print(error.localizedDescription)
}
}
It works with just the simple string but not with the HTML String.
Working Example:
let tempString = "jhgsfjhgsfhjgajshfgjahksfgjhs http://jhsgdfjhjhggajhdgsf.jpg jahsfgh asdf ajsdghf http://jhsgdfjhjhggajhdgsf.png"
findURlUsingExpression(tempString)
Output:
http://jhsgdfjhjhggajhdgsf.jpg
http://jhsgdfjhjhggajhdgsf.png
But not working with this one: http://www.writeurl.com/text/478sqami3ukuug0r0bdb/i3r86zlza211xpwkdf2m
Don't roll your own regex if you can help it. Easiest and safest way is to use NSDataDetector. By using NSDataDetector you leverage a pre-built, heavily used parsing tool which should already have most of the bugs shaken out of it.
Here is a good article on it: NSData​Detector
NSDataDetector is a subclass of NSRegularExpression, but instead of
matching on an ICU pattern, it detects semi-structured information:
dates, addresses, links, phone numbers and transit information.
import Foundation
let tempString = "jhgsfjhgsfhjgajshfgjahksfgjhs http://example.com/jhsgdfjhjhggajhdgsf.jpg jahsfgh asdf ajsdghf http://example.com/jhsgdfjhjhggajhdgsf.png"
let types: NSTextCheckingType = [.Link]
let detector = try? NSDataDetector(types: types.rawValue)
detector?.enumerateMatchesInString(tempString, options: [], range: NSMakeRange(0, (tempString as NSString).length)) { (result, flags, _) in
if let result = result?.URL {
print(result)
}
}
// => "http://example.com/jhsgdfjhjhggajhdgsf.jpg"
// => "http://example.com/jhsgdfjhjhggajhdgsf.png"
The example is from that site, adapted to search for a link.