URLSession in command line tool: control multiple tasks and convert HTML to text - swift

In my project for OS X Command Line Tool using Swift 3.0 (Beta 2) I need to convert HTML data to String from multiple URLs. There is a problem in use of such function with many background tasks (it's not working except the main thread, so maybe there is more elegant way to control completion of all tasks and read HTML data in such tool with or without parsers that I need for Swift 3 and Mac OS X (Linux in the near future)):
func html2text (html: String, usedEncoding: String.Encoding) -> String {
let data = html.data(using: usedEncoding)!
if let htmlString = AttributedString(html: data, options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: usedEncoding.rawValue], documentAttributes: nil)?.string {
return htmlString
} else {
return ""
}
}
So I read data first into an Array, waiting when all DataTasks finished and then converting it in the main thread. Also using global variable (Set of urls) to control completion of each task:
import Foundation
import WebKit
var urlArr = [String]()
var urlSet = Set<String>()
var htmlTup : [(url : String, html : String, encoding : String.Encoding)] = []
let session = URLSession.shared
For-in loop with multiple URLSession DataTasks
for myurl in urlArr {
if urlSet.insert(myurl).inserted {
print ("Loading \(myurl)...")
let inputURL = URL(string: myurl)!
let task = session.dataTask(with: inputURL, completionHandler: {mydata, response, error in
Read Encoding from HTML First
var usedEncoding = String.Encoding.utf8
if let encodingName = response!.textEncodingName {
let encoding = CFStringConvertIANACharSetNameToEncoding(encodingName)
if encoding != kCFStringEncodingInvalidId {
usedEncoding = String.Encoding(rawValue: CFStringConvertEncodingToNSStringEncoding(encoding))
}
}
Do some work with HTML String and read data into an Array
if let myString = String(data: mydata!, encoding: usedEncoding) {
htmlTup += [(url: myurl,html: myString, encoding: usedEncoding)]
}
// The end of task removing URL from Set
urlSet.remove(myurl)
})
//Run Task
task.resume()
}
}
}
Waiting for tasks to complete and convert HTML to text
while !urlSet.isEmpty {
// Do nothing
}
for (url,html,encoding) in htmlTup {
print ("Writing data from \(url)...")
print (html2text(html: html, usedEncoding: encoding))
}
Update 1: RunLoop in the main thread from this
Such code to check when each task finished:
var taskArr = [Bool]()
let task = session.dataTask(with: request) { (data, response, error) in
}
taskArr.removeLast()
}
taskArr.append(true)
task.resume()
// Waiting for tasks to complete
let theRL = RunLoop.current
while !taskArr.isEmpty && theRL.run(mode: .defaultRunLoopMode, before: .distantFuture) { }

You can't just spin in a busy loop waiting for results, because you're blocking the main run loop/thread/dispatch queue by doing that.
Instead, return at that point, thus allowing the main run loop to run. Then, in your completion handler, check to see if you've gotten all the responses you're expecting, and if so, do the stuff that you currently have after that busy wait while loop.

Related

No response when doing a GET-Request with Alamofire [duplicate]

In a swift 2 command line tool (main.swift), I have the following:
import Foundation
print("yay")
var request = HTTPTask()
request.GET("http://www.stackoverflow.com", parameters: nil, completionHandler: {(response: HTTPResponse) in
if let err = response.error {
print("error: \(err.localizedDescription)")
return //also notify app of failure as needed
}
if let data = response.responseObject as? NSData {
let str = NSString(data: data, encoding: NSUTF8StringEncoding)
print("response: \(str)") //prints the HTML of the page
}
})
The console shows 'yay' and then exits (Program ended with exit code: 0), seemingly without ever waiting for the request to complete. How would I prevent this from happening?
The code is using swiftHTTP
I think I might need an NSRunLoop but there is no swift example
Adding RunLoop.main.run() to the end of the file is one option. More info on another approach using a semaphore here
I realize this is an old question, but here is the solution I ended on. Using DispatchGroup.
let dispatchGroup = DispatchGroup()
for someItem in items {
dispatchGroup.enter()
doSomeAsyncWork(item: someItem) {
dispatchGroup.leave()
}
}
dispatchGroup.notify(queue: DispatchQueue.main) {
exit(EXIT_SUCCESS)
}
dispatchMain()
You can call dispatchMain() at the end of main. That runs the GCD main queue dispatcher and never returns so it will prevent the main thread from exiting. Then you just need to explicitly call exit() to exit the application when you are ready (otherwise the command line app will hang).
import Foundation
let url = URL(string:"http://www.stackoverflow.com")!
let dataTask = URLSession.shared.dataTask(with:url) { (data, response, error) in
// handle the network response
print("data=\(data)")
print("response=\(response)")
print("error=\(error)")
// explicitly exit the program after response is handled
exit(EXIT_SUCCESS)
}
dataTask.resume()
// Run GCD main dispatcher, this function never returns, call exit() elsewhere to quit the program or it will hang
dispatchMain()
Don't depend on timing.. You should try this
let sema = DispatchSemaphore(value: 0)
let url = URL(string: "https://upload.wikimedia.org/wikipedia/commons/4/4d/Cat_November_2010-1a.jpg")!
let task = URLSession.shared.dataTask(with: url) { data, response, error in
print("after image is downloaded")
// signals the process to continue
sema.signal()
}
task.resume()
// sets the process to wait
sema.wait()
If your need isn't something that requires "production level" code but some quick experiment or a tryout of a piece of code, you can do it like this :
SWIFT 3
//put at the end of your main file
RunLoop.main.run(until: Date(timeIntervalSinceNow: 15)) //will run your app for 15 seconds only
More info : https://stackoverflow.com/a/40870157/469614
Please note that you shouldn't rely on fixed execution time in your architecture.
Swift 4: RunLoop.main.run()
At the end of your file
// Step 1: Add isDone global flag
var isDone = false
// Step 2: Set isDone to true in callback
request.GET(...) {
...
isDone = true
}
// Step 3: Add waiting block at the end of code
while(!isDone) {
// run your code for 0.1 second
RunLoop.main.run(until: Date(timeIntervalSinceNow: 0.1))
}

Can you have synchronous but non-blocking URLSesssions?

I am utilizing URLSessions which are asynchronous in nature.
This works well when there is only one call for a session.
When I need to execute multiple (serial) calls, where the results need to be combined in the order of execution, it makes program logic painful and error prone.
I am also blocking the main thread, which isn't good.
Constraints
Moving to next task may not occur before 4 second elapses, though it can be more.
Utilizing Monterey (OS upgrade required), so using the let (data, response) = try await session.data(from: url) is not optional.
If tasks are executed faster than every 4 seconds, the result is a server side error, forcing a retry.
Task execution
Execute task -- if the task concludes in less than 4 seconds, wait
for the difference so the next task does not execute before the 4
seconds elapses.
Repeat process until all task have been completed.
Combine the results
My current process utilizes semaphores or dispatchGroups but both block the main thread.
Is there a way to get synchronous behavior without blocking the main thread?
func getDataFromInput_Sync(authToken: String, transformedText: String ) -> (Data?, URLResponse?, Error?)
{
/*
By default, transactions are asynchronous, so they return data while the rest of the program continues.
Changing the behavior to synchronous requires blocking to wait for the outcome. It affords us the
ability to manage program flow inline. This methods forces synchronous behavior and the output, which
are returned via a tuple, types: (Data?, URLResponse?, Error?)
*/
var outData : Data?
var outError : Error?
var urlResponse : URLResponse?
let targetURL = "https://..."
let urlconfig = URLSessionConfiguration.ephemeral
// set the timeout to a high number, if we are prepared to wait that long. Otherwise the session will timeout.
urlconfig.timeoutIntervalForRequest = 120
urlconfig.timeoutIntervalForResource = 120
let urlSession = URLSession(configuration: urlconfig)
// let dispatchGroup = DispatchGroup()
let semaphore = DispatchSemaphore(value: 0)
// ephermeral doesnt write cookies, cache or credentials to disk
guard let url = URL(string: targetURL),
let httpBodyData = transformedText.data(using: .utf8) else { return (nil,nil,nil) }
var request = URLRequest(url: url)
request.httpMethod = "POST"
request.httpBody = httpBodyData
request.addValue("Token " + authToken, forHTTPHeaderField: "Authorization")
// Perform HTTP Request
let task = (urlSession).dataTask(with: request) { (data, response, error) in
guard error == nil
else {
print("we have an error: \(error!.localizedDescription)")
return
}
guard let data = data else { print("Empty data"); return }
outData = data
urlResponse = response
outError = error
// dispatchGroup.leave()
semaphore.signal()
}
task.resume()
semaphore.wait()
// dispatchGroup.enter()
// task.resume()
// dispatchGroup.wait()
return (outData, urlResponse, outError)
}
func testServerRequest()
{
let sentences = ["Example Sentence1","Example Sentence2","Example Sentence3"] //...array to process
for (_,thisString) in sentences.enumerated()
{
let timeTestPoint = Date()
let futureSecs = 4.0
let (data, urlResponse, error) = getDataFromInput_Sync(authToken: authToken, transformedText: thisString )
let elapsed = timeTestPoint.timeIntervalSinceNow * -1 // time elapsed between request and result
// act on the data received
// executing the next request before futureSecs will cause an error, so pause
let delayAmt = futureSecs - elapsed
Thread.sleep(forTimeInterval: delayAmt)
}
}
Make it in background queue, like
DispatchQueue.global(qos: .background).async {
testServerRequest()
}

Can't get data returned from dataTask()

For one week I have been trying to get a string returned from dataTask().
I already read a lot here on StackOverFlow and also from serval sites where they tackle this topic. For example, this one. So I already understand that it's that the dataTask doesn't directly return values, cause it happens on different threads and so on. I also read about closures and completion handlers. I really got the feeling that I actually already got a little clue what this is about. But I can't get it to work.
So this is my code. I just post the whole code so no-one needs to worry that the problem sticks in a part which I don't show. Everything is working fine until I try to return a value and save it for example in a variable:
func requestOGD(code gtin: String, completion: #escaping (_ result: String) -> String) {
// MARK: Properties
var answerList: [String.SubSequence] = []
var answerDic: [String:String] = [:]
var product_name = String()
var producer = String()
// Set up the URL request
let ogdAPI = String("http://opengtindb.org/?ean=\(gtin)&cmd=query&queryid=400000000")
guard let url = URL(string: ogdAPI) else {
print("Error: cannot create URL")
return
}
let urlRequest = URLRequest(url: url)
// set up the session
let config = URLSessionConfiguration.default
let session = URLSession(configuration: config)
// make the request
let task = session.dataTask(with: urlRequest) {
(data, response, error) in
// check for any errors
guard error == nil else {
print("error calling GET on /todos/1")
print(error!)
return
}
// make sure we got data
guard let responseData = data else {
print("Error: did not receive data")
return
}
// parse the result, which is String. It willbecome split and placed in a dictionary
do {
let answer = (String(decoding: responseData, as: UTF8.self))
answerList = answer.split(separator: "\n")
for entry in answerList {
let entry1 = entry.split(separator: "=")
if entry1.count > 1 {
let foo = String(entry1[0])
let bar = String(entry1[1])
answerDic[foo] = "\(bar)"
}
}
if answerDic["error"] == "0" {
product_name = answerDic["detailname"]!
producer = answerDic["vendor"]!
completion(product_name)
} else {
print("Error-Code der Seite lautet: \(String(describing: answerDic["error"]))")
return
}
}
}
task.resume()
Here I call my function, and no worries, I also tried to directly return it to the var foo, also doesn't work The value only exists within the closure:
// Configure the cell...
var foo:String = ""
requestOGD(code: listOfCodes[indexPath.row]) { (result: String) in
print(result)
foo = result
return result
}
print("Foo:", foo)
cell.textLabel?.text = self.listOfCodes[indexPath.row] + ""
return cell
}
So my problem is, I have the feeling, that I'm not able to get a value out of a http-request.
You used a completion handler in your call to requestOGD:
requestOGD(code: listOfCodes[indexPath.row]) {
(result: String) in
// result comes back here
}
But then you tried to capture and return that result:
foo = result
return result
So you're making the same mistake here that you tried to avoid making by having the completion handler in the first place. The call to that completion handler is itself asynchronous. So you face the same issue again. If you want to extract result at this point, you would need another completion handler.
To put it in simple terms, this is the order of operations:
requestOGD(code: listOfCodes[indexPath.row]) {
(result: String) in
foo = result // 2
}
print("Foo:", foo) // 1
You are printing foo before the asynchronous code runs and has a chance to set foo in the first place.
In the larger context: You cannot use any asynchronously gathered material in cellForRowAt. The cell is returned before the information is gathered. That's what asynchronous means. You can't work around that by piling on further levels of asynchronicity. You have to change your entire strategy.

How to tell the main thread that a URLSessionDataTask has finished

Using Swift 4, I have this code that attempts a POST request to a REST API:
spinner.startAnimation(self)
btnOk.isEnabled = false
btnCancel.isEnabled = false
attemptPost()
spinner.stopAnimation(self)
btnOk.isEnabled = true
btnCancel.isEnabled = true
The function that does this (Constants and Request are classes that I created that create the request objects and hold frequently used data):
func attemptPost() {
let url = Constants.SERVICE_URL + "account/post"
let body: [String : Any] =
["firstName": txtFirstName.stringValue,
"lastName": txtLastName.stringValue,
"email": txtEmail.stringValue,
"password": txtPassword.stringValue];
let req = Request.create(urlExtension: url, httpVerb: Constants.HTTP_POST, jsonBody: body)
let task = URLSession.shared.dataTask(with: req) { data, response, err in
guard let data = data, err == nil else {
// error
return
}
if let resp = try? JSONSerialization.jsonObject(with: data) {
// success
}
}
task.resume()
}
Since the task that does this runs asynchronously, there is no sequential way that I can update the UI once the call to attemptPost() returns. And since the UI components are on the main thread, I can't directly update the components from the task that makes the request.
In C# it works the same way; there is a BackgroundWorker class in which you can safely update the UI components to avoid a "Cross-thread operation not valid" error.
I'm trying to find an example that accomplishes more or less the same thing, in which a "wait" state is established, the task runs, and upon task completion, the main thread is notified that the task is done so that the wait state can be changed.
But I'm still having trouble understanding how this all comes together in Swift. I've looked around and seen information about the handlers that are invoked from within URLSessionDataTask and stuff about GCD, but I'm still not able to connect the dots.
And is GCD even relevant here since the URLSessionDataTask task is asynchronous to begin with?
Any help is appreciated.
If I understood correctly you might try this solution:
spinner.startAnimation(self)
btnOk.isEnabled = false
btnCancel.isEnabled = false
attemptPost { (success) in
DispatchQueue.main.async {
spinner.stopAnimation(self)
btnOk.isEnabled = true
btnCancel.isEnabled = true
}
// UI wise, eventually you can do something with 'success'
}
func attemptPost(_ completion:#escaping (Bool)->())
let url = Constants.SERVICE_URL + "account/post"
let body: [String : Any] =
["firstName": txtFirstName.stringValue,
"lastName": txtLastName.stringValue,
"email": txtEmail.stringValue,
"password": txtPassword.stringValue];
let req = Request.create(urlExtension: url, httpVerb: Constants.HTTP_POST, jsonBody: body)
let task = URLSession.shared.dataTask(with: req) { data, response, err in
guard let data = data, err == nil else {
completion(false)
return
}
if let resp = try? JSONSerialization.jsonObject(with: data) {
completion(true)
}
}
task.resume()
}
so the idea is executing from attemptPost a block which will run asynchronously into the main thread your UI stuff

Is it ok to use tid-kijyun Swift-HTML-Parser to parse a real world html file from webpage

So the thing is, I am new to programming, and Swift in particular. I completed some courses and now want to build something really simple: an app that gets the news from website and pushes it to a Table View.
Right now I am stuck on error:
Optional(Error Domain=HTMLParserdomain Code=1 "The operation couldn’t be completed. (HTMLParserdomain error 1.)")
The code I wrote is really simple. It's the example from the tid-kijyun repo plus some code to get content of the HTML (func perFormConnectionToGrabUrlContent)
func perFormConnectionToGrabUrlContent(# url: String) -> NSString {
let url = NSURL(string: url)
let request = NSURLRequest(URL: url!)
var htmlContentTemp: NSString = ""
NSURLConnection.sendAsynchronousRequest(request, queue: NSOperationQueue.mainQueue()) {
(response, data, error) in
htmlContentTemp = NSString(data: data, encoding: NSUTF8StringEncoding)!
println(htmlContentTemp)
}
return htmlContentTemp
}
let html = perFormConnectionToGrabUrlContent(url: "http://www.google.com")
println(html)
var err: NSError?
var parser = HTMLParser(html: html, error: &err)
if err != nil {
println(err)
exit(1)
}
var bodyNode = parser.body
if let inputNodes = bodyNode?.findChildTags("a") {
for node in inputNodes {
println(node.contents)
}
}
if let inputNodes = bodyNode?.findChildTags("a") {
for node in inputNodes {
println(node.contents)
println(node.getAttributeNamed("href"))
}
}
So the question still is, should I change something in code and it should really work like a charm, or should I better go access the database of website I am trying to reach and run some datebase queries or something like that instead?