Swift Load Website to Scrape Code Without Loading View | WebKit - swift

I have an array of Google News article urls. Google News article urls redirect immediately to real urls, ie: CNBC.com/.... I am trying to pull out the real, redirected url. I thought I could loop through the list and load the Google News link in a WebView, then call webView.url in a DispatchQueue after 1 second to get the real url, but this doesn't work.
How could you fetch a list of redirected urls quickly?
Here's my code you could use to reproduce the problem:
let webView = WKWebView()
let myList = [URL(string: "https://news.google.com/articles/CAIiEDthIxbgofssGWTpXgeJXzwqGQgEKhAIACoHCAow2Nb3CjDivdcCMJ_d7gU?hl=en-US&gl=US&ceid=US%3Aen"), URL(string: "https://news.google.com/articles/CAIiEP5m1nAOPt-LIA4IWMOdB3MqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen")]
for url in myList {
guard let link = url else {continue}
self.webView.loadUrl(string: link.absoluteString)
DispatchQueue.main.asyncAfter(deadline: .now() + 1.0) {
let redirectedLink = self.webView.url
print("HERE redirected url: ", redirectedLink) // this does not work
}
}

There are two problems with your attempt:
1) You're using one and the same web view in the loop and since nothing inside the loop blocks until the web view has finished loading, you just end up cancelling the previous request with every loop pass.
2) Even if you did block inside the loop, accessing the URL after a second won't work reliably since the navigation could easily take longer than that.
What I would recommend doing is to continue using a single web view (to save resources) but to use its navigation delegate interface for resolving the URLs one by one.
This is a crude example to give you a basic idea:
import UIKit
import WebKit
#objc class RedirectResolver: NSObject, WKNavigationDelegate {
private var urls: [URL]
private var resolvedURLs = [URL]()
private let completion: ([URL]) -> Void
private let webView = WKWebView()
init(urls: [URL], completion: #escaping ([URL]) -> Void) {
self.urls = urls
self.completion = completion
super.init()
webView.navigationDelegate = self
}
func start() {
resolveNext()
}
private func resolveNext() {
guard let url = urls.popLast() else {
completion(resolvedURLs)
return
}
let request = URLRequest(url: url)
webView.load(request)
}
func webView(_ webView: WKWebView, didFinish navigation: WKNavigation!) {
resolvedURLs.append(webView.url!)
resolveNext()
}
}
class ViewController: UIViewController {
private var resolver: RedirectResolver!
override func viewDidLoad() {
super.viewDidLoad()
resolver = RedirectResolver(
urls: [URL(string: "https://news.google.com/articles/CAIiEDthIxbgofssGWTpXgeJXzwqGQgEKhAIACoHCAow2Nb3CjDivdcCMJ_d7gU?hl=en-US&gl=US&ceid=US%3Aen")!, URL(string: "https://news.google.com/articles/CAIiEP5m1nAOPt-LIA4IWMOdB3MqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen")!],
completion: { urls in
print(urls)
})
resolver.start()
}
}
This outputs the following resolved URLs:
[https://amp.cnn.com/cnn/2020/04/09/politics/trump-coronavirus-tests/index.html, https://www.cnbc.com/amp/2020/04/10/asia-markets-coronavirus-china-inflation-data-currencies-in-focus.html]
One other thing to note is that the redirection of those URLs in particular seems to rely on JavaScript which means you indeed need a web view. Otherwise kicking off URLRequests manually and observing the responses would have been enough.

Related

How to Load Several Swift WKWebviews and Know When They Are All Done

I am using WKWebView to render several (around 100) web pages that I then need to render to PDF. I am using the createPDF method of WKWebView to accomplish this. The reason I'm doing each individual page in its own web view is because createPDF doesn't respect page breaks in the HTML (as far as I know).
So I have a class where I start the loop to render each page:
class PrintVC: ViewController, WKNavigationDelegate {
var pages = [Page]()
func start(){
//A "page" is a struct that has the string content to load each web view
for page in pages{
let webView = WKWebView()
webView.navigationDelegate = self
webView.loadHTMLString(page.content, baseURL: Bundle.main.bundleURL)
}
}
}
I know the page is ready to be saved to PDF in the didFinish navigation delegate method:
func webView(_ webView: WKWebView, didFinish navigation: WKNavigation!) {
let config = WKPDFConfiguration()
config.rect = CGRect(x: 0, y: 0, width: 792, height: 612)
//Create the PDF
webView.createPDF(configuration: config){ result in
switch result{
case .success(let data):
do{
try data.write(to: URL(fileURLWithPath: "file-???.pdf"))
}catch let error{
print(error)
}
case .failure(let error):
print(error)
}
}
}
}
The trouble I'm having is I don't know when each individual page is done rendering. I also don't know how to pass each page's name to be used in the file path to save it.
How can I start a bunch of WKWebView loads and know when they are all done? Or better still, how can I reuse the same WKWebView and load each individual page in the same way? I assume using the same web view would be a better use of memory.
How can I start a bunch of WKWebView loads and know when they are all done?
Well, you'd need to identify which web view caused the delegate method to be called. It is for this reason that the first parameter - webView: WKWebView - exists.
One way is to put each (web view, pair) into a dictionary ([WKWebView: Page]). Then start the loading:
// assume you have declared a property "self.webViewDict"
for page in pages{
let webView = WKWebView()
webView.navigationDelegate = self
self.webViewDict[webView] = page
webView.loadHTMLString(page.content, baseURL: Bundle.main.bundleURL)
}
When one finishes loading, you can identify the page by doing webViewDict[webView]. You should then remove the web view from the dictionary:
webViewDict[webView] = nil
if webViewDict.isEmpty {
// everything is loaded!
}
how can I reuse the same WKWebView and load each individual page in the same way?
Note that if you use the same WKWebView, you'll have to load the pages sequentially. The same web view can't load multiple things at the same time.
You can just removed the loaded pages from pages. If you don't want to do that, you can copy pages to another var first.
In start, load the first page:
if let firstPage = pages.first {
webView.loadHTMLString(firstPage.content, baseURL: Bundle.main.bundleURL)
}
When you successfully load a page, do the same thing again:
case .success(let data):
pages.removeFirst()
if let firstPage = pages.first {
webView.loadHTMLString(firstPage.content, baseURL: Bundle.main.bundleURL)
} else {
// we are done!
}

Swift launch view only when data received

I'm getting info from an API using the following function where I pass in a string of a word. Sometimes the word doesn't available in the API if it doesn't available I generate a new word and try that one.
The problem is because this is an asynchronous function when I launch the page where the value from the API appears it is sometimes empty because the function is still running in the background trying to generate a word that exists in the API.
How can I make sure the page launches only when the data been received from the api ?
static func wordDefin (word : String, completion: #escaping (_ def: String )->(String)) {
let wordEncoded = word.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed)
let uri = URL(string:"https://dictapi.lexicala.com/search?source=global&language=he&morph=false&text=" + wordEncoded! )
if let unwrappedURL = uri {
var request = URLRequest(url: unwrappedURL);request.addValue("Basic bmV0YXlhbWluOk5ldGF5YW1pbjg5Kg==", forHTTPHeaderField: "Authorization")
let dataTask = URLSession.shared.dataTask(with: request) { (data, response, error) in
do {
if let data = data {
let decoder = JSONDecoder()
let empty = try decoder.decode(Empty.self, from: data)
if (empty.results?.isEmpty)!{
print("oops looks like the word :" + word)
game.wordsList.removeAll(where: { ($0) == game.word })
game.floffWords.removeAll(where: { ($0) == game.word })
helper.newGame()
} else {
let definition = empty.results?[0].senses?[0].definition
_ = completion(definition ?? "test")
return
}
}
}
catch {
print("connection")
print(error)
}
}
dataTask.resume()
}
}
You can't stop a view controller from "launching" itself (except not to push/present/show it at all). Once you push/present/show it, its lifecycle cannot—and should not—be stopped. Therefore, it's your responsibility to load the appropriate UI for the "loading state", which may be a blank view controller with a loading spinner. You can do this however you want, including loading the full UI with .isHidden = true set for all view objects. The idea is to do as much pre-loading of the UI as possible while the database is working in the background so that when the data is ready, you can display the full UI with as little work as possible.
What I'd suggest is after you've loaded the UI in its "loading" configuration, download the data as the final step in your flow and use a completion handler to finish the task:
override func viewDidLoad() {
super.viewDidLoad()
loadData { (result) in
// load full UI
}
}
Your data method may look something like this:
private func loadData(completion: #escaping (_ result: Result) -> Void) {
...
}
EDIT
Consider creating a data manager that operates along the following lines. Because the data manager is a class (a reference type), when you pass it forward to other view controllers, they all point to the same instance of the manager. Therefore, changes that any of the view controllers make to it are seen by the other view controllers. That means when you push a new view controller and it's time to update a label, access it from the data property. And if it's not ready, wait for the data manager to notify the view controller when it is ready.
class GameDataManager {
// stores game properties
// updates game properties
// does all thing game data
var score = 0
var word: String?
}
class MainViewController: UIViewController {
let data = GameDataManager()
override func viewDidLoad() {
super.viewDidLoad()
// when you push to another view controller, point it to the data manager
let someVC = SomeOtherViewController()
someVC.data = data
}
}
class SomeOtherViewController: UIViewController {
var data: GameDataManager?
override func viewDidLoad() {
super.viewDidLoad()
if let word = data?.word {
print(word)
}
}
}
class AnyViewController: UIViewController {
var data: GameDataManager?
}

Unable to inject JS into WKWebView in Swift/Cocoa/NextStep / Push user selection on web page in WKWebView to Swift / Cocoa

I'm working with an MacOS app which needs to use the WKUserScript capability to send a message from the webpage back to the MacOS app. I'm working with the article https://medium.com/capital-one-tech/javascript-manipulation-on-ios-using-webkit-2b1115e7e405 which shows this working in iOS and works just fine.
However I've been struggling for several weeks to try to get it to work in my MacOS. Here is my example of his code which complies fine and runs but does not successfully print the message found in the handler userContentController()
import Cocoa
import WebKit
class ViewController: NSViewController, WKNavigationDelegate {
#IBOutlet weak var webView: WKWebView!
override func viewDidLoad() {
super.viewDidLoad()
let userContentController = WKUserContentController()
// Add script message handlers that, when run, will make the function
// window.webkit.messageHandlers.test.postMessage() available in all frames.
userContentController.add(self, name: "test")
// Inject JavaScript into the webpage. You can specify when your script will be injected and for
// which frames–all frames or the main frame only.
let scriptSource = "window.webkit.messageHandlers.test.postMessage(`Hello, world!`);"
let userScript = WKUserScript(source: scriptSource, injectionTime: .atDocumentEnd, forMainFrameOnly: true)
userContentController.addUserScript(userScript)
// let config = WKWebViewConfiguration()
// config.userContentController = userContentController
// let webView = WKWebView(frame: .zero, configuration: config)
webView.navigationDelegate = self
webView.configuration.userContentController = userContentController
// Make sure in Info.plist you set `NSAllowsArbitraryLoads` to `YES` to load
// URLs with an HTTP connection. You can run a local server easily with services
// such as MAMP.
let htmlStr = "<html><body>Hello world - nojs</body></html>"
webView.loadHTMLString(htmlStr, baseURL: nil)
}
}
extension ViewController: WKScriptMessageHandler {
// Capture postMessage() calls inside loaded JavaScript from the webpage. Note that a Boolean
// will be parsed as a 0 for false and 1 for true in the message's body. See WebKit documentation:
// https://developer.apple.com/documentation/webkit/wkscriptmessage/1417901-body.
func userContentController(_ userContentController: WKUserContentController, didReceive message: WKScriptMessage) {
if let messageBody = message.body as? String {
print(messageBody)
}
}
}
Another odd thing is that I do not seem to be able to create a simple WKWebView app that loads a page and displays it. These are all just simple tests and my main application is able to load/display webpages just fine using AlamoFire/loadHTMLString() to display pages, I just have not been able to inject the JS required.
Everything I've done in the conversion is quite straight forward and required little or no change with the exception of the assignment of the userContentController - so perhaps that's the problem? This example works just fine in iOS with his original sample as a prototype. https://github.com/rckim77/WKWebViewDemoApp/blob/master/WKWebViewDemoApp/ViewController.swift
I'm guessing there must be something very simple I'm missing here. Any help would be greatly appreciated!
Heres how I have set my WebView on Mac try something like this
import Cocoa
import WebKit
class ViewController: NSViewController {
#IBOutlet weak var webView: WKWebView!
override func viewDidLoad() {
super.viewDidLoad()
let javascript = """
function printStatement() {
try {
window.webkit.messageHandlers
.callbackHandler.postMessage({'payload': 'Hello World!'})
} catch(err) {
console.log('The native context does yet exist')
}
}
"""
let script = WKUserScript(
source: javascript,
injectionTime: WKUserScriptInjectionTime.atDocumentEnd,
forMainFrameOnly: true
)
webView.configuration.userContentController.add(
name: "callbackHandler"
)
webView.configuration.userContentController
.addUserScript(script)
webView.navigationDelegate = self
let html = """
<div onClick='javascript:printStatement()'>Print Statement</div>
"""
webView.loadHTMLString(html, nil)
}
}
extension ViewController: WKScriptMessageHandler {
func userContentController(_ userContentController: WKUserContentController, didReceive message: WKScriptMessage) {
if(message.name == "callbackHandler") {
guard let body = message.body as? [String: Any] else {
print("could not convert message body to dictionary: \(message.body)")
return
}
guard let payload = body["payload"] as? String else {
print("Could not locate payload param in callback request")
return
}
print(payload)
}
}
}
Hopefully this answered your question and works if not let me know and i'll try figure it out!
Well, as it turns out a major part of the issue was that I needed to set the entitlements for both "App Sandbox" and "com.apple.security.files.user-selected.read-only" both to "no" in the WebTest.entitlements file.
This was not the case in previous versions of XCode (I'm on V10.1) and the default values basically disabled the WKWebView for what I was trying to do with it (ie, load a simple page either via URL or String)
However, Alex's fix did help once I got that solved... with a couple small tweaks (had to add 'self' to the userContentController.add() function. Also, I added my JS for it's original purpose which was to "push" to Swift every time the user changed the selection on the page.
Here's my final code:
import Cocoa
import WebKit
class ViewController: NSViewController, WKNavigationDelegate {
#IBOutlet var webView: WKWebView!
override func viewDidLoad() {
super.viewDidLoad()
let javascript = """
function printStatement() {
try {
var foo = window.getSelection().toString()
window.webkit.messageHandlers.callbackHandler.postMessage({'payload': foo})
} catch(err) {
console.log('The native context does yet exist')
}
}
function getSelectionAndSendMessage() {
try {
var currSelection = window.getSelection().toString()
window.webkit.messageHandlers.callbackHandler.postMessage({'payload': currSelection})
} catch(err) {
console.log('The native context does yet exist')
}
}
document.onmouseup = getSelectionAndSendMessage;
document.onkeyup = getSelectionAndSendMessage;
document.oncontextmenu = getSelectionAndSendMessage;
"""
let script = WKUserScript(
source: javascript,
injectionTime: WKUserScriptInjectionTime.atDocumentEnd,
forMainFrameOnly: true
)
webView.configuration.userContentController.add(self, name: "callbackHandler")
webView.configuration.userContentController.addUserScript(script)
webView.navigationDelegate = self
let html = """
<div onClick='javascript:printStatement()'>Print Statement</div>
This is some sample text to test select with
"""
webView.loadHTMLString(html, baseURL: nil)
}
}
extension ViewController: WKScriptMessageHandler {
func userContentController(_ userContentController: WKUserContentController, didReceive message: WKScriptMessage) {
if(message.name == "callbackHandler") {
guard let body = message.body as? [String: Any] else {
print("could not convert message body to dictionary: \(message.body)")
return
}
guard let payload = body["payload"] as? String else {
print("Could not locate payload param in callback request")
return
}
print(payload)
}
}
}
Thanks Alex for all your fantastic support!

WKWebKit does not refresh webpage

I am using Xcode 8.3.3 and Swift 3 to develop an app for the iMac using Cocoa. My goal is to use VCgoToWebPage and display a webpage to the user. My program calls this function many times, but the only webpage I see is the last one called. How do I implement a window refresh inside this function and wait for the webpage to be fully rendered?
func VCgoToWebPage(theWebPage : String) {
let url = URL(string: theWebPage)!
let request = URLRequest(url: url)
webView.load(request)
/*The modal box allows the web pages to be seen. Without it, after a series of calls to VCgoToWebPage only the last page called is displayed. The modal box code is just for debugging and will be removed. */
let alert = NSAlert()
alert.messageText="calling EDGAR page"
alert.informativeText=theWebPage
alert.addButton(withTitle: "OK")
alert.runModal()
}
You can use navigation delegate to make sure navigation to a page is complete before trying to load another. Have your class conform to WKNavigationDelegate and set webView.navigationDelegate to that class instance.
var allRequests = [URLRequest]()
func VCgoToWebPage(theWebPage : String) {
guard let url = URL(string: theWebPage) else {
return
}
let request = URLRequest(url: url)
if webView.isLoading{
allRequests.append(request)
} else {
webView.load(request)
}
}
func webView(WKWebView, didFinish: WKNavigation!){
if let nextRequest = allRequests.first{
webView.load(nextRequest)
allRequests.removeFirst()
}
}

Saving WebView to PDF returns blank image?

I'm trying to figure out how to save a WebView to a PDF and totally stuck, would really appreciate some help?
I'm doing this in Cocoa & Swift on OSX, here's my code so far:
import Cocoa
import WebKit
class ViewController: NSViewController {
override func loadView() {
super.loadView()
}
override func viewDidLoad() {
super.viewDidLoad()
loadHTMLString()
}
func loadHTMLString() {
let webView = WKWebView(frame: self.view.frame)
webView.loadHTMLString("<html><body><p>Hello, World!</p></body></html>", baseURL: nil)
self.view.addSubview(webView)
createPDFFromView(webView, saveToDocumentWithFileName: "test.pdf")
}
func createPDFFromView(view: NSView, saveToDocumentWithFileName fileName: String) {
let pdfData = view.dataWithPDFInsideRect(view.bounds)
if let documentDirectories = NSSearchPathForDirectoriesInDomains(.DocumentDirectory, .UserDomainMask, true).first {
let documentsFileName = documentDirectories + "/" + fileName
debugPrint(documentsFileName)
pdfData.writeToFile(documentsFileName, atomically: false)
}
}
}
It's pretty simple, what I'm doing is creating a WebView and writing some basic html content to it which renders this:
And then takes the view and saves it to a PDF file but that comes out blank:
I've tried grabbing the contents from the webView and View but no joy.
I've found a similar problem here How to take a screenshot when a webview finished rending regarding saving the webview to an image, but so far no luck with an OSX Solution.
Could it be something to do with the document dimensions?
or that the contents is in a subview?
maybe if you capture the View you can't capture the SubView?
Any ideas?
iOS 11.0 and above, Apple has provided following API to capture snapshot of WKWebView.
#available(iOS 11.0, *)
open func takeSnapshot(with snapshotConfiguration: WKSnapshotConfiguration?, completionHandler: #escaping (UIImage?, Error?) -> Swift.Void)
Sample usage:
func webView(_ webView: WKWebView, didFinish navigation: WKNavigation!) {
if #available(iOS 11.0, *) {
webView.takeSnapshot(with: nil) { (image, error) in
//Do your stuff with image
}
}
}
iOS 10 and below, UIWebView has to be used to capture snapshot. Following method can be used to achieve that.
func webViewDidFinishLoad(_ webView: UIWebView) {
let image = captureScreen(webView: webView)
//Do your stuff with image
}
func captureScreen(webView: UIWebView) -> UIImage {
UIGraphicsBeginImageContext(webView.bounds.size)
webView.layer.render(in: UIGraphicsGetCurrentContext()!)
let image: UIImage = UIGraphicsGetImageFromCurrentImageContext()!
UIGraphicsEndImageContext()
return image
}
Here's another relevant answer
So I kind of figured out how to solve it, it turns out you can't (especially on OSX) access and print a webview from a WKWebView.
You have to use a WebView and NOT a WKWebView (I originally started with WKWebView because a few of the articles I read said to use that).
A WebView object is pretty much similar to a WKWebView object, which is fun as hell :-)
But it gives you access to .mainFrame & .frameView which you'll need to print it's content.
Here's my code:
let webView = WebView(frame: self.view.frame)
let localfilePath = NSBundle.mainBundle().URLForResource(fileName, withExtension: "html");
let req = NSURLRequest(URL: localfilePath!);
webView.mainFrame.loadRequest(req)
self.view.addSubview(webView)
Once it's rendered I then added a 1 second delay just to make sure the content has rendered before I print it,
// needs 1 second delay
let delay = 1 * Double(NSEC_PER_SEC)
let time = dispatch_time(DISPATCH_TIME_NOW, Int64(delay))
dispatch_after(time, dispatch_get_main_queue()) {
// works!
let data = webView.dataWithPDFInsideRect(webView.frame)
let doc = PDFDocument.init(data: data)
doc.writeToFile("/Users/john/Desktop/test.pdf")
// works!
let printInfo = NSPrintInfo.sharedPrintInfo()
let printOperation = NSPrintOperation(view: webView.mainFrame.frameView, printInfo: printInfo)
printOperation.runOperation()
}
Here I'm printing it and saving it as a PDF, just so I'm doubly sure it works in all circumstances.
I'm sure it can be improved, I hate the delay hack, should replace that with some kind of callback or delegate to run when the content has fully loaded.