Is there any way to get the HTML from a web page once the JavaScript is loaded in a Flutter app? - flutter

I'm working on a URL preview widget, so I'd like to extract the meta tags from the HTML of a given URL.
However, the problem is that websites like Twitter don't return the entire HTML when they detect there's no JavaScript engine enabled (i.e. doing a GET request from the http package).
So, I'd like to know if there's any workaround for these cases, for example, using some kind of headless browser to get the entire HTML.
Thanks!

Related

AMP errors in web master tool

I have implemented AMP successfully for my webpages and google started indexing it, which I came to know via WebMaster tool. I am facing some issues which is present and disappears in short span of time.
Issue logged are:
User authored JavaScript found on page
The pages doesn't contain any script tags except schema.
This error is showing for few pages from 120 pages instead of following same
template. Below is the image link:
Have some more query:
I have observe different amp urls getting redirected to its original page when the same amp url is being used in Web Browser.
Is Google taking care of it or its on us to do the redirection?
I am planning to implement the sign in and share buttons on my web pages which will be using javascript. But if I do so, I do get validation error. So what is the right approach.
Can anyone please help me on this?
Please ensure that all script tags are of type application/ld+json. There should be no executable code in these script tags.
Redirection is something that you must be doing on your end. Google doesn't do any sort of redirection from AMP to non-amp pages if the URL is hit directly. In fact that URL schema that Google uses in their carousel is entirely their own, and just includes the path to your page inside it. E.g. https://cdn.ampproject.org/v/www.yoursitehere.com/path/to/article.html
Social sharing using Javascript inserted in the page is not allowed, as no Javascript is allowed. If you want to use social sharing, use a non-javascript implemention, or try out the amp-social-share
thanks for the response. As per the query which I asked
Please ensure that all script tags are of type application/ld+json. There should be no executable code in these script tags - I am not using any Script as of now except amp only
Redirection is something that you must be doing on your end. Google doesn't do any sort of redirection from AMP to non-amp pages if the URL is hit directly. In fact that URL schema that Google uses in their carousel is entirely their own, and just includes the path to your page inside it. E.g. https://cdn.ampproject.org/v/www.yoursitehere.com/path/to/article.html -
Understood
Social sharing using Javascript inserted in the page is not allowed, as no Javascript is allowed. If you want to use social sharing, use a non-javascript implementation, or try out the amp-social-share - Implemented Social Share and its working fine
Can we implement AMP for eCommerce sites where a lot of JavaScript, forms, plugins can be included? As of my knowledge AMP wants to keep it simple and thus restrict as many JavaScript, form tag is not valid only. So is there any chance we can implement AMP on eCommerce sites.

How to manipulate the meta area of the HTML dom with Scala-JS for a single page application

General Scala-JS page building advice needed. Most of the examples seem to be of the pattern where the main into which your single page application will go is between the tags in a landing page html file. How do you handle the need to insert something in the meta area of the dom? Do I need to render my landing page dynamically from the server to accomplish this? My specific need is to inject a script tag into the meta area of an already defined static html page. I'm using scalajs-react.
Generally you will want a server-rendered "root page" for the SPA. This allows you to dynamically compute proper cache busting file names for your script and stylesheet tags and to easily manage the cache expiration of the root page. Also, for proper html5 push state support you'll want to serve that page at every URL, which is easily done with a server side route.

YQL console not returning complete html page

ok i will keep it simple
I put the following query in YQL console
select * from html where url="https://twitter.com/laurenlemon/status/470403949980549121"
The twitter site in the query is a list of tweets which i want to pull using YQL.
The response in the console contained only the html tags and few contents of some of the html tags, but not a single tweets of any user was visible in any html element in response of the YQL console window.
I dont know what i am doing wrong.
OK guys i figured it out, YQL only has ability to scrape html loaded contents, now the contents that are loaded by AJAX like requests, so i had to go with selenium cum phantomJS that has ability to emulate a real browser and navigate to sites and scrape functions.
Anyone looking to scrape ajax loaded content can refer the selenium docs here, its very easy to use, step by step guide to scraping AJAX content.

Grab contents of a webpage in GWT

Let's say that I have a link to a webpage that contains some text. What's the easiest way to grab this text to process?
Thanks.
Long story short, I don't think it's possible to make a request from the client js to grab the text from a url with a different domain.
It is possible to make requests to load json. This link describes how.
Basically, the steps are:
Embed a tag in the GWT page
after GWT page is initialized, update
the script tag's src to load remote
url
remote url returns some json data
padded inside a callback javascript
function such as:
callback({blah:foo})
So, you're only option may be writing a method on the server side that loads the url, gets the text. You could then call this method from gwt client using normal rpc technique.
Assuming same origin: use the "RequestBuilder" class.
If you are trying to grab a webpage from a different origin, then it obviously won't work.

Just can't seem to fetch the mobile Gmail html, what is wrong?

I'm trying to cache the mobile Gmail webpage because UIWebView does not cache the content itself (mobile safari does, but not UIWebView).
I tried the methods listed here Reading HTML content from a UIWebView basically saving the html either directly from URLRequest or from UIWebView itself. When I try to put the html saved back into UIWebView it is not the same page!
This is the page that I want to save
alt text http://img39.imageshack.us/img39/5679/screenshot20090830at123.png
This is the page that the html saved will display
alt text http://img39.imageshack.us/img39/8734/screenshot20090830at122.png
If you're loading using loadData:MIMEType:textEncodingName:baseURL: make sure you're setting baseURL correctly - that way, the WebView will know where to look for relative stylesheets and so on.
Edit: For example, if I was saving this page, I'd set the base URL to Just can't seem to fetch the mobile Gmail html, what is wrong?.
That looks like the same page to me, but with different stylesheets attached. If you're just re-displaying identical HTML from your local server, the relative stylesheet paths in Google's HTML would no longer be correct. Also, any AJAX requests meant to run after the page loads would no longer work (both because the relative paths to the scripts would be wrong, and also because Cross-Site Scripting restrictions would prevent them from contacting Google).
Attempting to scrape content from an AJAX-enabled application is no small undertaking. You'd have to replicate a lot of GMail's functionality to truly reproduce the exact page Google presents.