How to dump more than <body> on chrome / chromium headless?

How to dump more than <body> on chrome / chromium headless? - dom

Chrome's documentation states:
The --dump-dom flag prints document.body.innerHTML to stdout:
As per the title, how can more of the DOM object (ideally all) be dumped with Chromium headless? I can manually save the entire DOM via the developer tools, but I want a programmatic solution.

Update 2019-04-23 Google was very active on headless front and many updates happened
The answer below is valid for the v62 current version is v73 and it's updating all the time.
https://www.chromestatus.com/features/schedule
I highly recommend checking puppeteer for any future development with headless chrome. It is maintained by Google and it installs required Chrome version together with npm package so you just use puppeteer API by the docs and not worry about Chrome versions and setting up the connection between headless Chrome and dev tools API which allows doing 99% of the magic.
Repo: https://github.com/GoogleChrome/puppeteer
Docs: https://pptr.dev/
Update 2017-10-29 Chrome has already --dump-html flag which returns full HTML, not only body.
v62 does have it, it is already on stable channel.
Issue which fixed this: https://bugs.chromium.org/p/chromium/issues/detail?id=752747
Current chrome status (version per channel) https://www.chromestatus.com/features/schedule
Leaving old answer for legacy
You can do it with google chrome remote interface. I have tried it and
wasted couple hours trying to launch chrome and get full html,
including title and it is just not ready yet, i would say.
It works sometimes but i've tried to run it in production environment
and got errors time to time. All kind of random errors like
connection reset and no chrome found to kill. Those errors rised
up sometimes and it's hard to debug.
I personally use --dump-dom to get html when i need body and when i
need title i just use curl for now. Of course chrome can give you
title from SPA applications, which can not be done with only curl if
title is set from JS. Will switch to google chrome after having stable
solution.
Would love to have --dump-html flag on chrome and just get all html.
If Google's engineer is reading this, please add such flag to chrome.
I've created issue on Chrome issue tracker, please click favorite "star" to get noticed by google developers:
https://bugs.chromium.org/p/chromium/issues/detail?id=752747
Here is a long list of all kind of flags for chrome, not sure if it's
full and all flags:
https://peter.sh/experiments/chromium-command-line-switches/ nothing
to dump title tag.
This code is from Google's blog post, you can try your luck with this:
const CDP = require('chrome-remote-interface');
...
(async function() {
const chrome = await launchChrome();
const protocol = await CDP({port: chrome.port});
// Extract the DevTools protocol domains we need and enable them.
// See API docs: https://chromedevtools.github.io/devtools-protocol/
const {Page, Runtime} = protocol;
await Promise.all([Page.enable(), Runtime.enable()]);
Page.navigate({url: 'https://www.chromestatus.com/'});
// Wait for window.onload before doing stuff.
Page.loadEventFired(async () => {
const js = "document.querySelector('title').textContent";
// Evaluate the JS expression in the page.
const result = await Runtime.evaluate({expression: js});
console.log('Title of page: ' + result.result.value);
protocol.close();
chrome.kill(); // Kill Chrome.
});
})();
Source:
https://developers.google.com/web/updates/2017/04/headless-chrome

You are missing --headless to get stdout.
chromium --incognito \
--proxy-auto-detect \
--temp-profile \
--headless \
--dump-dom https://127.0.0.1:8080/index.html
Pipe it all in | html2text to recompile html into text.

Related

Using Selenium WebDriver, Pester, and PowerShell to get and verify the network response after clicking a button

I managed to use PowerShell 7.1 and Selenium WebDriver and Module to control the Chrome Browser. I need now to access the network response of a post request which will be invoked after the button is clicked from Selenium.
I found some good info here, here, here, and here , however, it is for Python and Java so I need to convert some code to PowerShell. I hope you can help me out.
The code snippet below from one of the above-mentioned sources is where I am having difficulties with:
...
options = webdriver.ChromeOptions()
options.add_argument("--remote-debugging-port=8000")
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)
dev_tools = pychrome.Browser(url="http://localhost:8000")
tab = dev_tools.list_tab()[0]
tab.start()
...
Specifically, this part dev_tools = pychrome.Browser(url="http://localhost:8000")
Below is another code snippet I got from one of the above-mentioned sources:
ChromeDriver driver = new ChromeDriver();
DevTools devTool = driver.getDevTools()
devTool.createSession();
devTool.send(Network.enable(Optional.empty(), Optional.empty(), Optional.empty()));
devTool.addListener(Network.responseReceived(), <lamda-function>)
So it is clear the Selenium 4 has support for DevTools, but I am not finding the main C# documentation so that I can use it with PowerShell.
Finally, after accessing the response, I need to verify it using PowerShell Pester.
I appreciate your help.

Netlify deploy can't connect to Heroku backend

I've built a wee program that works fine when I run it locally. I've deployed the backend to Heroku, and I can access that either by going straight to the URL (http://gymbud-tracker.herokuapp.com/users) or when running the frontend locally. So far so good.
However, when I run npm run-script build and deploy it to Netlify, something goes wrong, and any attempt to access the server gives me the following error in the console:
auth.js:37 Error: Network Error
at e.exports (createError.js:16)
at XMLHttpRequest.p.onerror (xhr.js:99)
The action that is pushing that error is the following, if it is relevant:
export const signin = (formData, history) => async (dispatch) => {
try {
const { data } = await api.signIn(formData);
dispatch({ type: AUTH, data });
history.push("../signedin");
} catch (error) {
console.log(error);
}
};
I've been tearing my hair out trying to work out what is changing when I build and deploy, but cannot work it out.
As I say, if I run the front end locally then it access the Heroku backend no problem - no errors, and working exactly as I'd expect. The API call is correct, I believe: const API = axios.create({baseURL: 'http://gymbud-tracker.herokuapp.com/' });
I wondered if it was an issue with network access to the MongoDB database that Heroku is linked to, but it's open to "0.0.0.0/0" (I've not taken any security precautions yet, don't kill me!). The MDB database is actually in the same collection as other projects I've used, that haven't had this problem at all.
Any ideas what I need to do?
The front end is live here: https://gym-bud.netlify.app/
And the whole thing is deployed to GitHub here: https://github.com/gordonmaloney/gymbud

Your issue is CORS (Cross-Origin Resource Sharing). When I visit your site and inspect the page, I see the following error in the JavaScript console which is how I know this:
This error essentially means that your public-facing application (running live on Netlify) is trying to make an HTTP request from your JavaScript front-end to your Heroku backend deployed on a different domain.
CORS dictates which requests from a frontend application are ALLOWED to make a request to your backend API.
What you need to do to fix this is to modify your Heroku application and have it return the appropriate Access-Control-Allow-Origin header. This article on MDN explains the header and how you can use it.
Here's a simple example of the header you could set on your Heroku backend to allow this to work:
Access-Control-Allow-Origin: *
Please be sure to read the MDN documentation, however, as this example will allow any front-end application to make requests to your Heroku backend when in reality, you'll likely want to restrict it to just the front-end domains you build.

God I feel so daft, but at least I've worked it out.
I looked at the console on a different browser (Edge), and it said it was blocking it because it was mixed origin, and I realised I had just missed out the s in the https on my API call, so it wasn't actually a cors issue (I don't think?), but just a typo on my part!
So I changed:
const API = axios.create({baseURL: 'http://gymbud-tracker.herokuapp.com' });
To this:
const API = axios.create({baseURL: 'https://gymbud-tracker.herokuapp.com' });
And now it is working perfectly ☺️
Thanks for your help! Even if it wasn't the issue here, I've definitely learned a lot more about cors on the way, so that's good

IBM Watson api.eu-gb.dataplatform.cloud.ibm.com’s server IP address could not be found?

I start learning data science in IBM and I enroll in the course where we had to join in IBM cloud (lite) and then to create Watson studio(lite). I just followed the document and after creating Watson studio (lite) I got this error:
api.eu-gb.dataplatform.cloud.ibm.com’s server IP address could not be found
It prevent me from seing the button "Get start" which I must click on
I tried to see solution but I didn't find anything and also I talked with watson in chatbox you provide support but the solution watson give didn't help me.
Link to course:
https://developer.ibm.com/digitalnation/africa/skills/innovator-predict-employee-turnover-using-ibm-watson-studio/?module=02.03
For more info see my screenshot: "click here" to see the screen shot i take it

Investigation
I am unable to reproduce the issue.
The Watson Studio Instance Dashboard loads successfully for me in the London data center.
I highly suspect that your current web browser settings might be preventing the iframe'd content area from loading.
Suggested Follow-up Actions
(a) Try disabling your Ad blocker and reloading the page (ad blockers are known to block content from loading in iframes).
(b) Try loading the page in a new incognito window (this will disable all browser extensions).
(c) Try loading the page in another web browser (e.g. Firefox, Edge, Safari).
(d) Try clearing your cookies for the cloud.ibm.com domain and reloading the page.
(e) In your web browser, type Control+Shift+J (on Windows) or Command+Option+J (on Mac) to open the Developer Tools and take a screenshot of any errors you see in the Console tab (and post it back here).
Reference: https://developers.google.com/web/tools/chrome-devtools/shortcuts
(f) Try visiting the direct URL below to see if the page loads.
Direct URL
https://api.eu-gb.dataplatform.cloud.ibm.com/dsx-service-broker/ui/console?plan_id=40073cbd-2d60-4a65-a32d-3b1d11794cc6
Work-around
Go directly to the Watson Studio Registration Page (which is the URL for the Get Started button)
https://eu-gb.dataplatform.cloud.ibm.com/registration/steptwo?redirectIfAccountExists=True&apps=watson_studio

Can you run dig api.eu-gb.dataplatform.cloud.ibm.com or nslookup api.eu-gb.dataplatform.cloud.ibm.com? I am guessing its an issue with your DNS provider on your current network. Can you try another network or wifi connection? For reference here is the nslookup I just did with it.
nslookup api.eu-gb.dataplatform.cloud.ibm.com
Server: 10.xxx.xxx.xxx
Address: 10.xxx.xxx.xxx#53
Non-authoritative answer:
api.eu-gb.dataplatform.cloud.ibm.com canonical name = api.watson-data-prod-lon.eu-gb.containers.appdomain.cloud.
api.watson-data-prod-lon.eu-gb.containers.appdomain.cloud canonical name = watson-data-prod-lon.eu-gb.containers.appdomain.cloud.
Name: watson-data-prod-lon.eu-gb.containers.appdomain.cloud
Address: 158.175.100.214
Name: watson-data-prod-lon.eu-gb.containers.appdomain.cloud
Address: 158.176.125.222
Name: watson-data-prod-lon.eu-gb.containers.appdomain.cloud
Address: 141.125.66.126

try using vpn supporting browser like opera it worked with me since I'm from Egypt and had the same error

Update action package with gactions always returns request timeout

I created a project under actions console and made a test action package for smart home app. I want to try uploading the action package I have using gactions. However, every time I execute this command
./gactions --verbose update --action_package action.json --project my_project_id
the result is always like this:
Unable to update: Patch https://actions.googleapis.com/v2/agents/my_project_id?updateMask=agent.draftActionPackage.actions%2Cagent.draftActionPackage.conversations&validateOnly=false: Post https://accounts.google.com/o/oauth2/token: dial tcp 216.58.200.45:443: i/o timeout
I checked the verbose log and I noticed that it is reading some data from creds.data
Reading credentials from: creds.data
Then I noticed the contents in creds.data contains the access token and the expiry time. But the expiry time is july 18, which is a lot of days from now. I am not sure if this is the case that causes timeout error. And I also don't know how to update the creds.data to get a new access token.

Alright. I noticed that a part of this error is my net problem. But I was able to open yahoo and other sites, while the update just didn't work. But nevermind, I just switched to a different Wi-Fi.
Then I deleted the creds.data. And executed the update command again, this will come out.
Gactions needs access to your Google account. Please copy & paste the URL below into a web browser and follow the instructions there. Then copy and paste the authorization code from the browser back here.
Visit this URL:
https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=237807841406-o6vu1tjkq8oqjub8jilj6vuc396e2d0c.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fassistant+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Factions.builder&state=state
Enter authorization code:
Then I followed the instructions above, got the authorization code, copied and pasted it in the console, and everything works fine now.

CakePHP Facebook plugin error in HTTPS not in HTTP

I have set up the facebook plugin like the instructions.
Downloaded package and installed in app/plugins/facebook
Created app/config/facebook.php with my app's id, key and secret numbers, based on the example config file
Included $helpers = array('Facebook.Facebook') in my app_controller.php
Echoed the $this->Facebook->html() function in my layout (replcing the default html tag)
Echoed the $this->Facebook->init() function at the bottom of the layout, before
I run this code:
echo $this->Facebook->share('link');
If I go to: http://myhomepage.com it works but if I go to the same with HTTPS I just get a text share. In IE 10 and Chrome I get a Not safe content error. If I accept it it works.
How do I run it over SSL? I have bought the certificate, so it is valid and not a home-made self-signed one. I am running Cake PHP version 2.
I have tried to search the web, but I just find problems with the login function and not with the share button.

Please follow the below links for reference.
http://www.afbtemplates.com/tutorials/common-problems-facebook-secure-connections
http://developers.facebook.com/blog/post/497/
Do I need to support ssl on my site that allows login through facebook connect