I can get access to the entire HTML for any URL by opening dev-tools and typing:
document.documentElement
I am trying to replicate the same behavior using puppeteer, however, the snippet below returns {}
const puppeteer = require('puppeteer'); // v 1.1.0
const iPhone = puppeteer.devices['Pixel 2 XL'];
async function start(canonical_url) {
const browserURL = 'http://127.0.0.1:9222';
const browser = await puppeteer.connect({browserURL});
const page = await browser.newPage();
await page.emulate(iPhone);
await page.goto(canonical_url, {
waitUntil: 'networkidle2',
});
const data = await page.evaluate(() => document.documentElement);
console.log(data);
}
returns:
{}
Any idea on what I could be doing wrong here?
Related
My App ID is added to my react.js like that:
import * as Realm from "realm-web";
const REALM_APP_ID = "memeified_data-knivd";
const app = new Realm.App({ id: REALM_APP_ID });
My getAllData works well using the MongoDB App terminal:
But when I use the following code:
const [mainData, setData] = useState(null)
useEffect(() => {
const fetchData = async () => {
const data = await user.functions.getAllData()
setData(data);
}
fetchData()
.catch(console.error);;
}, [])
The code returns this console error:
TypeError: Cannot read properties of null (reading 'functions')
What could go wrong here?
I have a hapijs project which is using the hapi-mongodb plugin.
In the handler I am using the hapi-mongodb plugin to make db calls. See below
internals.getById = async (request, h) => {
try {
const db = request.mongo.db;
const ObjectId = request.mongo.ObjectID;
const query = {
_id: ObjectId(request.params.id)
};
const record = await db.collection(internals.collectionName).findOne(query);
//etc.....
I want to be able to test this using server.inject(), but I am not sure how to stub the request.mongo.db and the request.mongo.ObjectID
it('should return a 200 HTTP status code', async () => {
const server = new Hapi.Server();
server.route(Routes); //This comes from a required file
const options = {
method: 'GET',
url: `/testData/1`
};
//stub request.mongo.db and request.mongo.ObjectID
const response = await server.inject(options);
expect(response.statusCode).to.equal(200);
});
Any ideas?
I worked this out and realised that the mongo plugin decorates the server object which can be stubbed.
I am trying to get change my request url and see the new url in the response
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', interceptedRequest => {
if (interceptedRequest.url().includes('some-string')) {
interceptedRequest.respond({
status: 302,
headers: {
url: 'www.new.url.com'
},
})
}
interceptedRequest.continue()
});
page.on('response', response => {
console.log(response.url())
})
await page.goto('www.orginal.url.com')
// some code omitted
})();
In the interceptedRequest.respond method I'm trying to update the value of the url. Originally I was trying:
interceptedRequest.continue({url: 'www.new.url.com'})
but that way is not long supported in the current version of Puppeteer.
I was expecting to get www.new.url.com in the response, but I actually get the orignial url with www.new.url.com appended to the end.
Thanks in advance for any help.
It helped me. You need to change url to location
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', interceptedRequest => {
if (interceptedRequest.url().includes('some-string')) {
interceptedRequest.respond({
status: 302,
headers: {
location: 'www.new.url.com'
},
})
}
});
page.on('response', response => {
console.log(response.url())
})
await page.goto('www.orginal.url.com')
// some code omitted
})();
I am using Puppeteer to generate PDF files from HTML strings.
Reading the documentation, I found two ways of generating the PDF files:
First, passing an url and call the goto method as follows:
page.goto('https://example.com');
page.pdf({format: 'A4'});
The second one, which is my case, calling the method setContent as follows:
page.setContent('<p>Hello, world!</p>');
page.pdf({format: 'A4'});
The thing is that I have 3 different HTML strings that are sent from the client and I want to generate a single PDF file with 3 pages (in case I have 3 HTML strings).
I wonder if there exists a way of doing this with Puppeteer? I accept other suggestions, but I need to use chrome-headless.
I was able to do this by doing the following:
Generate 3 different PDFs with puppeteer. You have the option of saving the file locally or to store it in a variable.
I saved the files locally, because all the PDF Merge plugins that I found only accept URLs and they don't accept buffers for instance. After generating synchronously the PDFs locally, I merged them using PDF Easy Merge.
The code is like this:
const page1 = '<h1>HTML from page1</h1>';
const page2 = '<h1>HTML from page2</h1>';
const page3 = '<h1>HTML from page3</h1>';
const browser = await puppeteer.launch();
const tab = await browser.newPage();
await tab.setContent(page1);
await tab.pdf({ path: './page1.pdf' });
await tab.setContent(page2);
await tab.pdf({ path: './page2.pdf' });
await tab.setContent(page3);
await tab.pdf({ path: './page3.pdf' });
await browser.close();
pdfMerge([
'./page1.pdf',
'./page2.pdf',
'./page3.pdf',
],
path.join(__dirname, `./mergedFile.pdf`), async (err) => {
if (err) return console.log(err);
console.log('Successfully merged!');
})
I was able to generate multiple PDF from multiple URLs from below code:
package.json
{
............
............
"dependencies": {
"puppeteer": "^1.1.1",
"easy-pdf-merge": "0.1.3"
}
..............
..............
}
index.js
const puppeteer = require('puppeteer');
const merge = require('easy-pdf-merge');
var pdfUrls = ["http://www.google.com","http://www.yahoo.com"];
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
var pdfFiles=[];
for(var i=0; i<pdfUrls.length; i++){
await page.goto(pdfUrls[i], {waitUntil: 'networkidle2'});
var pdfFileName = 'sample'+(i+1)+'.pdf';
pdfFiles.push(pdfFileName);
await page.pdf({path: pdfFileName, format: 'A4'});
}
await browser.close();
await mergeMultiplePDF(pdfFiles);
})();
const mergeMultiplePDF = (pdfFiles) => {
return new Promise((resolve, reject) => {
merge(pdfFiles,'samplefinal.pdf',function(err){
if(err){
console.log(err);
reject(err)
}
console.log('Success');
resolve()
});
});
};
RUN Command: node index.js
pdf-merger-js is another option. page.setContent should work just the same as a drop-in replacement for page.goto below:
const PDFMerger = require("pdf-merger-js"); // 3.4.0
const puppeteer = require("puppeteer"); // 14.1.1
const urls = [
"https://news.ycombinator.com",
"https://en.wikipedia.org",
"https://www.example.com",
// ...
];
const filename = "merged.pdf";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const merger = new PDFMerger();
for (const url of urls) {
await page.goto(url);
merger.add(await page.pdf());
}
await merger.save(filename);
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
I'm trying to get all styles for all nodes on page and for that i want to use CSS.getMatchedStylesForNode from devtool-protocol, but its only working for one node. If loop through an array of nodes i get a lot of warning in console(code below) and nothing is returned. What i'm doing wrong ?
warning in console:
(node:5724) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 11): Error: Protocol error (CSS.getMatchedStylesForNode): Target closed.
my code
'use strict';
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page._client.send('DOM.enable');
await page._client.send('CSS.enable');
const doc = await page._client.send('DOM.getDocument');
const nodes = await page._client.send('DOM.querySelectorAll', {
nodeId: doc.root.nodeId,
selector: '*'
});
const styleForSingleNode = await page._client.send('CSS.getMatchedStylesForNode', {nodeId: 3});
const stylesForNodes = nodes.nodeIds.map(async (id) => {
return await page._client.send('CSS.getMatchedStylesForNode', {nodeId: id});
});
console.log(JSON.stringify(stylesForNodes));
console.log(JSON.stringify(styleForSingleNode));
await browser.close();
})();
Puppeteer version: 0.13.0
Platform: Window 10
Node: 8.9.3
Works using for of loop
const stylesForNodes = []
for (id of nodes.nodeIds) {
stylesForNodes.push(await page._client.send('CSS.getMatchedStylesForNode', {nodeId: id}));
}