I'd like to either remove an HTML element or simply remove first N characters of a webpage before evaluating/rendering it.
Is there any way to do that?
It depends on multiple scenarios. I will only outline the steps for each combination of the answers to the following questions.
Is the piece of JS called onload (ol) or is the script block immediately evaluated (ie)?
Is it an inline script (is) or is the script loaded separately (src attribute) (ls)?
Does the script block also contain some code that should not be removed (nr) or can it be removed completely (rc)?
1. Script is loaded separately (ls) & code can be removed completely (rc)
Register to the onResourceRequested listener and request.abort() depending on the matched url.
2. Script is loaded separately (ls) & contains other code too (nr)
This can only be done when the following code blocks do not depend on the code that should not be removed (which is unlikely). This is most likely necessary for click events that are registered in the DOM.
In this case cancel the request like in 1., download the script through an XHR, remove the unwanted code parts and add code block to the DOM. For this to work, you would need to disable web security, because otherwise no resource can be requested if it is not on the same domain: --web-security=false.
3. Script is loaded with the DOM (is) & JS executed through onload (ol) & can be removed completely (rc)
This is probably very error prone. You would begin an Interval with setInterval(function(){}, 5) from a page.onInitialized callback. Inside the interval you would need to check if window.onload (or something else you can get your hands on) is set in the page context. You remove it, if it is indeed the function that you wanted to remove by checking window.onload.toString().match(/something/).
This can be done directly and completely inside the page context (inside page.evaluate).
4. Script is loaded with the DOM (is) & JS executed through onload (ol) & contains other code too (nr)
Begin like in 3., but instead of removing window.onload, you can do
eval("window.onload = " + window.onload.toString().replace(/something/,''))
5. Script is loaded with the DOM (is) & the script block immediately evaluated (ie)
You can load the page as an XHR, replace the text and apply the adjusted content to the page. This will essentially be a filled about:blank page. For this to work, you would need to disable web security, because otherwise no resource can be requested if it is not on the same domain: --web-security=false or --local-to-remote-url-access=true. This would also work for 3. and 4..
There is still one problem though. Pages don't use full URLs most of the time. So when a script or element refers to stuff.php PhantomJS cannot request it. When the page.content is set then the page URL is essentially about:blank and all requests with incomplete URLs point to file:///.... Obviously there are no such files. Those resources must be replaced with their full URL counterparts.
There are three types of such URLs:
//example.com/resource.php variable protocol
/resource.php variable protocol and domain
resource.php variable protocol, domain and path to resource
Complete example:
var page = require('webpage').create(),
url = 'http://www.example.com';
page.open(url, function(status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
var content = page.evaluate(function(url){
var xhr = new XMLHttpRequest();
xhr.open("GET", url, false);
xhr.send();
return xhr.responseText;
}, url);
page.render("test_example.png");
page.content = content.replace(/xample/g,"asy");
page.render("test_easy.png");
console.log("url "+page.url); // about:blank
phantom.exit();
}
});
You might want to look into proper manipulation techniques apart from the simple string replace.
Related
There a few similar questions, but none of them have really gotten at what I'm asking.
I have a browser action popup. In the popup, I want to display settings if you're on a page where the content script has been injected (i.e., any page that matches the matches key within the content_scripts in the `manifest).
If I'm on a page that doesn't match the content_scripts matches pattern (and so wasn't injected), I just want to display a generic message "this plugin activates when you're on so-and-so sites".
What is the cleanest way to do it, without adding any unnecessary permissions?
It seems like one option is sending a message to a content script in the active tab, and seeing if I get a reply, but that seems really.. hacky. I should be able to know just based on a regex if I'm on one of the domains that matches my content script.
I'm looking for something that works in both manifest v2 and v3, btw.
TL;DR;
What's the simplest way to display a "you're on a page that matches your content_script" or "you're not on a page that matches your content_script" in a browser_action popup?
I build chrome extensions full time for an agency and have had projects where I needed to do exactly what you're asking.
The solution can be implemented w/o any permissions whatsoever. I built mine locally with an empty array for permissions. (for mv3)
for popup.html just create 2 divs and have them default to display none.
<div id="unsupported" style="display: none;">Ooops! This is not a supported site.</div>
<div id="supported" style="display: none;">Wohoo! This is a supported site!!!!!</div>
for your script.js, wait till the popup loads then query the active tab in the current window and get that tab's ID to send a message directly to it. If the tab is supported with a content script, it will send a true response (see last code snippet). If it wasn't supported, it will be an 'undefined' response.
async function setUI() {
let tabData = await chrome.tabs.query({ active: true, currentWindow: true })
let tabId = tabData[0].id // tabs.query returns an array, but we filtered to active tab within current window which yields only 1 object in the array
chrome.tabs.sendMessage(tabId, {
'message': 'isSupported'
}, (response) => {
console.log(response)
// response will be true if the message was successfuly sent to the tab and "undefined" if the message was never received (i.e. not supported w/ your content script)
if (response) return showSupportedHTML()
// else
showUnsupportedHTML()
})
}
function showSupportedHTML() {
document.querySelector('#supported').style['display'] = ''
}
function showUnsupportedHTML() {
document.querySelector('#unsupported').style['display'] = ''
}
window.addEventListener('DOMContentLoaded', () => {
setUI()
})
Lastly, in your content script, add a message listener to receive the message 'isSupported' that comes in from your content script. If the content script receives that message, have it send a response back with 'true'.
chrome.runtime.onMessage.addListener(function (request, sender, sendResponse) {
if (request.message == 'isSupported') {
console.log('run')
sendResponse(true)
}
})
Now, this of course only works for manifest v3 because as far as I know you can't use chrome.tabs.query for mv2. However, I recommend this solution as I've implemented pretty much this exact same code in other projects for clients and it's never had any issues.
I could look into a solution for mv2, though using the "activeTab" permission would be the right way to do it, I believe. Now, if you really don't want to go that route then you could implement a rather hacky solution. For example, you could use window 'focus' and window 'blur' events to see when a user has entered or left a tab. Then set a local storage variable every time a user enters / leaves a supported page. The order of operations for blur and focus is always blur => focus. So, when the blur event occurs you set a local storage variable to false. However, if you leave a supported tab for another supported tab then the 'focus' event will trigger immediately afterwards so you can set that same storage variable back to true.
Now, your content script will load after the tab has been focused so you'll need to add a function for when the page loads. You can run something like document.hidden and if that returns true, do nothing because the user already left this tab. If it returns false, then the user is still on the tab and you can set your local storage variable to true.
When the user opens the popup, you'll check that local storage variable and if its true or false, you can set the UI accordingly.
Let me know if the mv2 solution made sense or sounds too hacky. Happy to look into it more! :)
edit: Here is the code for mv2, I tested it and it does work and without any permissions, other than storage which is not an invasive permission.
Script.js for the mv2 popup:
async function setUI() {
chrome.storage.local.get(['isSupported'], function (response) {
console.log(response['isSupported'])
// response will be true if the message was successfuly sent to the tab and "undefined" if the message was never received (i.e. not supported w/ your content script)
if (response['isSupported']) return showSupportedHTML()
// else
showUnsupportedHTML()
})
}
function showSupportedHTML() {
document.querySelector('#supported').style['display'] = ''
}
function showUnsupportedHTML() {
document.querySelector('#unsupported').style['display'] = ''
}
window.addEventListener('DOMContentLoaded', () => {
setUI()
})
code for the content script in mv2:
if (!document.hidden) chrome.storage.local.set({'isSupported': true})
window.addEventListener('blur', () => {
console.log('left site')
chrome.storage.local.set({'isSupported': false})
})
window.addEventListener('focus', () => {
console.log('entered site')
chrome.storage.local.set({'isSupported': true})
})
Let me know if you have any additional questions.
Disclaimer: I have no prior browser extension development experience and am just going by the docs. I might be spouting nonsense or giving an answer that is plainly against your requirements, but that would be out of ignorance and not malicious intent. If you find my answer problematic, comment, or cast a vote and move on.
According to MDN, the activeTab permission allows to read the active tab's Tab.url property. One solution could be to request that permission, and then use that API to get the active tab's URL, and then use the same regex from the manifest.json's matches property to test for a match, and then use that information to modify your extension's browser_action UI.
You should be able to read the matches property from the manifest file via the .runtime.getManifest() API. MDN docs, chrome docs.
Snippet to get active tab in a background script: tabs.query({active: true}). (link to MDN docs). A content script should instead use tabs.getCurrent and the Tab.active property of the resolved result.
If you don't want to request the activeTab permission, what you're suggesting with the message-passing between the browser_action scripts and the content scripts might be the right way to go, but I don't know for a fact. The tabs.onActivated event would probably be useful with this approach. Note that to send a message from a background script to a content script, you need to use tabs.sendMessage (MDN docs, chrome docs) instead of runtime.sendMessage.
Another possible (maybe?) approach would be to listen for the tab change in the content script and then send the notification message from the content script to the extension's background scripts via the onfocus event (or similar events), and runtime.sendMessage.
If you go with a messaging-related approach, you might want to put a condition in the content script to only do messaging if the content script is in the top frame of the tab (Ie. iframes don't do messaging), since only one frame of the tab really needs to do this kind of messaging when the active tab changes, and content scripts can be applied to all frames in a browsing context.
Of these possible solutions I can think of, I don't know which is best for you, since you want both minimal permission requirements and a simple/clean approach, and each seems to be a tradeoff.
Essentially I want to have a script execute when the contents of a DIV change. Since the scripts are separate (content script in the Chrome extension & webpage script), I need a way simply observe changes in DOM state. I could set up polling but that seems sloppy.
For a long time, DOM3 mutation events were the best available solution, but they have been deprecated for performance reasons. DOM4 Mutation Observers are the replacement for deprecated DOM3 mutation events. They are currently implemented in modern browsers as MutationObserver (or as the vendor-prefixed WebKitMutationObserver in old versions of Chrome):
MutationObserver = window.MutationObserver || window.WebKitMutationObserver;
var observer = new MutationObserver(function(mutations, observer) {
// fired when a mutation occurs
console.log(mutations, observer);
// ...
});
// define what element should be observed by the observer
// and what types of mutations trigger the callback
observer.observe(document, {
subtree: true,
attributes: true
//...
});
This example listens for DOM changes on document and its entire subtree, and it will fire on changes to element attributes as well as structural changes. The draft spec has a full list of valid mutation listener properties:
childList
Set to true if mutations to target's children are to be observed.
attributes
Set to true if mutations to target's attributes are to be observed.
characterData
Set to true if mutations to target's data are to be observed.
subtree
Set to true if mutations to not just target, but also target's descendants are to be observed.
attributeOldValue
Set to true if attributes is set to true and target's attribute value before the mutation needs to be recorded.
characterDataOldValue
Set to true if characterData is set to true and target's data before the mutation needs to be recorded.
attributeFilter
Set to a list of attribute local names (without namespace) if not all attribute mutations need to be observed.
(This list is current as of April 2014; you may check the specification for any changes.)
Edit
This answer is now deprecated. See the answer by apsillers.
Since this is for a Chrome extension, you might as well use the standard DOM event - DOMSubtreeModified. See the support for this event across browsers. It has been supported in Chrome since 1.0.
$("#someDiv").bind("DOMSubtreeModified", function() {
alert("tree changed");
});
See a working example here.
Many sites use AJAX/XHR/fetch to add, show, modify content dynamically and window.history API instead of in-site navigation so current URL is changed programmatically. Such sites are called SPA, short for Single Page Application.
Usual JS methods of detecting page changes
MutationObserver (docs) to literally detect DOM changes.
Info/examples:
How to change the HTML content as it's loading on the page
Performance of MutationObserver to detect nodes in entire DOM.
Lightweight observer to react to a change only if URL also changed:
let lastUrl = location.href;
new MutationObserver(() => {
const url = location.href;
if (url !== lastUrl) {
lastUrl = url;
onUrlChange();
}
}).observe(document, {subtree: true, childList: true});
function onUrlChange() {
console.log('URL changed!', location.href);
}
Event listener for sites that signal content change by sending a DOM event:
pjax:end on document used by many pjax-based sites e.g. GitHub,
see How to run jQuery before and after a pjax load?
message on window used by e.g. Google search in Chrome browser,
see Chrome extension detect Google search refresh
yt-navigate-finish used by Youtube,
see How to detect page navigation on YouTube and modify its appearance seamlessly?
Periodic checking of DOM via setInterval:
Obviously this will work only in cases when you wait for a specific element identified by its id/selector to appear, and it won't let you universally detect new dynamically added content unless you invent some kind of fingerprinting the existing contents.
Cloaking History API:
let _pushState = History.prototype.pushState;
History.prototype.pushState = function (state, title, url) {
_pushState.call(this, state, title, url);
console.log('URL changed', url)
};
Listening to hashchange, popstate events:
window.addEventListener('hashchange', e => {
console.log('URL hash changed', e);
doSomething();
});
window.addEventListener('popstate', e => {
console.log('State changed', e);
doSomething();
});
P.S. All these methods can be used in a WebExtension's content script. It's because the case we're looking at is where the URL was changed via history.pushState or replaceState so the page itself remained the same with the same content script environment.
Another approach depending on how you are changing the div.
If you are using JQuery to change a div's contents with its html() method, you can extend that method and call a registration function each time you put html into a div.
(function( $, oldHtmlMethod ){
// Override the core html method in the jQuery object.
$.fn.html = function(){
// Execute the original HTML method using the
// augmented arguments collection.
var results = oldHtmlMethod.apply( this, arguments );
com.invisibility.elements.findAndRegisterElements(this);
return results;
};
})( jQuery, jQuery.fn.html );
We just intercept the calls to html(), call a registration function with this, which in the context refers to the target element getting new content, then we pass on the call to the original jquery.html() function. Remember to return the results of the original html() method, because JQuery expects it for method chaining.
For more info on method overriding and extension, check out http://www.bennadel.com/blog/2009-Using-Self-Executing-Function-Arguments-To-Override-Core-jQuery-Methods.htm, which is where I cribbed the closure function. Also check out the plugins tutorial at JQuery's site.
In addition to the "raw" tools provided by MutationObserver API, there exist "convenience" libraries to work with DOM mutations.
Consider: MutationObserver represents each DOM change in terms of subtrees. So if you're, for instance, waiting for a certain element to be inserted, it may be deep inside the children of mutations.mutation[i].addedNodes[j].
Another problem is when your own code, in reaction to mutations, changes DOM - you often want to filter it out.
A good convenience library that solves such problems is mutation-summary (disclaimer: I'm not the author, just a satisfied user), which enables you to specify queries of what you're interested in, and get exactly that.
Basic usage example from the docs:
var observer = new MutationSummary({
callback: updateWidgets,
queries: [{
element: '[data-widget]'
}]
});
function updateWidgets(summaries) {
var widgetSummary = summaries[0];
widgetSummary.added.forEach(buildNewWidget);
widgetSummary.removed.forEach(cleanupExistingWidget);
}
I want to create an "extension" for a Jacada Interaction (to extend functionality), in my case to parse and assign the numerical part of serialNumber (a letter, followed by digits) to a numeric global ("system") variable, say serialNumeric. What I am lacking is the structure and syntax to make this work, including the way to reference interaction variables from within the extension.
Here is my failed attempt, with lines commented out to make it innocuous after failing; I think I removed "return page;" after crashing, whereupon it still crashed:
initExtensions("serialNumeric", function(app){
app.registerExtension("loaded", function(ctx, page) {
// Place your extension code here
//$('[data-refname="snum"]').val('serialNumber');
// snum = Number(substring(serialNumber,1))
});
});
Here is an example of one that works:
/**
* Description: Add swiping gestures to navigate the next/previous pages
*/
initExtensions("swipe", function(app) {
// Swipe gestures (mobile only)
app.registerExtension('pageRenderer', function(ctx, page) {
page.swipe(function(evt) {
(evt.swipestart.coords[0] - evt.swipestop.coords[0] > 0)
? app.nextButton.trigger('click')
: app.backButton.trigger('click')
});
return page;
});
});
After reading the comment below, I tried the following, unsuccessfully (the modified question variable is not written back to that variable). It rendered poorly in the comment section, so I am putting it here:
initExtensions("serialNumeric", function(app){
app.registerExtension("loaded", function(ctx, page) {
var sernum = new String($('[data-refname="enter"] input'));
var snumeric = new String(sernum.substr(1));
$('[data-refname="enter"] input').val(snumeric);
});
});
I would like to understand when this code will run: it seems logical that it would run when the variable is assigned. Thanks for any insight ~
In your case, you extend loaded event. You don't have to return the page from the extension like in your working example below.
The page argument contains the DOM of the page you have just loaded, the ctx argument contains the data of the page in JSON form. You can inspect the content of both arguments in the browser's inspection tools. I like Chrome. Press F12 on Windows or Shift+Ctrl+I on Mac.
The selector $('[data-refname="snum"] input') will get you the input field from the question with the name snum that you defined in the designer. You can then place the value in the input field with the value from the serialNumber variable.
$('[data-refname="snum"] input').val(serialNumber);
You can also read values in the same way.
You can't (at this point) access interaction variables in the extension, unless you place theses variables inside question fields.
Here is a simple example how to put your own value programmatically into a input field and cause it to read it into the model, so upon next it will be sent to the server. You are welcome to try more sophisticated selectors to accommodate for your own form.
initExtensions("sample", function(app){
app.registerExtension("loaded", function(ctx, page) {
// simple selector
var i = $('input');
// set new value
i.val('some new value');
// cause trigger so we can read into our model
i.trigger('change');
});
});
I'm trying to dump the whole contents of a certain site using HTMLUnit, but when I try to do this in a certain (rather intrincate) site, I get an empty file (not an empty file per se, but it has an empty head tag, an empty body tag and that's it).
The site is https://www.abcdin.cl/abcdin/abcdin.nsf#https://www.abcdin.cl/abcdin/abcdin.nsf/linea?openpage&cat=Audio&cattxt=TV%20y%20Audio&catpos=03&linea=LCD&lineatxt=LCD%20&
And here's my code:
BufferedWriter writer = new BufferedWriter(new FileWriter(fullOutputPath));
HtmlPage page;
final WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_8);
webClient.setCssEnabled(false);
webClient.setPopupBlockerEnabled(true);
webClient.setRedirectEnabled(true);
webClient.setThrowExceptionOnScriptError(false);
webClient.setThrowExceptionOnFailingStatusCode(false);
webClient.setUseInsecureSSL(true);
webClient.setJavaScriptEnabled(true);
page = webClient.getPage(url);
dumpString += page.asXml();
writer.write(dumpString);
writer.close();
webClient.closeAllWindows();
Some people say that I need to introduce a pause in my code, since the page takes a while to load in Google Chrome, but I set long pauses and it doesn't work.
Thanks in advanced.
Just some ideas...
Retrieving that URL with wget returns a non-trivial HTML file. Likewise running your code with webClient.setJavaScriptEnabled(false). So it's definitely something to do with the Javascript in the page.
With Javascript enabled, I see from the logs that a bunch of Javascript jobs are being queued up, and I get see corresponding errors like this:
EcmaError: lineNumber=[49] column=[0] lineSource=[<no source>] name=[TypeError] sourceName=[https://www.abcdin.cl/js/jquery/jquery-1.4.2.min.js] message=[TypeError: Cannot read property "nodeType" from undefined (https://www.abcdin.cl/js/jquery/jquery-1.4.2.min.js#49)]
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "nodeType" from undefined (https://www.abcdin.cl/js/jquery/jquery-1.4.2.min.js#49)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:601)
Maybe those jobs are meant to populate your HTML? So when they fail, the resulting HTML is empty?
The error looks strange, as HtmlUnit usually has no issues with JQuery. I suspect the issue is with the code calling that particular line of the JQuery library.
I tried to ask this once, but I think that my former question was too unclear for you guys to answer, so I'll try again
I'm making a website using the Zend Framework, and am trying to include the premade messageboard Phorum. So far, I've made it work by not running it through my bootstrap using my .htaccess file. What I'd like to do i'd like to do is to be able to run it through my bootstrap so that I can use my previously created Layouts and Classes that I can only run through Zend.
For example, I have a premade sign in system that works through Zend_Auth. I have the person's data saved in Zend_Session. I load the user's profile through a controller. I have a service layer for the model that connects to my database on behalf of the user. There are several other dependencies that, as far as I can tell, I need the bootstrap for.
Phorum is basically just a large set of PHP scripts that are dependent on GET parameters. My original idea had been to use a controller to render the scripts. An example of what that URI would look like is this: My-Site.com/messageboard/list.php?1,3 with messageboard being the messageboardController. While this works for loading list, it can't capture the GET parameters, which Phorum is dependent on. Due to the complex nature of Phorum, it would be nearly impossible for me to be able to go in and make it something like My-Site.com/messageboard/list/1/3 or anything along those lines. The URI has to be the former, as it is built in to Phorum.
I have tried using frames. I got to keep my log in panel up top, and had the body of the page be a frame, but it was unbookmarkable, and the back button made everything outrageously difficult. I also couldn't get the frame to talk to the parent page in Zend well, so frames aren't an option.
Does anyone have a way that I can do this? What I need, in essence, is to take the script (ex. list.php?1,3) and place whatever it would render, after having used the 1,3 parameters, into a div in the "body" div of my layout. As far as I can tell, render doesn't seem to be able to capture the GET parameters. Does anyone know of a way I can do this.
Any ideas would be immeasurably appreciated. Thank you for your help!
This isn't a trivial thing to process, however, it is possible to write a custom route, along with some controller magic to handle this sort of thing and include the proper php file:
First of all - Your route should probably be (in ZF1.9 application.ini conventions)
resources.router.routes.phorum.type = "Zend_Controller_Router_Route_Regex"
resources.router.routes.phorum.route = "messageboard(?:/(.*))?"
resources.router.routes.phorum.defaults.controller = "phorum"
resources.router.routes.phorum.defaults.action = "wrapper"
resources.router.routes.phorum.defaults.module = "default"
resources.router.routes.phorum.defaults.page = "index.php"
resources.router.routes.phorum.map.1 = "page"
Now all requests to messageboard/whatever.php should be routed to PhorumController::wrapperAction() and have 'whatever.php' in $this->getRequest()->getParam('page')
Then it should become a simple matter of redirecting your "wrapper" action to include the proper php file from phorum. I have added some code from a similar controller I have (although mine didn't include php files - it was meant solely for serving a directory of content)
public function wrapperAction() {
$phorumPath = APPLICATION_PATH."../ext/phorum/";
$file = realpath($phorumPath . $this->getRequest()->getParam('page');
if (!$file || !is_file($file)) throw new Exception("File not found");
// disable default viewRenderer - layout should still render at this point
$this->_helper->viewRenderer->setNoRender(true);
// determine extension to determine mime-type
preg_match("#\.([^.]+)$#", $filename, $matches);
switch (strtolower($matches[1]))
{
case "php":
// patch the request over to phorum
include($file);
return; // exit from the rest of the handler, which deals specifically
// with other types of files
case "js":
$this->getResponse()->setHeader('Content-Type', 'text/javascript');
ini_set('html_errors', 0);
break;
case "css":
$this->getResponse()->setHeader('Content-Type', 'text/css');
ini_set('html_errors', 0);
break;
case "html":
$this->getResponse()->setHeader('Content-Type', 'text/html');
break;
// you get the idea... add any others like gif/etc that may be needed
default:
$this->getResponse()->setHeader('Content-Type', 'text/plain');
ini_set('html_errors', 0);
break;
}
// Disable Layout
$this->_helper->layout->disableLayout();
// Sending 304 cache headers if the file hasn't changed can be a bandwidth saver
$mtime = filemtime($fn);
if ($modsince = $this->getRequest()->getServer('HTTP_IF_MODIFIED_SINCE'))
{
$modsince = new Zend_Date($modsince);
$modsince = $modsince->getTimestamp();
if ($mtime <= $modsince) {
$this->getResponse()->setHttpResponseCode(304);
return;
}
}
$this->getResponse()->setHeader('Last-Modified', gmdate("D, d M Y H:i:s",$mtime). " GMT");
readfile($fn);
}
Please - Make sure to test this code for people trying to craft requests with .., etc in the page.