I'm writing a Chrome extension that injects scripts to the Google's search result page and modified all the results' anchor elements.
My problem is that the results are rendered asynchronously and are not shown in the page on document load/ready.
I had 2 initial solutions which don't work:
Set a timeout: Bad practice but it works. Nevertheless, might show inconsistent results so I prefer to avoid this solution.
Bind to 'DOMNodeInserted'. Generally works, but more complicated in my case because I insert new nodes my self before the anchors, which triggers a recursion. I can insert code to avoid it if the anchor is already 'tagged', but again, this solution is bad since I need to traverse all the anchors each time a node is inserted - from what I checked this happens more than 140 times in the search result page.
Is there any kind of custom event Google trigger on the search results page? Is there any other DOM event that can work in this case?
You are right in that using "DOMNodeInserted" is not a good approach. If nothing else, it is part of the obsolete Mutation Events API, which has been deprecated (among other reasons) for being notoriously inefficient.
It has been replaced by the MutationObserver API, so this is what you should use instead. You can utilize a MutationObserver to observe "childList" DOM mutations on a root node and its descendants.
(If you choose this approach the mutation-summary library might also come in handy.)
After a (really shallow) search, I found out that (at least for me) Google places its results in a div with id search. Below is the code of a sample extension that does the following:
Registers a MutationObserver to detect the insertion of div#search into the DOM.
Registers a MutationObserver to detect "childList" changes in div#search and its descendants.
Whenever a <a> node is added, a function traverses the relevant nodes and modifies the links. (The script ignores <script> elements for obvious reasons.)
This sample extension just encloses the link's text in ~~, but you can easily change it to do whatever you need.
manifest.json:
{
"manifest_version": 2,
"name": "Test Extension",
"version": "0.0",
"content_scripts": [{
"matches": [
...
"*://www.google.gr/*",
"*://www.google.com/*"
],
"js": ["content.js"],
"run_at": "document_end",
"all_frames": false
}],
}
content.js:
console.log("Injected...");
/* MutationObserver configuration data: Listen for "childList"
* mutations in the specified element and its descendants */
var config = {
childList: true,
subtree: true
};
var regex = /<a.*?>[^<]*<\/a>/;
/* Traverse 'rootNode' and its descendants and modify '<a>' tags */
function modifyLinks(rootNode) {
var nodes = [rootNode];
while (nodes.length > 0) {
var node = nodes.shift();
if (node.tagName == "A") {
/* Modify the '<a>' element */
node.innerHTML = "~~" + node.innerHTML + "~~";
} else {
/* If the current node has children, queue them for further
* processing, ignoring any '<script>' tags. */
[].slice.call(node.children).forEach(function(childNode) {
if (childNode.tagName != "SCRIPT") {
nodes.push(childNode);
}
});
}
}
}
/* Observer1: Looks for 'div.search' */
var observer1 = new MutationObserver(function(mutations) {
/* For each MutationRecord in 'mutations'... */
mutations.some(function(mutation) {
/* ...if nodes have beed added... */
if (mutation.addedNodes && (mutation.addedNodes.length > 0)) {
/* ...look for 'div#search' */
var node = mutation.target.querySelector("div#search");
if (node) {
/* 'div#search' found; stop observer 1 and start observer 2 */
observer1.disconnect();
observer2.observe(node, config);
if (regex.test(node.innerHTML)) {
/* Modify any '<a>' elements already in the current node */
modifyLinks(node);
}
return true;
}
}
});
});
/* Observer2: Listens for '<a>' elements insertion */
var observer2 = new MutationObserver(function(mutations) {
mutations.forEach(function(mutation) {
if (mutation.addedNodes) {
[].slice.call(mutation.addedNodes).forEach(function(node) {
/* If 'node' or any of its desctants are '<a>'... */
if (regex.test(node.outerHTML)) {
/* ...do something with them */
modifyLinks(node);
}
});
}
});
});
/* Start observing 'body' for 'div#search' */
observer1.observe(document.body, config);
In general, you can use Mutation Observers to listen for document changes. To avoid recursion, simply disconnect the mutation observer before changing the document, then enable it again.
Conceptually, it is not much different from the DOMNodeInserted event, so you can also remove the event listener, insert your nodes, then rebind the event listener. However, Mutation observers are more efficient, so you should use these instead of the DOM mutation events.
In this specific case (Google's search results), you can also use the hashchange event to detect when Google has rendered new search results. This method is only useful because there's a correlation between the location fragment, the search terms and the search results:
The user enters a search term Enter
The search results are updated.
The location fragment is changed (https://www.google.com/search?q=old#q=<new term>).
Example:
// On document load
printResult();
// Whenever the search term is changed
window.addEventListener('hashchange', function(event) {
printResult();
});
function printResult() {
// Example: Print first search result
console.log(document.querySelector('h3 a').href);
}
Related
In short: which is most memory + cost efficient way to use Firestore snapshot listeners, unmount them always at screen unmount or have the unsubscribe function in context and unmount when whole site "unmounts"?
Lets say in home screen I use snapshot listener for collection "events" which has 100 documents. Now I navigate through the site and return to home screen 2 more times during using the site. In this case which is better memory and cost efficiently wise (is there also other things to consider) and is there drawbacks?
to mount and unmount the listener on each mount and unmount of the home screen.
to mount on home screen and to unmount in whole site "unmount" (for example using window.addEventListener('beforeunload', handleSiteClose).
The usage of first is probably familiar with most but usage of the second could be done with something like this:
-Saving listener unsubscribe function in context with collection name as key:
const { listenerHolder, setListenerHolder } = DataContext();
useEffect(() => {
const newListeners = anyDeepCopyFunction(listenerHolder);
const collection = 'events';
if (listenerHolder[collection] === undefined) {
//listenerBaseComponent would be function to establish listener and return unsubscribe function
const unSub = listenerBaseComponent();
if (unSub)
newListeners[collection] = unSub;
}
if (Object.entries(newListeners).length !== Object.entries(listenerHolder).length) {
setListenerHolder(newListeners);
}
}, []);
-Unmounting all listeners (in component that holds inside of it all screens and is unmounted only when whole site is closed):
const { listenerHolder, setListenerHolder } = DataContext();
const handleTabClosing = () => {
Object.entries(listenerHolder).forEach(item => {
const [key, value] = item;
if (typeof value === 'function')
value();
});
setListenerHolder({});
}
useEffect(() => {
window.addEventListener('beforeunload', handleTabClosing)
return () => {
window.removeEventListener('beforeunload', handleTabClosing)
}
})
In both cases the home screen is showing most recent from "events" collection, but in my understanding...
-The first approach creates listener 3 times to collection "events" and so 3 x 100 read operations are done.
-The second approach creates listener 1 time to collection "events" and so 1 x 100 read operations are done.
If storing the unsubscribe function to context is possible and all listener unsubscribtions are handled at once in site unmount or in logout, doesn't this make using it this way super easy, more maintainable and more cost efficient? If I would need to see data from "events" collection in any other screen I would not have to do get call / create a new listener, because I would always have latest data from "events" when site is used. Just check if there is (in this case) collection name as key in global state "listenerHolder", and if there is, there would be most up to date data always for events.
Since there wasn't information from others about this use case I made some testing myself jumping from this "homescreen" to another screen back and forth multiple times. "Homescreen" has about 150 items and second screen 65.
The results are from Firebase, Cloud Firestore usage tab:
This is the result of reads from that jumping: 654(1.52pm-1.53pm) + 597(1.53pm-1.54pm) = 1251 reads
Now I tested the same jumping back and forth when using global context listeners: 61(1.59pm-2.00pm) + 165(2.00pm-2.01pm) = 226 reads
So using listeners in global context will result significantly less reads. This is depending how many times new listeners (in normal use case) would need to be recreated.
I have not yet tested well enough memory usage comparing these two cases. But if I test it, I will add the results here for others to benefit.
The deprecated #cloudant/cloudant is replaced by ibm-cloud/cloudant package. In former I was using following code snippet
const feed = dummyDB.follow({ include_docs: true, since: 'now'})
feed.on('change', function (change) {
console.log(change)
})
feed.on('error', function (err) {
console.log(err)
})
feed.filter = function (doc, req) {
if (doc._deleted || doc.clusterId === clusterID) {
return true
}
return false
}
Could you share a code for which I can get feed.on event listener similar to above code in new npm package ibm-cloud/cloudant.
There isn't an event emitter for changes in the #ibm-cloud/cloudant package right now. You can emulate the behaviour by either:
polling postChanges (updating the since value after new results) and processing the response result property, which is a ChangesResult. That in turn has a results property that is an array of ChangesResultItem elements, each of which is equivalent to the change argument of the event handler function.
or
call postChangesAsStream with a feed type of continuous and process the stream returned in the response result property, each line of which is a JSON object that follows the structure of ChangesResultItem. In this case you'd also probably want to configure a heartbeat and timeouts.
In both cases you'd need to handle errors to reconnect in the event of network glitches etc.
I'm wrapping an API that emits events in Observables and currently my datasource code looks something like this, with db.getEventEmitter() returning an EventEmitter.
const Datasource = {
getSomeData() {
return Observable.fromEvent(db.getEventEmitter(), 'value');
}
};
However, to actually use this, I need to both memoize the function and have it return a ReplaySubject, otherwise each subsequent call to getSomeData() would reinitialize the entire sequence and recreate more event emitters or not have any data until the next update, which is undesirable, so my code looks a lot more like this for every function
const someDataCache = null;
const Datasource = {
getSomeData() {
if (someDataCache) { return someDataCache; }
const subject = new ReplaySubject(1);
Observable.fromEvent(db.getEventEmitter(), 'value').subscribe(subject);
someDataCache = subject;
return subject;
}
};
which ends up being quite a lot of boilerplate for just one single function, and becomes more of an issue when there are more parameters.
Is there a better/more elegant design pattern to accomplish this? Basically, I'd like that
Only one event emitter is created.
Callers who call the datasource later get the most recent result.
The event emitters are created when they're needed.
but right now I feel like this pattern is fighting the Observable pattern, resulting a bunch of boilerplate.
As a followup to this question, I ended up commonizing the logic to leverage Observables in this way. publishReplay as cartant mentioned does get me most of the way to what I needed. I've documented what I've learned in this post, with the following tl;dr code:
let first = true
Rx.Observable.create(
observer => {
const callback = data => {
first = false
observer.next(data)
}
const event = first ? 'value' : 'child_changed'
db.ref(path).on(event, callback, error => observer.error(error))
return {event, callback}
},
(handler, {event, callback}) => {
db.ref(path).off(event, callback)
},
)
.map(snapshot => snapshot.val())
.publishReplay(1)
.refCount()
Say you have a file with:
AddReactImport();
And the plugin:
export default function ({types: t }) {
return {
visitor: {
CallExpression(p) {
if (p.node.callee.name === "AddReactImport") {
// add import if it's not there
}
}
}
};
}
How do you add import React from 'react'; at the top of the file/tree if it's not there already.
I think more important than the answer is how you find out how to do it. Please tell me because I'm having a hard time finding info sources on how to develop Babel plugins. My sources right now are: Plugin Handbook,Babel Types, AST Spec, this blog post, and the AST explorer. It feels like using an English-German dictionary to try to speak German.
export default function ({types: t }) {
return {
visitor: {
Program(path) {
const identifier = t.identifier('React');
const importDefaultSpecifier = t.importDefaultSpecifier(identifier);
const importDeclaration = t.importDeclaration([importDefaultSpecifier], t.stringLiteral('react'));
path.unshiftContainer('body', importDeclaration);
}
}
};
}
If you want to inject code, just use #babel/template to generate the AST node for it; then inject it as you need to.
Preamble: Babel documentation is not the best
I also agree that, even in 2020, information is sparse. I am getting most of my info by actually working through the babel source code, looking at all the tools (types, traverse, path, code-frame etc...), the helpers they use, existing plugins (e.g. istanbul to learn a bit about basic instrumentation in JS), the webpack babel-loader and more...
For example: unshiftContainer (and actually, babel-traverse in general) has no official documentation, but you can find it's source code here (fascinatingly enough, it accepts either a single node or an array of nodes!)
Strategy #1 (updated version)
In this particular case, I would:
Create a #babel/template
prepare that AST once at the start of my plugin
inject it into Program (i.e. the root path) once, only if the particular function call has been found
NOTE: Templates also support variables. Very useful if you want to wrap existing nodes or want to produce slight variations of the same code, depending on context.
Code (using Strategy #1)
import template from "#babel/template";
// template
const buildImport = template(`
import React from 'react';
`);
// plugin
const plugin = function () {
const importDeclaration = buildImport();
let imported = false;
let root;
return {
visitor: {
Program(path) {
root = path;
},
CallExpression(path) {
if (!imported && path.node.callee.name === "AddMyImport") {
// add import if it's not there
imported = true;
root.unshiftContainer('body', importDeclaration);
}
}
}
};
};
Strategy #2 (old version)
An alternative is:
use a utility function to generate an AST from source (parseSource)
prepare that AST once at the start of my plugin
inject it into Program (i.e. the root path) once, only if the particular function call has been found
Code (using Strategy #2)
Same as above but with your own compiler function (not as efficient as #babel/template):
/**
* Helper: Generate AST from source through `#babel/parser`.
* Copied from somewhere... I think it was `#babel/traverse`
* #param {*} source
*/
export function parseSource(source) {
let ast;
try {
source = `${source}`;
ast = parse(source);
} catch (err) {
const loc = err.loc;
if (loc) {
err.message +=
"\n" +
codeFrameColumns(source, {
start: {
line: loc.line,
column: loc.column + 1,
},
});
}
throw err;
}
const nodes = ast.program.body;
nodes.forEach(n => traverse.removeProperties(n));
return nodes;
}
Possible Pitfalls
When a new node is injected/replaced etc, babel will run all plugins on them again. This is why your first instrumentation plugin is likely to encounter an infinite loop right of the bet: you want to remember and not re-visit previously visited nodes (I'm using a Set for that).
It gets worse when wrapping nodes. Nodes wrapped (e.g. with #babel/template) are actually copies, not the original node. In that case, you want to remember that it is instrumented and skip it in case you come across it again, or, again: infinite loop 💥!
If you don't want to instrument nodes that have been emitted by any plugin (not just yours), that is you want to only operate on the original source code, you can skip them by checking whether they have a loc property (injected nodes usually do not have a loc property).
In your case, you are trying to add an import statement which won't always work without the right plugins enabled or without program-type set to module.
I believe there's an even better way now: babel-helper-module-imports
For you the code would be
import { addDefault } from "#babel/helper-module-imports";
addDefault(path, 'React', { nameHint: "React" })
I have paged interface. Given a starting point a request will produce a list of results and a continuation indicator.
I've created an observable that is built by constructing and flat mapping an observable that reads the page. The result of this observable contains both the data for the page and a value to continue with. I pluck the data and flat map it to the subscriber. Producing a stream of values.
To handle the paging I've created a subject for the next page values. It's seeded with an initial value then each time I receive a response with a valid next page I push to the pages subject and trigger another read until such time as there is no more to read.
Is there a more idiomatic way of doing this?
function records(start = 'LATEST', limit = 1000) {
let pages = new rx.Subject();
this.connect(start)
.subscribe(page => pages.onNext(page));
let records = pages
.flatMap(page => {
return this.read(page, limit)
.doOnNext(result => {
let next = result.next;
if (next === undefined) {
pages.onCompleted();
} else {
pages.onNext(next);
}
});
})
.pluck('data')
.flatMap(data => data);
return records;
}
That's a reasonable way to do it. It has a couple of potential flaws in it (that may or may not impact you depending upon your use case):
You provide no way to observe any errors that occur in this.connect(start)
Your observable is effectively hot. If the caller does not immediately subscribe to the observable (perhaps they store it and subscribe later), then they'll miss the completion of this.connect(start) and the observable will appear to never produce anything.
You provide no way to unsubscribe from the initial connect call if the caller changes its mind and unsubscribes early. Not a real big deal, but usually when one constructs an observable, one should try to chain the disposables together so it call cleans up properly if the caller unsubscribes.
Here's a modified version:
It passes errors from this.connect to the observer.
It uses Observable.create to create a cold observable that only starts is business when the caller actually subscribes so there is no chance of missing the initial page value and stalling the stream.
It combines the this.connect subscription disposable with the overall subscription disposable
Code:
function records(start = 'LATEST', limit = 1000) {
return Rx.Observable.create(observer => {
let pages = new Rx.Subject();
let connectSub = new Rx.SingleAssignmentDisposable();
let resultsSub = new Rx.SingleAssignmentDisposable();
let sub = new Rx.CompositeDisposable(connectSub, resultsSub);
// Make sure we subscribe to pages before we issue this.connect()
// just in case this.connect() finishes synchronously (possible if it caches values or something?)
let results = pages
.flatMap(page => this.read(page, limit))
.doOnNext(r => this.next !== undefined ? pages.onNext(this.next) : pages.onCompleted())
.flatMap(r => r.data);
resultsSub.setDisposable(results.subscribe(observer));
// now query the first page
connectSub.setDisposable(this.connect(start)
.subscribe(p => pages.onNext(p), e => observer.onError(e)));
return sub;
});
}
Note: I've not used the ES6 syntax before, so hopefully I didn't mess anything up here.