Detecting changes in a web page's content - diff

I'm looking for an approach (not a library or framework, as I can't seem to find one) to detecting changes in a web pages content. I've taken a look at posts similar to Tracking changes to web page content, and white papers (http://shodhganga.inflibnet.ac.in/bitstream/10603/2415/14/14_chapter%205.pdf), but having trouble figuring out a good approach.
I don't believe an md5 of a page's content is useful, as the content of most pages change slightly depending on when you request it (eg. if they hardcode the day's date).
Additionally, I'd like to figure out a way to determine what content has actually changed (eg. running a diff on the content that is seen as different is good enough, but I would first need to figure out which content is different).
http://www.changedetection.com/ seems to do a solid job of this.
Any approaches or ideas or links would be appreciated.
Thanks.

The HTTP Last-Modified and ETag headers may be a way to go. However, if the web pages you crawl do not implement them, you are left with text similarity detection.

Related

Extracting data from background - Facebook

I am inspecting a page on Facebook now. These describers indicate that some interesting data is being gathered on the users, but I can't find it.
Is there a way to extract the data? (i.e. userborn, males, females, gpslocation) from the response below:
["user","page","group","app","event","friendlist","shortcut"],"browse_functions":{"intersect":{"numParamsUnbounded":true,"minNumParams":1,"maxNumParams":100,"allowsFreeText":false},"fuzzy-intersect":{"numParamsUnbounded":true,"minNumParams":1,"maxNumParams":100,"allowsFreeText":false},"union":{"numParamsUnbounded":true,"minNumParams":1,"maxNumParams":100,"allowsFreeText":false},"fbids":{"numParamsUnbounded":true,"minNumParams":1,"maxNumParams":100,"allowsFreeText":false},"story-fbids":{"numParamsUnbounded":true,"minNumParams":1,"maxNumParams":100,"allowsFreeText":false},"all":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"pages":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"present":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":1,"allowsFreeText":false},"past":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":1,"allowsFreeText":false},"future":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":1,"allowsFreeText":false},"ever":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":1,"allowsFreeText":false},"ever-past":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":1,"allowsFreeText":false},"class":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"date":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"after":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"before":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"duration-past":{"numParamsUnbounded":false,"minNumParams":2,"maxNumParams":2,"allowsFreeText":false},"duration-future":{"numParamsUnbounded":false,"minNumParams":2,"maxNumParams":2,"allowsFreeText":false},"users-age":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"users-younger":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"users-older":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"users-born":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"users-interested":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-named":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":true},"users-birth-place":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"females":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":0,"allowsFreeText":false},"males":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":0,"allowsFreeText":false},"members":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"friends":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"online-friends":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"non-friends":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"acquaintances":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"close-friends":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"restricted-friends":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"followers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-followed":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"creators":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"admins":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"contacts":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"groups":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"non-groups":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"groups-privacy":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"groups-named":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":true},"groups-about":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"communities":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"communities-named":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":true},"relatives":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"siblings":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"brothers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"sisters":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"parents":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"fathers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"mothers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"children":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"sons":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"daughters":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"aunts":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"uncles":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"nieces":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"nephews":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"cousins":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"grandchildren":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"grandsons":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"granddaughters":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"grandparents":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"grandmothers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"grandfathers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepsiblings":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepsisters":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepbrothers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepparents":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepfathers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepmothers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepchildren":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepdaughters":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"stepsons":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"sisters-in-law":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"brothers-in-law":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"fathers-in-law":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"mothers-in-law":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"sons-in-law":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"daughters-in-law":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"partners":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"boyfriends":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"girlfriends":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-any-relationship":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-dating":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-relationship":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-open-relationship":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"spouses":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"fiances":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-its-complicated":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-civil-union":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"domestic-partners":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"wives":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"husbands":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"students":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":7,"allowsFreeText":false},"employees":{"numParamsUnbounded":false,"minNumParams":0,"maxNumParams":5,"allowsFreeText":false},"major":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"degree":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"job":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"schools-attended":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":4,"allowsFreeText":false},"school-location":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"high-schools-attended":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":4,"allowsFreeText":false},"colleges-attended":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":4,"allowsFreeText":false},"grad-schools-attended":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":4,"allowsFreeText":false},"employer-location":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"employers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":3,"allowsFreeText":false},"residents-near":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":4,"allowsFreeText":false},"home-residents":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"hometowns":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"residents":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":2,"allowsFreeText":false},"current-cities":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"current-regions":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"current-countries":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-of-nationalities":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"nationalities":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"speakers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"languages":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"likers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"exact-page-likers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"job-liker-union":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"listeners":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"readers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"watchers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"actors":{"numParamsUnbounded":false,"minNumParams":2,"maxNumParams":2,"allowsFreeText":false},"page_raters":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"commenters":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-religious-view":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-political-view":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"admirers":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"religious-views":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"political-views":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"visitors":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"users-checked-in":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"places-checked-in":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"places-visited":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"recent-places-visited":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"places-recommended-for":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"places-reviewed":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"pages-in":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"places":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"places-in":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"places-near":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":4,"allowsFreeText":false},"places-liked":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":false},"places-named":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":true},"places-near-address":{"numParamsUnbounded":false,"minNumParams":1,"maxNumParams":1,"allowsFreeText":true},"gps-location":
Sorry, but there isn't any information in there. You just copied past some bunch of useless codes.
There is no hidden information, those hashes are only used for their needs, and there's nothing we can actually do to fully analyse it.
Well, we can obviously parse those data, but what will it serve to? You don't even know what does for instance numParamsUnbounded means. What do you want to do when you got it parsed? You'll just got the same thing.
Between, it's not HTLM, but HTML, and this code isn't even HTML.
#Jake_Mill22 these are parameters utilized on the Facebook Graph and it's search. It's basically a graph traversal-based search that goes on behind the scenes, and is also based on Set theory. Your search consists of some filterings on entities (people, places, pages, groups, etc) by their semantic relationships to one another. For example:
https://www.facebook.com/search/str/rock%20music/pages-named/likers/str/stack%20overflow/pages-named/likers/intersect
Says show me all people who both like pages on Facebook with "rock music" in their names who also like other pages with "Stack Overflow" in their names.
You can put together some pretty interesting queries. Check out here for more info here:
http://booleanblackbelt.com/2015/02/important-facebook-graph-search-developments/

per-paragraph commenting system

I'm very interested in the emerging trend of comments-per-paragraph systems (also called "annotations systems"), such as the ones implemented by medium.com and qz.com and i'm looking at the idea of developing one for my own.
Question: it seems they are mainly implemented via javascript, that runs through the text's html paragraphs uniquely identified by an id attribute (or, in the case of Medium, a name attribute). Does it mean their CMS actually store each paragraph as a separate entry in the database? Seems overly complex to me, but otherwise, how do they manage the fact that a paragraph can be deleted, edited or moved around in the overall text? How would the unique id be preserved if the author changes the paragraph?
How is that unique id logically structured? (post_id + position_in_post)?
Thank you for your insights...
I can't speak to the medium side, but as one of the developers for Quartz, I can give insight into how qz.com annotations work.
The annotations code is custom php code and is independent of the CMS for publishing articles (wordpress VIP). We do indeed store a reference to each paragraph as a row in the database, in order to track any updates to the article content. We call this an annotation thread and when a user saves an annotation the threadId gets stored along with the annotation.
We do not have a unique id stored on the wordpress side for each paragraph, instead we store the paragraphs relative position in that article (nodeIndex ā€œ3" and nodeSelector ā€œpā€ == the third p-tag in the content body for a given article) and the javascript determines where exactly to place the annotation block. We went this route to avoid heavier customizations on the wordpress side, though depending on your CMS it may be easier to address this directly in the CMS code and add unique ids in the html before sending to the client.
Every time an update to an article is published, each paragraph in the updated article is compared against what was previously stored with the annotation threads for that article. If the position and paragraph text do not match up, it attempts to find the paragraph that is the closest match and update the row for that thread and new threads are created and deleted where appropriate. All of this is handled server side whenever changes are published to an article.
A couple of alternate implications that are also worth looking at are Gawker's Kinja text annotations (currently in use on Jalopnik) and the word-for-word annotations of rapgenius.com.
(disclaimer: I'm a factlink dev.)
I work for a company trying to allow per-paragraph (or per-phrase) commenting on arbitrary sites. Essentially, you've got two choices to identify the anchor of a comment.
Remember the structure of the page (e.g. some path from a root to a paragraph), and place comments at the same position next time.
Identify the content of the paragraph and place comments near identical or similar content next time.
Both systems have their downsides, but you pretty much need to go with option 2 if you want a robust system. Structural identification is fragile in the face of changing structure. Especially irrelevant changes such theming or the precise html tags used can significantly impact the "path". When that happens, you really can't fix it - unless you inspect the content, i.e. option (2).
Sam describes what comes down to a server-side content-based in his answer. Purely client-side content-based matching is what factlink and (IIRC) hypothesis use. Most browsers support non-standard but fast substring search in page content using either window.find or TextRange.findText. Alternatively, you could walk the DOM, which is slower but gives you the flexibility to implement (e.g.) fuzzy matching.
It may seem like client-side matching is overkill or complex, but really, it's simpler: it's a very robust way to decouple your content-management from your commenting. Neither is really simple, so decoupling those concerns can be a win.
I had created a fiddle on the same lines to demonstrate power of JQuery during a training session.
http://fiddle.jshell.net/fotuzlab/Lwhu5/
Might help as a starting point along with Sam's detailed and useful insights. You get the value of textfield in Jquery function where you can send it across to your CMS using ajax/APIs.
PS: The function is not production ready. Its only meant as a starting point. A little tweaking will make it usable.
I've recently published a post on how to do this with WordPress building on an existing plugin.
Like qz.com, I assign paragraph ids on the client and then provide that info to WordPress to store as comment meta when a new comment is created. I used hashing of the paragraph text to create the id which means that the order of paragraphs is unimportant but does mean that if a paragraph is edited then any associated comments become orphaned.
At first I thought this was an issue but thinking about it, if a reader comments on a paragraph then editing that text subsequently seems a little sneaky.
The code is freely available on GitHub if you feel like forking it and enhancing it.
There is one other wordpress plugin called "commentpress" which exist since a long time.
I use an old version of this plugin for my blog and it's work very well.
You can choose to comment per lines or per paragraphs, and ergonomics is really thinking!
A demo here:
http://futureofthebook.org/
and all the code is on github:
https://github.com/IFBook/commentpress-core
After a quick look on the code, it seems they use the second approch like #Eamon Nerbonne explains on his answer.
They parse each paragraphs to make a signature based on the first char of each words. Here is the function to do that.
In case someone comes looking in here, I've implemented a medium like functionality as a Django app.
It is open source and can be found as package on Pypi, and on github.
I used one of my other apps, blogging to allocate unique Paragraph IDs to each content object (currently we're only looking at <p> tags) and puts uses some extra internal meta data in the backend while storing it in DB (MySQL currently, but what we've done is JSONed the Blob, this method is more natively suited for a document oriented DBs). The frontend is mainly jQuery driven with REST API plugging the backend with the frontend.
I took cues from this post, but then rejected the creation of some kind of digest value from paragraph because content can change. What I wanted was to preserve the annotations as long as the paragraph was not completely over-written. In the complete over-write case, I provided for collection of the annotations in an orphaned bucket.
More in these tutorials
A legacy version of the same is running on those tutorials pages, that was the first revision. (But you won't be able to post without logging in, but you could always login using social accounts to check it out :-) )

[MoinMoin Wiki] Show only specific pages in RecentChanges page

How can I restrict the RecentChanges page in MoinMoin that only specific pages would be listed.
Since I have a huge Wiki site and pages are organized in a hierarchical way, I would like to let RecentChanges page show only limited pages according to different hierarchical page paths.
That's not possible with moin 1.x.
There is even a little reason for it (which might not apply in your case maybe, but applies in the general case):
Wikis live from so-called soft-security: people are seeing changes via RecentChanges and will look at them, reverting any bad changes they see (e.g. spam, malicous edit, stuff gone wrong, people playing around at wrong places, etc.).
If you reduce what they see, soft-security gets degraded as everybody is just looking at his own stuff.

Dynamic web site plus decoupled content delivery from CMS

I have a web site project, a mixture of complex dynamic pages and authored CMS-managed content. I have the tools for the complex dynamic part and would like a CMS that allows me to call it to retrieve content that's been approved, i.e. for web site inclusion.
To be clear, I need the complex dynamic part to be the master and the CMS-managed content to be served up as and when I want it.
I had thought they'd be loads of options around this - it being an obvious (to me) thing to want to do. I'd also thought that CMS's would naturally publish API's (web service based ideally) to enable this...but my research so far doesn't seem to show this. Hopefully I'm just missing a trick. Can anyone help?
I've looked, btw, at openText, Alfresco, Jahia, Enfold, Percussion, Interwoven, EPIServer, Ektron to name a few.
Ideally, I'd like an open source CMS solution if there is one, definitely can't afford the big $ that some of the vendors are looking for.
Am I right in assuming you are wanting to use an API or Service to retrieve content from the CMS that has been through some approval process?
This is definately possible with EPiServer, through either the code API or, if more appropriate, a webservice, although I think the price might be an issue here.

How to create a deliverable for a front-end engineer?

This is a question about the development workflow of front end engineers. I am starting a project for a rather large site with lots of pages, each page has multiple steps, and it's very difficult to lay out all the content in a spreadsheet.
The content of each page will be delivered in a spreadsheet cell, and some pages have multiple variable section that are determined by user's preferences.
I was asked my opinion about how to structure the deliverable. I am wondering if there is a best practice out there for structuring this kind of deliverable? Because when you have a poorly structured deliverable it can be almost as mindnumbing as using pen-and-pencil to write code.
Do you have any tools, formats, practices for creating deliverables that are easy to work with?
It sounds like you are just doing the UI design and then giving it to the front-end engineers.
If that is correct, I would suggest that you see if you can do the rough html/css work to get the page to look as you want, and then they can go in and give it the functionality, but that way you have an idea what is possible.
You can do much of the work, then leave comments about trying to center something a bit better, for example.
I am not a big fan of just getting the design on paper or as an image, it would be easier to just get the html/css.
There are plenty of tools now that make css and html easy to do, even if you have the css inside the html, they can separate the two, but, it would be a huge help to the designers.
Just do one page, and give it to them, and then come back in a day or two and get feedback as to what their thoughts are, and how you can improve what you give them.
As you go through this process, after a while both groups will know what to expect and you can get the rest done quickly.
This is more of an agile methodology with the front-end engineers as your customers.
My suggestion would be mockups or wireframes for the pages. Mockups would be examples of the pages in various states while the wireframe is a detailed document of the structure of the page.
HTML and CSS is way too complicated for mockup use. I usually first create a requirement backlog for UI/functionalities as well (just a list of priorized reqs in Excel).
Especially for a large site development you should also have the process and data flow definitions done (UML or other way of description) to help you define the mentioned requirements.
Based on these you will know what kind of steps does the whole site funcionality need (i.e. pages) and what the page hierarchy and structure will be like. This way it's much easier to get a grasp of the whole thing.
After that we'll create fast wireframes and visualize the end result with fast mockups done as images with Photoshop or similar. These are absolutely vital in my experience as it helps the customer (and other stakeholders) to actually understand what is beind done. For this the html and css are simply too slow to run multiple iterations with.