REST pagination content duplicates - rest

When creating REST application which will return a collection of items (topic with collection of posts) with sorting from new to old ones.
If there will be HATEOAS principles performed and all content will be chunked on pages client will get a current page id, offset, data limits and links to first, current and next page for example.
There is no problem to get data from next page, but if somebody has been added content while client is reading current page - data will be pushed on the start of collection and last item of current page will be moved to the next page.
If you will just skip posts which already has been loaded before, you will get lower amount of items on the next page. There is a way to get a count of pushed items in start of list and increment offset.
What is a best practices for this?

Not using offsets indexes, but instead skip tokens that indicate the first value not to include (or first value to include) is a good technique provided the value can be unique for every item in your result set and is an orderable field based on the current sort. But it's not flawless. Usually this just doesn't matter.
If it really does matter you have to put IDs of everything that's in the first page in the call to 2nd page, and again and again. HATEOAS helps you do stuff like this...but it can get very messy and still things can pop up on page 1 given the current sorting when you make a request for page 5...what do you do with that?
Another trick to avoid dupes in a UI is to use the self or canonical link relationships to uniquely identify resources in a page and compare those to existing resources in the UI. Updating the UI with the latest matching resources is usually a simple task. This of course puts some burden on the client.
There is not a 1 size fits all solution to this problem. You have to design for the UX you intend to fulfill.


TYPO3: export list of pages with info about last editor

I need to create an export of all pages and get information about the last modification date and who did the last modification.
Last modification date is easy: that's stored in SYS_LASTCHANGED in the pages table. But how can I find the information about who did the change? For changes to pages that's easy as well, as I can check sys_log and sys_history and search the pid in the field recuid in combination with tablename. For tt_content records I can do a select first to get all records for a specific page. And then search recuid and tablename accordingly.
But how about all the other records, e.g. from plugins? Do I really need to iterate over all tables and the pid field to get all possible records?
I think you have a lot of work to do, as the information SYS_LASTCHANGED is not very trustworthy.
Depending on the rendering process multiple records are used for building up a page. and more and more information does not come from records but from files, which are versioned by chance in a git repository (or similar).
Just in case all(!) information is stored in records (tsconfig, typoscript, fluid templates in records (sys_file for files)). There are thousands of ways a page is generated and which information influences the current page rendering.
just some examples:
the typoscript is changed. does it result in a rendering change of the current page?
a record (e.g. tt_content) is deleted. so this record will no longer show in the page. will you consider all invisible (delted, hidden, time restricted) records?
TYPO3 has constructions like: show content from page X, show content records X,Y,Z from other pages
translations: does a change in another language than the current is a change to the current language (a fallback might be possible)
if you consider links in menus or text: if another page which is linked in the current page gets disabled TYPO3 does not generate a link any longer

Moodle-progress bar

In moodle,I could see the default course progress for the courses in the moodle on the front end. But when tried to show the progress like 10% completed when chapter1 gets completed, 20% completed when chapter2 gets completed and so on. I could not find any module or could not figure out how to modify the code.
In other words:1. How to track the progress of course completion based on course subsections completion? Because default tracking based on courses based only.2. It is possible to track the courses without (refer ticking the course completion checkbox?3. Based on the URL viewing of course sections, is it possible to track the course progress?Thanks in advance.
You can sometimes track specific page views and interactions via the mdl_logstore_standard_log table. Different modules/activities in Moodle log different types/amounts of data, but views of typical course topics/sections are usually logged regardless of completion.
For example, imagine a course with id=10 where you visit section/topic 3. The URL usually looks something like this: <yourdomain>/course/view.php?id=10&section=3
In this case, the view should be logged in mdl_logstore_standard_log with an eventname value of \core\event\course_viewed. The course id should be in the courseid column and the section viewed should be in the "other" column, although that data is an array stored with PHP serialization, so it's helpful to use unserialize and array parsing functions to get the "3" quickly if needed.
Again, keep in mind different activities/modules log data differently - for example, an assignment activity is logged differently - but hope this helps you find what you need. Good luck!

Google Search Appliance (GSA) feeds - unpredictable behavior

We have a metadata-and-url feed and a content feed in our project. The indexing behaviour of the documents submitted using either feed is completely unpredictable. For the content feed, the documents get removed from the index after a random interval every time. For the metadata-and-url feed, the additional metadata we add is ignored, again randomly. The documents themselves do remain in index in the latter case - only our custom metadata gets removed. Basically, it looks like the feeds get "forgotten" by GSA after sometime. What could be the cause of this issue, and how do we go about debugging this?
Points to note:
1) Due to unavoidable reasons, our GSA index is always hovering around the license limit (+/- 1000 documents or so). Could this have an effect? Are feeds purged when nearing license limit? We do have "lock = true" set in the feed records though.
2) These fed documents are not linked to from pages and hence (I believe) would have low page rank. Are feeds automatically purged if not linked to from pages?
3) Our follow patterns include the fed documents.
4) We do not use action=delete with the same documents, so that possibility is ruled out. Also for the content feed we always post all the documents. So they are not removed through feeds.
When you hit the license limit the GSA will start dropping documents from the index so I'd say that's definitely your problem.

Regenerating CouchDB views

Contrived example:
productName: 'Lost Series 67 DVD',
availableFrom: '19/May/2011',
availableTo: '19/Sep/2011'
View storeFront/currentlyAvailableProducts basically checks if current datetime is within availableFrom - availableTo and emits the doc.
I would like to force a view to regenerate at 1am every night, i.e. process/map all docs.
At first I had a simple python script scheduled via crontab that touched each document hence causing a new revision and the view to update,however since couchdb is append only this wasnt very efficient - i.e. loads of unnecessary IO and disk space usage followed by compaction, very resource wasteful on all fronts.
Second solution was to push the view definition again via couchapp push however this meant the view was unavailable (or partially unavailable) for several minutes which was also unacceptable.
Is there any other solutions?
Will's answer is great; but just to get the consensus viewpoint represented here:
Keep one view, and query it differently every day
Determine your time-slice size, for example one day.
Next, for each document, you emit once for every time slice (day) that it is available. So if a document is available from 19 May to 21 May (inclusive), your emit keys would be:
Once that is computed for every document, to find docs available on a certain day, just query the view with (e.g. today) ?key="2011-05-18".
You never have to update or re-run your views.
If you must never change your query URL for some reason, you might be able to use a _show function to 302 (temporary) redirect to today's correct query.
So your view is not being updated automatically I take it?
New and changed documents are not being added on the fly?
Oh I see, you're cheating. You're using "out of document" information (i.e. the current date) during view creation.
There's no view renaming, but if you were desperate you could use url rewriting.
Simply create a design document "each day": /db/_design/today05172011
Then use some url rewriting to change: GET /db/_design/today/_view/yourview
to: GET /db/_design/today051711/_view/yourview
Create the view at 11pm server time (tweak it so that "now" is "tomorrow", or whatever).
Then add some more clean up code to later delete the older views.
This way your view builds each night as you like.
Obviously you'll need to front Couch with some other web server/proxy to pull this off.
It's elegant, and inelegant, at the same time.

Last Updated Date: Antipattern?

I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?
Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.
It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.
These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.
It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).
there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect
This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.