Regenerating CouchDB views - nosql

Contrived example:
{
productName: 'Lost Series 67 DVD',
availableFrom: '19/May/2011',
availableTo: '19/Sep/2011'
}
View storeFront/currentlyAvailableProducts basically checks if current datetime is within availableFrom - availableTo and emits the doc.
I would like to force a view to regenerate at 1am every night, i.e. process/map all docs.
At first I had a simple python script scheduled via crontab that touched each document hence causing a new revision and the view to update,however since couchdb is append only this wasnt very efficient - i.e. loads of unnecessary IO and disk space usage followed by compaction, very resource wasteful on all fronts.
Second solution was to push the view definition again via couchapp push however this meant the view was unavailable (or partially unavailable) for several minutes which was also unacceptable.
Is there any other solutions?

Will's answer is great; but just to get the consensus viewpoint represented here:
Keep one view, and query it differently every day
Determine your time-slice size, for example one day.
Next, for each document, you emit once for every time slice (day) that it is available. So if a document is available from 19 May to 21 May (inclusive), your emit keys would be:
"2011-05-19"
"2011-05-20"
"2011-05-21"
Once that is computed for every document, to find docs available on a certain day, just query the view with (e.g. today) ?key="2011-05-18".
You never have to update or re-run your views.
If you must never change your query URL for some reason, you might be able to use a _show function to 302 (temporary) redirect to today's correct query.

So your view is not being updated automatically I take it?
New and changed documents are not being added on the fly?
Oh I see, you're cheating. You're using "out of document" information (i.e. the current date) during view creation.
There's no view renaming, but if you were desperate you could use url rewriting.
Simply create a design document "each day": /db/_design/today05172011
Then use some url rewriting to change: GET /db/_design/today/_view/yourview
to: GET /db/_design/today051711/_view/yourview
Create the view at 11pm server time (tweak it so that "now" is "tomorrow", or whatever).
Then add some more clean up code to later delete the older views.
This way your view builds each night as you like.
Obviously you'll need to front Couch with some other web server/proxy to pull this off.
It's elegant, and inelegant, at the same time.

Related

Is this the best way to set CoreData entity variables to 0 every day at 24:00?

I currently have a coreData entity called CalorieProgress, which I would like to reset all variables (calorieProgress, fatProgress) to 0, every day.
I am still quite new to SwiftUI, and the only method I thought of as of now, is to add a Date Created variable to this entity called created, and when the user opens the app, to check if that date was yesterday. If so set all values to 0 etc.
Is there a more efficient way to do this?
Thanks
Your design is good and simple, and a reasonable choice if you're getting started.
It can have trouble, however, when people move between time zones. It is even possible for people to move to previous days this way (most dramatically when they cross the date line). There is no single answer to that question. Your app has to decide what it means by "today" when strange clock events happen. (Users also sometimes change their clock, and you want to behave "reasonably" in those cases, even if it means the data is wrong.)
Having built several of these, my suggestion is to just store raw, immutable, data records, and work out things like resetting values when you're running queries. For example, to work out how many calories someone has burned "today" doesn't require that you set any value to zero. You can just perform a query for records that occur after some time and sum their calories (you can even do this with aggregate queries directly on Core Data).
Core Data can be very fast, but if these queries become too slow, you can store daily aggregation records in Core Data. But keeping the original raw data means that those are really just caches and you can throw them away and recompute any time you need to.
Assuming that a new day starts as midnight (I've worked on apps where days started "when the user wakes up in the morning" which is much more complicated...) you should also be aware of significantTimeChangeNotification which is posted at midnight (and a few other times). You can't use this to launch your app or do processing in the background, but it's very nice for updating your UI if the user has the app open.

Most Performant way to implement time-dependent status

Central to a project I'm working on is a highlighting-mechanic that can be applied to certain items on the website. The idea is, that this highlighted-status is only active for a certain amount of time.
I'm trying to find the most performant way to achieve this (in querying, setting status, checking status and revoking it)
A first approach would be to set simply set a value 'highlighted:true' to the item. This seems to be the most performant way to query for highlighted items. The Drawback I see here, is that there also needs to be stored a date for the highlighting-action, but furthermore there needs to run an interval to check on the highlighted items and potentially revoke their highlighted status. Also the exact moment when the item stops beeing highlighted can't be determined exactly, since its depending on the interval of the check-function.
A second approach would be to mainly store the date of the highlighting-action and run the query against it. It seems that the query of highlighted objects is way less performant, since every item ever is beeing checked, and on top its not just a boolean, but a proper function that throws those differnt date-values around to check if it is still valid. On the upside there is no external cleanup-function neccessary and every highlighting period ends perfectly on time.
Would love to have your input on this. Is there maybe a clever pattern on this?

Delayed Table Name Resolution in View

I have a view over a table. It turns out the table gets moved and an updated version of it created each night. This ensures there is always a table of the expected name present in the database, but I cannot find a way to make my view continue to point to the current version of the table. Whichever table existed when the view was created is the one I end up pointing to even after it moves and goes stale.
ViewA:
select a, b, c from todays_table;
todays_table stays current all day, then at night it gets renamed to todays_table01. View A now points to todays_table01 and a new table shows up called todays_table. Again, todays_table is current, but ViewA no longer is.
Is there a way to delay the table name resolution until the view is used? I haven't been able to get EXECUTE IMMEDIATE working for SELECT statement. I think I could get a dynamic SQL statement working if I used a cursor, but I have never needed these before and I'm not sure if they are the right path. I read about AUTO_REVAL but I believe this would only delay resolution until the first time the view was used and still go stale that night.
I could, of course, stop using the view and just move the complex query into my program but there are many places it is needed so I would like to eliminate all other solutions before falling back to this.
It would be ideal to eliminate the temporary table and just have the master table receive updates throughout the day but this is beyond my comprehension as I know nothing about RPG II and OCL.
Thanks for reading.
Edit
Per #Mr. Llama's suggestion, I experimented with using synonyms and aliases to point to todays_table and then having my view point to the synonym. Unfortunately in this scenario, the view uses the alias to resolve the actual table name on creation so the view continues to point to todays_table when it is renamed to todays_table01, though the alias continues to reference todays_table.
Edit 2
I'm accepting #mustaccio's answer because it does work and would be a reasonable approach to this problem if I could get the parameters going where they need to. My particular project requires flexibility so I am actually going to jump on the nightly process bandwagon and add a program to recreate my views after the process messes with their references as #danny117 suggested.
Thanks to everyone who replied though, I learned a lot about how all of these pieces work together.
I think you might be able to achieve what you want by wrapping your view definition in a SQL table function, something like
CREATE FUNCTION insteadofview (<parameters>)
RETURNS TABLE (<columns>)
...
RETURN
SELECT <the rest of your view definition>
Depending on how you query your view, you will probably need to pass search criteria into the function as parameters, otherwise performance will be suboptimal because the function will have to return all rows from the query before search arguments can be applied.
According to the manual, as you have noticed views on a table that is renamed continue to point to the original table object. Routines, however, including table functions, will be invalidated and their plans prepared again when they next invoked, using the original source table name.
I have no way of testing this though.
Full syntax to create a table function.

What is the proper way to keep track of updates in progress using MondoDB?

I have a collection with a bunch of documents representing various items. Once in a while, I need to update item properties, but the update takes some time. When properties are updated, the item gets a new timestamp for when it was modified. If I run updates one at a time, then there is no problem. However, if I want to run multiple update processes simultaneously, it's possible that one process starts updating the item, but the next process still sees the item as needing an update and starts updating it as well.
One solution is to mark the item as soon as it is retrieved for update (findAndModify), but it seems wasteful to add a whole extra field to every document just to keep track of items currently being updated.
This should be a very common issue. Maybe there are some built-in functions that exist to address it? If not, is there a standard established method to deal with it?
I apologize if this has been addressed before, but I am having a hard time finding this information. I may just be using the wrong terms.
You could use db.currentOp() to check if an update is already in flight.

Last Updated Date: Antipattern?

I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?
Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.
It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.
These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.
It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).
there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect
This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.