How does everyone manage archiving/deleting documents in MongoDB? - mongodb

I'm currently learning a lot about the MEAN stack and obviously MongoDB. I want to set my database up so that nothing is ever 'removed', things are only marked as deleted or moved somewhere else, like an archived collection/database. What's the industry standard way of doing this?
The way i see it is that I have two options, both raising more questions:
Marking documents as deleted with a deleted key.
Would I store this as a timestamp with an accompanying array of timestamps? The array is needed as I'm wanting to also create a 'restoring' functionality, in-turn meaning it can be deleted more than once which I want to track. This will mean that I have to update a lot of my queries to ignore that key.
Move the documents to another collection or database.
This would require the most work as I'd need to handle any other functionality that references that document. For example deleting a user from a cinema database, would this mean that I have to archive previous bookings as well or just update queries to also search in the archive?
I couldn't find any useful resources on this but if you guys know of any then please point me in that direction :) thanks.

Thanks hector! his answer:
"Actually, there is not a "standard" way to do this. Each company does it by its own way. For your first option, you don't need to store a timestamp array but just a flag indicating that document is "deleted". Then, in another collection you can store the events. For instance: {event: "deleted", date: "03/08/2017 08:00:00", documentId: "7726"} An event store is the way to go"

Related

How can I structure Firestore to store history of events that can easily be accessed?

I have been continuously changing my database structure and just can't find a good solution.
My database stores habits, each document is a different habit. The fields of the habits hold information on the habit, but where I am struggling is how to structure the completion/history.
Unlike a task, a habit is completed over and over again which means I need to store all that information and be able to query it easily for date ranges and such.
First
The first idea I had was a subcollection of documents that each held information such as the date, the completion status, etc.
Although this idea sounds good the problem was querying. I am using flutter and inside a StreamBuilder it proved difficult to access a sub-collection.
Second
My second idea was to use a document field in the habit. With this, I could have an array called completed, and the array would store a Map with the date, the status, etc.
With this idea, I made it the furthest, but where I struggle is changing the status of one field in the Map in the array.
Third
My third and most recent idea I haven't made much headway with yet but I was thinking I could use a nest Maps in the habits document field. A field called history would be equivalent to the map, then inside the Map, a date in string format would be equivalent to the Map that stores all the essential information.
This idea was great in a lot of ways, but querying the data by date range when the date is in a string format will inevitably be difficult.
Please let me know which structure is more ideal, or if there is a better way to do it. The language I am using is Flutter so I need to operate within the guidelines of the cloud_firestore package. Any help would be appreciated!

nosql wishlist models - Struggle between reference and embedded document

I got a question about modeling wishlists using mongodb and mongoose. The idea is I need a user beeing able to have many different wishlists which contain many wishes, each wish making a reference to a single article
I was thinking about it and because a wishlist only belong to a single user I thought using embedded document for that.
Same for the wish beeing embedded to a wishlist.
So I got something like that
var UserSchema = new Schema({
...
wishlists: [wishlistSchema]
...
})
var WishlistSchema = new Schema({
...
wishes: [wishSchema]
...
})
but my question is what to do with the article ? should I use a reference or should I copy the article's data in an embedded document.
If I use embedded document I got an update problem. When the article's price change, to update every wish referencing this article become a struggle. But to access those wishes's article is a piece of cake.
If I use reference, The update is not a problem anymore but I got a probleme when I filter the wish depending on their article criteria ( when I filter the wishes depending on a price, category etc .. ).
I think the second way is probably the best but I don't know how if it's possible to build a query to filter the wish depending on the article's field. I tried a lot of things using population but nothing works very well when you need to populate depending on a nested object field. ( for exemple getting wishes where their article respond to certain conditions ).
Is this kind of query doable ?
Sry for the loooong question and for my bad English :/ but any advice would be great !
In my experience in dealing with NoSQL database (mongo, mainly), when designing a collection, do not think of the relations. Instead, think of how you would display, page, and retrieve the documents.
I would prefer embedding and updating multiple schema when there's a change, as opposed to doing a ref, for multiple reasons.
Get would be fast and easy and filter is not a problem (like you've said)
Retrieve operations usually happen a lot more often than updates and with proper indexing, you wouldn't really have to bother about performance.
It leverages on NoSQL's schema-less nature and you'll be less prone restructuring due to requirement changes (new sorting, new filters, etc)
Paging would be a lot less of a hassle, and UI would not be restricted with it's design with paging and limit.
Joining could become expensive. Redundant data might be a hassle to update but it's always better than not being able to display a data in a particular way because your schema is normalized and joining is difficult.
I'd say that the rule of thumb is that only split them when you do not need to display them together. It is not impossible to join them back if you do, but definitely more troublesome.

MongoDB: How to organize data

I am a little bit uncertain on how to organize the data when using MongoDB.
I have a user with some various data. Say a classified service, with a profile and possibly some items for sale. In a relational database this data would be split up into a profile table and a for-sale table. As I understand in MongoDB this would probably all go into one "document" (well, probably except if there is very large number of items for sale).
But my classified service is a little bit special, as for each item for sale, an administrator (salesman) adds stuff to the item for sale, such as allow the ad to go public, a comment on the item and possibly more. The user should obviously not be able to alter this admin-added info.
What would be the recommended way to deal with this? Can the administrator just change (add to) the users item-document? But I guess the user can then change what the administrator has added, right? So perhaps a better approach would be for the admin to create another document that contains the added data, and these two documents would be merged before being displayed?
Maybe the following may be helpful: http://docs.mongodb.org/manual/applications/data-models/?
Also, http://docs.mongodb.org/manual/data-modeling/

Changing the primary key on a MongoDB collection

I took a shortcut earlier and made the primary key of my Mongo database by concatenating various fields to create a "unique id"
I would now like to change it to actually use the ObjectId. What's the best approach to do it? I have a little over 3M documents and would like this to be as least disruptive as possible.
A simple approach would be to bring down the site for a bit and then copy every document from one to the other one which is using ObjectIds but I'd like to keep the application running if I can. I imagine one way would be to write to both for a period of time while the migration happens but that would require me having two similar code bases so I wonder if there's a way to avoid all that.
To provide some additional information:
It's just one collection that's not referenced by any others. I have another MySQL database that contains some values that are used to create the queries that read from this MongoDB collection.
I'm using PyMongo/Mongoengine libraries to interact with MongoDB from Python and I don't know if it's possible to just change the primary key for a collection.
You shouldn't bring your site down for any reason if it does not go down itself. :)
No matter how many millions of records you have, the solution to the problem resides on how you use your ids.
If you cross-reference documents in different collections using these ids, then for every updated object, you will update all other objects that references this one.
As first step, your system should be updated to stop creating new objects in the old way. If your system lets you easily do this, then you can update your database very easily. If this change is not easy to make, then your system has some architectural problems and you should first change this. If this is the situation, please update your question so I can update my answer.
Since I don't know anything about your applications and data, what I say will be too general. Let's call the collection you want to update coll_bad_id. Every item in this collection is referenced in other collections like coll_poor_guy and coll_wisdom_searcher. How I would do this is to run over coll_bad_id one item at a time like this:
1. read one item
2. update _id with new style of _id
3. insert item back to collection
-- now we have two copies of the same item one with old-style id, one with new
4. update each item referencing this to use new style id
5. remove the duplicate item with old-style id from collection
One thing you should keep in mind that, bson ObjectId's keep date/time data that can be very useful. Since you rebuild all these objects on one day, your ObjectId's will not reflect correct creation times for these items. For newly added items, they will. You can note the first newly added item as the milestone of items with ids with correct-creation times.
UPDATE: Code sample to run on Mongo shell.
This is not the most efficient way to do this; but it is safe to run since we do not remove anything before adding them back with a new _id. Better can be doing this in small amounts by adding queries to find() call.
var cursor = db.testcoll.find()
cursor.forEach(function(item) {
var oldid= item._id; // we save old _id to use for removal below.
delete item._id; // When we add an item without _id, Mongo creates a unique _id.
db.testcoll.insert(item); // We add item without _id.
db.testcoll.remove(oldid); // We delete the item with bad _id.
});

MongoDB: What's a good way to get a list of all unique tags?

What's the best way to keep track of unique tags for a collection of documents millions of items large? The normal way of doing tagging seems to be indexing multikeys. I will frequently need to get all the unique keys, though. I don't have access to mongodb's new "distinct" command, either, since my driver, erlmongo, doesn't seem to implement it, yet.
Even if your driver doesn't implement distinct, you can implement it yourself. In JavaScript (sorry, I don't know Erlang, but it should translate pretty directly) can say:
result = db.$cmd.findOne({"distinct" : "collection_name", "key" : "tags"})
So, that is: you do a findOne on the "$cmd" collection of whatever database you're using. Pass it the collection name and the key you want to run distinct on.
If you ever need a command your driver doesn't provide a helper for, you can look at http://www.mongodb.org/display/DOCS/List+of+Database+Commands for a somewhat complete list of database commands.
I know this is an old question, but I had the same issue and could not find a real solution in PHP for it.
So I came up with this:
http://snipplr.com/view/59334/list-of-keys-used-in-mongodb-collection/
John, you may find it useful to use Variety, an open source tool for analyzing a collection's schema: https://github.com/jamescropcho/variety
Perhaps you could run Variety every N hours in the background, and query the newly-created varietyResults database to retrieve a listing of unique keys which begin with a given string (i.e. are descendants of a specific parent).
Let me know if you have any questions, or need additional advice.
Good luck!