How to mark documents as marked/deleted in mongoDB - mongodb

So I have a few choices to temporarily hide (disabled) or permanently hide (deleted) documents from my queries.
I do not plan to "physically" delete data, except when doing maintainance.
What I could do:
Add two properties like: "disabled: true" and "deleted:false"
Add a status field like: "status: 'deleted|disabled|other'"
Turn things around (show only active documents) like: "status: 'active'" or "active: true".
When querying, I could either query for these properties in every query or I could query a mongoDB view, that only returns "active" documents.
The purpose of the database is to help users to find projects they might like to join. I use mongoose too, but many queries might be native mongoDB-queries.
So what might be the "smartest solution" in terms of performance, scalability and potential effort?

I would go for a status field.
It is more generic, and if you need to add an extra value you don't need to create a new field, and in case you want to have more that one status (like ACTIVE and ENABLED) you can even set the status field as an array.
I do not understand "Query a mongoDB view", but for me the solution is quite simple.
Just add to ever query db.collection.find({..., "status" : "deleted"})

Related

Is it possible to run a "dummy" query to see how many documents _would_ be inserted

I am using MongoDB to track unique views of a resource.
Everytime a user views a specific resource for the first time, a new view is logged in the db.
If that same user views the same resource again, the unique compound index on the collection blocks the insert of the duplicate.
For bulk inserts, with { ordered: false }, Mongo allows the new views through and blocks the duplicates. The return value of the insert is an object with an insertedCount property, telling me how many docs made it past the unique index.
In some cases, I want to know how many docs would be inserted before running the query. Then, based on the dummy insertedCount, I would choose to run the query, or not.
Is there a way to test a query and have it do everything except actually inserting the docs?
I could solve this by running some js serverside to get the answer I need. But I would prefer to let the db do those checks

mongodb multiple documents insert or update by unique key

I would like to get a list of items from an external resource periodically and save them into a collection.
There are several possible solutions but they are not optimal, for example:
Delete the entire collection and save the new list of items
Get all items from the collection using "find({})" and use it to filter out existing items and save those that do not exist.
But a better solution will be to set a unique key and just do kind of "update or insert".
Right now on saving items the unique key already exists I will get an error
is there a way to do it at all?
**upsert won't do the work since it's updating all items with the same value so it's actually good for a single document only
I have a feeling you can achieve what you want simply by using the "normal" insertMany with the ordered option set to false. The documentation states that
Note that one document was inserted: The first document of _id: 13
will insert successfully, but the second insert will fail. This will
also stop additional documents left in the queue from being inserted.
With ordered to false, the insert operation would continue with any
remaining documents.
So you will get "duplicate key" exceptions which, however, you can simply ignore in your case.

how to join a collection and sort it, while limiting results in MongoDB

lets say I have 2 collections wherein each document may look like this:
Collection 1:
target:
_id,
comments:
[
{ _id,
message,
full_name
},
...
]
Collection 2:
user:
_id,
full_name,
username
I am paging through comments via $slice, let's say I take the first 25 entries.
From these entries I need the according usernames, which I receive from the second collection. What I want is to get the comments sorted by their reference username. The problem is I can't add the username to the comments because they may change often and if so, I would need to update all target documents, where the old username was in.
I can only imagine one way to solve this. Read out the entire full_names and query them in the user collection. The result would be sortable but it is not paged and so it takes a lot of resources to do that with large documents.
Is there anything I am missing with this problem?
Thanks in advance
If comments are an embedded array, you will have to do work on the client side to sort the comments array unless you store it in sorted order. Your application requirements for username force you to either read out all of the usernames of the users who commented to do the sort, or to store the username in the comments and have (much) more difficult and expensive updates.
Sorting and pagination don't work unless you can return the documents in sorted order. You should consider a different schema where comments form a separate collection so that you can return them in sorted order and paginate them. Store the username in each comment to facilitate the sort on the MongoDB side. Depending on your application's usage pattern this might work better for you.
It also seems strange to sort on usernames and expect/allow usernames to change frequently. If you could drop these requirements it'd make your life easier :D

Filtering a large number of documents in Sphinx

In my app, users can favorite documents. Sphinx is used to allow them to search for matching documents. If a user wants to search their favorites, I first go directly to the database (mySQL) to fetch a list of document IDs and use that to filter the search in sphinx. The pseudocode looks something like this:
function searchFavoritesForUser($userId, $query) {
$favoriteIds = getFavoriteIdsForUser($userId);
$sphinx = new Sphinx(...);
$sphinx->setFilter('document_id', $favoriteIds);
return $sphinx->search($query);
}
This works fine if the user has a reasonable number of favorites. If the user has a large number of favorites, then loading the favorites can use a potentially large amount of memory and setting the filter in sphinx can run up again various limits in searchd.
I realize that I can adjust those config values, but it seems like there must be a better way to design this. Ideally, I would be able to eliminate the step where I have to load all of the favorite document IDs from the database into main memory.
While you create sphinx index, you can create MVA (multi-value attribute) for favorites in sphinx having (doc_id, user_id) and then search directly in sphinx, no need to query to MySql.

MongoDB - Combine filter with default filter

I'm hoping to do a very common task which is to never delete items that I store, but instead just mark them with a deleted flag. However, for almost every request I will now have to specify deleted:false. Is there a way to have a "default" filter on which you can add? Such that I can construct a live_items filter and do queries on top of that?
This was just one guess at a potential answer. In general, I'd just like to have deleted=False be the default search.
Thanks!
In SQL you would do this with a view, but unfortunately MongoDB doesn't support views.
But when queries which exclude items which are marked as deleted are far more frequent than those which include them, you could remove the deleted items from the main items collection and put them in a separate items_deleted collection. This also has the nice side-effect that the performance of the collection of active items doesn't get impaired by a large number of deleted items. The downside is that indices can't be guaranteed to be unique over both collections.
I went ahead and just made a python function that combines the specified query:
def find_items(filt, single=False, live=True):
if live:
live = {'deleted': False}
filt = dict(filt.items() + live.items())
if single:
return db.Item.find_one(filt)
else:
return db.Item.find(filt)