MongoDB sorting by documents with more fields filled out - mongodb

I'm currently trying to get a query that I can find by zip code (very easy)
e.g. db.col.find({zip: 60010})
but then sort based on how many fields on the Mongo document are filled out (or not null; not so easy)
How can I do this in a fast an efficient way?

You can do it using objectToArday and addFields then size
You can refer this answer
But it's costlier option than having one field which tells the count of fields in that document.
You can decide based on read or write intensive application.

Related

Projection in MongoDb

I am learning MongoDb and a question came to my mind regarding projection.
When we do a projection for some fields, what does MongoDB do?
Would it read the whole document and then drop some fields and returns the results or it won't read excluded fields and return the fields mentioned in the query.
For e.g. If I have a document with 4 fields and 3 arrays(each of size ~10) and I just want the 4 fields and not the arrays.
Would MongoDB read the whole document and drop the array or would just read the 4 fields?
If it's the first case how the execution time or latency would differ if the array becomes big in the document?
The document is compressed on storage , so mongo need to read the document first , uncompress it and get the fields specified in the filter only.
The trick here is that when you search by some of the fields you need to index them so the search to happen faster in memory and to avoid mongo to read all documents one by one and check for the searched field.
And if you need faster access for only those fields it is best all those fields to be in compound index and you search them via so called "covered query" , then you will search only in memory and fetch only from memory without accessing storage which will be much more faster.
Also in many cases it happen that same documents are searched multiple times so the mongoDB predictive algorithm is caching those documents in memory to be accessed faster.

MongoDB/Mongoose whats the best way to count a lot of documents with a filter?

I understand that estimatedDocumentCount() uses metadata to count, which makes it faster when there are a lot of documents. But the drawback is that you cant add a filter on it like countDocuments(). What if there are still a lot of documents, but you want to use a filter, what's the best way to do that, if there is a way.
Well, you got it right.
countDocuments(...) is how you count documents with a filter.
If you're facing issues with speed, I'd advise you to add an index on the fields you're planning to filter with, this way it's an index scan and the result will be almost immediate.
It would follow that you are filtering based on the contents of the document, and that you would not be able to do so only with metadata.
You could add an index, or insert your data into an exclusive collection using normalized data models.
The documentation also suggests that you could create a view and count based on the metadata.

MongoDB - Many-to-Many-Relationship (Special Case)

Should I store an embedded document multiple times in MongoDB or should I only store it once and link to it using it‘s ID?
I want to accomplish a „Many-to-Many-Relationship“ and I only have to update these embedded documents once a year.
Which of the both option fits better?
Thanks for your help!
In your case, you only have to update the embedded documents one a year, it means that the read operation is going to be used much more than the write operation.
So, for optimizing read operations, "references" should be avoid.
The only remaining concern here is whether the embedded documents are large (size) or not and they are frequently duplicated or not. If not, feel free to use embedded documents, because that is the natural power of MongoDB.

What is the preferred way to add many fields to all documents in a MongoDB collection?

I have have a Python application that is iteratively going through every document in a MongoDB (3.0.2) collection (typically between 10K and 1M documents), and adding new fields (probably doubling/tripling the number of fields in the document).
My initial thought was that I would use upsert the entire of the revised documents (using pyMongo) - now I'm questioning that:
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
this is actually a great question that can be solved a few different ways depending on how you are managing your data.
if you are upserting additional fields does this mean your data is appending additional fields at a later point in time with the only changes being the addition of the additional fields? if so you could set the ttl on your documents so that the old ones drop off over time. keep in mind that if you do this you will want to set an index that sorts your results by descending _id so that the most recent additions are selected before the older ones.
the benefit of this of doing it this way is that your are continually writing data as opposed to seeking and updating data so it is faster.
in regards to upserts vs bulk inserts. bulk inserts are always faster than upserts since bulk upserting requires you to find the original document first.
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
you really need to understand your data fully to determine what is best but if only change to the data is additional fields or changes that only need to be considered from that point forward then bulk inserting and setting a ttl on your older data is the better method from the stand point of write operations as opposed to seek, find and update. when using this method you will want to db.document.find_one() as opposed to db.document.find() so that only your current record is returned.
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
bulk inserts will be faster than inserting each one sequentially.

Listing fields in a mongodb object

I am developing a system where items can be shared with other users via an access key. I'm storing the access keys as fields within a shareinfo object (embedded within the item's document), as shown below:
shareinfo:{
........
<nth key>: <permissions object - may be complex and large>
........
}
When an item is accessed I check shareinfo.key and find if its valid or not.
Currently, to list the keys I am loading (in Java) the entire shareinfo object in memory and running keySet() on it to retrieve and return the keys while the rest of the data is wasted.
Here's the problem: I want to get the list of keys (i.e. object field names) without the accompanying data (because in some cases the permissions object is noticeably large).
I could not find any query in the mongodb docs for such a query. I want to know whether it's possible or not? Or is there an optimized way to load the list of field names into the application without the accompanying field values?
I had the same issue when trying to understand the structure of an existing mongodb database using the mongodb shell. Then I found that I can use the JavaScript function Object.keys() to retrieve an array of the object fields. (Tested with MongoDb 2.4.2)
Object.keys(db.collection.findOne())
MongoDB has a schema-less design, which means that any document could have different fields contained within it from any other document. To get an exhaustive list of all the fields for all the documents, you would need to traverse all the documents in the collection and enumerate each field. For any reasonably sized collection this is going to be an extremely expensive operation.
There are a couple of helpers that make this more simple: Variety and schema.js . They both allow you to limit the documents inspected, and they report on coverage as well. Both amount to doing a map/reduce on the collections, but are at least simpler than having to write one yourself.
If you are just looking for top-level fields across all documents (don't care about sub-fields and all the details statistics provided by variety)
Here is the one-liner
db.things.aggregate([{$project: {arrayofkeyvalue: {$objectToArray: "$$ROOT"}}}, {$unwind:"$arrayofkeyvalue"}, {$group:{_id:null, allkeys:{$addToSet:"$arrayofkeyvalue.k"}}}])