MongoDB Do I need to worry about Collection Collation strength performance issues? - mongodb

I apologize if my title is not the best but I had a hard time describing my question. I'm interested to know what the downsides would be in setting the collation of a collection to strength = 2. It is my understanding this allows me to query the document in a case-insensitive way. I'm used to this coming from a SQL background and was surprised when this wasn't the default. I know you set specific field index collations but if you create a collection like so:
db.createCollection('myColl',{locale:'en',strength:2});
Then any string field I query will be case-insensitive. In my opinion this is great for most of my use cases but I'm wondering does creating this collation on most of my collections cause more processing load when it is performing queries?
Is the best practice to create collections with case-sensitive collation and then only add collation on specific fields? My gut tells me that the developers of MongoDb have a good reason to not have case-insensitive collation to be default on collections and I don't want to do something that inadvertently will cause more load on a server if unneeded. Now another reason for not setting case-insensitive collation could be because they need to know the 'locale' in order to set it. So maybe that could be the reason and not because it would cause more processing time.
I hope this question makes sense. I appreciate any feedback.

Related

What considerations in mongo should I have turning indexed numeric attributes into strings

We are moving to a new system which is forcing us to use strings instead of Int32s for an id. Not to be confused with the _id. None of our queries are intending to change, but they are a lot slower. They effectively went from 170ms to 1.4minutes. We have a lot of lookups in this main query, if it wasn't proprietary I would post it here, but really it's not the query since only the database attributes that we use for lookups has changed from a number to a string. They are already indexed on that using unique and descending indexes, but maybe there is more consideration I might need for it? Effectively this change made the attribute "a12343cgr3h" from a number like 4321. I personally believe numbers are faster and I have doubts we can make this any faster, but I'm hoping we can speed it up somehow, I just believe the solution is out of my wheelhouse. I'm not sure if I need a text index or if there are other solutions I need to change. Most of the queries use a simple find({id: "a12343cgr3h"}), but then we have some aggregate queries with lots of lookups and nested arrays that also have their own lookups. I can't post the query otherwise I would. Any thoughts on what I should do in terms of indexes or anything else I need to consider when changing an indexed numeric attribute to a text attribute that could be slowing down the query?

Best way to make a medical history DB in mongodb

i'm making a sistem that stores all medical , and healthy data from a person in a database , i've chosen mongodb to do the work but i'm new in mongodb modeling and i dont have an idea of whats the best way to do this.
Do i use a document for each pacient and insert subdocuments like this:
$evolution=array(); //subdocument
$record=array(); //subdocument
$prescriptions=array(); //subdocument
$exams=array(); //subdocument
$surgeries=array(); //subdocument
or do i create a new document for each one of these data?.
i know the limitation of document size that is 16 megabytes, but i don't know if the informations will reach the limmit.
The exact layout of your documents is highly dependent on the types of queries you need to make. Unfortunately without a detailed understanding of your use case it would be impossible to provide good advice about what is the best layout.
Depending on your use case it may be valid to have a document/patient with sub documents as you indicate. In some cases though it may be better to have a separate collection for each of the fields indicated. It all depends on how big those documents will be, what types of queries you will need to perform etc.
Some general advice:
Try to avoid queries that use multiple collections.
If your queries are getting difficult, you may have the wrong layout. Re-evaluate your layout any time you are in this situation.
Documents that constantly grow can create problems because Mongo constantly has to move them around in order to make room for the growth. If they will be growing quickly then reevaluate to see if there is a better layout.
While you can technically store different document layouts in the same collection in Mongo it is not generally considered a good practice. All documents in your collection should ideally follow some sort of schema even if that schema is not rigidly defined.
Field names matter. They take up space in Mongo so short field names are better if you expect to have a lot of data.
The best advice I can offer would be to start with what you think might work and see how it goes. If it gets awkward or difficult to get the information you need then reevaluate.

Looking up submissions by ID with MongoDB/Mongoose

So I'm used to looking up submissions by an auto increment primary ID with MySQL, but after using the MongoDB ORM wrapper, Mongoose, I'm finding that since Mongo stores data within collections differently, there is not really any concept of a traditional auto-increment ID.
I'm stuck trying to figure out how to grab a submission now because normally I'd structure my URL like so:
submission/34/category/slug-goes-here.
Since the 34 now becomes an ugly string based UUID with Mongo, I don't necessary want to display that in my URLs, but I want a unique URL in order to look up my submissions.
I'm thinking of maybe having a set method that when I insert the submission into my database, it generates some kind of 6 character hash e.g. zhXk40 and looks it up like that.
I'm wondering if I do it like this what the performance trade-offs would be. If I made constraints on the slug and then looked them up with the slug, and verified that the category matched, would that be more efficient? Either way I'm going to have to check if the category and slug match, but I'm not sure if an ID is even really necessary in this case.
What's the best practice for creating a route + looking up some piece of data from the db based on that route?
The first thing you should know is:
The _id property doesn't necessarily is that "ugly" ObjectId string.
Actually, the _id just need to be unique within its collection, so if you want to use auto incrementing IDs, there are no problems, however...
If you plan to use sharding within your database, then using auto incrementing field as the _id is overkill.
Why? Read the accepted answer here: Should I implement auto-incrementing in MongoDB?
In my application, as we're not going to shard it, then we use a indexed, numeric ID just for easier usability to the final user, and internally all references are ObjectIds.
Also, here's an good tuto for creating a auto incrementing field in MongoDB: http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
Sheesh! If we were able to truly know best practice here we'd all be better off. The talking heads are still talking and will be for some time.
How I would approach this is to go as pretty as I could. If I had a usable text string to make it semantic that'd be best case.
If I couldn't do that I'd go with the hash thing you suggested.
With both solutions the challenge will be to ensure that it remains unique. That means lookups before you save.
On performance, it's the same as SQL. Index what you use to look things up with. Mongo does good with composite indexes so category name and hash will look up pretty quick.

Working with ugly mongodb _ids on front end

A kinda subjective question but I have a few concerns about working with mongodb _ids on client side. I would better use something like s52ruf6wst or xR2ru286zjI for RESTful resources and working with small collections of items.
1) I'm starting to be dependent on proprietary implementation of backend database (_id field name and implementation). If I stick with this _ids it is harder to replace back-end db later.
2) I've got huge ugly URLs containing mongo _ids (even for REST endpoints - I don't like it)
3) For hackers and "curious users" it is become obvious which back-end db is used.
As I see most of web applications use their own conventions on how ids, uid, uuids should look like, and I would say to me it looks more professional (than using staightforward ugly implementation by db vendor).
So the question is when it is good to use standard mongo _ids and use them across back and front ends? And what can be done to improve the situation?
when it is good to use standard mongo _ids
Always. Except when you simply don't like it. But your personal preferences have nothing to do with security. Mongo's object ids are not inherently less safe than any other identifier type (integer, UUID, etc.)
ObjectID is designed to be unique across your cluster and this is very important, because mongodb is a distributed DB. It also has a nice property: values are monotonically increasing with time. This property may or may not be useful to you.
1) I'm starting to be dependent on proprietary implementation of backend database (_id field name and implementation). If I stick with this _ids it is harder to replace back-end db later.
This is where abstraction layers, frameworks and ODM (ORM)s come in. They provide a standardised layer (i.e. Doctrine 2) to query mutliple different types of database. As an exampe id translates in many ORMs as _id and ID and id depending on which database you are using.
As said before, the ObjectId has no inherent securiy flaws, it isn't even that useful in general to other users since even though the ObjectId has a time part this time part cannot be easily used to decide what the next object is (unlike an auto incrementing ID). The only way to do this reliably would be to test all times up until now and all pid numbers to detect if a hidden object exists. So it is not very easy at all to crawl ObjectId URLs and in fact are not very SEO friendly for that exact reason. But yes they could know what database you are using.
That being said, yes they are ugly but they are that long and ugly to be, as #Sergio says, unique. Making your own will be just as bad. I suppose you could shrink it a little by base64 encoding the hexadecimal representation of the ObjectId.
However I am unsure if you really need that.

Is MongoDB a good fit for this?

In a system I'm building, it's essentially an issue tracking system, but with various issue templates. Some issue types will have different formats that others.
I was originally planning on using MySQL with a main issues table and an issues_meta table that contains key => value pairs. However, I'm thinking NoSQL (MongoDB) might be the better option.
Can MongoDB provide me with the ability to generate "standard"
reports, like # of issues by type, # of issues by type by month, # of
issues assigned per person, etc? I ask this because I've read a few
sources that said Mongo was bad at reporting.
I'm also planning on storing my audit logs in Mongo, since I want a single "table" for all actions (Modifications to any table). In Mongo I can store each field that was changed easily, since it is schemaless. Is this a bad idea?
Anything else I should know, and will Mongo work for what I want?
I think MongoDB will be a perfect match for that use case.
MongoDB collections are heterogeneous, meaning you can store documents with different fields in the same bag. So different reporting templates won't be a show stopper. You will be able to model a full issue with a single document.
MongoDB would be a good fit for logging too. You may be interested in capped collections.
Should you need to have relational association between documents, you can do have it too.
If you are using Ruby, I can recommend you Mongoid. It will make it easier. Also, it has support for versioning of documents.
MongoDB will definitely work (and you can use capped collections to automatically drop old records, if you want), but you should ask yourself, does it fit to this task well? For use case you've described it is better option to use Redis (simple and fast enough) or Riak (if you care a lot about your log data).