Best way to make a medical history DB in mongodb - mongodb

i'm making a sistem that stores all medical , and healthy data from a person in a database , i've chosen mongodb to do the work but i'm new in mongodb modeling and i dont have an idea of whats the best way to do this.
Do i use a document for each pacient and insert subdocuments like this:
$evolution=array(); //subdocument
$record=array(); //subdocument
$prescriptions=array(); //subdocument
$exams=array(); //subdocument
$surgeries=array(); //subdocument
or do i create a new document for each one of these data?.
i know the limitation of document size that is 16 megabytes, but i don't know if the informations will reach the limmit.

The exact layout of your documents is highly dependent on the types of queries you need to make. Unfortunately without a detailed understanding of your use case it would be impossible to provide good advice about what is the best layout.
Depending on your use case it may be valid to have a document/patient with sub documents as you indicate. In some cases though it may be better to have a separate collection for each of the fields indicated. It all depends on how big those documents will be, what types of queries you will need to perform etc.
Some general advice:
Try to avoid queries that use multiple collections.
If your queries are getting difficult, you may have the wrong layout. Re-evaluate your layout any time you are in this situation.
Documents that constantly grow can create problems because Mongo constantly has to move them around in order to make room for the growth. If they will be growing quickly then reevaluate to see if there is a better layout.
While you can technically store different document layouts in the same collection in Mongo it is not generally considered a good practice. All documents in your collection should ideally follow some sort of schema even if that schema is not rigidly defined.
Field names matter. They take up space in Mongo so short field names are better if you expect to have a lot of data.
The best advice I can offer would be to start with what you think might work and see how it goes. If it gets awkward or difficult to get the information you need then reevaluate.

Related

creating schema vs adding an additional field?

I want to store featured products like staff picks, featured products of each category in my system that will hold at most 10 documents. My priority is read performance over write performance but I also want to have an efficient storage system and I have three ways to do it in my mind:
Create a boolean field such as is_bestseller, is_staffpick in Products schema and query for it.
I think this is the simplest way to do it but I think it would require an additional query to check if the at most 10 limit has been reached.
Create a FeaturedProducts schema that holds references of product ids.
This is useful in the sense that if I want to add some additional info such as a featured product within the featured products then I could simply add a field in this schema. It would also be easy to check the at most 10 documents limit. I think this makes it more scalable but at the cost of performance?
Create a FeaturedProducts schema that will hold all the needed data.
I think performance wise this would be the best but I'm not sure if this is an efficient way to store data. Basically, I would just duplicate the data of a product and store it. Obviously, if I have to update product details then I have to update it in two places now but the read-to-write ratio heavily favors reading so I am willing to do this even if it's gonna require more logic regarding updating and deleting products. Also it would be easy to set at most 10 documents limit.
I tried to look for some examples regarding featured products but couldn't find anything useful. I am not sure what the best practice is here and which way to go about so any kind of help is appreciated.
The rule of thumb when modeling your data in MongoDB is:
Data that is accessed together should be stored together.
Havin that in mind I considered The Extended Reference Pattern a great options for you use case, here is a example from the MongoDB Blog.
Considere an e-commerce application where you have user collection, order collection and others. Where users and orders has a 1-N relation, embedding all of the information about a customer for each order just to reduce the JOIN operation results in a lot of duplicated information.
Instead of duplicating all of the information on the customer, we only copy the fields we access frequently.
This schema will have height read performance, because all the information will be store in a single document, at the cost of some duplicate data, but that is not completely bad considering that it can sever as history data.
Useful information:
Patterns
Design Anti-pattern
A potential solution is to use an index here so that you can maximize your query performance. You would create an additional boolean flag (as you indicated in your first solution) then index that query, with a cursor that limits the number of returned values.
For more ways to increase your query performance check out the official Mongo docs here. If you're curious as to how much more performant your queries become, you can use Mongo's explain() method to get benchmarks (more info here) and compare approaches.
Best of luck!

NoSQL how to lookup id in a collection

NoSQL noob here. I'm building an app using Firestore NoSQL. I'm looping through items where every item has a owner id (creator user id).
I want to display owner's name on the listing page. In traditional SQL, i have foreign key so I can just make reference to say, Item.Owner.FirstName
What's the best practice in NoSQL? Should I be saving owner name as a field at the time of saving the item? or do a lookup of each owner id to get user object whilst i'm looping through items?
Second option sounds expensive so i'm assuming 1st way is the way to go. Unless there's a better, more accepted way?
Both will work. You either reference the data in the other document in whatever way you see fit, or you duplicate information into the document that you intend to query to build the display. You just have to decide what which problem you want to deal with:
If you duplicate data among documents (known as "denormalization"), then you'll have to put effort into keeping them all up to date with each other, if that's what you require. So, writing one document might actually turn into writing multiple documents.
If you normalize your data with no duplication, then each of your queries will require more queries to get the related data from other documents. This could result in a drop in performance and an increase in cost for apps with heavy read loads.
Since we don't know the performance requirements and usage behavior of your app, there is no way to give specific advice. You will have to think carefully about which problem you want to have, perhaps based on complexity, performance, and overall cost.

Mass Update NoSQL Documents: Bad Practice?

I'm storing two collections in a MongoDB database:
==Websites==
id
nickname
url
==Checks==
id
website_id
status
I want to display a list of check statuses with the appropriate website nickname.
For example:
[Google, 200] << (basically a join in SQL-world)
I have thousands of checks and only a few websites.
Which is more efficient?
Store the nickname of the website within the "check" directly. This means if the nickname is ever changed, I'll have to perform a mass update of thousands of documents.
Return a multidimensional array where the site ID is the key and the nickname is the value. This is to be used when iterating through the list of checks.
I've read that #1 isn't too bad (in the NoSQL) world and may, in fact, be preferred? True?
If it's only a few websites I'd go with option 1 - not as clean and normalized as in the relational/SQL world but it works and much less painful than trying to emulate joins with MongoDB. The thing to remember with MongoDB or any other NoSQL database is that you are generally making some kind of trade off - nothing is for free. I personally really value the schema-less document oriented data design and for the applications I use it for I readily make the trade-offs (like no joins and transactions).
That said, this is a trade-off - so one thing to always be asking yourself in this situation is why am I using MongoDB or some other NoSQL database? Yes, it's trendy and "hot", but I'd make certain that what you are doing makes sense for a NoSQL approach. If you are spending a lot of time working around the lack of joins and foreign keys, no transactions and other things you're used to in the SQL world I'd think seriously about whether this is the best fit for your problem.
You might consider a 3rd option: Get rid of the Checks collection and embed the checks for each website as an array in each Websites document.
This way you avoid any JOINs and you avoid inconsistencies, because it is impossible for a Check to exist without the Website it belongs to.
This, however, is only recommended when the checks array for each document stays relatively constant over time and doesn't grow constantly. Rapidly growing documents should be avoided in MongoDB, because everytime a document doubles its size, it is moved to a different location in the physical file it is stored in, which slows down write-operations. Also, MongoDB has a 16MB limit per document. This limit exists mostly to discourage growing documents.
You haven't said what a Check actually is in your application. When it is a list of tasks you perform periodically and only make occasional changes to, there would be nothing wrong with embedding. But when you collect the historical results of all checks you ever did, I would rather recommend to put each result(set?) in an own document to avoid document growth.

Is MongoDB a good fit for this?

In a system I'm building, it's essentially an issue tracking system, but with various issue templates. Some issue types will have different formats that others.
I was originally planning on using MySQL with a main issues table and an issues_meta table that contains key => value pairs. However, I'm thinking NoSQL (MongoDB) might be the better option.
Can MongoDB provide me with the ability to generate "standard"
reports, like # of issues by type, # of issues by type by month, # of
issues assigned per person, etc? I ask this because I've read a few
sources that said Mongo was bad at reporting.
I'm also planning on storing my audit logs in Mongo, since I want a single "table" for all actions (Modifications to any table). In Mongo I can store each field that was changed easily, since it is schemaless. Is this a bad idea?
Anything else I should know, and will Mongo work for what I want?
I think MongoDB will be a perfect match for that use case.
MongoDB collections are heterogeneous, meaning you can store documents with different fields in the same bag. So different reporting templates won't be a show stopper. You will be able to model a full issue with a single document.
MongoDB would be a good fit for logging too. You may be interested in capped collections.
Should you need to have relational association between documents, you can do have it too.
If you are using Ruby, I can recommend you Mongoid. It will make it easier. Also, it has support for versioning of documents.
MongoDB will definitely work (and you can use capped collections to automatically drop old records, if you want), but you should ask yourself, does it fit to this task well? For use case you've described it is better option to use Redis (simple and fast enough) or Riak (if you care a lot about your log data).

MongoDB - Denormalization / model opinion

I've been getting in to mongo, but coming from RDBMS background facing the probably obvious questions with regards to denormalisation and general data modelling.
If I have a document type with an array of sub docs, each sub doc has a status code.
In The relational world I would add a foreign key to the record, StatusId, simple.
In mongodb, would you denormalise the key pieces of data from the "status" e.g. Code and desc and hold objectid referencing another collection of proper status. I guess the next question is one of design, if the status doc is modified I'd then need to modified the denormalised data?
Another question on the same theme is how would you model a transaction table, say I have events and people, the events could be quite granular, say time sheets which over time may lead to many records. Based on what I've seen, this would seem like a good candidate for a child / sub array of docs, of course that could be indexed for speed.
Therefore is it possible to query / find just the sub array or part of it? And given the 16mb limit for doc size, and I just limited the transaction history of the person? Or should the transaction history be a separate collection with a onjid referencing the person?
Thanks for any input
Sam
Or should the transaction history be a separate collection with a onjid referencing the person?
Probably, I think this S/O question may help you understand why.
if the status doc is modified I'd then need to modified the denormalised data?
Yes this is standard trade-off in MongoDB. You will encounter this question a lot. You may need to leverage a Queue structure to ensure that data remains consistent across multiple collections.
Therefore is it possible to query / find just the sub array or part of it?
This is a tough one specific to MongoDB. With the basic query syntax, you have only limited support for dealing with arrays of objects. The new "Aggregration Framework" is actually much better here, but it's not available in a stable build.
All your "how to model this or that" can't really be answered, because good schema design depends on so many factors (access patters, hardware characteristics, is cluster used, etc).
if the status doc is modified I'd then need to modified the denormalised data?
Usually yes, that's the drawback of denormalisation. But sometimes you don't have to (some social network site stores user name with a photo tag and doesn't update it when user changes his name).
to query / find just the sub array or part of it?
It is not currently possible to fetch only a part of array (unless using map/reduce, of course).
And given the 4mb limit
Where did you get this from? It's 16mb at the moment.
While it's true that schema design does take into account many factors, the need to denormalize data usually comes up somewhere. I tend to take advantage of denormalization in my apps that use MongoDB because I feel it lends itself well storing denormalized data:
no additional column maintenance
support for hashes and arrays as field types (perfect for storing denormalized fields)
speedy, non-blocking writes make syncing data less expensive
document size growth only marginally affects performance up to limits (for the most part)
There are a few gems that help you manage denormalized data, including setting it up and keeping it in sync. If you're using Mongoid, you try mongoid_alize. DISCLAIMER: I am the author and maintainer of mongoid_alize.