I'm just getting started with MongoDB and the whole principle of NoSQL and I'm really enjoying the experience.
Has anyone got any suggestions of features that I should look into?
Has anyone come across any in-depth tutorials?
One thing that in particular I'm interested in is how to deal with what is annalagous to a change in schema. If a column is added to a table, all records would have this new column. Seemingly in Mongo, if a document ends up with a new property only new documents would have this property with an update being required to add this onto all other docuemts. Is there a better approach to this or am fixating on non-existent problems?
First of all, you need to realise that data-modeling is going to be totally different from a relational database. I would suggest that you follow a few schema design presentations that are available through http://www.10gen.com/presentations (unselect "featured" and select "view all" and search for "schema"). http://www.10gen.com/presentations/webinar/mongodb-schema-design-how-to-think-non-relational is a particularly good one.
There are lots of other "schema design" tutorials online as well, and I have to stress that simply converting your relational schema into MongoDB is not going to get the best out of MongoDB as it's a totally different approach.
Related
I started reading up on MongoDB (which got me very excited) as I understand one of their flaws is the self explanatory lack of relation. Especially when it comes to large or ever growing on both sides, many to many relationships.
And, as I read around the best way to avoid ever growing arrays inside some document is either try avoiding it by creating buckets of documents and then referencing the buckets (that does not guarantee total prevention of overgrowth). Or to create the both document referencing a third many to many document.
Since I could not found a final answer to this dilemma or at least one the wouldn't be a few years old, could someone explain if this is the dead end (in case the project uses a few big(ever growing) many to many relationships) and I should switch to RDBMS?
It depends on your usecase.
The main question is do you actually know why you want to use MongoDB in the first place? Hopefully, the reason is not because of the trend. RDBMS's are still relevant and have their own usecases. For some applications RDBMS is the way to go for some it isn't.
Now back to your original question about many-to-many relations. As you have already researched there are ways to model those relationships in MongoDB. So that doesn't disqualify MongoDB as a database on its own. For example, to you need transactionality or referential integrity checks when you insert or delete records for those many to many relationships? If the answer to that is yes, then MongoDB may not be the perfect fit for your case.
When i first started working on MongoDB this exact question crossed my mind and during searching for the answer i read something very interesting (hope i had the link to that for you, but unfortunately i dont).
think of a real world problem where you have a many to many relation that just keeps on growing ? there may be very exceptional cases of such kind.
lets say many students are registered for many courses. Now a course may be registered by 100 students but for sure a student wont register for 100 courses, so you can simply in the student collection keep a array field for registered course ID's..
let's deep dive and say there are a bunch of super brilliant students who actually registered for 100 courses in such scenario a array field may not be a viable solution. Then ? how about a collection that just have student_id and course_id. This even exists in the RDBMS world too.
so the workarounds available should be enough to find and design an optimized solution for probably the most complex of the scenarios.
First of all, I have extensive experience on Relational DBs but very beginner level knowledge of Document DB. I'm exploring MongoDB but my question is in general to Document DB.
AFA I know (I may be wrong), A Document DB is consisting of containers and containers contain same of different object structures. These object structures are defined such a way that filters and information can be applied in most optimum way. For ex. A is written by Authors. So object of Book will contain list of authors also. This way searching can be made faster and performance can be gained.
What is my problem?
I'm creating an application (yet haven't started as I'm confused here). It's relational DB is something like this....
The problem is I'm not able to design the Document DB structure for this requirement.
Please somebody help my in designing such database or can give me idea on "What approach one should select while designing such database?"
This comes down to answering the following questions:
What are your most common access patterns? It is helpful to think of your API methods, or top 5-10 queries to decide how to organize.
What are your transactional needs? Which of these entity types occur together in transactions and queries?
How often do they change? Should you embed or reference?
If you could include these details, we can help with more targeted suggestions.
http://azure.microsoft.com/documentation/articles/documentdb-modeling-data/ is also worth a read if you haven't already.
The main difference between DocumentDB and relational databases/MongoDB is that collections are more like shards/partitions and not tables.
i'm making a sistem that stores all medical , and healthy data from a person in a database , i've chosen mongodb to do the work but i'm new in mongodb modeling and i dont have an idea of whats the best way to do this.
Do i use a document for each pacient and insert subdocuments like this:
$evolution=array(); //subdocument
$record=array(); //subdocument
$prescriptions=array(); //subdocument
$exams=array(); //subdocument
$surgeries=array(); //subdocument
or do i create a new document for each one of these data?.
i know the limitation of document size that is 16 megabytes, but i don't know if the informations will reach the limmit.
The exact layout of your documents is highly dependent on the types of queries you need to make. Unfortunately without a detailed understanding of your use case it would be impossible to provide good advice about what is the best layout.
Depending on your use case it may be valid to have a document/patient with sub documents as you indicate. In some cases though it may be better to have a separate collection for each of the fields indicated. It all depends on how big those documents will be, what types of queries you will need to perform etc.
Some general advice:
Try to avoid queries that use multiple collections.
If your queries are getting difficult, you may have the wrong layout. Re-evaluate your layout any time you are in this situation.
Documents that constantly grow can create problems because Mongo constantly has to move them around in order to make room for the growth. If they will be growing quickly then reevaluate to see if there is a better layout.
While you can technically store different document layouts in the same collection in Mongo it is not generally considered a good practice. All documents in your collection should ideally follow some sort of schema even if that schema is not rigidly defined.
Field names matter. They take up space in Mongo so short field names are better if you expect to have a lot of data.
The best advice I can offer would be to start with what you think might work and see how it goes. If it gets awkward or difficult to get the information you need then reevaluate.
When it comes to NoSQL, there are bewildering number of choices to select a specific NoSQL database as is clear in the NoSQL wiki.
In my application I want to replace mysql with NOSQL alternative. In my application I have user table which has one to many relation with large number of other tables. Some of these tables are in turn related to yet other tables. Also I have a user connected to another user if they are friends.
I do not have documents to store, so this eliminates document oriented NoSQL databases.
I want very high performance.
The NOSQL database should work very well with Play Framework and scala language.
It should be open source and free.
So given above, what NoSQL database I should use?
I think you may be misunderstanding the nature of "document databases". As such, I would recommend MongoDB, which is a document database, but I think you'll like it.
MongoDB stores "documents" which are basically JSON records. The cool part is it understands the internals of the documents it stores. So given a document like this:
{
"name": "Gregg",
"fave-lang": "Scala",
"fave-colors": ["red", "blue"]
}
You can query on "fave-lang" or "fave-colors". You can even index on either of those fields, even the array "fave-colors", which would necessitate a many-to-many in relational land.
Play offers a MongoDB plugin which I have not used. You can also use the Casbah driver for MongoDB, which I have used a great deal and is excellent. The Rogue query DSL for MongoDB, written by FourSquare is also worth looking at if you like MongoDB.
MongoDB is extremely fast. In addition you will save yourself the hassle of writing schemas because any record can have any fields you want, and they are still searchable and indexable. Your data model will probably look much like it does now, with a users "collection" (like a table) and other collections with records referencing a user ID as needed. But if you need to add a field to one of your collections, you can do so at any time without worrying about the older records or data migration. There is technically no schema to MongoDB records, but you do end up organizing similar records into collections.
MongoDB is one of the most fun technologies I have happened to come across in the past few years. In that one happy Saturday I decided to check it out and within 15 minutes was productive and felt like I "got it". I routinely give a demo at work where I show people how to get started with MongoDB and Scala in 15 minutes and that includes installing MongoDB. Shameless plug if you're into web services, here's my blog post on getting started with MongoDB and Scalatra using Casbah: http://janxspirit.blogspot.com/2011/01/quick-webb-app-with-scala-mongodb.html
You should at the very least go to http://try.mongodb.org
That's what got me started.
Good luck!
At this point the answer is none, I'm afraid.
You can't just convert your relational model with joins to a key-value store design and expect it to be a 1:1 mapping. From what you said it seems that you do have joins, some of them recursive, i.e. referencing another row from the same table.
You might start by denormalizing your existing relational schema to move it closer to a design you wish to achieve. Then, you could see more easily if what you are trying to do can be done in a practical way, and which technology to choose. You may even choose to continue using MySQL. Just because you can have joins doesn't mean that you have to, which makes it possible to have a non-relational design in a relational DBMS like MySQL.
Also, keep in mind - non-relational databases were designed for scalability - not performance! If you don't have thousands of users and a server farm a traditional relational database may actually work better for you.
Hmm, You want very high performance of traversal and you use the word "friends". The first thing that comes to mind is Graph Databases. They are specifically made for this exact case.
Try Neo4j http://neo4j.org/
It's is free, open source, but also has commercial support and commercial licensing, has excellent documentation and can be accessed from many languages (REST interface).
It is written in java, so you have native libraries or you can embedd it into your java/scala app.
Regarding MongoDB or Cassendra, you now (Dec. 2016, 5 years late) try longevityframework.org.
Build your domain model using standard Scala idioms such as case classes, companion objects, options, and immutable collections. Tell us about the types in your model, and we provide the persistence.
See "More Longevity Awesomeness with Macro Annotations! " from John Sullivan.
He provides an example on GitHub.
If you've looked at longevity before, you will be amazed at how easy it has become to start persisting your domain objects. And the best part is that everything persistence related is tucked away in the annotations. Your domain classes are completely free of persistence concerns, expressing your domain model perfectly, and ready for use in all portions of your application.
I am highly interested in new NoSQL solutions to implement a search engine for a dating site. However because of having a lot of possibilities, I am little bid confused. My requirements,
1) 10 million people
2) More than 8 index (gender, online, city, name etc...)
3) Scalability
Thanks
You wanna go for either mangoDB or CouchDB.
CouchDB scales a little better while mangoDB syntax is a little more familiar.
also it depends what framework/language u use to create the dating site.
i personally would choose couchdb. (u should know javascript...a lot)
Apache Solr is a data store and fulltext search engine that might be useful to you. Solr is rarely mentioned as a NoSQL technology, but it shares many characteristics with document-oriented databases.
Keep in mind that you have to know what type of queries you're going to run before you can choose a NoSQL solution or design your database.
That's in contrast to a relational database, where you can design a general-purpose database based on the data relationships.
With that large of a dataset you would probably be well advised to look at search as separate from data store. As someone suggested, SOLR will index your data for you to search independently of your database. You have 2 problems, data store and search.
ElasticSearch http://www.elasticsearch.org/overview/
Can handle age difference, geographic location, tastes and dislikes, etc. Or a leaderboard system that depends on many variables.
You'd want something that has sophisticated search and aggregation support.
Elasticsearch is a good candidate. In addition to its ability to perform fuzzy, proximity searches (which is something you'd likely want), you'd also want to integrate some machine learning pipeline to constantly improve your matching 'accuracy'.