Something about MongoDB - mongodb

I'm new to MongoDB, can anyone explain how it could be used in efficiently in enterprise applications, so as to give good performance (using joins, indexing etc.)
And perhaps also point me to any MongoDB production applications on the web.

For a good introduction to MongoDB, check out The Little MongoDB Book. Here's a list of sites currently using MongoDB in production.

You talk about Joins and Indexes. It seems your head is still in the RDBMS world. NoSQL and Mongo are not just different Relational Databases there a completely different way of managing and thinking of Data. You need to think of your data schema in terms of Structured Objects rather than rows.

Sounds like you need to start from the beginning. MongoDB.org has a lot of the info you're asking for already available. Specifically, read their page on use cases, and the page on production deployments.
A more specific question would receive more comprehensive answers and fewer downvotes.

Related

Using PostgreSQL or PostgreSQL + MongoDB?

I'm currently planning a social-media application - especially the backend.
Basically I have all the social aspects for which I want to use SQL (PostgreSQL I guess) but I also have geolocations organized in lists (so many-to-one) which will propably make out the biggest ammount of data. I know that PostgreSQL has modules for GIS capabilities and my initial thought was to just use PostgreSQL for everything, just for the sake of simplicity and because performance of Geolocation searches should be around the same for both systems, if not even in favor of PostgreSQL. I can also use JSON Type in PostgreSQL so it basically has the most obvious advantages of MongoDB covered.
On the other hand I'm affraid of scalability as the geolocations are going to be the biggest chunk of data and the tables are propably going have heaps of rows.
So my thought now is to implement geolocations in MongoDB with its easy scalability, easy to use geolocation search and embedd e.g Comments/Likes for a geolocation directly into the document, which would make the geolocation reads/searches way easier but then again I had to combine this data with social data from SQL, e.g fetch all users that commented a geolocation and get their profile info from PostgreSQL and 'manually' combine it. Even though parts of this could be done on frontend saving me a lot of resources.
I'm not sure how good this idea performs and if I'm really doing myself a favor there.
tldr: Use PostgreSQL.
Long answer:
You are trying to pre-optimize for a problem you don't even know if you will have. You don't know how many geolocations you will have, what the usage behaviors will be of your users and you probably don't even have any users yet.
I've used MongoDB before and migrated to PostgreSQL. There are many, many features and benefits to using a 'real' database for storing highly structured data. I suggest googling around for 'PostgreSQL vs X' articles, but the overall consensus that I've found is that PGSQL is extremely mature, reliable, performant and supported.
From my personal experience using Mongo then switching to PGSQL, I will never use Mongo again unless PGSQL (or another full-fledged SQL database) is completely falling over and I've spent months fixing it. Even then I'd take a hard look at other NoSQL databases too. PGSQL has so many amazing features and powerful tools that make it a joy to use.
For the seemingly few things you think you need Mongo for, PGSQL can do, and do just as well or better. It has native JSON types with indexes, geo support, full text indexing, etc. PGSQL has been around longer and has more support (useful for debugging, performance tuning, etc).
Regardless of which technologies you are thinking of using, you can't make any sort of informed decision if you don't:
Test with large data sets
and
Know your usage patterns and data volumes
So at this point I'd pick the more matured and powerful tool and setup monitoring for it. Watch the usage and performance of PGSQL, see how it holds up. Research best practices for PGSQL. Get to know it, learn it, dive in deep. When it comes to scaling individual services, each one is somewhat unique and will not fit a simple "Should I use X or Y?" question.
Good luck!

Clarification of uses of different NoSQL databases

I understand it may seem a redundant question, but I hope it will clarify all my and other users doubts.
Here are the noSQL I am talking about can someone explain me:
The best use cases (When I should use it)
The pros and the cons (Limitations included)
Their added value (why it is better, possibly a mathematical/scientific explanation)
of MongoDB, Redis, CouchDB, Hadoop
Thanks
g
MongoDb and CouchDb are not key-value storages, but document-stores.
The best way to clarify doubts - reading technical documentation and overview =)
In short - MongoDb and CouchDb are fast enough, reliable key-value storages, that persist
data to disc. MongoDb works over custom TCP/IP protocol, CouchDb use REST approach over HTTP
Redis is another kind of ket/value storage, that stores all data in memory, all writes and reads go directly to memory. This approach has some drawbacks and benefits as well. It persists changes at disc too
Hadoop is not just a key/value storage. It's an engine for the distributed processing of large data. If you are going to write Google you can use Hadoop.
In my opinion, if you are not going to build something specific (i think you won't), go ahead and use MongoDB.

NoSQL Document DB

I am in a need for a document-oriented database for a project I am working on. I basically have two things I need: Full ACID support and the ability to have references. Scalability is not a major issue since the number of total users is at most 300.
I know MongoDB supports references between documents and CouchDB supports ACID but I have not found one that has both.
I am really trying to avoid implementing either (ACID, References) in the application layer. The obvious fallback is RDBMS with some tree structure implementation which I am also trying to avoid.
Any suggestions?
THANX
You require ACID and full references, and CouchDB is not good for that.
You do not require scalability either. My guess is a database that is well-known wouldn't hurt either.
For those reasons, a relational database sounds appropriate.
Checkout RavenDB - it has both ACID and transaction support AND it supports the notion of relations between documents via Includes and Live Projections. Denormalization can probably come in handy, too.
Don't use an RDBMS if your business logic says it doesn't like it.
You mentioned the constraints - you mentioned correctly what CouchDB/MongoDB give you.
So based on those facts: use your fallback.

MongoDB versus CouchDB... And any other "major players"

What are the major differences between MongoDB and CouchDB, and are there any other major NO-SQL database-servers out there worth mentioning?
I know that CERN uses CouchDB somewhere in their LHC back-end; huge stamp of approval. What are MongoDB - and any other major servers' - references?
Update
One of the major selling points of CouchDB, to me, is the REST-based API and seamless JavaScript integration using JSON as a data-wrapper. Is this possible with any of the other NO-SQL databases mentioned?
There are many more differences, but some quick points:
CouchDB has MVCC (Multi Version Concurrency Control) - each time a document is updated, a NEW version of it is created. Whereas MongoDB is update-in-place.
CouchDB has support for multi-master, so you can write to any server. MongoDB only has 1 server active for write (master-slave) - However: I this this may have changed in the latest release (1.6) so MongoDB may now support multiple servers for writes
To see who's using MongoDB see here (e.g. foursquare, bit.ly, sourceforge....)
To see who's using CouchDB see here.
The most notable other NoSQL database is Cassandra (facebook, twitter)
Then you have HBase, HyperTable, RavenDB, SimpleDB, and more still...
Welcome to some new ground #AdaTheDev covered most of the major ones. There's also Project Voldemort, Tokyo Cabinet/Tyrant, and a whole bunch of wrappers around all of these things. So people are also building MemcacheDB (memcache with a persistence layer).
MongoDB has several hooks to support "REST" APIs (check out "Sleepy Mongoose" and Node.js support). MongoDB and CouchDB have different ways of handling map-reduces (though they are somewhat similar). MongoDB does not have MVCC, but the two systems really have different ways of storing data each with their own set of trade-offs.
MongoDB uses language-specific drivers where CouchDB uses REST (performance trade-off).
For more detailed comparison look here.
MongoDB is probably a little easier for a relational developer to grasp since it uses drivers and has better support for ad hoc queries. CouchDB has very little in common with the old relational ways of doing things.
Both deal with sharding and replication differently.
Having said that, I believe both are conceptually similar enough that it often boils down to personal preference. They are all fun to code with. In fact, we evaluated both for an internal project and went back and forth with our decision.

Is MongoDB reliable? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am developing a dating website and I am thinking of using a NoSQL database to store the profiles etc. I am currenly looking into the MongoDB and so far I am very pleased. The only worry is that I read on different websites that MongoDB is unreliable and not good.
I looked into the NoSQL alternatives and found no one that fully meets my specific criterias:
Easy to learn and use.
Fully compatible with PHP out of the box.
Fast and well documented.
What do you think, am I doing the right thing to go with MongoDB or is it a waste of time?
Thankful for all input in the matter!
I researched MongoDB for my social service startup and it is definitely worth considering. MongoDB has a powerful feature set that makes it a realistic and strong alternative to RDBMS solutions.
Amongst them:
Document Database: Most of your data is embedded in a document, so in order to get the data about a person, you don't have to join several tables. Thus, better performance for many use cases.
Strong Query Language: Despite not being a RDBMS, MongoDB has a very strong query language that allows you to get something very specific or very general from a document or documents. The DB is queried using javascript so you can do many more things beside querying (e.g. functions, calculations).
Sharding & Replication: Sharding allows you application to scale horizontally rather than vertically. In other words, more small servers instead of one huge server. And replication gives you fail-over safety in several configurations (e.g. master/slave).
Powerful Indexing: I originally got interested in MongoDB because it allows geo-spatial indexing out of the box but it has many other indexing configurations as well.
Cross-Platform: MongoDB has many drivers.
As for the documentation, there is not deluge but that is because this project only started in 2009; soon there will be plenty more. However there is enough to get started with your project. In addition to that you can check out Kyle Banker's MongoDB in Action, great resource.
Lastly, I had experience only with RDMBS prior to MongoDB, didn't know javascript or json and still found it to be very simple and elegant.
Consider this related question on MongoDB and CouchDB - Fit for Production?
MongoDB has a showcase of Production Deployments as well. Be sure to analyze the uses of MongoDB rather than the size of the company.
Any software can be reliable or unreliable. MongoDB has replica sets, which give you hardware failover capabilities. You can take backups on a regular basis, which gives you a recovery interval, and you get sharding which can give you some modicum of redundancy, especially when combined with replica sets.
The issue isn't whether or not the technology is reliable, the issue is whether or not you have a well-defined backup and recovery plan that suits your platform of choice.
If MongoDB suits your needs, you're making the right choice. Just make sure to investigate what you can do to increase your reliability.
If its good enough for Foursquare it's most likely good enough for you.
I come from a RDBMS background (12 years) and have spent the last 6 months looking at NoSQL options. For your scenario, MongoDB, sounds like a good choice. What I am hearing from those who have worked with MongoDB in production for some time, is that you should follow these best practices:
Keep key size small
Evaluate (and possibly add) indexes to speed up queries
Pay attention to schema (I know seems strange for a 'schemaless' database, but I have heard this several times
Use replica sets
Here's a video of a best-practices talk from the MongoDB LA User Group that I find useful
10gen, the company behind MongoDB provide official PHP driver.
As Jeremiah says, they implement replica sets in the last version (1.6.0) and they have already debug it (1.6.1 and next version in some weeks: 1.6.2).
Mostover, the free support by the company and communities is very fast and efficient (by 'free' I mean question on the google groups: http://groups.google.com/group/mongodb-user?pli=1)
Well, another points about reliability :
The community reacts extremely fast if you meet any critical issue.
You need to worry about your expectations about "reliability" : Do you need a guarantee on storing your data safely, never getting corrupted ?
In this case, you will have to compare the cost of buying reliable hardware and deploying MongoDB replica-sets
Do you mean having a highly available service ?
MongoDB has some youth issues, I can't say the opposite. But this is definitely NOT a waste of time, and perhaps a long-time solution.
That depends on what you need reliability for. Mongo is very reliable for reading - it have strong availability and sharding features.
OTOH, Mongo writes are not reliable. While most go through, it is never guaranteed that update succeeds or not and you have to manually query database to check if it did.
Thus, Mongo is best used when you have more reads than writes you absolutely need to succeed.
MongoDB would be a good choice.We evaluated and started using MongoDB for our business usecases. MongoDB is giving us better performance than Oracle and also it is easy to scale horizontally.