Is it possible to encounter n+1 problem in NoSQL given the data model is designed properly? - nosql

I think it is impossible to encounter such a problem. Because if the data model is designed properly, we keep related data together in NoSQL. Is it possible?

Related

difference between OODBMS, ORDBMS, Network Model, Hierarchical model

can any one describe the exact difference between dbms models.. i have gone through many papers but im still confused between these..
his approach to DBMS is more flexible than the hierarchical approach, but still the programmer has to know the physical representation of data to be able to access it, and accordingly applications using a network database has to be changed every time the structure of the database changes.

Neo4j instead of relational database

I am implementing a sinatra/rails based web portal that might eventually have few many:many relationships between tables/models. This is a one man team and part time but real world app.
I discussed my entity with someone and was advised to try neo4j. Coming from real 'non-sexy' enterprise world, my inclination is to use relational db until it stops scaling or becomes a nightmare because of sharding etc and then think about anything else.
HOWEVER,
I am using postgres for the first time in this project along with datamapper and its taking me time to get started very fast
I am just trying out few things and building more use cases so I consitently have to update my schema (prototyping idea and feedback from beta) . I wont have to do this in neo4j (except changing my queries)
Seems like its very easy to setup search using neo4j . But Postgres can do full text search as well.
Postgres recently announced support for json and javascript. Wondering if I should just stick with PG and invest more time learning PG (which has a good community) instead neo4j.
Looking for usecases where neo4j is better, especially at protyping/initial phase of a project. I understand if the website grows I might end up having multiple persistent technologies like s3, relational (PG), mongo etc.
Also it would be good to know how it plays out with Rails/Ruby ecosystem.
Update1:
I got a lot of good answers and seems like the right thing to do is stick with Postgres for now (especially since I deploy to heroku)
However the idea of being schema-less is tempting. Basically I am thinking of a approach where you don't define a datamodel until you have say 100-150 users and you have yourself figured out a good schema (business use cases) for your product , while you are just demoing the concept and getting feedback with limited signups. Then one can decide a schema and start with relational.
Would be nice to know if there are easy to use schema/less persistence option (based on ease to use/setup for new user) that might give up say scaling etc.
Graph databases should be considered if you have a really chaotic data model. They were needed to express highly complex relationships between entities. To do that, they store relationships at the data level whereas RDBMS use a declarative approach. Storing relationships only makes sense if these relationships are very different, otherwise you'll just end up duplicating data over and over, taking a lot of space for nothing.
To require such variety in relationships you'd have to handle huge amount of data. This is where graph databases shines because instand of doing tons of joins, they just pick a record and follow his relationships. To support my statement : you'll notice that every use cases on Neo4j's website are dealing with very complex data.
In brief, if you don't feel concerned with what I said above, I think you should use another technology. If this is just about scaling, schemalessness or starting fast a project, then look at other NoSQL solutions (more specifically, either column or document oriented databases). Otherwise you should stick with PostgreSQL. You could also, like you said, consider polyglot persistence,
About your update, you might consider hStore. I think it fits your requirements. It's a PostgreSQL module which also works on Heroku.
I don't think I agree that you should only use a graph database when your data model is very complex. I'm sure they could handle a simple data model/relationships as well.
If you have no prior experience with Neo4j or Postgres, then most likely both with take quite a bit of time to learn well.
Some things to keep in mind when picking:
It's not just about development against a database technology. You should consider deployment as well. How easy is it to deploy and scale Postgres/Neo4j?
Consider the community and tools around each technology. Is there a data mapper for Neo4j like there is for Postgres?
Consider that the data models are considerably different between the two. If you can already think relationally, then I'd probably stick with Postgres. If you go with Neo4j you're going to be making a lot of mistakes for several months with your data models.
Over time I've learned to keep it simple when I can. Postgres might be the boring choice compared to Neo4j, but boring doesn't keep you up at night. =)
Also I never see anyone mention it, but you should look at Riak (http://basho.com/riak/) too. It's a document database that also provides relationships (links) between objects. Not as mature as a graph database, but it can connect a few entities quickly.
The most appropriate choice depends on what problem you are trying to solve.
If you just have a few many to many tables, a relational database can be fine. In general, there is better OR-mapper support for relational databases, as they are much older and have a standardized interface and row-column structure. They also have been improved on for a long time, so they are stable and optimized for what they are doing.
A graph database is better if e.g. your problem is more about the connections between entities, especially if you need higher distance connections, like "detect cycles (of unspecified length)", some "what do friends-of-a-friend like". Things like that get unwieldy when restricted to SQL joins. A problem specific language like cypher in case of Neo4j makes that much more concise. On the downside, there are mappers between graph dbs and objects, but not for every framework and language under the sun.
I recently implemented a system prototype using neo4j and it was very useful to be able to talk about the structure and connections of our data and be able to model that one to one in the data storage. Also, adding other connections between data points was easy, neo4j being a schemaless storage. We ended up switching to mongodb due to troubles with write performance, but I don't think we could have finished the prototype with that in the same time.
Other NoSQL datastores like document based, column, key-value also cover specific usecases. Polyglot persistence is definitively something to look at, so keep your choice of backend reasonably separated from your business logic, to allow you to change your technology later if you learned something new.

recommendations for a dbms for an EAV system with mostly insert and select operations needs on .net stack

In the project I have been working on, the data modeling requirements are:
A system consisting of N number of clients with each having N number of events. An event is an entity with a required name and timestamp at which it occurs. Optionally, an event may have N number of properties (key/value pares) defining attributes that a client want to store with the particular instance of that event.
The system will have mostly:
inserts – events are logged but never updated.
selects – reports/actions will be generated/executed based on events and properties of any possible combinations.
The requirements reflect an entity-attribute-value (EAV) data model. After researching for sometimes, I feel that a relational dbms like Sql Server might not be a good fit for this. (correct me if I'm wrong!)
So I'm leaning toward NoSql option like MongoDb/CouchDb/RavenDb etc.
My questions are:
What is the best fit in available NoSql solutions keeping in view of my system's heavy insert/select needs?
I'm also open for relational option if these requirements can be translated into relational schema. Although I personally doubt this, but after reading performance DBA answers (like referenced here), I got curious. However, I couldn't figure out myself an optimal relational model for my requirements, perhaps the system being rather generic.
thanks!
MongoDB really shines when you write unstructured data to it (like your event). Also, it is able to sustain pretty heavy write load. However, it's not very good for reporting. At least, for reporting in the traditional sense.
So, if your reporting needs are simple, you might get away with some simple map-reduce jobs. Otherwise you can export data to a relational database (nightly job, for example) and report the hell out of it.
Such hybrid solution is pretty common (in my experience).

Recovery or failover strategies when NoSQL data becomes inconsistent

NoSQL emphasizes availability over consistency. Sometimes, this would cause the data in your NoSQL datastore to become inconsistent.
1) What are strategies to recover from such a situation?
2) What are strategies to prevent such a situation if possible?
3) What are the specific strategies for the popular NoSQL vendors, such as MongoDB, CouchDB, Cassandra, and HBase?
I think with asking point #3 you are mixing 2 different problems:
A. Database becomes unreadable i.e. its data files are corruputed and data is not accessible or partially accessible
B. Application data stored in NoSQL database becomes inconsistent (e.g. some key mistmatch happened) for application to use that and application starts to behave weirdly.
Problem A is a database maintainability issue and each database handles it in a specific way (e.g., MongoDB). And truly speaking it's not only NoSQL problem. But in general this kind of situation is rather an emergency and shouldn't happen if your database engine is solid and has good and enough hardware.
Problem B is poorly your application specific and I think the main strategy here is to make your application expect that data might be inconsistent at some point and try to work around that if possible. There can also be some background process that finds inconsistencies in data. In any case it purely depends on your data model.
EDIT: Updates on the data in NoSQL database are not transactional, but in general are atomic. So, if one tuple is updated by 2 different processes you will not get part of the tuple from one and another part from another, you will get the whole tuple from one of the processes which is considered "last" by the engine. But if your application updates several "dependent" tuples, then result for several updating threads is not predicatable, of course, because there is no transaction around those multiple updates. Unless, of course, all pocesses put same data into database. But if you have too many dependecies between different types of tuples/objects then I would say that your application is using NoSQL in a wrong way.
EDIT: There is also intresting discussion here.

Could document-based databases be used instead of RDBMS throughout the application in the future?

The more I read/use non-sql databases, the more I love it.
It's so for the OOP world and it's easy to use, like Rails for Frameworks.
I know the disadvantages. The major concern seems to be the no-transaction and no-concurrency part. Am I correct?
Are these the only features making it hard for developers to choose to use non-sql databases entirely, even for transactions?
If these features were fixed, would it be more OK to only use document-based databases for an application?
Cause now it seems like you still have to use a RDBMS for customer billing data while your content could be in document-based databases like MongoDB/CouchDB/Cassandra.
Could someone shed a light on this.
Yes of course you can build entire applications on non-relational data models. As a general rule though most people don't want to do that. The problem is that hierarchical/graph based data models (ie. any model that depends on navigational data structures) significantly increase the complexity and reduce the effectiveness of queries and data integrity in the database. The relational model was invented 40 years ago precisely to overcome those disadvantages inherent in navigation-based databases.
No.
They do not seem to be appropriate for fixed-schema, high-volume, mostly numerical data. Think data warehousing. Think ad-hoc analytical queries. They could take over all (or some of the) areas where RDBMS have not been a good fit in the first place (areas where people came up with XML databases, and object-oriented databases, and graph databases, and so on).
This is just like Excel not being able to replace Word (also admittedly, most Excel files I see these days are more presentation than spreadsheet). Different tools for different tasks.
In short, many NoSql solutions don't have cascading updates so if your application's data schema requires this, you will either update multiple documents (ie columns whatever) programatically, or stick with a sql based solution to handle this.
Concurrency is handled differently for different solutions.
I think this blog does a good job at explaining some of the trade-offs using a NoSQL solution
http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1
It depends also on the development of new, faster hardware.
You can distribute your database over multiple cheap computers (scaling out) if you use Cassandra and MongoDB. There will always be data sets that are too large for one computer because people collect and keep more data when it is possible to collect and keep more data.
However most data sets fit on one computer and can be stored in a SQL database. It is also possible to scale out a SQL database but foreign keys and complex transactions become slow when you distribute your data over multiple machines.
You have to make some tough choices when you distribute your data over multiple machines: http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html, you don't have to worry about the CAP theorem if all your data is on one machine.