When should you use JCR and when should you use JPA/RDBMS? - persistence

The Java JCR API defines a persistence mechanism which can be used to replace many of the traditional roles of a RDBMS. For example, JackRabbit - the JCR reference implementation - supports transactions, SQL queries, object-content mapping, etc.
You could pick one or the other, storing binary data in BLOBs/file system pointers in a RDBMS or textual data in JCR node properties. Does anyone have experience replacing tossing their DB in favor of JCR?
You could use both, perhaps storing binary data in the CR and textual data in the RDBMS, but then you have to express relationships between systems, keep them in sync, etc. You also begin facing the difficult question of, "Well it feels right to store this document title with the document's binary data, so I'll make it a node property... but then I have this document record in my DB so perhaps I should keep it there.."
How do you decide?

There is a paper by Bertil Chapuis on exactly that topic:
JCR or RDBMS? (blog post)
And there is also the NoSQL movement (described in this article for example, or here), which is an ongoing discussion that, hmmm, let's say, "highlights" the disadvantages of the relational model.

Related

Key value oriented database vs document oriented database

I have recently started learning NO SQL databases and I came across Key-Value oriented databases and Document oriented databases. Since they have a similar structure, aren't they saved and retrieved the exact same way? And if that is the case then why do we define them as separate types? Otherwise, how they are saved in the file system?
To get started it is better to pin point the least wrong vocabulary. What used to be called nosql is too broad in scope, and often there is no intersection feature-wise between two database that are dubbed nosql except for the fact that they somehow deal with "data". What program does not deal with data?! In the same spirit, I avoid the term Relational Database Management System (RDBMS). It is clear to most speakers and listeners that RDBMS is something among SQL Server, some kind of Oracle database, MySQL, PostgreSQL. It is fuzzy whether that includes SQLite, that is already an indicator, that "relational database" ain't the perfect word to describe the concept behind it. Even more so, what people usually call nosql never forbid relations. Even on top of "key-value" stores, one can build relations. In a Resource Description Framework database, the equivalent of SQL rows are called tuple, triple, quads and more generally and more simply: relations. Another example of relational database are database powered by datalog. So RDBMS and relational database is not a good word to describe the intended concepts, and when used by someone, only speak about the narrow view they have about the various paradigms that exists in the data(base) world.
In my opinion, it is better to speak of "SQL databases" that describe the databases that support a subset or superset of SQL programming language as defined by the ISO standard.
Then, the NoSQL wording makes sense: database that do not provide support for SQL programming language. In particular, that exclude Cassandra and Neo4J, that can be programmed with a language (respectivly CQL and Cypher / GQL) which surface syntax looks like SQL, but does not have the semantic of SQL (neither a superset, nor a subset of SQL). Remains Google BigQuery, which feels a lot like SQL, but I am not familiar enough with it to be able to draw a line.
Key-value store is also fuzzy. memcached, REDIS, foundationdb, wiredtiger, dbm, tokyo cabinet et. al are very different from each other and are used in verrrrrrrrrrry different use-cases.
Sorry, document-oriented database is not precise enough. Historically, they were two main databases, so called document database: ElasticSearch and MongoDB. And those yet-another-time, are very different software, and when used properly, do not solve the same problems.
You might have guessed it already, your question shows a lack of work, and as phrased, and even if I did not want to shave a yak regarding vocabulary related to databases, is too broad.
Since they have a similar structure,
No.
aren't they saved and retrieved the exact same way?
No.
And if that is the case then why do we define them as separate types?
Their programming interface, their deployment strategy and their internal structure, and intended use-cases are much different.
Otherwise, how they are saved in the file system?
That question alone is too broad, you need to ask a specific question at least explain your understanding of how one or more database work, and ask a question about where you want to go / what you want to understand. "How to go from point A-understanding (given), to point B-understanding (question)". In your question point A is absent, and point B is fuzzy or too-broad.
Moar:
First, make sure you have solid understanding of an SQL database, at the very least the SQL language (then dive into indices and at last fine-tuning). Without SQL knowledge, your are worthless on the job market. If you already have a good grasp of SQL, my recommendation is to forgo everything else but FoundationDB.
If you still want "benchmark" databases, first set a situation (real or imaginary) ie. a project that you know well, that requires a database. Try to fit several databases to solve the problems of that project.
Lastly, if you have a precise project in mind, try to answer the following questions, prior to asking another question on database-design:
What guarantees do you need. Question all the properties of ACID: Atomic, Consistent, Isolation, Durability. Look into BASE. You do not necessarily need ACID or BASE, but it is a good basis that is well documented to know where you want / need to go.
What is size of the data?
What is the shape of the data? Are they well defined types? Are they polymorphic types (heterogeneous shapes)?
Workload: Write-once then Read-only, mostly reads, mostly writes, a mix of both. Answer also the question how fast or slow can be writes or reads.
Querying: How queries look like: recursive / deep, columns or rows, or neighboor hood queries (like graphql and SQL without recursive queries do). Again what is the expected time to response.
Do not forgo to at least the review deployement and scaling strategies prior to commit to a particular solution.
On my side, I picked up foundationdb because it is the most versatile in those regards, even if at the moment it requires some code to be a drop-in replacement for all postgresql features.

Is mongodb the right choice for building health related web application?

I'm not sure if stack-overflow is a right platform to ask the title question. I'm in dilemma as to which front-end and back-end stack should i consider for developing a health related web application?
I heartily appreciate any suggestions or recommendations. Thanks.
You will need to have a look at your data, if it is relational, I would personally go for a SQL server such Microsoft SQL Server, MySQL or Postgres. If your data is non-relational you can go for something like Mongo.
Here is an image that explains how relational data and non-relational data work:
I'm not saying that MongoDB is bad, it all depends on your data and how you would like to structure your data. Obviously when you're working with healthcare data such as patient data there are certain laws you need to adhere to, especially in the United States with HIPPA, but I am sure almost every country has one of those.
The implications might be that you need to encrypt any data stored in the database, and that's one of the benefits of a relational database as most of them have either TDE (Transparent Data Encryption) or Encryption at Rest. Which means that your data is secured when in use and when not in use, respectively.
When it comes to the front-end you can look at Javascript frameworks such as Angular, Vue, React and then for your backend you can choose pretty much anything that you know well such as NodeJS or .NET Core or Go, pick your poison, each of them have their advantages and drawbacks so you will need to investigate your options before committing to one or the other.
It depends on your data structures. You can use MongoDB with dynamic schemas that eliminates the need for a predefined data structure. So you can use MongoDB when you have a dynamic dataset which is less relational. In the other hand, MongoDB is natively scalable. So you can store a large amount of data without much trouble.
Use a relational DB system when you have highly relational entities. SQL enables you to have complex transactions between entities with high reliability.
MongoDB/NoSQL
High write load
Unstable Schemas
Can handle big amount of data
High availability
SQL
Data structure fits for tables and row
Strict relationships among entities
Complex queries
Frequent updates in a large number of records

Why would you use NoSQL to build posts/comments over relational? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.

NoSQL-agnostic persistence layer

It seems to me that, at the end of the day, most NoSQL databases are at their core key/value stores, which means one should be able to build a layer which could be NoSQL database agnostic.
That layer would only use CRUD operations (put, set, delete), but would expose more advanced features, and you'd be able to switch with minimal effort the underlying DB whether it's Mongo, Redis, Cassandra, etc.
Would building something like this have value to many people, and does it already exist?
Thanks
NuoDB is an elastically scalable SQL/ACID database that uses a Key/Value model for storage. It runs on top of Amazon S3 today (as well as standard file systems) and could support any KV store in principle. For the moment it's access method is SQL, but the system could readily support other data access languages and methods if that is a common requirement.
Barry Morris, NuoDB Inc.
There's kundera and DataNucleus
UnQL means Unstructured Query Language. It's an open query language for JSON, semi-structured and document databases.
It's next to impossible to build such thing.
As a thought experiment, I suggest that you take, for example, Redis, MongoDB and Cassandra, and design an API of such layer.
These NoSQL solutions have drastically different characteristics and they serve different purposes. Trying to build a common API for them is like building a common API for SQL database, spreadsheet document, plain text file and gmail.
While you can certainly come up with something, it will completely pointless.
Different needs call for different tools.
PlayOrm is another solution that is built on cassandra but has a pluggable interface for hbase, mongodb, etc. etc. 20/30 years ago they said the same thing about RDBMS, but more and more the featuresets converged. I suspect you will see alot of that in nosql database's as well as they adopt each other's feature sets.
currently, they have vastly different feature sets but at the core there is a set of operations that is very very similar.
PlayOrm actually builds it's query language which works on any noSQL provider as well, so it's S-SQL scalable SQL can work with cassandra, hadoop, etc. etc.
later,
Dean

Using Multiple Database Types to Model Data in a single application

Does it make sense to break up the data model of an application into different database systems? For example, the application stores all user data and relationships in a graph database (ideal for storing relationships), while storing other data in a document database, such as CouchDB or MongoDB? This would require the user graph database to reference unique ids in the document databases and vice versa.
Is this over complicating the data model and application? Or is this using the best uses of both types of database systems for scaling your application?
It definitely can make sense and depends fully on the requirements of your application. If you can use other database systems for things in which they are really good at.
Take for example full text search. Of course you can do more or less complex full text searches with a relational database like MySql. But there are systems like e.g. Lucene/Solr which are optimized for such things and can search fast in millions of documents. So you could use these systems for their special task (here: make a nifty full text search), then you return the identifiers and maybe load the relational structured data from the RDBMS.
Or CouchDB. I use couchDB in some projects as a caching systems. In combination with a relational database. Of course I need to care about consistency, but it it's definitely worth the effort. It pushed performance in the projects a lot and decreases for example load on the server from 2 to 0.2. :)
Something like this is for instance called cross-store persistence. As you mentioned you would store certain data in your relational database, social relationships in a graphdb, user-generated data (documents) in a document-db and user provided multimedia files (pictures, audio, video) in a blob-store like S3.
It is mainly about looking at the use-cases and making sure that from wherever you need it you might access the "primary" or index key of each store (back and forth). You can encapsulate the actual lookup in your domain or dao layer.
Some frameworks like the Spring Data projects provide some initial kind of cross-store persistence out of the box, mostly integrating JPA with a different NOSQL datastore. For instance Spring Data Graph allows it to store your entities in JPA and add social graphs or other highly interconnected data as a secondary concern and leverage a graphdb for the typical traversal and other graph operations (e.g. ranking, suggestions etc.)
Another term for this is polyglot persistence.
Here are two contrary positions on the question:
Pro:
"Contrary to that, I’m a big fan of polyglot persistence. This simply means using the right storage backend for each of your usecases. For example file storages, SQL, graph databases, data ware houses, in-memory databases, network caches, NoSQL. Today there are mostly two storages used, files and SQL databases. Both are not optimal for every usecase."
http://codemonkeyism.com/nosql-polyglott-persistence/
Con:
"I don’t think I need to say that I’m a proponent of polyglot persistence. And that I believe in Unix tools philosophy. But while adding more components to your system, you should realize that such a system complexity is “exploding” and so will operational costs grow too (nb: do you remember why Twitter started to into using Cassandra?) . Not to mention that the more components your system has the more attention and care must be invested figuring out critical aspects like overall system availability, latency, throughput, and consistency."
http://nosql.mypopescu.com/post/1529816758/why-redis-and-memcached-cassandra-lucene