Java ORMs on NoSQL DB like HBase - nosql

I have recently started getting familiarized with NoSQL (HBase). I am definitely a noob.
I was investigating about ORMs and high level clients which can be used on HBase and came across a few.
Some ORM libraries like Kundera are providing SQL like data query functionality. I am finding this a little counter intuitive.
Can any one help me understand why we would again need SQL like querying if the whole objective was to move away from it?
Also can anyone comment on your experiences with ORMs for HBase? I looked at a few of them from http://wiki.apache.org/hadoop/SupportingProjects and started looking at Kundera.
Another related question - Does data query with Kundera run map reduce jobs internally?

kundera or Spring data might provide user friendly ORM layer over NoSQL databases, but the underlying entity model still has to be NoSQL friendly. This means that NoSQL users should not blindly follow RDBMS modeling strategies but design ORM entities in such a way so that all NoSQL capabilities can be used.
As a thumb rule, the kundera ORM entities should be designed using query-first strategy where first the queries need to defined so as to create primary keys and also ensuring that relationship model is used as minimal as possible. Querying on random columns and full scans should be avoided and so data might have to be replicated across entities for reducing multiple entity look ups. Also, transactions management needs to be planned. FYI, kundera does not support transactions(beyond single row TX supported by Hbase/Cassandra).

Reason for using Kundera:
1) If looking for SQL like support over HBase. As it is build on top of HBase native API, so it simply transforms these SQL queries in to corresponding GET or PUT method calls.
2) Currently it support HBase-0.20.6 only. Kundera-2.0.6 will enable support for HBase 0-90.x versions.
3) Kundera does not do sometihng out of the box to provide map reduce over SQL like queries. However support for such thing will be provided in Kundera-2.0.6 by enabling support for Hive native queries only!
It is totally JPA compliant, so no need to learn something new. It simply hides complexity at developer level with very minimal effort.
SQL like querying is for developement ease, quick developement, less error prone and reusability ofcourse!
-Vivek

Related

Refactoring a NoSql Database Schema

I am considering Nosql for a project we are about to start (see https://stackoverflow.com/a/20588134/1838739). In regard to modifying NoSql i am confused about this statement "While the network/graph topology is faster than the Set topology the graph data model once implemented is almost impossible to change." Is this true for Nosql? How flexible is it to modify if changes are needed in a production implemented db.
Graph database are very open for change since the relationships are part of the data and not part of a schema.
To get started I recommend to read a little bit on Cypher (http://www.neo4j.org/learn/cypher or http://docs.neo4j.org/chunked/stable/cypher-query-lang.html). You'll see quickly how easy graphs can be changed.

play framework anorm for different database

I am new to Scala as well as play framework with Scala 2.0. I like the idea of writing the SQL code myself and have full control rather than depend on ORM tool. But does Anorm SQL work across different database vendors like MySQL and Oracle? Since I am writing an application which should be capable to work with any Relational database, my requirement is to write SQL which should work across databases since my application should work with vendor database.
Some vendor might have Oracle and some might have MySQL. So my code should be DB agnostic.Is this possible in Scala as I know that quires which run on mysql will not run on Oracle.
Thanks in Advance,
Pradeep
Short answer: NO.
Long answer: Anorm is just a library for dispatching your SQL queries to the database through JDBC, retrieving the results and delivering them to you. It does not understand the differences between different databases because it relies on JDBC for connection handling, and on you for writing queries.
You either have to handle different DB engines yourself or have an ORM handle that for you.
PS: Unless you really need to have a DB agnostic application (and fully understand its implications), I'd suggest you simply target 2-3 popular engines and avoid the future complications.

Advantage of ontology over RDBMS in an offline application

Is there any advantage of using ontology based database (linked data) instead of RDBMS in an offline application? Does linked data provide more relations and reasoning capabilities using SPARQL than SQL? Can I not achieve the same using joins in SQL?
Suppose I am storing the details of various mobile phones. This database should answer user centric queries like
1.list of all mobiles with good (quantified) touch interface
2.mobiles similar to Samsung Galaxy s4
Can I not retrieve efficient results using RDBMS itself with joins? If the answer is yes, then would the performance of answering these queries between the two database models be of argument here? Basically what is the edge that I get get by using ontologies in such scenarios?
The main advantage of using ontologies is the formalized semantics. This way a reasoner can automatically infer new statements without writing specific code.
But it's true, that you can model every Linked Data also in RDBMS and the other way around. The same is true for querying with SPARQL or SQL. You can achieve the same results. SPARQL has some advantages if your SQL query requires multiple joins. This can be expressed in a far more meaningful way in SPARQL.
The disadvantage of ontology based databases is nowadays still a lack of performance in comparison to RDBMS.

Migrating an ORM based Spring project (Hibernate / JPA) to noSQL (MongoDB/Cassandra/CauchDB)

I am fairly new to the noSQL world, and although I understand the benefits of performance and "cloud" friendliness, it seems the RDBMS world is much simpler and standard and limited to fewer players
I worked with SQL Server, Oracle, DB2, Sybase, Terradata, MySQL and others, and it seems they have in common much more (in terms of Query language, Indexing, ACID, etc) than the noSQL family.
My question is mainly this
Is it at all a valid concept to move an existing Spring/Java EE+JPA app to a noSQL storage? or it will require a complete re-architecture of the system beyond the medium of storage?
Hoping it's a valid goal, are there any migration paths that were case studied as best practices?
Is there an equivalent to the concept in "noSQL" that is comparable to ORM for RDBMS? e.g. any layer of separating the storage implementation from concept (I know GAE BigTable supports JDO and JPA but only partially, is there a newer JSR for a more noSQL friendly JPA?)
Are there any attempts to standardize "noSQL" the same way RDBMS are (query language,
API)
Is "noSQL" a too wide term? Should I modify the question per implementation (KV/Document)
Try Kundera : https://github.com/impetus-opensource/Kundera. it is an open source JPA2.0 compliant solution. You can also join http://groups.google.com/group/kundera-discuss/subscribe for further discussion.
-Vivek
DataNucleus allows JPA persistence to RDBMS, MongoDB, HBase and various others. That is one way you can tackle the problem, giving you a start point for use of your app with other datastores. From there you could modify class hierarchies to get around some of the problems that these other datastores bring. Use of JPA with other datastores is not part of any JSR and never will be, since JPA is designed around RDBMS solely. JDO on the other hand is already a standard for all datastores, as it was designed to be (also supported by DataNucleus)
EclipseLink 2.4 supports JPA with MongoDB and other NoSQL data sources.
http://java-persistence-performance.blogspot.com/2012/04/eclipselink-jpa-supports-mongodb.html
PlayOrm is another solution with it's Scalable-SQL and is JPA-like but breaks from JPA in that it supports noSQL patterns that JPA can't support.

Jira using enterprise architecture by OfBiz

The 'open for business project' is an enterprise framework.
It so happens Jira uses this, and I was pretty shocked at how much work is involved to pull data for a particular entity (say a issue/bug in Jira's case).
Imagine getting a list of all the issues, it has to first get all the columns (or properties) to display for the table column, then pull in the values for each. For an enterprise solution this sounds like a sub-optimal solution (but I understand how it adds flexibility).
You can read how its used in Jira practically: http://confluence.atlassian.com/display/JIRA/Database+Schema
main site: http://ofbiz.apache.org/docs/entity.html
I'm just confused as to how to list all issues. Meaning, what would the sql queries look like?
Its one thing to pull a single issue, but to get a list you have to do allot of work to get the values. I don't think it can be done with a singl query using joins now can it?
(Disclaimer: I work for Atlassian, but I'm not on the JIRA team)
OFBiz EE is just an abstraction layer for moving between database tables and fancy maps called GenericValues. It has no influence over the database schema itself. Your real issue here seems to be that JIRA's database schema is complicated.
The reason it's complicated is because it has to support a data model where an issue is an arbitrary collection of arbitrary fields, at some point in an arbitrary workflow. The fields themselves can be defined by third-party plugins. It's very hard to produce a friendly-looking RDBMS schema to fit this kind of dynamic data model, and JIRA tries as best it can.
You can get information directly out of the database if you want, the database schema is documented in the link above, or you can go up a layer or twelve of abstraction and talk through one of JIRAs many APIs.
A good place to ask questions about getting data out of JIRA is the forums on http://forums.atlassian.com/
The entity engine used in jira is a database abstraction layer ( with a very rich and easy to use API ) that connects your application with one or more datasources. But the databases are still relational, so you can use SQL if you want to. About the issue info you want to pull I'd say it wouldn't be very easy only with joins. I'd recommend you use the scripting language of the RDBMS ( i.e. PL/SQL, pgPL/SQL ).
SELECT * FROM jiraissue;