Because of external reasons we thinking to switch from MongoDB to Cassandra. Cassandra is scale good, write fast, read good. But there we really stuck is queries features. We using MongoDB queries features activelly and we also use mongo's aggregation features very activelly. So could you please point me to alternatives technology, which could compensate monodb rich queries and aggragation framework? Could it be Hadoop or Spark?
Apache Spark is most powerful cassandra complement. With Spark you can group, join, sort, filter, and whatever you imagine. There are some projects that built an abstraction layer in Spark over Cassandra and let you apply this operations.
Two commonly projects are:
Stratio Deep
Datastax Connector
Related
I have some question about Druid Query.
According to official document, there are two query language which are Druid query and Native query.
In my use, I feel more comfortable to use Druid Sql because I don't know about Native Query well and have a simple code.
But, I don't know the difference in performance between two query languages.
Is there a large difference in performance between them?
I saw a druid forum writing in 2019.07. In that document, Recommand using Druid SQL in Druid 0.15.0 or after. (for now, latest Druid version is 0.23.0)
Which are better to use Druid Query or Native Query??
But, I don't know the difference in performance between two query
languages.
Is there a large difference in performance between them?
Every Druid SQL query gets translated into a native query. According to the docs, there's a slight overhead in translating the query from SQL to native, but that's
the only minor performance penalty to using Druid SQL compared to
native queries.
To specifically answer your question of
Which are better to use Druid Query or Native Query??
continue using what you are comfortable with.
If you'd like to learn more about best practices and native queries, the linked doc goes into quite a bit of detail.
I am new to both Scala and NoSQL databases. I would like to know is if there exist ORM tools that will map my Scala objects to a NoSQL database, as with RDBMS solutions?
There is a library called Kundera, based on JPA, that provides ORM for a number of NoSQL databases of different flavours, including HBase, Cassandra, MongoDB, CouchDB and Neo4J. See https://github.com/impetus-opensource/Kundera which appears to be under active development. Note that the nature of some of the problems that have given rise to NoSQL in the first place, and some of the tradeoffs inherent in the CAP theorem between availability, consistency and partition tolerance, make ORM challenging in a NoSQL environment. There is some good discussion of some of these issues here: http://architects.dzone.com/articles/sqlifying-nosql-%E2%80%93-are-orm
As you mentioned Scala, specifically, here is a very interesting article from Foursquare about how they built DSL with Scala for interacting with MongoDB. http://engineering.foursquare.com/2011/01/21/rogue-a-type-safe-scala-dsl-for-querying-mongodb/
I have been studying NoSQL and Hadoop for Data Warehousing however I never worked with this technologies before and I would like to inquire if this following is possible to check if I got my understanding of this technologies right.
If I have my data stored in MongoDB, can I use Hadoop with Hive to make Hiveql queries directly to MongoDB and store the output of those queries as views back in MongoDB again, instead of the HDFS?
Also If I understand correctly most of the NoSQL databases don't support joins and aggregates, but it's possible to make them through map-reduce. If HiveQL queries are map-reduce jobs when I do a join in HiveQL would it already be automatically "joining" the MongoDB data in map-reduce for me, with no need to be worried about the lack of support for joins and aggregates in MongoDB?
MongoDB does have very good support for Aggregation kind of functions. There are no joins of-course. The way MongoDB Schema is usually designed is such that you would typically not need a join.
HiveQL operates on 'Tables' in HDFS. That's the default behavior.
But you have a MongoDB-Hadoop Connector: http://docs.mongodb.org/ecosystem/tools/hadoop/
which will let you query MongoDB data from within Hadoop.
To use Map Reduce you can do that with MongoDB itself (without Hadoop).
See this: http://docs.mongodb.org/manual/core/map-reduce/
I am planning to build a DataWarehouse in MongoDB for the first time. It has been suggested to me that I should use Hadoop for map-reduce in case I need some more complex analyses of the datasets.
Having discovered Hive, I liked the idea of doing mapreduces through a language similar with SQL. But my doubt is, can I make HiveQL queries directly into mongodb without needing to build an Hive DW on top of Hadoop? Because in all use cases I found it seems to only work in the data found in the Hadoop HDFS.
You could use MongoDB Connector for Hadoop:
http://docs.mongodb.org/ecosystem/tools/hadoop/
Mongo DB on its own has Map-Reduce paradigm too:
http://docs.mongodb.org/manual/core/map-reduce/
I am wondering whether map reduce job in mongodb has anything to do with Hadoop. Mapreduce in Mongodb is a standalone and no dependency on any hadoop installation? If what I am guessing is correct, then the map reduce syntax is the same between the two or it just means that mongodb is supporting its own map reduce (with different syntax)?
MongoDB has its own MapReduce. You write map/reduce/finalize functions in javascript (as opposed to Hadoop and Java).
They say that it is also possible to use Hadoop on top of MongoDB, but I didn't try that yet.
Map Reduce is not fastest interface in mongodb to do ad hoc queries it is more designed for background jobs, creating reports etc. I wrote some time ago how to do it on my blog
http://no-fucking-idea.com/blog/2012/04/01/using-map-reduce-with-mongodb/
MongoDB borrowed the idea of mapreduce, which btw predates hadoop, it was used at google on their infrastructure (and they also borrowed the idea from functional programming languages).
The "syntax" is also totally different (note: there are several APIs for hadoop for different languages so it's impossible to make a comparison).