HiveQL in MongoDB - mongodb

I have been studying NoSQL and Hadoop for Data Warehousing however I never worked with this technologies before and I would like to inquire if this following is possible to check if I got my understanding of this technologies right.
If I have my data stored in MongoDB, can I use Hadoop with Hive to make Hiveql queries directly to MongoDB and store the output of those queries as views back in MongoDB again, instead of the HDFS?
Also If I understand correctly most of the NoSQL databases don't support joins and aggregates, but it's possible to make them through map-reduce. If HiveQL queries are map-reduce jobs when I do a join in HiveQL would it already be automatically "joining" the MongoDB data in map-reduce for me, with no need to be worried about the lack of support for joins and aggregates in MongoDB?

MongoDB does have very good support for Aggregation kind of functions. There are no joins of-course. The way MongoDB Schema is usually designed is such that you would typically not need a join.
HiveQL operates on 'Tables' in HDFS. That's the default behavior.
But you have a MongoDB-Hadoop Connector: http://docs.mongodb.org/ecosystem/tools/hadoop/
which will let you query MongoDB data from within Hadoop.
To use Map Reduce you can do that with MongoDB itself (without Hadoop).
See this: http://docs.mongodb.org/manual/core/map-reduce/

Related

How to use the appropriate index while querying mongo using mongospark connectors via withPipeline feature?

I am trying to load huge amount of data from mongodb. Data size is in millions. So, it makes sense to pull this data using appropriate indexes and also query mongo in parallel. Thats why to do batch reads, I am using mongo spark.
How to use the appropriate index while querying mongo using mongospark connectors via withPipeline feature?
Also, I was exploring "com.mongodb.reactivestreams.client.MongoCollection". If possible, can someone throw some light on this?

What are the right practices of using Apache Solr with MongoDB?

I am using Apache Solr 6.3.0 and MongoDB 3.4 for advance text search features. I have successfully, synced mongodb with solr cores using mongo-connector 2.5 and solr doc manager.
I want to know the right way and practices to use solr with mongo and I have some issues that I need help on:
1). Now that my data is available both in mongo database and also indexed and stored in Solr cores, should I now query Solr all the time ? Or should I query solr for text search only and perform rest of the queries on mongo ?
2). Is there some way I could perform powerful search directly on mongo database using the indexing done by Solr ?
3). I have some collections that contain deeply nested json data and MongoDb supports them well. Solr indexes and stores such data in flattened form.But, I want to maintain the original nested json format in query response. Is this something I can achieve with Solr ?
Other suggestions about good practices of using solr with mongoDb will be extremely helpful.
If it makes sense to just query Solr, do that. If it makes sense to query Solr for certain data, do that. It depends on your use case, but if any query can be answered with the data in Solr, it's perfectly fine to use that for everything. That'll probably allow a more efficient use of your caches.
No, not that I know of.
Not really. Solr isn't well suited for nested JSON (even if you have parent/child documents, it's something you'll have to manually handle in every situation and will require special casing all over).
In those situations you can use Solr for querying, get the ids back and then retrieve the actual documents from mongo with their JSON structure intact. In that case you can leave most fields as non-stored in Solr.

Mongodb aggregation framework alternative for Cassandra

Because of external reasons we thinking to switch from MongoDB to Cassandra. Cassandra is scale good, write fast, read good. But there we really stuck is queries features. We using MongoDB queries features activelly and we also use mongo's aggregation features very activelly. So could you please point me to alternatives technology, which could compensate monodb rich queries and aggragation framework? Could it be Hadoop or Spark?
Apache Spark is most powerful cassandra complement. With Spark you can group, join, sort, filter, and whatever you imagine. There are some projects that built an abstraction layer in Spark over Cassandra and let you apply this operations.
Two commonly projects are:
Stratio Deep
Datastax Connector

MongoDB data modeling - separate or combine collections?

i have a question for the performance in meteorJS. Before i used meteorJS is always wrote my Applications in PHP and MySQL. In MySQL i always created a lot of tables with many connections betweens them.
For example:
Table User
id;login;password;email
Table User_Data
user_id;name;age
My questions is now how i have to design my MongoDB collections. Its nice that the collection are build like js objects so i dont have to predesign my tables and can always easy change the collumns. But is it better to combine all data to one collection or to several collections ?
For example:
Table User
_id;login;password;email;data:{name;age}
Is it better or worse for the performance ? Or is it the wrong pattern to design MongoDB Collections ?
The question mainly about MongoDB data modeling. What you'll learn applies to MongoDB used with Meteor or with anything else.
http://docs.mongodb.org/manual/data-modeling/ talks about data modeling with MongoDB and is a good introduction.
In your particular case, you can read more about how to avoid JOINs in MongoDB.

MongoDB with Hive DW

I am planning to build a DataWarehouse in MongoDB for the first time. It has been suggested to me that I should use Hadoop for map-reduce in case I need some more complex analyses of the datasets.
Having discovered Hive, I liked the idea of doing mapreduces through a language similar with SQL. But my doubt is, can I make HiveQL queries directly into mongodb without needing to build an Hive DW on top of Hadoop? Because in all use cases I found it seems to only work in the data found in the Hadoop HDFS.
You could use MongoDB Connector for Hadoop:
http://docs.mongodb.org/ecosystem/tools/hadoop/
Mongo DB on its own has Map-Reduce paradigm too:
http://docs.mongodb.org/manual/core/map-reduce/