MongoDB -- Query Performance through Java class - mongodb

I am new to the world of NoSQL databases and MongoDB.
I am trying to access the query execution time in Mongodb. I plan to do CURD operations with a set of 1000,10000 and 100000 records.I would also like to run few other random queries. I want to execute these queries through a java class. Can any one kindly suggest me the best way to get the response times from the java class file. I understand Database Profiler for mongodb is a good way, but would like to execute profiler commands through java.Could any one suggest how to execute Profiler commands from java class for each query.
P.S: Kindly suggest if there is any better way. Any suggestion would be of a great help.

Using the Mongo Java client you would invoke find() in the usual way and then append a query modifier like so:
MongoCollection<...> collection = ...
FindIterable<...> results = collection.find(query).modifiers(new Document(QueryOperators.EXPLAIN, true));
This will return you Mongo's profiler document for that query.
Plenty of details here on the format, content etc of that document.
If you are interested in profiling you might also need to hint (i.e. force Mongo to use a specific index), this is done in much the same way, for example:
MongoCollection<...> collection = ...
FindIterable<...> results = collection.find().modifiers(new Document(QueryOperators.HINT, indexName));

Related

Can MongoDB MapReduce functions behave like CouchDB views?

The nice thing about CouchDB is that the view results are updated incrementally as the database changes. Is there a way to accomplish the same thing with MongoDB? I've been unable to locate any specifics in the documentation. Thanks
No, MongoDB does not support dynamically-updated views in the way that CouchDB does. MapReduce results are written to a collection on disk and/or returned in the query; they are not updated unless you run the job again to do it.

How can I discover a mongo database's structure

I have a Mongo database that I did not create or architect, is there a good way to introspect the db or print out what the structure is to start to get a handle on what types of data are being stored, how the data types are nested, etc?
Just query the database by running the following commands in the mongo shell:
use mydb //this switches to the database you want to query
show collections //this command will list all collections in the database
db.collectionName.find().pretty() //this will show all documents in the database in a readable format; do the same for each collection in the database
You should then be able to examine the document structure.
There is actually a tool to help you out here called Variety:
http://blog.mongodb.org/post/21923016898/meet-variety-a-schema-analyzer-for-mongodb
You can view the Github repo for it here: https://github.com/variety/variety
I should probably warn you that:
It uses MR to accomplish its tasks
It uses certain other queries that could bring a production set-up to a near halt in terms of performance.
As such I recommend you run this on a development server or a hidden node of a replica or something.
Depending on the size and depth of your documents it may take a very long time to understand the rough structure of your database through this but it will eventually give one.
This will print name and its type
var schematodo = db.collection_name.findOne()
for (var key in schematodo) { print (key, typeof key) ; }
I would recommend limiting the result set rather than issuing an unrestricted find command.
use mydb
db.collectionName.find().limit(10)
var z = db.collectionName.find().limit(10)
Object.keys(z[0])
Object.keys(z[1])
This will help you being to understand your database structure or lack thereof.
This is an open-source tool that I, along with my friend, have created - https://pypi.python.org/pypi/mongoschema/
It is a Python library with a pretty simple usage. You can try it out (even contribute).
One option is to use the Mongoeye. It is open-source tool similar to the Variety.
The difference is that Mongoeye is a stand-alone program (Mongo Shell is not required) and has more features (histograms, most frequent values, etc.).
https://github.com/mongoeye/mongoeye
Few days ago I found GUI client MongoDB Compass with some nice visualizations. See the product overview. It comes directly from the mongodb people and according to their doc:
MongoDB Compass is designed to allow users to easily analyze and understand the contents of their data collections within MongoDB...
You may've asked about validation schema. Here's the answer how to get it:
How to retrieve MongoDb collection validator rules?
Use Mongo Compass
which does a sample as explained here
Which does a random sample of 1000 documents to get you the schema - it could miss something but it's the only rational option if you database is several GBs.
Visualisation
The schema then can be exported as JSON
Documentation
You can use MongoDB's tool mongodump. On running it, a dump folder is created in the directory from which you executed mongodump. In that folder, there are multiple folders that correspond to the databases in MongDB, and there are subfolders that correspond to the collections, and files that correspond to the documents.
This method is the best I know of, as you can also make out the schema of empty collections.

How to store query output in temp db?

I am really new to the programming but I am studying it. I have one problem which I don't know how to solve.
I have collection of docs in mongoDB and I'm using Elasticsearch to query the fields. The problem is I want to store the output of search back in mongoDB but in different DB. I know that I have to create temporary DB which has to be updated with every search result. But how to do this? Or give me documentation to read so I could learn it. I will really appreciate your help!
Mongo does not natively support "temp" collections.
A typical thing to do here is to not actually write the entire results output to another DB since that would be utterly pointless since Elasticsearch does its own caching as such you don't need any layer over the top.
As well, due to IO concerns it is normally a bad idea to write say a result set of 10k records to Mongo or another DB.
There is a feature request for what you talk of: https://jira.mongodb.org/browse/SERVER-3215 but no planning as of yet.
Example
You could have a table of results.
Within this table you would have a doc that looks like:
{keywords: ['bok', 'mongodb']}
Each time you search and scroll through each result item you would write a row to this table populating the keywords field with keywords from that search result. This would be per search result per search result list per search. It would probably be best to just stream each search result to MongoDB as they come in. I have never programmed Python (though I wish to learn) so an example in pseudo:
var elastic_results = [{'elasticresult'}];
foreach(elastic_results as result){
//split down the phrases in this result and make a keywords array
db.results_collection.insert(array_formed_from_splitting_down_result); // Lets just lazy insert no need for batch or trying to shrink the amount of data to one go or whatever, lets just stream it in.
}
So as you go along your results you basically just mass insert as fast a possible create a sort of "stream" of input to MongoDB. It can do this quite well.
This should then give you a shardable list of words and language verbs to process things like MRs on and stuff to aggregate statistics about them.
Without knowing more and more about your scenario this is pretty much my best answer.
This does not use the temp table concept but instead makes your data permanent which is fine by the sounds of it since you wish to use Mongo as a storage engine for further tasks.
Actually there is MongoDB river plugin to work with Elasticsearch...
db.your_table.find().forEach(function(doc) { b.another_table.insert(doc); } );

java, Calculating mongodb query execution time

I want to log my query times for each time when a query made.
I'm using mongodb on playframework. Simply I'm thinkinig to substract start and end time.For ex:
a=currenttime();
madequert();
querytime=currenttime()-a;
Is there any better way?
You probably want to use Mongo's DB profiler. That way you'll keep that of your code (less work to maintain it) and it will give you more options to check Mongo behaviour.

Running MongoDB Queries in Map/Reduce

Is it possible to run MongoDB commands like a query to grab additional data or to do an update from with in MongoDB's MapReduce command. Either in the Map or the Reduce function?
Is this completely ludicrous to do anyways? Currently I have some documents that refer to separate collections using the MongoDB DBReference command.
Thanks for the help!
Is it possible to run MongoDB commands... from within MongoDB's MapReduce command.
In theory, this is possible. In practice there are lots of problems with this.
Problem #1: exponential work. M/R is already pretty intense and poorly logged. Adding queries can easily make M/R run out of control.
Problem #2: context. Imagine that you're running a sharded M/R and you are querying into an unsharded collection. Does the current context even have that connection?
You're basically trying to implement JOIN logic and MongoDB has no joins. Instead, you may need to build the final data in a couple of phases by running a few loops on a few sets of data.