How to intercept mondoDB Query from Presto Connector - mongodb

I have written a number of Presto queries that pull from mongoDB collections, but others in our project query mongo directly. These folks would like to use my queries to save them the time of having to rewrite them.
Is there a way to obtain/extract the mongoDB query language generated by Presto?
Didn't see anything in the MongoDB connector documentation that would indicate how or if this was possible.
I am aware of SQL-mongo converters out there, but Presto SQL extends normal SQL to enable things like unwrapping arrays etc. that we encounter with non-relational stores and these converts have trouble with these things in my experience.

You can set MongoDB driver log level DEBUG in log.properties:
org.mongodb=DEBUG
However, it will print many unrelated logs (e.g. healthcheck). Filed an issue https://github.com/prestosql/presto/issues/5600

I guess the easiest way is to look into Mongodb while the query is running and get it from there, for example via logging:
db.setProfilingLevel(2)
db.system.profile.find().pretty()
You may also use some GUIs like MongoVue or Robo 3T - I used MongoVue in the past to evaluate running queries.

Related

MongoDB: display entities as a graph-nodes UI for remote mongo host

I have lots of data in my MongoDB accross collection and I'd like to vizualise them, in graph-node way, or in relation represenation manner.
I know, that Mongo by default, isn't 'that good' solution for this case, and I should use something like Neo4j or ElasticSearch with Kibana instead. And also I know, that Mongo Compass has in-build charts.
And according to some articles on Mongo site and $graphLookup aggregation operator Mongo can be used in that way.
So my question is about a tool, which is capable to vizualise (or connect by values) entities between each other.
Probably there is some way to integrate and connect Kibana to MongoDB remote instance (not Atlas) or Graphana allow to solve this problem (but I haven't head much about it) so I'll be welcome to ay of your advice about it.

How to handle databases or collection being created accidentally in mongoDB? [duplicate]

Is there a way to switch off the ability of mongo to sporadically create dbs and collections as soon as it sees one in a query. I run queries on the mongo console all the time and mistype a db or collection name, causing mongo to just create one. There should be a switch to have mongo only explicitly create dbs and collections. I can't find one on the docs.
To be clear, MongoDB does not auto create collections or databases on queries. For collections, they are auto created when you actually save data to them. You can test this yourself, run a query on a previously unknown collection in a database like this:
use unknowndb
db.unknowncollection.find()
show collections
No collection named "unknowncollection" shows up until you insert or save into it.
Databases are a bit more complex. A simple "use unknowndb" will not auto create the database. However, if after you do that you run something like "show collections" it will create the empty database.
I agree, an option to control this behavior would be great. Happy to vote for it if you open a Jira ticket at mongoDB.
No, implicit creation of collections and DBs is a feature of the console and may not be disabled. You might take a look at the security/authorization/role features of 2.6 and see if anything might help (although there's not something that exactly matches your request as far as I know).
I'd suggest looking through the MongoDB issues/bug/requests database system here to and optionally add the feature request if it doesn't already exist.
For people who are using Mongoose, a new database will get created automatically if your Mongoose Schema contains any form of index. This is because Mongo needs to create a database before it can insert said index.

Using Alteryx's Mongo Connection Tool To Connect With A Mongo Labs Database (Collection)

I've been test driving Alteryx for the last week or so and was wondering if anyone has successfully connected to a Mongo Labs data base using the Alteryx Mongo Input and Output tools. I've tried numerous times and can's seem to get it to work.
Yes, I am using that MongoDB extractor on a daily basis to generate TDE files to feed Tableau, since Tableau does not offer dedicated extractors, but I admit it was difficult to get started.
Extraction is now swift from a MongoDB running on AWS, make sure you indicate the proper collection, and be aware of one flaw with the current version of extractor, as of Alteryx 9.5: it is missing the flag to send to the DB that would let you read from a Read Only configured MongoDB. It is on the priority list for V10. In the meantime, you should connect to the DB that can be written to, and to ease your DBA, show that Alteryx workflows can't write to it unless you drag into the workflow the MongoDB Output Tool.
Also you can use the Properties / Criteria window to set filters on the indexed field of your MongoDB. After much research, I found out the precise syntax for filtering the extraction by date:
{_id: {$gt: ObjectId("54a4fe800000000000000000")}}
Since each MongoDB ObjectId contains an embedded timestamp of its creation time.
To get the proper time, you can use this excellent website:
http://steveridout.github.io/mongo-object-time/
And if you need fancier filters, here is some help with the syntax:
http://www.querymongo.com/
I hope that helps...

stop mongodb creating dbs and collections dynamically

Is there a way to switch off the ability of mongo to sporadically create dbs and collections as soon as it sees one in a query. I run queries on the mongo console all the time and mistype a db or collection name, causing mongo to just create one. There should be a switch to have mongo only explicitly create dbs and collections. I can't find one on the docs.
To be clear, MongoDB does not auto create collections or databases on queries. For collections, they are auto created when you actually save data to them. You can test this yourself, run a query on a previously unknown collection in a database like this:
use unknowndb
db.unknowncollection.find()
show collections
No collection named "unknowncollection" shows up until you insert or save into it.
Databases are a bit more complex. A simple "use unknowndb" will not auto create the database. However, if after you do that you run something like "show collections" it will create the empty database.
I agree, an option to control this behavior would be great. Happy to vote for it if you open a Jira ticket at mongoDB.
No, implicit creation of collections and DBs is a feature of the console and may not be disabled. You might take a look at the security/authorization/role features of 2.6 and see if anything might help (although there's not something that exactly matches your request as far as I know).
I'd suggest looking through the MongoDB issues/bug/requests database system here to and optionally add the feature request if it doesn't already exist.
For people who are using Mongoose, a new database will get created automatically if your Mongoose Schema contains any form of index. This is because Mongo needs to create a database before it can insert said index.

How can I discover a mongo database's structure

I have a Mongo database that I did not create or architect, is there a good way to introspect the db or print out what the structure is to start to get a handle on what types of data are being stored, how the data types are nested, etc?
Just query the database by running the following commands in the mongo shell:
use mydb //this switches to the database you want to query
show collections //this command will list all collections in the database
db.collectionName.find().pretty() //this will show all documents in the database in a readable format; do the same for each collection in the database
You should then be able to examine the document structure.
There is actually a tool to help you out here called Variety:
http://blog.mongodb.org/post/21923016898/meet-variety-a-schema-analyzer-for-mongodb
You can view the Github repo for it here: https://github.com/variety/variety
I should probably warn you that:
It uses MR to accomplish its tasks
It uses certain other queries that could bring a production set-up to a near halt in terms of performance.
As such I recommend you run this on a development server or a hidden node of a replica or something.
Depending on the size and depth of your documents it may take a very long time to understand the rough structure of your database through this but it will eventually give one.
This will print name and its type
var schematodo = db.collection_name.findOne()
for (var key in schematodo) { print (key, typeof key) ; }
I would recommend limiting the result set rather than issuing an unrestricted find command.
use mydb
db.collectionName.find().limit(10)
var z = db.collectionName.find().limit(10)
Object.keys(z[0])
Object.keys(z[1])
This will help you being to understand your database structure or lack thereof.
This is an open-source tool that I, along with my friend, have created - https://pypi.python.org/pypi/mongoschema/
It is a Python library with a pretty simple usage. You can try it out (even contribute).
One option is to use the Mongoeye. It is open-source tool similar to the Variety.
The difference is that Mongoeye is a stand-alone program (Mongo Shell is not required) and has more features (histograms, most frequent values, etc.).
https://github.com/mongoeye/mongoeye
Few days ago I found GUI client MongoDB Compass with some nice visualizations. See the product overview. It comes directly from the mongodb people and according to their doc:
MongoDB Compass is designed to allow users to easily analyze and understand the contents of their data collections within MongoDB...
You may've asked about validation schema. Here's the answer how to get it:
How to retrieve MongoDb collection validator rules?
Use Mongo Compass
which does a sample as explained here
Which does a random sample of 1000 documents to get you the schema - it could miss something but it's the only rational option if you database is several GBs.
Visualisation
The schema then can be exported as JSON
Documentation
You can use MongoDB's tool mongodump. On running it, a dump folder is created in the directory from which you executed mongodump. In that folder, there are multiple folders that correspond to the databases in MongDB, and there are subfolders that correspond to the collections, and files that correspond to the documents.
This method is the best I know of, as you can also make out the schema of empty collections.