accessing fieldnames as metadata in mongodb - mongodb

I have a number of different documents in a mongo collection.
The attrs are all numeric values. I don't know apriori what the fieldnames are (I do but they can vary from doc to doc).
I want to write a program that
a) gets all the unique fieldnames in a collection
b) finds the max and min value of each field in the collection
and then reports it in a tabular form with rows "fieldname, maxvalue, minvalue" or in JSON that is equivalent. I am using pymongo but I don't have to, ruby or js or even java driver is fine.
How do I get programmatic access to the list of unique fieldnames in a collection? That's
the major question. I can manage the rest.

Either you main the list of used key inside your application as part of your application logic in some document inside the same collection or a meta-collection yourself or you have to iterate over all documents to figure out the list of keys...there is nothing in MongoDB helping you here since MongoDB is schemaless.

Related

How to order the fields of the documents returned by the find query in MongoDB? [duplicate]

I am using PyMongo to insert data (title, description, phone_number ...) into MongoDB. However, when I use mongo client to view the data, it displays the properties in a strange order. Specifically, phone_number property is displayed first, followed by title and then comes description. Is there some way I can force a particular order?
The above question and answer are quite old. Anyhow, if somebody visits this I feel like I should add:
This answer is completely wrong. Actually in Mongo Documents ARE ordered key-value pairs. However when using pymongo it will use python dicts for documents which indeed are not ordered (as of cpython 3.6 python dicts retain order, however this is considered an implementation detail). But this is a limitation of the pymongo driver.
Be aware, that this limitation actually impacts the usability. If you query the db for a subdocument it will only match if the order of the key-values pairs is correct.
Just try the following code yourself:
from pymongo import MongoClient
db = MongoClient().testdb
col = db.testcol
subdoc = {
'field1': 1,
'field2': 2,
'filed3': 3
}
document = {
'subdoc': subdoc
}
col.insert_one(document)
print(col.find({'subdoc': subdoc}).count())
Each time this code gets executed the 'same' document is added to the collection. Thus, each time we run this code snippet the printed value 'should' increase by one. It does not because find only maches subdocuemnts with the correct ordering but python dicts just insert the subdoc in arbitrary order.
see the following answer how to use ordered dict to overcome this: https://stackoverflow.com/a/30787769/4273834
Original answer (2013):
MongoDB documents are BSON objects, unordered dictionaries of key-value pairs. So, you can't rely on or set a specific fields order. The only thing you can operate is which fields to display and which not to, see docs on find's projection argument.
Also see related questions on SO:
MongoDB field order and document position change after update
Can MongoDB and its drivers preserve the ordering of document elements
Ordering fields from find query with projection
Hope that helps.

fill up mongo data automatically by using script

I am a newbie to mongo, I have a collection in my mongodb, To test a feature in my project I need to update database with some random data.I need a script to do that. by identifying the datatype of the field script should fill up the data automatically.
suppose I have the fields in the collection:
id, name, first_name, last_name, current_date, user_income etc.
Since the my questions are as follows:
1. Can we get all field names of a collection with their data types?
2. Can we generate a random value of that data type in mongo shell?
3. how to set the values dynamically to store random data.
I am frequently putting manually to do this.
1. Can we get all field names of a collection with their data types?
mongodb collections are schema-less, which means each document (row in relation database) can have different fields. When you find a document from a collection, you could get its fields names and data types.
2. Can we generate a random value of that data type in mongo shell?
3. how to set the values dynamically to store random data.
mongo shell use JavaScript, you may write a js script and run it with mongo the_js_file.js. So you could generate a random value in the js script.
It's useful to have a look at the mongo JavaScript API documentation and the mongo shell JavaScript Method Reference.
Other script language such as Python can also do that. mongodb has their APIs too.

Mongodb compare two big data collection

I want to compare two very big collection, the main of the operation is two know what element is change or deleted
My collection 1 and 2 have a same structure and have more 3 million records
example :
record 1 {id:'7865456465465',name:'tototo', info:'tototo'}
So i want to know : what element is change, and what element is not present in collection 2.
What is the best solution to do this?
1) Define what equality of 2 documents means. For me it would be: both documents should contain all fields with exact same values given their ids are unique. Note that mongo does not guarantee field order, and if you update a field it might move to the end of the document which is fine.
2) I would use some framework that can connect to mongo and fetch data at the same time converting it to a map-like data structure or even JSON. For instance I would go with Scala + Lift record (db.coll.findAll()) + Lift JSON. Lift JSON library has Diff function that will give you a diff of 2 JSON docs.
3) Finally I would sort both collections by ids, open db cursors, iterate and compare.
if the schema is flat in your case it is, you can use a free tool to compare the data(dataq.io) in two tables.
Disclaimer : I am the founder of this product.

Mongo query for number of items in a sub collection

This seems like it should be very simple but I can't get it to work. I want to select all documents A where there are one or more B elements in a sub collection.
Like if a Store document had a collection of Employees. I just want to find Stores with 1 or more Employees in it.
I tried something like:
{Store.Employees:{$size:{$ne:0}}}
or
{Store.Employees:{$size:{$gt:0}}}
Just can't get it to work.
This isn't supported. You basically only can get documents in which array size is equal to the value. Range searches you can't do.
What people normally do is that they cache array length in a separate field in the same document. Then they index that field and make very efficient queries.
Of course, this requires a little bit more work from you (not forgetting to keep that length field current).

MongoDB: Nested query with arrays, and it's performance

I have 2 collections on 2 separate DBs. Both store an array field. I plan to query both at once so that:
All collection 1 documents that have elements [A,B] in their array
field and their _ids are present in collection 2's array field with a
specific document _id.
As an example:
docs (collection 1, DB 1):
[{"_id":ObjectId("doc1"), "array1":["A","B"]}, {"_id":ObjectId("doc2"), "array1":["A","C"]}]
user_docs (collection 2, DB 2):
[{"_id":ObjectId("usr1"), "array2": [ObjectId("doc1"),ObjectId("foo")]}, {"_id":ObjectId("usr2"), "array2": [ObjectId("bar"),ObjectId("baz")]}]
I need a query that given A,B and usr1, returns the 'doc1' object (because it has A,B in it's array1 field and usr1 has it in it's array2 field).
I obviously can fetch all docs having A,B in one query and all usr1's docs in another query and find the common elements at application level, but is there any better way of doing it using MongoDB?
Thanks for your help.
Ok im not sure i understand exactly what your trying to do from your description. But i dont understand why you would query data across db's this just seems very heavy handed to me why cant you store both the data sets in the same db. You can always separate later if required? Im not sure this will solve your vague problem but it would be a good place to start.
best of Luck.
You will have to query MongoDB twice, since you have no possibility of a join. You will have to do it on application level. If you can denormalize, do it. Cash the needed data in a embedded doc, so that you can do one query only.
I think #Eamonn is right, that you shouldn't have to do a query across DBs.