How can I return an array of mongodb objects in pymongo (without a cursor)? Can MapReduce do this? - mongodb

I have a db set up in mongo that I'm accessing with pymongo.
I'd like to be able to pull a small set of fields into a list of dictionaries. So, something like what I get in the mongo shell when I type...
db.find({},{"variable1_of_interest":1, "variable2_of_interest":1}).limit(2).pretty()
I'd like a python statement like:
x = db.find({},{"variable1_of_interest":1, "variable2_of_interest":1})
where x is an array structure of some kind rather than a cursor---that is, instead of iterating, like:
data = []
x = db.find({},{"variable1_of_interest":1, "variable2_of_interest":1})
for i in x:
data.append(x)
Is it possible that I could use MapReduce to bring this into a one-liner? Something like
db.find({},{"variable1_of_interest":1, "variable2_of_interest":1}).map_reduce(mapper, reducer, "data")
I intend to output this dataset to R for some analysis, but I'd like concentrate the IO in Python.

You don't need to call mapReduce, you just turn the cursor into a list like so:
>>> data = list(col.find({},{"a":1,"b":1,"_id":0}).limit(2))
>>> data
[{u'a': 1.0, u'b': 2.0}, {u'a': 2.0, u'b': 3.0}]
where col is your db.collection object.
But caution with large/huge result cause every thing is loaded into memory.

What you can do is to call mapReduce in pymongo and pass it the find query as an argument, it could be like this:
db.yourcollection.Map_reduce(map_function, reduce_function,query='{}')
About the projections I think that you would need to do them in the reduce function since query only specify the selection criteria as it says in the mongo documentation

Building off of Asya's answer:
If you wanted a list of just one value in each entry as opposed to a list of objects--using a list comprehension worked for me.
I.e. if each object represents a user and the database stored their email, and you say wanted all the users that were 15 years old
user_emails = [user['email'] for user in db.people.find( {'age' : 15} )]
More here

Related

How to order the fields of the documents returned by the find query in MongoDB? [duplicate]

I am using PyMongo to insert data (title, description, phone_number ...) into MongoDB. However, when I use mongo client to view the data, it displays the properties in a strange order. Specifically, phone_number property is displayed first, followed by title and then comes description. Is there some way I can force a particular order?
The above question and answer are quite old. Anyhow, if somebody visits this I feel like I should add:
This answer is completely wrong. Actually in Mongo Documents ARE ordered key-value pairs. However when using pymongo it will use python dicts for documents which indeed are not ordered (as of cpython 3.6 python dicts retain order, however this is considered an implementation detail). But this is a limitation of the pymongo driver.
Be aware, that this limitation actually impacts the usability. If you query the db for a subdocument it will only match if the order of the key-values pairs is correct.
Just try the following code yourself:
from pymongo import MongoClient
db = MongoClient().testdb
col = db.testcol
subdoc = {
'field1': 1,
'field2': 2,
'filed3': 3
}
document = {
'subdoc': subdoc
}
col.insert_one(document)
print(col.find({'subdoc': subdoc}).count())
Each time this code gets executed the 'same' document is added to the collection. Thus, each time we run this code snippet the printed value 'should' increase by one. It does not because find only maches subdocuemnts with the correct ordering but python dicts just insert the subdoc in arbitrary order.
see the following answer how to use ordered dict to overcome this: https://stackoverflow.com/a/30787769/4273834
Original answer (2013):
MongoDB documents are BSON objects, unordered dictionaries of key-value pairs. So, you can't rely on or set a specific fields order. The only thing you can operate is which fields to display and which not to, see docs on find's projection argument.
Also see related questions on SO:
MongoDB field order and document position change after update
Can MongoDB and its drivers preserve the ordering of document elements
Ordering fields from find query with projection
Hope that helps.

find_one query returns just the fields instead of an entry

I'm currently trying to use pymongo's find_one query. When I run the Mongo Shell and execute a findOne query, it get a document that is returned. However when I try using pymongo's find_one query, I always seem to get just the field names instead of an actual entry.
#app.route("/borough/manhattan/")
def manhattan():
restaurantmanhattan = restaurants.find_one({'borough':'Manhattan'})
json_restaurantmanhattan = []
for restaurant in restaurantmanhattan:
json_restaurantmanhattan.append(restaurant)
json_restaurantmanhattan = json.dumps(json_restaurantmanhattan)
return json_restaurantmanhattan
Once I navigate to http://0.0.0.0:5000/borough/manhattan/ I get the following:
["cuisine","borough","name","restaurant_id","grades","address","_id"]
I believe I should be seeing a document entry that meets the query that it has Manhattan listed in the borough.
I'm at a loss as to how I should be writing the query to return that.
Can anyone explain what I'm seeing?
There are many things wrong with your view.
First as you may already know, find_one return a single document as Python dictionary. So in your for loop, you iterating the dictionary keys.
You really do not need that for loop.
import json
#app.route("/borough/manhattan/")
def manhattan():
restaurant_manhattan = restaurants.find_one({'borough':'Manhattan'})
return json.dumps(restaurant_manhattan)

How can you measure the space that a set of documents takes up (in bytes) in mongo db?

What I would like to do is figure out how much space in bytes a certain set of documents takes up. E.g. something like:
collection.stuff.stats({owner: someOwner}, {sizeInBytes: 1})
Where the first parameter is a query, and the second is like a projection of the statistics you want calculated.
I read that there's a bsonsize function you can use to measure the size of a single document. I'm wondering if maybe I could use that along with the aggregation methods to calculate the size of a search. But if I was going to do that, I'd want to know how bsonsize works. How does it work? Is it expensive to run?
Are there other options for measuring the size of data in mongo?
One perhaps "quick and dirty" way to find this would be to assign your results to a cursor, then insert that result into a new collection and call db.collection.stats on it. It would look like this in the shell:
var myCursor = db.collection.find({key:value});
while(myCursor.hasNext()) {
db.resultColl.insert(myCursor.next())
}
db.resultColl.stats();
Which should return the information on the subset of documents

store mongodb result in array?

Is it possible to store the result of a mongodb statement in array using jquery
I have like this
Polls_Coll.find({},{question:1});
I want all the question filed records to store in array some thing like
var arr[]=Polls_Coll.find({},{question:1});
I know above thing is wrong. I need something like that.
I need it for autocompletion. Now i'm taking source from one collection like this
source:_(Product_Mobiles.find().fetch()).pluck("title")
I want data from multiple sources and store it in array
Thanks
Using the mongo console you can do this with .toArray() like
var results = db.collection.find({}).toArray();
However, this might depend on the driver you are using... I guess the javascript driver has it as well.
If your problem is putting all the results from multiple sources into a single array:
How to merge two arrays in Javascript and de-duplicate items
You could merge the two arrays if thats what you mean:
var results = collection.find({}).fetch();
var results2 = collection2.find({}).fetch();
results = results.concat(results2);
Then you can do pluck
_(results).pluck("title");
Also you can't use db. in Meteor you have to use the name of the collection varaible you defined with new Meteor.Collection

Getting an item count with MongoDB C# driver query builder

Using the c# driver for MongoDB I can easily construct a query against which I can then add SetSkip() and SetLimit() parameters to constrict the result set to a certain size.
However I'd like to be able to know what item count of the query would be before applying Skip and Take without executing the query and loading the entire result set (which could be huge) into memory.
It looks like I can do this with MongoDB directly through the shell by using the count() command. e.g.:
db.item.find( { "FieldToMatch" : "ValueToMatch" } ).count()
Which just returns an integer and that's exactly what I want. But I can't see a way in the documentation of doing this through the C# driver. Is it possible?
(It should be noted that we're already using the query builder extensively, so ideally I'd much rather do this through the query builder than start issuing commands down to the shell through the driver, if that's possible. But if that's the only solution then an example would be helpful, thanks.)
Cheers,
Matt
You can do it like this:
var server = MongoServer.Create("mongodb://localhost:27020");
var database = server.GetDatabase("someDb");
var collection = database.GetCollection<Type>("item");
var cursor = collection.Find(Query.EQ("FieldToMatch" : "ValueToMatch"));
var count = cursor.Count();
Some notes:
You should have only one instance of server (singleton)
latest driver version actually returns long count instead of int
Cursor only fetches data once you iterate
You can configure a lot of things like skip, take, specify fields to return in cursor before actually load data (start iteration)
Count() method of cursor loads only document count
I'm using the Driver 2.3.0 and now is also possible to do that like this:
...
IMongoCollection<entity> Collection = db.GetCollection<entity>(CollectionName);
var yourFilter = Builders<entity>.Filter.Where(o => o.prop == value);
long countResut = Collection.Count(yourFilter);