Mongodb : Advanced conditional query - mongodb

I have the following list of documents: ( the collection has more than 100 documents )
{name : 'Tom', gender : 'male'},
{name : 'Sandra', gender : 'female'},
{name : 'Alex', gender : 'male'}
what i want is to return only 4 records with 2 of them being male and 2 of them being female.
So far I've tried this:
db.persons.find({'gender' : { $in : ['male','female']},{$limit : 4});
which brings 4 records as expected but isn't guaranteed to have 2 male and 2 female exact. Is there any way I can filter documents to return the specified list and also not require to make two separate db calls?
Thanks in advance.

I've been struggling to find a valid solution to your problem, but it appears that it is no easy task.
The only way that I thought of possible to call the database once is by grouping the information by gender and then project the resulted names array by slicing it and limiting the array size to 2.
This is not possible in the aggregation pipeline as you can not use operators such as $slice.
Still, I managed to group the database entries by gender and return the values in an array, which can then be manipulated.
After a lot of attempts, I came up with the below solution:
var people = db.people.aggregate([
{
$group: {
_id: '$gender',
names: { $push: '$name'}
}
}
]).toArray();
var new_people = [];
for (var i = 0; i < people.length; i++) {
for (var j = 0; j < 2; j++) {
new_people.push({
gender: people[i]._id,
name: people[i].names[j]
});
}
}
If you want to filter the data you have two options, based on my example:
Filter the data in the aggregation pipeline within the $match stage
Filter the data when looping over the aggregation resulted array

Making two calls is easy and I can see no reason for not making them.
Collect the results of two finds:
var males = db.person.find({"gender": "male"}, {"name":1}).limit(2);
var females = db.person.find({"gender": "female"}, {"name":1}).limit(2);
var all = [];
var collectToAll = function(person) { all.push(person); };
males.forEach(collectToAll)
females.forEach(collectToAll)
Then all is
[
{
"_id" : ObjectId("549289765732b52ca191fdae"),
"name" : "Tom"
},
{
"_id" : ObjectId("549289865732b52ca191fdb0"),
"name" : "Alex"
},
{
"_id" : ObjectId("549289805732b52ca191fdaf"),
"name" : "Sandra"
}
]

Related

MongoDB: Update a field of an item in array with matching another field of that item

I have a data structure like this:
We have some centers. A center has some switches. A switch has some ports.
{
"_id" : ObjectId("561ad881755a021904c00fb5"),
"Name" : "center1",
"Switches" : [
{
"Ports" : [
{
"PortNumber" : 2,
"Status" : "Empty"
},
{
"PortNumber" : 5,
"Status" : "Used"
},
{
"PortNumber" : 7,
"Status" : "Used"
}
]
}
]
}
All I want is to write an Update query to change the Status of the port that it's PortNumber is 5 to "Empty".
I can update it when I know the array index of the port (here array index is 1) with this query:
db.colection.update(
// query
{
_id: ObjectId("561ad881755a021904c00fb5")
},
// update
{
$set : { "Switches.0.Ports.1.Status" : "Empty" }
}
);
But I don't know the array index of that Port.
Thanks for help.
You would normally do this using the positional operator $, as described in the answer to this question:
Update field in exact element array in MongoDB
Unfortunately, right now the positional operator only supports one array level deep of matching.
There is a JIRA ticket for the sort of behavior that you want: https://jira.mongodb.org/browse/SERVER-831
In case you can make Switches into an object instead, you could do something like this:
db.colection.update(
{
_id: ObjectId("561ad881755a021904c00fb5"),
"Switch.Ports.PortNumber": 5
},
{
$set: {
"Switch.Ports.$.Status": "Empty"
}
}
)
Since you don't know the array index of the Port, I would suggest you dynamically create the $set conditions on the fly i.e. something which would help you get the indexes for the objects and then modify accordingly, then consider using MapReduce.
Currently this seems to be not possible using the aggregation framework. There is an unresolved open JIRA issue linked to it. However, a workaround is possible with MapReduce. The basic idea with MapReduce is that it uses JavaScript as its query language but this tends to be fairly slower than the aggregation framework and should not be used for real-time data analysis.
In your MapReduce operation, you need to define a couple of steps i.e. the mapping step (which maps an operation into every document in the collection, and the operation can either do nothing or emit some object with keys and projected values) and reducing step (which takes the list of emitted values and reduces it to a single element).
For the map step, you ideally would want to get for every document in the collection, the index for each Switches and Ports array fields and another key that contains the $set keys.
Your reduce step would be a function (which does nothing) simply defined as var reduce = function() {};
The final step in your MapReduce operation will then create a separate collection Switches that contains the emitted Switches array object along with a field with the $set conditions. This collection can be updated periodically when you run the MapReduce operation on the original collection.
Altogether, this MapReduce method would look like:
var map = function(){
for(var i = 0; i < this.Switches.length; i++){
for(var j = 0; j < this.Switches[i].Ports.length; j++){
emit(
{
"_id": this._id,
"switch_index": i,
"port_index": j
},
{
"index": j,
"Switches": this.Switches[i],
"Port": this.Switches[i].Ports[j],
"update": {
"PortNumber": "Switches." + i.toString() + ".Ports." + j.toString() + ".PortNumber",
"Status": "Switches." + i.toString() + ".Ports." + j.toString() + ".Status"
}
}
);
}
}
};
var reduce = function(){};
db.centers.mapReduce(
map,
reduce,
{
"out": {
"replace": "switches"
}
}
);
Querying the output collection Switches from the MapReduce operation will typically give you the result:
db.switches.findOne()
Sample Output:
{
"_id" : {
"_id" : ObjectId("561ad881755a021904c00fb5"),
"switch_index" : 0,
"port_index" : 1
},
"value" : {
"index" : 1,
"Switches" : {
"Ports" : [
{
"PortNumber" : 2,
"Status" : "Empty"
},
{
"PortNumber" : 5,
"Status" : "Used"
},
{
"PortNumber" : 7,
"Status" : "Used"
}
]
},
"Port" : {
"PortNumber" : 5,
"Status" : "Used"
},
"update" : {
"PortNumber" : "Switches.0.Ports.1.PortNumber",
"Status" : "Switches.0.Ports.1.Status"
}
}
}
You can then use the cursor from the db.switches.find() method to iterate over and update your collection accordingly:
var newStatus = "Empty";
var cur = db.switches.find({ "value.Port.PortNumber": 5 });
// Iterate through results and update using the update query object set dynamically by using the array-index syntax.
while (cur.hasNext()) {
var doc = cur.next();
var update = { "$set": {} };
// set the update query object
update["$set"][doc.value.update.Status] = newStatus;
db.centers.update(
{
"_id": doc._id._id,
"Switches.Ports.PortNumber": 5
},
update
);
};

MapReduce trouble with counting

I've got a problem, I have data in mongodb which looks like this:
{"miejscowosci_str":"OneCity", "wojewodztwo":"FirstRegionName", "ZIP-Code" : "...", ...}
{"miejscowosci_str":"TwoCity", "wojewodztwo":"FirstRegionName", "ZIP-Code" : "...", ...}
{"miejscowosci_str":"ThreeCity", "wojewodztwo":"SecondRegionName", "ZIP-Code" : "...", ...}
{"miejscowosci_str":"FourCity", "wojewodztwo":"SecondRegionName", "ZIP-Code" : "...", ...}
and so on
What I want is to list all regions (wojewodztwo) and to count average number of zip codes per region, I know how to count all zip codes in region:
var map = function() {
emit(this.wojewodztwo,1);
};
var reduce = function(key, val) {
var count = 0;
for(i in val) {
count += val[i];
}
return count;
};
db.kodypocztowe.mapReduce(
map,
reduce,
{ out : "result" }
);
But I don't know how to count number of cities (miejscowosci_str) so I could divide number of ZIP-Codes in region through number of cities in the same region.
One city can have multiple number of zip-codes.
Have you got any ideas?
I'm making a couple of assumptions here :
cities can have multiple zip codes
zip codes are unique
you are not trying to get the answer to M101P week 5 questions !
Rather than just counting the cities in one go, why not build up a list of city/zip objects in the map phase and then reduce this to a list of zips and unique cities in the map phase. Then you can use the finalize phase to calculate the averages.
Note : if the data set is large you might want to consider using the aggregation framework instead, this is shown after the map/reduce example
db.kodypocztowe.drop();
db.result.drop();
db.kodypocztowe.insert([
{"miejscowosci_str":"OneCity", "wojewodztwo":"FirstRegionName", "ZIP-Code" : "1"},
{"miejscowosci_str":"TwoCity", "wojewodztwo":"FirstRegionName", "ZIP-Code" : "2"},
{"miejscowosci_str":"ThreeCity", "wojewodztwo":"SecondRegionName", "ZIP-Code" : "3"},
{"miejscowosci_str":"FourCity", "wojewodztwo":"SecondRegionName", "ZIP-Code" : "4"},
{"miejscowosci_str":"FourCity", "wojewodztwo":"SecondRegionName", "ZIP-Code" : "5"},
]);
// map the data to { region : [{citiy : name , zip : code }] }
// Note : a city can be in multiple zips but zips are assumed to be unique
var map = function() {
emit(this.wojewodztwo, {city:this.miejscowosci_str, zip:this['ZIP-Code']});
};
//
// convert the data to :
//
// {region : {cities: [], zips : []}}
//
// note : always add zips
// note : only add cities if they are not already there
//
var reduce = function(key, val) {
var res = {zips:[], cities:[]}
for(i in val) {
var city = val[i].city;
res.zips.push(val[i].zip);
if(res.cities.indexOf(city) == -1) {
res.cities.push(city);
}
}
return res;
};
//
// finalize the data to get the average number of zips / region
var finalize = function(key, res) {
res.average = res.zips.length / res.cities.length;
delete res.cities;
delete res.zips;
return res;
}
print("==============");
print(" map/reduce")
print("==============");
db.kodypocztowe.mapReduce(
map,
reduce,
{ out : "result" , finalize:finalize}
);
db.result.find().pretty()
print("==============");
print(" aggregation")
print("==============");
db.kodypocztowe.aggregate( [
// get the number of zips / [region,city]
{ "$group" :
{
_id : {"region" : "$wojewodztwo", city : "$miejscowosci_str"},
zips:{$sum:1}
}
},
// get the number of cities per region and sum the number of zips
{ "$group" :
{
_id : "$_id.region" ,
cities:{$sum:1},
zips:{$sum:"$zips"},
}
},
// project the data into the same format that map/reduce generated
{ "$project" :
{
"value.average":{$divide: ["$zips","$cities"]}
}
}
]);
I hope that helps.

MongoDB fetch documents with sort by count

I have a document with sub-document which looks something like:
{
"name" : "some name1"
"like" : [
{ "date" : ISODate("2012-11-30T19:00:00Z") },
{ "date" : ISODate("2012-12-02T19:00:00Z") },
{ "date" : ISODate("2012-12-01T19:00:00Z") },
{ "date" : ISODate("2012-12-03T19:00:00Z") }
]
}
Is it possible to fetch documents "most liked" (average value for the last 7 days) and sort by the count?
There are a few different ways to solve this problem. The solution I will focus on uses mongodb's aggregation framework. First, here is an aggregation pipeline that will solve your problem, following it will be an explanation/breakdown of what is happening in the command.
db.testagg.aggregate(
{ $unwind : '$likes' },
{ $group : { _id : '$_id', numlikes : { $sum : 1 }}},
{ $sort : { 'numlikes' : 1}})
This pipeline has 3 main commands:
1) Unwind: this splits up the 'likes' field so that there is 1 'like' element per document
2) Group: this regroups the document using the _id field, incrementing the numLikes field for every document it finds. This will cause numLikes to be filled with a number equal to the number of elements that were in "likes" before
3) Sort: Finally, we sort the return values in ascending order based on numLikes. In a test I ran the output of this command is:
{"result" : [
{
"_id" : 1,
"numlikes" : 1
},
{
"_id" : 2,
"numlikes" : 2
},
{
"_id" : 3,
"numlikes" : 3
},
{
"_id" : 4,
"numlikes" : 4
}....
This is for data inserted via:
for (var i=0; i < 100; i++) {
db.testagg.insert({_id : i})
for (var j=0; j < i; j++) {
db.testagg.update({_id : i}, {'$push' : {'likes' : j}})
}
}
Note that this does not completely answer your question as it avoids the issue of picking the date range, but it should hopefully get you started and moving in the right direction.
Of course, there are other ways to solve this problem. One solution might be to just do all of the sorting and manipulations client-side. This is just one method for getting the information you desire.
EDIT: If you find this somewhat tedious, there is a ticket to add a $size operator to the aggregation framework, I invite you to watch and potentially upvote it to try and speed to addition of this new operator if you are interested.
https://jira.mongodb.org/browse/SERVER-4899
A better solution would be to keep a count field that will record how many likes for this document. While you can use aggregation to do this, the performance will likely be not very good. Having a index on the count field will make read operation fast, and you can use atomic operation to increment the counter when inserting new likes.
You can use this simplify the above aggregation query by the following from mongodb v3.4 onwards:
> db.test.aggregate([
{ $unwind: "$like" },
{ $sortByCount: "$_id" }
]).pretty()
{ "_id" : ObjectId("5864edbfa4d3847e80147698"), "count" : 4 }
Also as #ACE said you can now use $size within a projection instead:
db.test.aggregate([
{ $project: { count: { $size : "$like" } } }
]);
{ "_id" : ObjectId("5864edbfa4d3847e80147698"), "count" : 4 }

Ordering for a todo list with MongoDB

I'm attempting to create my own todo list using Javascript, Python and MongoDB. I'm getting stuck on how to handle the task ordering.
My current idea is to have an order field in each task document and when the order changes on the client I would grab the task list from the db and reorder each task individually/sequentially. This seems awkward because large todo lists would mean large amount of queries. Is there a way to update a field in multiple documents sequentially?
I'm also looking for advice as to whether this is the best way to do this. I want to be able to maintain the todo list order but maybe I'm going about it the wrong way.
{
"_id" : ObjectId("50a658f2cace55034c68ce95"),
"order" : 1,
"title" : "task1",
"complete" : 0
}
{
"_id" : ObjectId("50a658fecace55034c68ce96"),
"order" : 2,
"title" : "task2",
"complete" : 1
}
{
"_id" : ObjectId("50a65907cace55034c68ce97"),
"order" : 3,
"title" : "task3",
"complete" : 1
}
{
"_id" : ObjectId("50a65911cace55034c68ce98"),
"order" : 4,
"title" : "task4",
"complete" : 0
}
{
"_id" : ObjectId("50a65919cace55034c68ce99"),
"order" : 5,
"title" : "task5",
"complete" : 0
}
Mongo is very very fast with queries, you should not be as concerned with performance as if you were using a full featured relational database. If you want to be prudent, just create a todo list of 1k items and try it out, it should be pretty instant.
for (var i = 0; i < orderedListOfIds.length; i++)
{
db.collection.update({ '_id': orderedListOfIds[i] }, { $set: { order:i } })
}
then
db.collection.find( { } ).sort( { order: 1 } )
Yes, mongo allows for updating multiple documents. Just use a modifier operation and multi=True. For example, this increments order by one for all documents with order greater than five:
todos.update({'order':{'$gt':5}}, {'$inc':{'order':1}}, multi=True)
As to the best way, usually it's better to use a "natural" ordering (by name, date, priority etc) rather than create a fake field just for that.
I'm doing something similar. I added a field ind to my list items. Here's how I move a list item to a new location:
moveItem: function (sourceIndex, targetIndex) {
var id = Items.findOne({ind:sourceIndex})._id;
var movinUp = targetIndex > sourceIndex;
shift = movinUp ? -1 : 1;
lowerIndex = Math.min(sourceIndex, targetIndex);
lowerIndex += movinUp ? 1 : 0;
upperIndex = Math.max(sourceIndex, targetIndex);
upperIndex -= movinUp ? 0 : 1;
console.log("Shifting items from "+lowerIndex+" to "+upperIndex+" by "+shift+".");
Items.update({ind: {$gte: lowerIndex,$lte: upperIndex}}, {$inc: {ind:shift}},{multi:true});
Items.update(id, {$set: {ind:targetIndex}});
}
if you're using native promises (es6) in mongoose mongoose.Promise = global.Promise you can do the following to batch:
function batchUpdate(res, req, next){
let ids = req.body.ids
let items = []
for(let i = 0; i < ids.length; i++)
items.push(db.collection.findOneAndUpdate({ _id:ids[i] }, { $set: { order:i } }))
Promise.all(items)
.then(() => res.status(200).send())
.catch(next)
}

How can I find elements of a MongoDB collection that are taking up a large amount of space?

If I have a collection with thousands of elements, is there a way I can easily find which elements are taking up the most space (in terms of MB)?
There's no built-in query for this, you have to iterate the collection, gather size for each document, and sort afterwards. Here's how it'd work:
var cursor = db.coll.find();
var doc_size = {};
cursor.forEach(function (x) {
var size = Object.bsonsize(x);
doc_size[x._id] = size;
});
At this point you'll have a hashmap with document ids as keys and their sizes as values.
Note that with this approach you will be fetching the entire collection over the wire. An alternative is to use MapReduce and do this server-side (inside mongo):
> function mapper() {emit(this._id, Object.bsonsize(this));}
> function reducer(obj, size_in_b) { return { id : obj, size : size_in_b}; }
>
> var results = db.coll.mapReduce(mapper, reducer, {out : {inline : 1 }}).results
> results.sort(function(r1, r2) { return r2.value - r1.value; })
inline:1 tells mongo not to create a temporary collection for results, everything will be kept in RAM.
And a sample output from one of my collections:
[
{
"_id" : ObjectId("4ce9339942a812be22560634"),
"value" : 1156115
},
{
"_id" : ObjectId("4ce9340442a812be24560634"),
"value" : 913413
},
{
"_id" : ObjectId("4ce9340642a812be26560634"),
"value" : 866833
},
{
"_id" : ObjectId("4ce9340842a812be28560634"),
"value" : 483614
},
...
{
"_id" : ObjectId("4ce9340742a812be27560634"),
"value" : 61268
}
]
>
Figured this out! I did this in two steps using Object.bsonsize():
db.myCollection.find().forEach(function(myObject) {
db.objectSizes.save({object_id: object._id, size: Object.bsonsize(chain)});
});
db.objectSizes.find().sort({size: -1}).limit(5).pretty();