MongoDB MapReduce--is there an Aggregation alternative? - mongodb

I've got a collection with documents using a schema something like this (some members redacted):
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
2,
3,
5
],
"activity" : [
4,
4,
3
],
},
"media" : [
ObjectId("537ea185df872bb71e4df270"),
ObjectId("537ea185df872bb71e4df275"),
ObjectId("537ea185df872bb71e4df272")
]
}
In this schema, the first, second, and third positivity ratings correspond to the first, second, and third entries in the media array, respectively. The same is true for the activity ratings. I need to calculate statistics for the positivity and activity ratings with respect to their associated media objects across all documents in the collection. Right now, I'm doing this with MapReduce. I'd like to, however, accomplish this with the Aggregation Pipeline.
Ideally, I'd like to $unwind the media, answers.ratings.positivity, and answers.ratings.activity arrays simultaneously so that I end up with, for example, the following three documents based on the previous example:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 2,
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df270")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 3
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df275")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 5
"activity" : 3
}
},
"media" : ObjectId("537ea185df872bb71e4df272")
}
]
Is there some way to accomplish this?

The current aggregation framework does not allow you to do this. Being able to unwind multiple arrays that are know to be the same size and creating a document for the ith value of each would be a good feature.
If you want to use the aggregation framework you will need to change your schema a little. For example take the following document schema:
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
{k:1, v:2},
{k:2, v:3},
{k:3, v:5}
],
"activity" : [
{k:1, v:4},
{k:2, v:4},
{k:3, v:3}
],
}},
"media" : [
{k:1, v:ObjectId("537ea185df872bb71e4df270")},
{k:2, v:ObjectId("537ea185df872bb71e4df275")},
{k:3, v:ObjectId("537ea185df872bb71e4df272")}
]
}
By doing this you are essentially adding the index to the object inside the array. After this it's just a matter of unwinding all the arrays and matching on the key.
db.test.aggregate([{$unwind:"$media"},
{$unwind:"$answers.ratings.positivity"},
{$unwind:"$answers.ratings.activity"},
{$project:{"media":1, "answers.ratings.positivity":1,"answers.ratings.activity":1,
include:{$and:[
{$eq:["$media.k", "$answers.ratings.positivity.k"]},
{$eq:["$media.k", "$answers.ratings.activity.k"]}
]}}
},
{$match:{include:true}}])
And the output is:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 1,
"v" : 2
},
"activity" : {
"k" : 1,
"v" : 4
}
}
},
"media" : {
"k" : 1,
"v" : ObjectId("537ea185df872bb71e4df270")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 2,
"v" : 3
},
"activity" : {
"k" : 2,
"v" : 4
}
}
},
"media" : {
"k" : 2,
"v" : ObjectId("537ea185df872bb71e4df275")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 3,
"v" : 5
},
"activity" : {
"k" : 3,
"v" : 3
}
}
},
"media" : {
"k" : 3,
"v" : ObjectId("537ea185df872bb71e4df272")
},
"include" : true
}
]
Doing this creates a lot of extra document overhead and may be slower than your current MapReduce implementation. You would need to run tests to check this. The computations required for this will grow in a cubic way based on the size of those three arrays. This should also be kept in mind.

Related

MongoDB Geospatial query within an array returns always the full doc

Have a collection in MongoDB that looks like this :
{
"_id" : "7613035010550",
"purchases" : [
{
"date" : ISODate("2017-04-15T14:15:00.000Z"),
"coords" : {
"lon" : 43.729604,
"lat" : 1.416017
},
"metar" : {},
"quantity" : 1,
"price" : 2.31
},
{
"date" : ISODate("2017-05-02T16:23:00.000Z"),
"coords" : {
"lon" : 43.722862,
"lat" : 1.415837
},
"metar" : {},
"quantity" : 6,
"price" : 12
},
{
"date" : ISODate("2017-05-02T18:32:00.000Z"),
"coords" : {
"lon" : 46.307353,
"lat" : 3.28937
},
"metar" : {},
"quantity" : 2,
"price" : 5
}
],
"rates" : [
{
"value" : 5
},
{
"value" : 4
},
{
"value" : 5
},
{
"value" : 2
}
]
}
And would like make a query that is abble to return only purchases done within a define radius (i.e 5 km) around a point and only for an id... But i don't know how to handle this kind of query.
Try this query :
db.getCollection('stats').find({"purchases.coords":{$geoWithin:{$centerSphere: [[43.688935, 1.401541], 25 / 6378.1]}}})
But returns the whole document... I would like to be abble to return something like an array of purchases made around the defined radius, i.e only those two in my exemple :
{
"date" : ISODate("2017-04-15T14:15:00.000Z"),
"coords" : {
"lon" : 43.729604,
"lat" : 1.416017
},
"metar" : {},
"quantity" : 1,
"price" : 2.31
},
{
"date" : ISODate("2017-05-02T16:23:00.000Z"),
"coords" : {
"lon" : 43.722862,
"lat" : 1.415837
},
"metar" : {},
"quantity" : 6,
"price" : 12
}
How can i achieve this kind of query... or... how to define my collection to be abble to make this kind of query ?
Thx,
JL
For Your purpose I'd recommend to use aggregation with stages project and unwind.
Can't check it right now but it should looks like this:
db.getCollection('stats').aggregate([
{'$match': {
"purchases.coords": {$geoWithin:{$centerSphere: [[43.688935, 1.401541], 25 / 6378.1]}}
}},
{'$project': {
"_id": 0, // 0 - if you don't need document id
"purchases": 1,
}},
{'$unwind': "$purchases"},
{'$match': {
"purchases.coords": {$geoWithin:{$centerSphere: [[43.688935, 1.401541], 25 / 6378.1]}}
}},
])
I've used 2 identical matches to:
Match all documents matched specified conditions.
Match all unwinded 'purchase' matched specified conditions.
You can use this aggregation without first match but it may be a bit slower.
You can see how it works If you comment all the stages and then uncomment one by one.

MongoDB Aggregation - return default value for documents that don't match query

I'm having trouble figuring out the right aggregation pipe operations to return the results I need.
I have a collection similar to the following :-
{
"_id" : "writer1",
"Name" : "writer1",
"Website" : "website1",
"Reviews" : [
{
"Film" : {
"Name" : "Jurassic Park",
"Genre" : "Action"
},
"Score" : 4
},
{
"Technology" : {
"Name" : "Mad Max",
"Genre" : "Action"
},
"Score" : 5
}
]
}
{
"_id" : "writer2",
"Name" : "writer2",
"Website" : "website1",
"Reviews" : [
{
"Technology" : {
"Name" : "Mad Max",
"Genre" : "Action"
},
"Score" : 5
}
]
}
And this is my aggregation so far : -
db.writers.aggregate([
{ "$unwind" : "$Reviews" },
{ "$match" : { "Reviews.Film.Name" : "Jurassic Park" } },
{ "$group" : { "_id" : "$Website" , "score" : { "$avg" : "$Reviews.Score" },
writers :{ $push: { name:"$Name", score:"$Reviews.Score" } }
}}
])
This returns only writers who have a review of the matching film and also only websites that have at least 1 writer who has reviewed the film,
however, I need to return all websites containing a list of their all writers, with a score of 0 if they haven't written a review for the specified film.
so, I am currently getting : -
{ "_id" : "website1", "score" : 4, "writers" : [ { "name" : "writer1", "score" : 4 } ] }
When I actually need : -
{ "_id" : "website1", "score" : 2, "writers" : [ { "name" : "writer1", "score" : 4 },{ "name" :"writer2", "score" : 0 } ] }
Can anyone point me in the right direction?
Cheers

Get document based on multiple criteria of embedded collection

I have the following document, I need to search for multiple items from the embedded collection"items".
Here's an example of a single SKU
db.sku.findOne()
{
"_id" : NumberLong(1192),
"description" : "Uploaded via CSV",
"items" : [
{
"_id" : NumberLong(2),
"category" : DBRef("category", NumberLong(1)),
"description" : "840 tag visual",
"name" : "840 Visual Mini Round",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(7),
"category" : DBRef("category", NumberLong(2)),
"description" : "Maxi",
"name" : "Maxi",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(11),
"category" : DBRef("category", NumberLong(3)),
"description" : "Button",
"name" : "Button",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(16),
"category" : DBRef("category", NumberLong(4)),
"customizationFields" : [
{
"_class" : "CustomizationField",
"_id" : NumberLong(1),
"displayText" : "Custom Print 1",
"fieldName" : "customPrint1",
"listOrder" : 1,
"maxInputLength" : 12,
"required" : false,
"version" : NumberLong(0)
},
{
"_class" : "CustomizationField",
"_id" : NumberLong(2),
"displayText" : "Custom Print 2",
"fieldName" : "customPrint2",
"listOrder" : 2,
"maxInputLength" : 17,
"required" : false,
"version" : NumberLong(0)
}
],
"description" : "2 custom lines of farm print",
"name" : "Custom 2",
"version" : NumberLong(2)
},
{
"_id" : NumberLong(20),
"category" : DBRef("category", NumberLong(5)),
"description" : "Color Red",
"name" : "Red",
"version" : NumberLong(0)
}
],
"skuCode" : "NF-USDA-XC2/SM-BC-R",
"version" : 0,
"webCowOptions" : "840miniwithcust2"
}
There are repeat items.id throughout the embedded collection. Each Sku is made up of multiple items, all combinations are unique, but one item will be part of many Skus.
I'm struggling with the query structure to get what I'm looking for.
Here are a few things I have tried:
db.sku.find({'items._id':2},{'items._id':7})
That one only returns items with the id of 7
db.sku.find({items:{$all:[{_id:5}]}})
That one doesn't return anything, but it came up when looking for solutions. I found about it in the MongoDB manual
Here's an example of a expected result:
sku:{ "_id" : NumberLong(1013),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(12) },
{ "_id" : NumberLong(16) },
{ "_id" :NumberLong(2) } ] },
sku:
{ "_id" : NumberLong(1014),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(2) },
{ "_id" : NumberLong(16) },
{ "_id" :NumberLong(24) } ] },
sku:
{ "_id" : NumberLong(1015),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(12) },
{ "_id" : NumberLong(2) },
{ "_id" :NumberLong(5) } ] }
Each Sku that comes back has both a item of id:7, and id:2, with any other items they have.
To further clarify, my purpose is to determine how many remaining combinations exist after entering the first couple of items.
Basically a customer will start specifying items, and we'll weed it down to the remaining valid combinations. So Sku.items[0].id=5 can only be combined with items[1].id=7 or items[1].id=10 …. Then items[1].id=7 can only be combined with items[2].id=20 … and so forth
The goal was to simplify my rules for purchase, and drive it all from the Sku codes. I don't know if I dug a deeper hole instead.
Thank you,
On the part of extracting the sku with item IDs 2 and 7, when I recall correctly, you have to use $elemMatch:
db.sku.find({'items' :{ '$all' :[{ '$elemMatch':{ '_id' : 2 }},{'$elemMatch': { '_id' : 7 }}]}} )
which selects all sku where there is each an item with _id 2 and 7.
You can use aggregation pipelines
db.sku.aggregate([
{"$unwind": "$sku.items"},
{"$group": {"_id": "$_id", "items": {"$addToSet":{"_id": "$items._id"}}}},
{"$match": {"items._id": {$all:[2,7]}}}
])

mongodb Embedded document search on parent and child field

I have a nested embedded document CompanyProduct below is structure
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca3"),
"CompanyID" : 90449,
"Name" : Company1,
"CompanyDepartment" : [
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca4")
"DepartmentID" : 287,
"DepartmentName" : "Stores",
"DepartmentInventory" : [
{
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 1,
"ProductName" : "abc",
"Quantity" : 100
},
{
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 2,
"ProductName" : "xyz",
"Quantity" : 1
}
],
}
],
}
There can be N no of companies and each company can have N number of departments and each department can have N number of products.
I want to do a search to find out a particular product quantity under a particular company
I tried below query but it does not work. It returns all the products for the specific company, the less than 20 condition doesn't work.
db.CompanyProduct.find({$and : [{"CompanyDepartment.DepartmentInventory.Quantity":{$lt :20}},{"CompanyID":90449}]})
How should the query be?
You are searching from companyProduct's sub documents. So it will return you companyProduct whole document, it is NoSQL database , some how we do not need to normalize the collection , but your case it has to be normalize , like if you want to EDIT/DELETE any sub document and if there are thousand or millions of sub document then what will you do ... You need to make other collection with the name on CompanyDepartment and companyProduct collection should be
productCompany
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca3"),
"CompanyID" : 90449,
"Name" : Company1,
"CompanyDepartment" : ['53d213c5ddbb1912343a8ca4'],
}
and other collection companyDepartment
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca4")
"DepartmentID" : 287,
"DepartmentName" : "Stores",
"DepartmentInventory" : [
{
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 1,
"ProductName" : "abc",
"Quantity" : 100
},
{
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 2,
"ProductName" : "xyz",
"Quantity" : 1
}
],
}
after this you got array of companyDeparment' ID and only push and pull query will be used on productCompany
A Solution can be
db.YourCollection.aggregate([
{
$project:{
"CompanyDepartment.DepartmentInventory":1,
"CompanyID" : 1
}
},{
$unwind: "$CompanyDepartment"
},{
$unwind: "$CompanyDepartment.DepartmentInventory"
},{
$match:{$and : [{"CompanyDepartment.DepartmentInventory.Quantity":{$lt :20}},{"CompanyID":90449}]}
}
])
the result is
{
"result" : [
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca3"),
"CompanyID" : 90449,
"CompanyDepartment" : {
"DepartmentInventory" : {
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 2,
"ProductName" : "xyz",
"Quantity" : 1
}
}
}
],
"ok" : 1
}

MongoDB Update $push Cannot apply the positional operator without a corresponding query field containing an array

model:
{
"_id" : "a62107e10f388c90a3eb2d7634357c8b",
"_appid" : [
{
"_id" : "1815aaa7f581c838",
"events" : [
{
"_id" : "_TB_launch",
"boday" : [
{
"VERSIONSCODE" : "17",
"NETWORK" : "cmwap",
"VERSIONSNAME" : "2.4.0",
"IMSI" : "460026319223205",
"PACKAGENAME" : "com.androidbox.astjxmjmmshareMM",
"CHANNELID" : "xmjmm17",
"CHANNELNAME" : "浠..?.M寰.俊?.韩?.?1.x锛.,
"eventid" : "_TB_launch",
"uuid" : "a62107e10f388c90a3eb2d7634357c8b",
"creattime" : "1366300799766",
"ts" : ISODate("2013-04-25T06:28:36.403Z")
}
],
"size" : 1
}
],
"size" : 1
}
],
"size" : 1
}
> db.events.update(
{
"_id":"039e569770cec5ff3811e7410233ed27",
"_appid._id":"e880db04064b03bc534575c7f831a83a",
"_appid.events._id":"_TB_launch"
},
{
"$push":{
"_appid.$.events.$.boday":{"111":"123123"}
}
}
);
Cannot apply the positional operator without a corresponding query field containing an array.
Why?!!
You are trying to reference multiple levels of embedding - you can only have one positional $ operator. You won't be able to do something like this until this feature request has been implemented.
Response Here
The short answer is, "no", but working with nested arrays gets
tricky. Here is an example:
db.foo.save({_id: 1, a1:[{_a1id:1, a2:[{a2id:1, a3:[{a3id:1, a4:"data"}]}]}]})
db.foo.find()
{ "_id" : 1, "a1" : [
{ "_a1id" : 1, "a2" : [
{ "a2id" : 1, "a3" : [
{ "a3id" : 1, "a4" : "data" }
] }
] }
] }
db.foo.update({_id:1}, {$push:{"a1.0.a2.0.a3":{a3id:2, a4:"other data"}}})
db.foo.find()
{ "_id" : 1, "a1" : [
{ "_a1id" : 1, "a2" : [
{ "a2id" : 1, "a3" : [
{ "a3id" : 1, "a4" : "data" },
{ "a3id" : 2, "a4" : "other data" }
] }
] }
] }
If you are unsure where one of your sub-documents lies within an
array, you may use one positional operator, and Mongo will update the
first sub-document which matches. For example:
db.foo.update({_id:1, "a1.a2.a2id":1}, {$push:{"a1.0.a2.$.a3":{a3id:2, a4:"other data"}}})