MongoDB aggregation and projection issue - mongodb

helpful people of StackOverflow!
I'm in the process of learning how to work with MongoDB, and am currently stuck with one particular problem.
I'm building a guitar tabs app, working only with an "artist" base document. All other data are subdocuments. Depending on the accessed functionality (e.g: search, list tabs by artist, view single tab), I aggregate and project my documents accordingly.
However, I can't get one projection to work as I want.
Given the following data:
{
"artist" : "Jeff Buckley",
"songs" : [
{
"name" : "Grace",
"tabs" : [
{
"version" : 1,
"tab" : "...",
"tuning" : "DADGBe"
},
{
"version" : 2,
"tab" : "...",
"tuning" : "DADGBe"
}
]
},
{
"name" : "Last Goodbye",
"tabs" : [
{
"version" : 1,
"tab" : "...",
"tuning" : "DGDGBD"
},
{
"version" : 2,
"tab" : "...",
"tuning" : "EADGBe"
}
]
}
]
}
I want to aggregate it the following way for a list view:
{
"artist" : "Jeff Buckley",
"tabs" : [
{
"song" : "Grace",
"version" : 1
},
{
"song" : "Grace",
"version" : 2
},
{
"song" : "Last Goodbye",
"version" : 1
},
{
"song" : "Last Goodbye",
"version" : 2
},
]
}
I tried it with the following projection:
db.tabs.aggregate(
[
{
$project : {
artist : 1,
tabs.song : "$songs.name",
tabs.version : "$songs.tabs.version"
}
}
]
)
But instead I got:
{
"artist" : "Jeff Buckley",
"tabs" : {
"version" : [[2,1],[2,1]],
"song" : ["Grace","Last Goodbye"]
}
}
Can anyone point me in the right direction?
Thanks!

your aggregation query not correct $project only affect your json document keys
your aggretion query like this
db.tabs.aggregate(
{$unwind : "$songs"},
{$unwind : "$songs.tabs"},
{$group : {
_id:"$artist",
tabs:{$push : {song : "$songs.name",version:"$songs.tabs.version"}}}},
{$project : {
tabs:"$tabs",
artist:"$_id",
_id:0}}
).pretty()

Related

weird behaviour mongo aggregation framework

I have one Document called account holding and it has below records,
{ "_id" : ObjectId("57cfbb09e4b024be2f1bce57"),
"_class" : "com.commercestudio.domain.AccountHolding",
"accountId" : "5732933ae4b0b709443b0d1e",
"companyId" : "57223d6de4b06c4ef00415b5",
"brokerageAccountId" : "5KC05007",
"symbol" : "AGG",
"quantity" : 1.0,
"pricePaid" : 112.55,
"processDate" : ISODate("2016-09-06T00:00:00.000Z"),
"recordDate" : ISODate("2016-09-06T00:00:00.000Z"),
"createdOn" : ISODate("2016-09-07T07:00:25.479Z")
}
{ "_id" : ObjectId("57cfbb09e4b024be2f1bce5b"),
"_class" : "com.commercestudio.domain.AccountHolding",
"accountId" : "5732933ae4b0b709443b0d1e",
"companyId" : "57223d6de4b06c4ef00415b5",
"brokerageAccountId" : "5KC05007",
"symbol" : "LQD",
"quantity" : 4.0,
"pricePaid" : 123.78,
"processDate" : ISODate("2016-09-06T00:00:00.000Z"),
"recordDate" : ISODate("2016-09-06T00:00:00.000Z"),
"createdOn" : ISODate("2016-09-07T07:00:25.498Z")
}
.....
now I apply aggrigration framework for finding out latest record date data for perticulat accountId,
db.accountHolding.aggregate(
[
{
"$match": {
"accountId": "5834caf32ae7bacc527ef2f3",
"symbol": {
"$in": [
"IUSG",
"VEA",
"IEMG",
"SCHX",
"VBR",
"IUSV",
"VOE"
]
}
}
},
{
"$group": {
"_id": "$symbol",
"recordDate": {
"$last": "$recordDate"
},
"quantity": {
"$last": "$quantity"
},
"pricePaid": {
"$last": "$pricePaid"
}
}
}
])
and it returns two different results in two different environments,
On my development env. it shows,
{
"_id" : "VEA",
"recordDate" : ISODate("2018-03-02T00:00:00.000Z"),
"quantity" : 22.79609, "pricePaid" : 44.14
}
{ "_id" : "IUSG",
"recordDate" : ISODate("2018-03-02T00:00:00.000Z"),
"quantity" : 8.87831,
"pricePaid" : 55.79
}
something like this and from production env. it shows,
{
"_id" : "VEA",
"recordDate" : ISODate("2018-02-26T00:00:00Z"),
"quantity" : 22.79609,
"pricePaid" : 45.76
}
{
"_id" : "IUSG",
"recordDate" : ISODate("2018-02-26T00:00:00Z"),
"quantity" : 8.87831,
"pricePaid" : 57.47
}
actually, I am unable to find out the solution why this weird behaviour is taken place, as both env has same data.
My database server is deployed on AWS instance.
Can someone help me out with finding out the root cause and solution for the same?
This is expected behavior.
From the docs,
Returns the value that results from applying an expression to the last
document in a group of documents that share the same group by a field.
Only meaningful when documents are in a defined order.
Add $sort before $group stage.
{$sort:{recordDate:1}}

Get document based on multiple criteria of embedded collection

I have the following document, I need to search for multiple items from the embedded collection"items".
Here's an example of a single SKU
db.sku.findOne()
{
"_id" : NumberLong(1192),
"description" : "Uploaded via CSV",
"items" : [
{
"_id" : NumberLong(2),
"category" : DBRef("category", NumberLong(1)),
"description" : "840 tag visual",
"name" : "840 Visual Mini Round",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(7),
"category" : DBRef("category", NumberLong(2)),
"description" : "Maxi",
"name" : "Maxi",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(11),
"category" : DBRef("category", NumberLong(3)),
"description" : "Button",
"name" : "Button",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(16),
"category" : DBRef("category", NumberLong(4)),
"customizationFields" : [
{
"_class" : "CustomizationField",
"_id" : NumberLong(1),
"displayText" : "Custom Print 1",
"fieldName" : "customPrint1",
"listOrder" : 1,
"maxInputLength" : 12,
"required" : false,
"version" : NumberLong(0)
},
{
"_class" : "CustomizationField",
"_id" : NumberLong(2),
"displayText" : "Custom Print 2",
"fieldName" : "customPrint2",
"listOrder" : 2,
"maxInputLength" : 17,
"required" : false,
"version" : NumberLong(0)
}
],
"description" : "2 custom lines of farm print",
"name" : "Custom 2",
"version" : NumberLong(2)
},
{
"_id" : NumberLong(20),
"category" : DBRef("category", NumberLong(5)),
"description" : "Color Red",
"name" : "Red",
"version" : NumberLong(0)
}
],
"skuCode" : "NF-USDA-XC2/SM-BC-R",
"version" : 0,
"webCowOptions" : "840miniwithcust2"
}
There are repeat items.id throughout the embedded collection. Each Sku is made up of multiple items, all combinations are unique, but one item will be part of many Skus.
I'm struggling with the query structure to get what I'm looking for.
Here are a few things I have tried:
db.sku.find({'items._id':2},{'items._id':7})
That one only returns items with the id of 7
db.sku.find({items:{$all:[{_id:5}]}})
That one doesn't return anything, but it came up when looking for solutions. I found about it in the MongoDB manual
Here's an example of a expected result:
sku:{ "_id" : NumberLong(1013),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(12) },
{ "_id" : NumberLong(16) },
{ "_id" :NumberLong(2) } ] },
sku:
{ "_id" : NumberLong(1014),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(2) },
{ "_id" : NumberLong(16) },
{ "_id" :NumberLong(24) } ] },
sku:
{ "_id" : NumberLong(1015),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(12) },
{ "_id" : NumberLong(2) },
{ "_id" :NumberLong(5) } ] }
Each Sku that comes back has both a item of id:7, and id:2, with any other items they have.
To further clarify, my purpose is to determine how many remaining combinations exist after entering the first couple of items.
Basically a customer will start specifying items, and we'll weed it down to the remaining valid combinations. So Sku.items[0].id=5 can only be combined with items[1].id=7 or items[1].id=10 …. Then items[1].id=7 can only be combined with items[2].id=20 … and so forth
The goal was to simplify my rules for purchase, and drive it all from the Sku codes. I don't know if I dug a deeper hole instead.
Thank you,
On the part of extracting the sku with item IDs 2 and 7, when I recall correctly, you have to use $elemMatch:
db.sku.find({'items' :{ '$all' :[{ '$elemMatch':{ '_id' : 2 }},{'$elemMatch': { '_id' : 7 }}]}} )
which selects all sku where there is each an item with _id 2 and 7.
You can use aggregation pipelines
db.sku.aggregate([
{"$unwind": "$sku.items"},
{"$group": {"_id": "$_id", "items": {"$addToSet":{"_id": "$items._id"}}}},
{"$match": {"items._id": {$all:[2,7]}}}
])

view of query in mongodb

I have a collection ,this is one of it's docs :
{
"_id" : 1 ,
"first_name" : "john",
"phone" : [
{
"type" : "mobile",
"number" : "9151112233"
},
{
"type" : "home",
"city_code" : 51,
"number" : "55251544"
},
{
"type" : "mobile",
"number" : "9152425125"
}
]
}
I'm searching for "phones" that contain type "mobile" and show them.
I need something like this :
{
"number" : "9151112233",
"number" : "9152425125"
}
I write this query for that :
db.main.find({ _id : 1 , 'phone.type' : "mobile" },{'phone.number' : true , _id : false}).forEach(printjson)
I want to show only numbers that their types are mobile but this query show all to numbers because this single doc contain others too.
How can I fix it?
I'd use the aggregation framework along with the $unwind, $match and $project commands. This:
db.main.aggregate({$unwind:"$phone"},{$match:{"phone.type":"mobile"}},{$project:{"phone.number":1,"_id":0}})
produces this output:
{ "phone" : { "number" : "9151112233" } }
{ "phone" : { "number" : "9152425125" } }
which only matches the mobile numbers.
http://docs.mongodb.org/manual/aggregation/

MongoDB MapReduce--is there an Aggregation alternative?

I've got a collection with documents using a schema something like this (some members redacted):
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
2,
3,
5
],
"activity" : [
4,
4,
3
],
},
"media" : [
ObjectId("537ea185df872bb71e4df270"),
ObjectId("537ea185df872bb71e4df275"),
ObjectId("537ea185df872bb71e4df272")
]
}
In this schema, the first, second, and third positivity ratings correspond to the first, second, and third entries in the media array, respectively. The same is true for the activity ratings. I need to calculate statistics for the positivity and activity ratings with respect to their associated media objects across all documents in the collection. Right now, I'm doing this with MapReduce. I'd like to, however, accomplish this with the Aggregation Pipeline.
Ideally, I'd like to $unwind the media, answers.ratings.positivity, and answers.ratings.activity arrays simultaneously so that I end up with, for example, the following three documents based on the previous example:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 2,
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df270")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 3
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df275")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 5
"activity" : 3
}
},
"media" : ObjectId("537ea185df872bb71e4df272")
}
]
Is there some way to accomplish this?
The current aggregation framework does not allow you to do this. Being able to unwind multiple arrays that are know to be the same size and creating a document for the ith value of each would be a good feature.
If you want to use the aggregation framework you will need to change your schema a little. For example take the following document schema:
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
{k:1, v:2},
{k:2, v:3},
{k:3, v:5}
],
"activity" : [
{k:1, v:4},
{k:2, v:4},
{k:3, v:3}
],
}},
"media" : [
{k:1, v:ObjectId("537ea185df872bb71e4df270")},
{k:2, v:ObjectId("537ea185df872bb71e4df275")},
{k:3, v:ObjectId("537ea185df872bb71e4df272")}
]
}
By doing this you are essentially adding the index to the object inside the array. After this it's just a matter of unwinding all the arrays and matching on the key.
db.test.aggregate([{$unwind:"$media"},
{$unwind:"$answers.ratings.positivity"},
{$unwind:"$answers.ratings.activity"},
{$project:{"media":1, "answers.ratings.positivity":1,"answers.ratings.activity":1,
include:{$and:[
{$eq:["$media.k", "$answers.ratings.positivity.k"]},
{$eq:["$media.k", "$answers.ratings.activity.k"]}
]}}
},
{$match:{include:true}}])
And the output is:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 1,
"v" : 2
},
"activity" : {
"k" : 1,
"v" : 4
}
}
},
"media" : {
"k" : 1,
"v" : ObjectId("537ea185df872bb71e4df270")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 2,
"v" : 3
},
"activity" : {
"k" : 2,
"v" : 4
}
}
},
"media" : {
"k" : 2,
"v" : ObjectId("537ea185df872bb71e4df275")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 3,
"v" : 5
},
"activity" : {
"k" : 3,
"v" : 3
}
}
},
"media" : {
"k" : 3,
"v" : ObjectId("537ea185df872bb71e4df272")
},
"include" : true
}
]
Doing this creates a lot of extra document overhead and may be slower than your current MapReduce implementation. You would need to run tests to check this. The computations required for this will grow in a cubic way based on the size of those three arrays. This should also be kept in mind.

MongoDB Aggregation Framework: Getting $unwind error when using $group

I have a document structure as follows:
{
"_id" : NumberLong("80000000012"),
[...]
"categories" : [{
"parent" : "MANUFACTURER",
"category" : "Chevrolet"
}, {
"parent" : "MISCELLANEOUS",
"category" : "Miscellaneous"
}],
[...]
}
I am trying to get a distinct list of all 'category' fields for each 'parent' field. I was trying to utilize the aggregation framework to do this with the following query:
db.posts_temp.aggregate(
{$unwind : '$categories'},
{$match : {'categories.parent' : 'MISCELLANEOUS'}},
{$project : {
'_id' : 0,
parent : '$categories.parent',
category : '$categories.category'
}
},
{
$group : {
_id : '$parent',
category : {$addToSet : '$category'}
}
}
);
Running this query returns the following error:
{
"errmsg" : "exception: $unwind: value at end of field path must be an array",
"code" : 15978,
"ok" : 0
}
This seems to be tied to the group portion of the query, because, when I remove it, the query runs correctly, but, obviously, the data is not where I want it to be.
I just tried executing the above aggregation query on my mongo instance. Here are my 3 documents each with a key of categories that has an array of two nested documents.
Here is my data:
{
"_id" : ObjectId("512d5252b748191fefbd4698"),
"categories" : [
{
"parent" : "MANUFACTURER",
"category" : "Chevrolet"
},
{
"parent" : "MISCELLANEOUS",
"category" : "Miscellaneous"
}
]
}
{
"_id" : ObjectId("512d535cb748191fefbd4699"),
"categories" : [
{
"parent" : "MANUFACTURER",
"category" : "Chevrolet"
},
{
"parent" : "MISCELLANEOUS",
"category" : "Pickup"
}
]
}
{
"_id" : ObjectId("512d536eb748191fefbd469a"),
"categories" : [
{
"parent" : "MANUFACTURER",
"category" : "Toyota"
},
{
"parent" : "MISCELLANEOUS",
"category" : "Miscellaneous"
}
]
}
Here is the aggregation query of yours that I ran:
db.posts_temp.aggregate( {$unwind:'$categories'} , {$match: {'categories.parent':'MISCELLANEOUS'}}, {$project:{'_id':0, parent: '$categories.parent', category:'$categories.category'}}, {$group:{_id:'$parent', category:{$addToSet:'$category'}}})
Here is the result:
{
"result" : [
{
"_id" : "MISCELLANEOUS",
"category" : [
"Pickup",
"Miscellaneous"
]
}
],
"ok" : 1
}
Let me know if there some discrepancies between my data and yours.
CSharpie