MongoDB aggregation on same collection - mongodb

I need to get the value from another document and here are my sample documents :
[
{
"product_code": "0172745",
"condition_type": "ZPRE",
"price_group": "33",
"visibility": "brp, network",
"sales_org": "1010",
"distr_channel": "10",
"price_type": "dealer",
"price_market": "na",
"valid_from": "2019-07-01",
"valid_to": "9999-12-31",
"price_price_uom": "5.30",
"price_uom": "PC",
"price_sales_uom": "12.90",
"sales_uom": "PC",
"currency": "CAD",
"last_change_date": "2020-01-17T20:14:15"
}
},
{
"product_code": "0172745",
"condition_type": "ZPBA",
"price_group": "2R",
"visibility": "brp, network",
"sales_org": "1010",
"distr_channel": "10",
"price_type": "dealer",
"price_market": "na",
"valid_from": "2019-07-01",
"valid_to": "9999-12-31",
"price_price_uom": "5.30",
"price_uom": "PC",
"price_sales_uom": "5.300",
"sales_uom": "PC",
"currency": "CAD",
"last_change_date": "2020-01-17T20:14:15"
}
]
Output should be document 2 + price_sales_uom from document 1, aggregation is product_code and condition_type:
{
"product_code": "0172745",
"condition_type": "ZPBA",
"price_group": "2R",
"price_sales_uom": "12.90"
}

Please try below query, this will group all documents based on product_code :
db.collection.aggregate([
{
$group: {
_id: '$product_code', product_code: { $first: '$product_code' },
condition_type: { $last: '$condition_type' },
price_group: { $last: '$price_group' },
price_sales_uom: { $max: '$price_sales_uom' }
}
}, { $project: { _id: 0 } }])
Note : At it's given state of this question, this query assumes to get maximum value for price_sales_uom over matched documents (If it has to be from first document use $first), Similarly condition_type, price_group are being retrieved from last document in the group, but for product_code really doesn't matter whether $first or $last as they'll are same.

Related

How to retrieve each single array element from mongo pipeline?

Let's assume that this is how a sample document looks like in mongo-db,
[
{
"_id": "1",
"attrib_1": "value_1",
"attrib_2": "value_2",
"months": {
"2": {
"month": "2",
"year": "2008",
"transactions": [
{
"field_1": "val_1",
"field_2": "val_2",
},
{
"field_1": "val_4",
"field_2": "val_5",
"field_3": "val_6"
},
]
},
"3": {
"month": "3",
"year": "2018",
"transactions": [
{
"field_1": "val_7",
"field_3": "val_9"
},
{
"field_1": "val_10",
"field_2": "val_11",
},
]
},
}
}
]
The desired output is something like this, (I am just showing it for months 2 & 3)
id
months
year
field_1
field_2
field_3
1
2
2008
val_1
val_2
1
2
2008
val_4
val_5
val_6
1
3
2018
val_7
val_9
1
3
2018
val_10
val_11
My attempt:
I tried something like this in Py-Mongo,
pipeline = [
{
# some filter logic here to filter data basically first
},
{
"$addFields": {
"latest": {
"$map": {
"input": {
"$objectToArray": "$months",
},
"as": "obj",
"in": {
"all_field_1" : {"$ifNull" : ["$$obj.v.transactions.field_1", [""]]},
"all_field_2": {"$ifNull" : ["$$obj.v.transactions.field_2", [""]]},
"all_field_3": {"$ifNull" : ["$$obj.v.transactions.field_3", [""]]},
"all_months" : {"$ifNull" : ["$$obj.v.month", ""]},
"all_years" : {"$ifNull" : ["$$obj.v.year", ""]},
}
}
}
}
},
{
"$project": {
"_id": 1,
"months": "$latest.all_months",
"year": "$latest.all_years",
"field_1": "$latest.all_field_1",
"field_2": "$latest.all_field_2",
"field_3": "$latest.all_field_3",
}
}
]
# and I executed it as
my_db.collection.aggregate(pipeline, allowDiskUse=True)
The above is actually bring the data but it's bringing them in lists. Is there a way to easily bring them one each row in mongo itself?
the above brings data in this way,
id
months
year
field_1
field_2
field_3
1
["2", "3"]
["2008", "2018"]
[["val_1", "val_4"], ["val_7", "val_10"]]
[["val_2", "val_5"], ["", "val_11"]]
[["", "val_6"], ["val_9", ""]]
Would highly appreciate your valuable inputs regarding the same and a better way to do the same as well!
Thanks for your time.
My Mongo version is 3.4.6 and I am using PyMongo as my driver. You can see the query in action at mongo-db-playground
This is might be bad idea to do all process in a aggregation query, you could do this in your client side,
I have created a query which is lengthy may cause performance issues in huge data,
$objectToArray convert months object to array
$unwind deconstruct months array
$unwind deconstruct transactions array and provide index field index
$group by _id, year, month and index, and get first object from transactions in fields
$project you can design your response if you want otherwise this is optional i have added in playground link
my_db.collection.aggregate([
{ # some filter logic here to filter data basically first },
{ $project: { months: { $objectToArray: "$months" } } },
{ $unwind: "$months" },
{
$unwind: {
path: "$months.v.transactions",
includeArrayIndex: "index"
}
},
{
$group: {
_id: {
_id: "$_id",
year: "$months.v.year",
month: "$months.v.month",
index: "$index"
},
fields: { $first: "$months.v.transactions" }
}
}
], allowDiskUse=True);
Playground

Aggregation: group date in nested documents (nested object)

I've a mongo database with user information. If new userdata is added I do a duplicate check and in case of a duplicate entry, I do not create a new document, but instead update the existing one with a nested node (under tracking) adding the timestamp and some other informations.
{
"_id": "5e95dee277dcc55e9d18bf1a",
"email": "test#test.com",
"tracking": [
{
"domain": "mydomain",
"subdomain": "",
"ip": "59.214.120.68",
"timestamp": "2020-03-21 20:06:12",
"externalID": "82"
},
{
"domain": "mydomain",
"subdomain": "",
"ip": "99.214.130.33",
"timestamp": "2020-03-26 18:43:01",
"externalID": "483"
},
{
"domain": "mydomain",
"subdomain": "",
"ip": "19.214.131.22",
"timestamp": "2020-03-26 18:48:42",
"externalID": "485"
}
]
}
Now I'm trying to aggregate the documents and group/count them by date. Is there any option how I can do this with diffrent number of nodes under tracking for each document?
You can do something like this.
{
$unwind: {
path: "$tracking",
preserveNullAndEmptyArrays: true
}
}, {
$group: {
_id: "$tracking.timestamp",
count: {
$sum: 1
}
}
}
It would be much better if you could store datetimes as actual datetime type and not string representations. But assuming you cannot for now, to group by the date component without the time (which is what I believe you seek) you can use substr:
db.foo.aggregate([
{$unwind: "$tracking"}
,{$group: {"_id": {$substr: [ "$tracking.timestamp", 0, 10] } , n: {$sum:1} }}
]);

How to find a result and apply localization in MongoDB?

Given the following sample data
db.cars.insertMany([
{
"category": "sedan",
"model": {
"manufacturer": {
"en": "Mercedes",
"ru": "Мерседес"
},
"number": "E320"
}
},
{
"category": "SUV",
"model": {
"manufacturer": {
"en": "Audi",
"ru": "Ауди"
},
"number": "Q7"
}
},
])
I can select a category by its' name with the following query
db.cars.find({'category': 'sedan'})
And also, if I want to do mapping for a given field, I can do the following
db.cars.aggregate({$project: {'model.manufacturer': '$model.manufacturer.ru'}})
Now combining those 2, I get
db.cars.aggregate([{$match: {'category': 'SUV'}}, {$project: {'model.manufacturer': '$model.manufacturer.ru'}}])
Now my question is, is this a right approach, and if yes, how do I keep all the other values without typing them in the aggregation query(like {'model': 1, ... })
Adds new fields to documents. $addFields outputs documents that contain all existing fields from the input documents and newly added fields..
The $addFields stage is equivalent to a $project stage that explicitly specifies all existing fields in the input documents and adds the new fields.
Starting in version 4.2, MongoDB adds a new aggregation pipeline stage $set that is an alias for $addFields.
https://docs.mongodb.com/manual/reference/operator/aggregation/addFields/
db.cars.aggregate([
{
$match: {
"category": "SUV"
}
},
{
$addFields: {
"model.manufacturer": "$model.manufacturer.ru"
}
}
])
or for MongoDB >= v4.2
db.cars.aggregate([
{
$match: {
"category": "SUV"
}
},
{
$set: {
"model.manufacturer": "$model.manufacturer.ru"
}
}
])

How to use aggregation to pull information from a nested array on objects in mongoDB

I am trying to get the hang of the aggregation in mongoDB, I am new at this and I think I kinda got lost.
I have a collection of users, each user has an array of social networks, and each item in this array is an object.
How do I get value only from a given social network/s object/s.
{
"user":{
"_id": "d10430c8-e59",
"username": "John",
"password": "f7wei93",
"location": "UK",
"gender": "Male",
"age": 26,
"socials": [
{
"type": "instagram",
"maleFollowers": 23000,
"femaleFollowers": 65000,
"posts": 5400,
"avgFollowerAge": 22
},
{
"type": "facebook",
"maleFollowers": 4000,
"femaleFollowers": 6700,
"posts": 330,
"avgFollowerAge": 25
},
{
"type": "snapchat",
"maleFollowers": 873,
"femaleFollowers": 1200,
"posts": 1200,
"avgFollowerAge": 21
},
]
}
}
I want to get the totalFollowersCount ( maleFollowers + femaleFollowers)
for each social network type that is given in an array.
for example ["instagram", "snapchat"] will only give me the totalFollowersCount for this two specific networks.
I started playing around with aggregation but didn't figure it out all the way.
First, we need to flatten the socials array. Then, we group by socials type + 'user' fields and sum( maleFollowers + femaleFollowers).
With $match operator, we filter desired social networks.
db.users.aggregate([
{
$unwind: "$user.socials"
},
{
$group: {
_id: {
user: "$user._id",
type: "$user.socials.type"
},
"totalFollowersCount": {
$sum: {
$add: [
"$user.socials.femaleFollowers",
"$user.socials.maleFollowers"
]
}
}
}
},
{
$match: {
"_id.type": {
$in: [
"instagram",
"snapchat"
]
}
}
}
])
MongoPlayground

select only subdocuments or arrays

{
"_id":{
"oid":"4f33bf69873dbc73a7d21dc3"
},
"country":"IND",
"states":[{
"name":"orissa",
"direction":"east",
"population":41947358,
"districts":[{
"name":"puri",
"headquarter":"puri",
"population":1498604
},
{
"name":"khordha",
"headquarter":"bhubaneswar",
"population":1874405
}
]
},
{
"name":"andhra pradesh",
"direction":"south",
"population":84665533,
"districts":[{
"name":"rangareddi",
"headquarter":"hyderabad",
"population":3506670
},
{
"name":"vishakhapatnam",
"headquarter":"vishakhapatnam",
"population":3789823
}
]
}
]
}
In above collection(i.e countries) i have only one document , and i want to fetch the details about a particular state (lets say "country.states.name" : "orissa" ) ,But i want my result as here under instead of entire document .Is there a way in Mogo...
{
"name": "orissa",
"direction": "east",
"population": 41947358,
"districts": [
{
"name": "puri",
"headquarter": "puri",
"population": 1498604
},
{
"name": "khordha",
"headquarter": "bhubaneswar",
"population": 1874405
}
]
}
Thanks
Tried this:
db.countries.aggregate(
{
"$project": {
"state": "$states",
"_id": 0
}
},
{
"$unwind": "$state"
},
{
"$group": {
"_id": "$state.name",
"state": {
"$first": "$state"
}
}
},
{
"$match": {
"_id": "orissa"
}
}
);
And got:
{
"result" : [
{
"_id" : "orissa",
"state" : {
"name" : "orissa",
"direction" : "east",
"population" : 41947358,
"districts" : [
{
"name" : "puri",
"headquarter" : "puri",
"population" : 1498604
},
{
"name" : "khordha",
"headquarter" : "bhubaneswar",
"population" : 1874405
}
]
}
}
],
"ok" : 1
You can't do it right now, but you will be able to with $unwind in the aggregation framework. You can try it now with the experimental 2.1 branch, the stable version will come out in 2.2, probably in a few months.
Any query in mongodb always return root document.
There is only one way for you to load one sub document with parent via $slice if you know ordinal number of state in nested array:
// skip ordinalNumberOfState -1, limit 1
db.countries.find({_id: 1}, {states:{$slice: [ordinalNumber -1 , 1]}})
$slice work in default order (as documents was inserted in nested array).
Also if you don't need fields from a country you can include only _id and states in result:
db.countries.find({_id: 1}, {states:{$slice: [ordinalNumber -1 , 1]}, _id: 1})
Then result document will looks like this one:
{
"_id":{
"oid":"4f33bf69873dbc73a7d21dc3"
},
"states":[{
"name":"orissa",
"direction":"east",
"population":41947358,
"districts":[{
"name":"puri",
"headquarter":"puri",
"population":1498604
},
{
"name":"khordha",
"headquarter":"bhubaneswar",
"population":1874405
}
]
}]
}
db.countries.find({ "states": { "$elemMatch": { "name": orissa }}},{"country" : 1, "states.$": 1 })
If you don't want to use aggregate, you can do it pretty easily at the application layer using underscore (included by default):
var country = Groops.findOne({"property":value);
var state _.where(country, {"state":statename});
This will give you the entire state record that matches statename. Very convenient.