$facet \ $bucket date manipulation - mongodb

I am using $facet aggregation for e-commerce style platform.
for this case I have an products collections, and the product schema contain many fields.
I am using same aggregation process to get all the required facets.
the question is, if I am able to use the same $facet \ $bucket aggrigation to group documents by manipulation of specific field -
for example - in product schema I have releaseDate field which is Date (type) field.
currently the aggregation query is looking like this:
let facetsBodyExample = {
'facetName': [
{ $unwind: `$${fieldData.key}` },
{ $sortByCount: `$${fieldData.key}` }
],
'facetName2': [
{ $unwind: `$${fieldData.key}` },
{ $sortByCount: `$${fieldData.key}` }
],
...,
...
};
let results = await Product.aggregate({
$facet: facetsBodyExample
});
the documents looks like
{
_id : 'as5d16as1d65sa65d165as1d',
name : 'product name',
releaseDate: '2015-07-01T00:00:00.000Z',
field1:13,
field2:[1,2,3],
field3:'abx',
...
}
I want to create custom facets (groups) by quarter + year in format like 'QQ/YYY', without defining any boundaries.
now, I am getting groups of exact match of date, and I want to group them into quarter + year groups, if possible in the same $facet aggregation.
Query result of the date field I want to customize:
CURRENT:
{
relaseDate: [
{ "_id": "2017-01-01T00:00:00.000Z", "count": 26 },
{ "_id": "2013-04-01T00:00:00.000Z", "count": 25 },
{ "_id": "2013-07-01T00:00:00.000Z", "count": 23 },
...
]
}
DESIRED:
{
relaseDate: [
{ "_id": "Q1/2014", "count": 100 },
{ "_id": "Q2/2014", "count": 200 },
{ "_id": "Q3/2016", "count": 300 },
...
]
}
Thanks !!

Related

How to retrieve each single array element from mongo pipeline?

Let's assume that this is how a sample document looks like in mongo-db,
[
{
"_id": "1",
"attrib_1": "value_1",
"attrib_2": "value_2",
"months": {
"2": {
"month": "2",
"year": "2008",
"transactions": [
{
"field_1": "val_1",
"field_2": "val_2",
},
{
"field_1": "val_4",
"field_2": "val_5",
"field_3": "val_6"
},
]
},
"3": {
"month": "3",
"year": "2018",
"transactions": [
{
"field_1": "val_7",
"field_3": "val_9"
},
{
"field_1": "val_10",
"field_2": "val_11",
},
]
},
}
}
]
The desired output is something like this, (I am just showing it for months 2 & 3)
id
months
year
field_1
field_2
field_3
1
2
2008
val_1
val_2
1
2
2008
val_4
val_5
val_6
1
3
2018
val_7
val_9
1
3
2018
val_10
val_11
My attempt:
I tried something like this in Py-Mongo,
pipeline = [
{
# some filter logic here to filter data basically first
},
{
"$addFields": {
"latest": {
"$map": {
"input": {
"$objectToArray": "$months",
},
"as": "obj",
"in": {
"all_field_1" : {"$ifNull" : ["$$obj.v.transactions.field_1", [""]]},
"all_field_2": {"$ifNull" : ["$$obj.v.transactions.field_2", [""]]},
"all_field_3": {"$ifNull" : ["$$obj.v.transactions.field_3", [""]]},
"all_months" : {"$ifNull" : ["$$obj.v.month", ""]},
"all_years" : {"$ifNull" : ["$$obj.v.year", ""]},
}
}
}
}
},
{
"$project": {
"_id": 1,
"months": "$latest.all_months",
"year": "$latest.all_years",
"field_1": "$latest.all_field_1",
"field_2": "$latest.all_field_2",
"field_3": "$latest.all_field_3",
}
}
]
# and I executed it as
my_db.collection.aggregate(pipeline, allowDiskUse=True)
The above is actually bring the data but it's bringing them in lists. Is there a way to easily bring them one each row in mongo itself?
the above brings data in this way,
id
months
year
field_1
field_2
field_3
1
["2", "3"]
["2008", "2018"]
[["val_1", "val_4"], ["val_7", "val_10"]]
[["val_2", "val_5"], ["", "val_11"]]
[["", "val_6"], ["val_9", ""]]
Would highly appreciate your valuable inputs regarding the same and a better way to do the same as well!
Thanks for your time.
My Mongo version is 3.4.6 and I am using PyMongo as my driver. You can see the query in action at mongo-db-playground
This is might be bad idea to do all process in a aggregation query, you could do this in your client side,
I have created a query which is lengthy may cause performance issues in huge data,
$objectToArray convert months object to array
$unwind deconstruct months array
$unwind deconstruct transactions array and provide index field index
$group by _id, year, month and index, and get first object from transactions in fields
$project you can design your response if you want otherwise this is optional i have added in playground link
my_db.collection.aggregate([
{ # some filter logic here to filter data basically first },
{ $project: { months: { $objectToArray: "$months" } } },
{ $unwind: "$months" },
{
$unwind: {
path: "$months.v.transactions",
includeArrayIndex: "index"
}
},
{
$group: {
_id: {
_id: "$_id",
year: "$months.v.year",
month: "$months.v.month",
index: "$index"
},
fields: { $first: "$months.v.transactions" }
}
}
], allowDiskUse=True);
Playground

Merging / aggregating multiple documents in MongoDB

I have a little problem with my aggregations. I have a large collection of flat documents with the following schema:
{
_id:ObjectId("5dc027d38da295b969eca568"),
emp_no:10001,
salary:60117,
from_date:"1986-06-26",
to_date:"1987-06-26"
}
It's all about annual employee salaries. The data is exported from relational database so there are multiple documents with the same value of "emp_no" but the rest of their attributes vary. I need to aggregate them by values of attribute "emp_no" so as a result I will have something like this:
//one document
{
_id:ObjectId("5dc027d38da295b969eca568"),
emp_no:10001,
salaries: [
{
salary:60117,
from_date:"1986-06-26",
to_date:"1987-06-26"
},
{
salary:62102,
from_date:"1987-06-26",
to_date:"1988-06-25"
},
...
]
}
//another document
{
_id:ObjectId("5dc027d38da295b969eca579"),
emp_no:10002,
salaries: [
{
salary:65828,
from_date:"1996-08-03",
to_date:"1997-08-03"
},
...
]
}
//and so on
Last but not least there are almost 2.9m of documents so aggregating by "emp_no" manually would be a bit of a problem. Is there a way I can aggregate them using just mongo queries? How do I do this kind of thing? Thank you in advance for any help
The group stage of aggregation pipeline can be used to get this type of aggregates. Specify the attribute you want to group by as the value of _id field in the group stage.
How does the below query work for you?
db.collection.aggregate([
{
"$group": {
"_id": "$emp_no",
"salaries": {
"$push": {
"salary": "$salary",
"from_data": "$from_date",
"to_data": "$to_date"
}
},
"emp_no": {
"$first": "$emp_no"
},
"first_document_id": {
"$first": "$_id"
}
}
},
{
"$project": {
"_id": "$first_document_id",
"salaries": 1,
"emp_no": 1
}
}
])

MongoDB query for objects in an array

I have a collection that consists of objects like this:
{
"name": "Event One",
"user_id": "1"
"rsvp": [
{"id": "1","name":"John Doe","industry":"Software"},
{"id": "2","name":"John Doe II","industry":"Software"},
]
}
I am trying to query all events that a user has rsvp too and that he did not create. Then match the rsvp with the industry the are in. Is it possible to query in mongoDB and have it returned like this:
[
{"id": "1","name":"John Doe"},
{"id": "2","name":"John Doe II"},
]
The query I have is below:
events.find({"$and":[{"rsvp.contact.indusrty":{"$in":["Software","Information Systems"]}},{"user_id":{"$ne":"5d335704802df000076bad97"}}]},{"projection":{"rsvp.contact.name":true},"typeMap":{"root":"array","document":"array"}})
I am new to MONGODB and can't seem to find anything that is helping me figure this out.
You need $match to define filtering criteria, $unwind to get single document per rsvp and $replaceRoot to promote rsvp to root model:
db.events.aggregate([
{
$match: {
"$and":[
{
"rsvp.industry":{
"$in":["Software","Information Systems"]
}
},
{ "user_id":{"$ne":"5d335704802df000076bad97"}}
]
}
},
{
$unwind: "$rsvp"
},
{
$replaceRoot: {
newRoot: "$rsvp"
}
},
{
$project: {
industry: 0
}
}
])
Mongo Playground

How to write MongoDB query to filter selectively

I want to write a query where we have 2 types of employees stored in a database. I want a list of all the employees sorted by their start date, but if an employee is on ‘contract’ then I want to include only those employees whose end date is still in future.
Example of such a collection would be:
{
"_id": 123,
"startDate": some date
"endDate": some date
"type": “FULL_TIME",
},
{
"_id": 123,
"startDate": some date
"endDate": some date
"type": “CONTRACT",
},
You can try below $or find query.
First condition to get all active CONTRACT employees and second part to get all FULL_TIME employees followed by asc sorting on startDate.
db.collection.find(
{ $or:
[
{ $and:
[
{ type: "CONTRACT" },
{ endDate: { $gt: new ISODate() } }
]
},
{ type:"FULL_TIME" }
]
}
).sort( { startDate: 1 } );

Aggregation in mongodb for nested documents

I have a document in the following format:
"summary":{
"HUL":{
"hr_0":{
"ts":None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
"topics":[
"Basketball",
"Football"
],
"geo":{
"locations":{
"Delhi":34,
"Kolkata":56,
"Pune":79,
"Bangalore":92,
"Mumbai":54
},
"mst_act":{
"loc":Bangalore,
"lat_long":None
}
}
}
},
"hr_1":{....},
"hr_2":{....},
.
.
"hr_23":{....}
I want to run an aggregation in pymongo that sums up the pos, neg and neu sentiments for all hours of the day "hr_0" to "hr_23".
I am having trouble in constructing the pipeline command in order to do this as the fields I am interested in are in nested dictionaries. Would really appreciate your suggestions.
Thanks!
It's going to be pretty difficult to come up with an aggregation pipeline that will give you the desired aggregates because your document schema has some dynamic keys which you can't use as an identifiey expression in the group operator pipeline.
However, a workaround using the current schema would be to iterate over the find cursor and extract the values you want to add up within the loop. Something like the following:
pos_total = 0
neg_total = 0
neu_total = 0
cursor = db.collection.find()
for doc in cursor:
for i in range(0, 24):
pos_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["pos"]
neg_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neg"]
neu_total += ddoc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neu"]
print(pos_total)
print(neg_total)
print(neu_total)
If you could do with changing the schema, then the following schema would be ideal for using the aggregation framework:
{
"summary": {
"HUL": [
{
"_id": "hr_0",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
"topics":[
"Basketball",
"Football"
],
"geo":{
"locations":{
"Delhi":34,
"Kolkata":56,
"Pune":79,
"Bangalore":92,
"Mumbai":54
},
"mst_act":{
"loc":Bangalore,
"lat_long":None
}
}
}
},
{
"_id": "hr_2",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
...
}
},
...
{
"_id": "hr_23",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
...
}
}
]
}
}
The aggregation pipeline that would give you the required totals is:
var pipeline = [
{
"$unwind": "$summary.HUL"
},
{
"$group": {
"_id": "$summary.HUL._id",
"pos_total": { "$sum": "$summary.HUL.Insights.sentiments.pos" },
"neg_total": { "$sum": "$summary.HUL.Insights.sentiments.neg" },
"neu_total": { "$sum": "$summary.HUL.Insights.sentiments.neu" },
}
}
]
result = db.collection.aggregate(pipeline)