MongoDB: Upsert with array filter - mongodb

I have collection like this:
mc.db.collection.insert_many([
{"key_array": [1], "another_array": ["a"]},
{"key_array": [2, 3], "another_array": ["b"]},
{"key_array": [4], "another_array": ["c", "d"]},
])
And I'm using this kind of updates:
mc.db.collection.update_one(
{"key_array": 5},
{"$addToSet": {"another_array": "f"}},
upsert=True
)
It works good with updates, but I have trouble when trying to upsert:
It creates a document with a non-array key_array field, like this
{
"_id": ObjectId(...)
"key_array": 5,
"another_array": ["f"]
}
while I want to have this one
{
"_id": ObjectId(...)
"key_array": [5],
"another_array": ["f"]
}
Also, I cannot use the {"key_array": [5]} style query, because it won't match the existing array with length > 1.
So, is there any chance to save such behavior on updates, and receive the correct document structure on inserts?
Any help will be appreciated

This should help.
https://www.mongodb.com/docs/manual/reference/operator/update/setOnInsert/
mc.db.collection.update_one(
{"key_array": 5},
{
"$addToSet": {"another_array": "f"},
"$setOnInsert": {"key_array": [5], ...}
},
upsert=True
)

how about this one.
db.collection.update({
"key_array": 5
},
{
"$addToSet": {
"another_array": "f",
},
"$set": {
"key_array": [
5
]
}
},
{
"upsert": true
})
https://mongoplayground.net/p/4YdhKuzr2I6

Ok, finally I had solved this issue with two consecutive updates, the first as specified in the question - upserts with non-array query field, and the second which converts the field to an array if it belongs to another type.
from pymongo import UpdateOne
bulk = [
UpdateOne(
{"key_array": 6},
{"$addToSet": {"another_array": "e"}},
upsert=True
),
UpdateOne(
{
"$and": [{"key_array": 6},
{"key_array": {"$not": {"$type": "array"}}}]
},
[{"$set": { "key_array": ["$key_array"]}}]
)
]
mc.db.collection.bulk_write(bulk)
But I'm still looking for a more elegant way to do such a thing.

Related

Update multiple fields based on condition in aggregation pipeline MongoDB Atlas trigger

I have the following pipeline that calculate the rank (sort) according to the score when the flag update is set to true:
const pipeline = [
{$match: {"score": {$gt: 0}, "update": true}},
{$setWindowFields: {sortBy: {"score": -1}, output: {"rank": {$denseRank: {}}}}},
{$merge: {into: "ranking"}}
];
await ranking_col.aggregate(pipeline).toArray();
What i do next is to set the rank to 0 when the update flag is set to false:
ranking_col.updateMany({"update": false}, {$set: {"rank": parseInt(0, 10)}});
One of my document looks like this :
{
"_id": "7dqe1kcA7R1YGjdwHsAkV83",
"score": 294,
"update": false,
"rank": 0,
}
I want to avoid the extra updateMany call and do the equivalent inside the pipeline. MongoDB support back then told me to use the $addFields flag this way :
const pipeline = [
{$match: {"score": {$gt: 0}, "update": true}},
{$setWindowFields: {sortBy: {"score": -1}, output: {"rank": {$denseRank: {}}}}},
{$addFields: {rank: {$cond: [{$eq: ['$update', false]},parseInt(0, 10),'$rank']}}},
{$merge: {into: "ranking"}}
];
This is not working in my Atlas Trigger.
Can you please correct my syntax or tell me a good way to do so ?
This aggregation pipeline isn't particularly efficient (a fair amount of work in "$setWindowFields" gets thrown away - more comments about this below), but I think it does what you want. Please check to make sure it's correct as I don't have complete understanding of the collection, its use, etc.
N.B.: This aggregation pipeline is not very efficient because:
It processes every document. There's no leading "$match" to filter documents.
Because of 1., "$setWindowFields" has to "partitionBy": "$update" and sort/rank the "update": false partition and "$and": ["update": true, {"$lte": ["score", 0]}] docs even though it is irrelevant.
All the irrelevant work is thrown away by just setting the "update": false" partition's "rank" to 0 and then excluding all the "$and": ["update": true, {"$lte": ["score", 0]}] documents from the "$merge".
In a large collection, your original two-step update may likely be more efficient.
db.ranking.aggregate([
{
"$setWindowFields": {
"partitionBy": "$update",
"sortBy": {"score": -1},
"output": {
"rank": {"$denseRank": {}}
}
}
},
{
"$set": {
"rank": {
"$cond": [
"$update",
"$rank",
0
]
}
}
},
{
"$match": {
"$expr": {
"$not": [{"$and": ["$update", {"$lte": ["$score", 0]}]}]
}
}
},
{"$merge": "ranking"}
])
Try it on mongoplayground.net.

MongoDB: how to pass current value to operator?

Here is sample db structure where I'm trying to modify value of every work_starts_at.
{
"_id": 1,
"work_week": [
{'name': 'ponedeljak', 'work_hours': []},
{'name': 'utorak', 'work_hours': []},
]
},
{
"_id": 2,
"work_week": [
{'name': 'monday', 'work_hours': [
{'work_starts_at': 1, 'work_ends_at': 2}
]},
]
},
{
"_id": 3,
"work_week": [
{'name': 'понедельник', 'work_hours': [
{'work_starts_at': 2, 'work_ends_at': 3},
{'work_starts_at': 6, 'work_ends_at': 7},
]},
{'name': 'вторник', 'work_hours': []},
]
}
The best solution I came to is the following (in python), but instead of subtracting 5 from current value of work_week.work_hours.work_starts_at it's traversing I got null. I suppose it's because construction $$CURRENT.work_hours.work_starts_at doesn't point to work_starts_at, so I'm actually subtracting 5 from null.
How can I properly address value of currently traversed element?
collection.update_many(
{},
[
{
'$set': {
'work_week.work_hours.work_starts_at': {
'$subtract': ['$$CURRENT.work_hours.work_starts_at', 5]
}
}
}
]
You can do this way from the mongo shell (it must be very similar in python):
db.collection.update({
"work_week.work_hours.work_starts_at": {
$exists: true
}
},
{
$inc: {
"work_week.$[].work_hours.$[].work_starts_at": -5
}
},
{
multi: true
})
playground
Thanks to #R2D2! I've translated it to Python (pymongo):
collection.update_many(
{'work_week.work_hours.work_starts_at': {'$exists': True}},
{'$inc': {
'work_week.$[].work_hours.$[].work_starts_at': -5
}}
)

Scala / MongoDB - removing duplicate

I have seen very similar questions with solutions to this problem, but I am unsure how I would incorporate it in to my own query. I'm programming in Scala and using a MongoDB Aggregates "framework".
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
// filter duplicates here ?
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "item", "content")))
)
The query returns duplicate objects which is undesirable. I would like to remove these. How could I go about incorporating Aggregates.group and "$addToSet" to do this? Or any other reasonable solution would be great too.
Note: I have to omit some details about the query, so the store lookup aggregate is not there. However, I want to remove the duplicates later in the query so it hopefully shouldn't matter.
Please let me know if I need to provide more information.
Thanks.
EDIT: 31/ 07/ 2019: 13:47
I have tried the following:
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
Aggregates.group("$item.itemID,
Accumulators.first("ID", "$ID"),
Accumulators.first("itemName", "$itemName"),
Accumulators.addToSet("item", "$item")
Aggregates.unwind("$items"),
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "items", "content")))
)
But my query now returns zero results instead of the duplicate result.
You can use $first to remove the duplicates.
Suppose I have the following data:
[
{"_id": 1,"item": "ABC","sizes": ["S","M","L"]},
{"_id": 2,"item": "EFG","sizes": []},
{"_id": 3, "item": "IJK","sizes": "M" },
{"_id": 4,"item": "LMN"},
{"_id": 5,"item": "XYZ","sizes": null
}
]
Now, let's aggregate it using $first and $unwind and see the difference:
First let's aggregate it using $first
db.collection.aggregate([
{ $sort: {
item: 1
}
},
{ $group: {
_id: "$item",firstSize: {$first: "$sizes"}}}
])
Output
[
{"_id": "XYZ","firstSize": null},
{"_id": "ABC","firstSize": ["S","M","L" ]},
{"_id": "IJK","firstSize": "M"},
{"_id": "EFG","firstSize": []},
{"_id": "LMN","firstSize": null}
]
Now, Let's aggregate it using $unwind
db.collection.aggregate([
{
$unwind: "$sizes"
}
])
Output
[
{"_id": 1,"item": "ABC","sizes": "S"},
{"_id": 1,"item": "ABC","sizes": "M"},
{"_id": 1,"item": "ABC","sizes": "L},
{"_id": 3,"item": "IJK","sizes": "M"}
]
You can see $first removes the duplicates where as $unwind keeps the duplicates.
Using $unwind and $first together.
db.collection.aggregate([
{ $unwind: "$sizes"},
{
$group: {
_id: "$item",firstSize: {$first: "$sizes"}}
}
])
Output
[
{"_id": "IJK", "firstSize": "M"},
{"_id": "ABC","firstSize": "S"}
]
group then addToSet is an effective way to deal with your problem !
it looks like this in mongoshell
db.sales.aggregate(
[
{
$group:
{
_id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } },
itemsSold: { $addToSet: "$item" }
}
}
]
)
in scala you can do it like
Aggregates.group("$groupfield", Accumulators.addToSet("fieldName","$expression"))
if you have multiple field to group
Aggregates.group(new BasicDBObject().append("fieldAname","$fieldA").append("fieldBname","$fieldB")), Accumulators.addToSet("fieldName","expression"))
then unwind

MongoDB return subTotal and total in a single query

I have this data in a collection:
{id:1, types:{'A':4, 'B': 3, 'C':12}}
{id:1, types:{'A':8, 'B': 2, 'C':11}}
{id:2, types:{'A':7, 'B': 6, 'C':14}}
{id:3, types:{'A':1, 'B': 9, 'C':15}}
I want to query for the total of each type for id:1 but I also want to know the totals for each type for all ids in a single query. I would like the output to look something like this:
{id:1, types:{'A':12, 'B':5, 'C':12, 'sumA':20,'sumB':20,'sumC':52}}
I can do this by calling 2 separate queries. One query containing
{$match: {id:1}}
And one that does not have a $match option. But I would like to know if it can be done in a single query.
Edit: types A,B and C are dynamic so I won't know the values beforehand.
Thanks!
You can use below aggregation query.
$group aggregation with $sum to calculate total count and $cond to limit the count for specific id.
db.col.aggregate([
{"$group":{
"_id":null,
"sumA":{"$sum":"$types.A"},
"sumB":{"$sum":"$types.B"},
"sumC":{"$sum":"$types.C"},
"A":{"$sum":{"$cond":[{"$eq":["$id",1]},"$types.A",0]}},
"B":{"$sum":{"$cond":[{"$eq":["$id",1]},"$types.B",0]}},
"C":{"$sum":{"$cond":[{"$eq":["$id",1]},"$types.C",0]}},
}}
])
Update to below structure
{id:1, types:[{"k":'A', v:4}, { "k":'B', "v": 3}, { "k":'C', "v": 12}]}
{id:1, types:[{"k":'A', v:8}, { "k":'B', "v": 2}, { "k":'C', "v": 11}]}
{id:2, types:[{"k":'A', v:7}, { "k":'B', "v": 6}, { "k":'C', "v": 14}]}
{id:3, types:[{"k":'A', v:1}, { "k":'B', "v": 9}, { "k":'C', "v": 15}]}
Aggregation query:
db.col.aggregate([
{"$unwind":"$types"},
{"$group":{
"_id":"$types.k",
"sum":{"$sum":"$types.v"},
"type":{"$sum":{"$cond":[{"$eq":["$id",1]},"$types.v",0]}}
}}
])

Record Aggregation in Mongo across an Array

I have an array stored in each document/record in a mongo database and I need to compute a score for each element in this array and aggregate the scores by another field in the array element.
It's hard for me to explain what I am trying to do in english so here is a python example of what I am looking to do.
records = [
{"state": "a", "initvalue": 1, "data": [{"time": 1, "value": 2}, {"time": 2, "value": 4}]},
{"state": "a", "initvalue": 5, "data": [{"time": 1, "value": 7}, {"time": 2, "value": 9}]},
{"state": "b", "initvalue": 4, "data": [{"time": 1, "value": 2}, {"time": 2, "value": 1}]},
{"state": "b", "initvalue": 5, "data": [{"time": 1, "value": 3}, {"time": 2, "value": 2}]}
]
def sign(record):
return 1 if record["state"] == "a" else -1
def score(record):
return [{"time": element["time"], "score": sign(record) * (element["value"] - record["initvalue"])} for element in record["data"]]
scores = []
for record in records:
scores += score(record)
sums = {}
for score in scores:
if score["time"] not in sums:
sums[score["time"]] = 0
sums[score["time"]] += score["score"]
print '{:>4} {:>5}'.format('time', 'score')
for time, value in sums.iteritems():
print '{:>4} {:>5}'.format(time, value)
This computes a slightly different score function for state a and for state b and then aggregates the scores across each time entry.
Here is the result
time score
1 7
2 13
I am trying to figure out how to do this in mongo, without pulling the records into python and reinventing aggregation.
Thanks for the help!
Ok. I figured this out. Once I really understood how pipeline's work and about the condition function everything came together.
from pymongo import MongoClient
client = MongoClient()
result = client.mydb.foo.aggregate([
{'$project': {'_id': 0, 'data': 1, 'initvalue': 1, 'state': 1}},
{'$unwind': '$data'},
{'$project': {
'time': '$data.time',
'score': {'$multiply': [
{'$cond': [{'$eq': ['$state', 'a']}, 1, -1]},
{'$subtract': ['$data.value', '$initvalue']}
]}
}},
{'$group': {
'_id': '$time',
'score': {'$sum': '$score'}
}},
{'$project': {'_id': 0, 'time': '$_id', 'score': 1}}
])
for record in result['result']:
print record
This yields the desired result
{u'score': 13, u'time': 2}
{u'score': 7, u'time': 1}