Upsert with pymongo and a custom _id field

Upsert with pymongo and a custom _id field - mongodb

I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?

In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}

I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.

Related

Count of a nested value of all entries in mongodb collection

I have a collection named outbox which has this kind of structure
"_id" :ObjectId("5a94e02bb0445b1cc742d795"),
"track" : {
"added" : {
"date" : ISODate("2020-12-03T08:48:51.000Z")
}
},
"provider_status" : {
"job_number" : "",
"count" : {
"total" : 1,
"sent" : 0,
"delivered" : 0,
"failed" : 0
},
"delivery" : []
}
I have 2 tasks. First I want the sum of all the "total","sent","failed" on all the entries in the collection no matter what their objectId is. ie I want sum of all the "total","sent","delivered" and "failed". Second I want all these only for a given object Id between Start and End date.
I am trying to find total using this query
db.outbox.aggregate(
{ $group: { _id : null, sum : { $sum: "$provider_status.count.total" } } });
But I am getting this error as shown
Since I do not have much experience in mongodb I don't have any idea how to do these two tasks. Need help here.

You are executing this in Robo3t seems like.
You need to enclose this in an array like
db.test.aggregate([ //See here
{
$group: {
_id: null,
sum: {
$sum: "$provider_status.count.total"
}
}
}
])//See here
But it's not the case with playground as they handle them before submitting to the server

MongoDB: How to get the object names in collection?

and think you in advance for the help. I have recently started using mongoDB for some personal project and I'm interested in finding a better way to query my data.
My question is: I have the following collection:
{
"_id" : ObjectId("5dbd77f7a204d21119cfc758"),
"Toyota" : {
"Founder" : "Kiichiro Toyoda",
"Founded" : "28 August 1937",
"Subsidiaries" : [
"Lexus",
"Daihatsu",
"Subaru",
"Hino"
]
}
}
{
"_id" : ObjectId("5dbd78d3a204d21119cfc759"),
"Volkswagen" : {
"Founder" : "German Labour Front",
"Founded" : "28 May 1937",
"Subsidiaries" : [
"Audi",
"Volkswagen",
"Skoda",
"SEAT"
]
}
}
I want to get the object name for example here I want to return
[Toyota, Volkswagen]
I have use this method
var names = {}
db.cars.find().forEach(function(doc){Object.keys(doc).forEach(function(key){names[key]=1})});
names;
which gave me the following result:
{ "_id" : 1, "Toyota" : 1, "Volkswagen" : 1 }
however, is there a better way to get the same result and also to just return the names of the objects. Thank you.

I would suggest you to change the schema design to be something like:
{
_id: ...,
company: {
name: 'Volkswagen',
founder: ...,
subsidiaries: ...,
...<other fields>...
}
You can then use the aggregation framework to achieve a similar result:
> db.test.find()
{ "_id" : 0, "company" : { "name" : "Volkswagen", "founder" : "German Labour Front" } }
{ "_id" : 1, "company" : { "name" : "Toyota", "founder" : "Kiichiro Toyoda" } }
> db.test.aggregate([ {$group: {_id: null, companies: {$push: '$company.name'}}} ])
{ "_id" : null, "companies" : [ "Volkswagen", "Toyota" ] }
For more details, see:
Aggregation framework
$group
Accumulator operators
As a bonus, you can create an index on the company.name field, whereas you cannot create an index on varying field names like in your example.

How to set keys in mongoDB aggregation?

The idea is to go from a collection of documents like this:
{
"_id" : ObjectId("58ff4fa372ac97344d5672c2"),
"direction" : 1,
"post" : ObjectId("58ff4ea572ac97344d5672c1"),
"user" : ObjectId("586b84239ae9590ab66bd3ad")
}
{
"_id" : ObjectId("58ff4c9f2952d7341d4afc0c"),
"direction" : -1,
"post" : ObjectId("58fc15a3fb3bed0fd54bfd95"),
"user" : ObjectId("586b84239ae9590ab66bd3ad")
}
To this:
[
//post: direction
"58ff4ea572ac97344d5672c1": 1,
"58fc15a3fb3bed0fd54bfd95": -1
]
I can't seem to find anything in the MongoDB Aggregation docs that allows you to set the key name using the value of another field.
I'm expecting this code to work, but I can see why it doesn't. It thinks that "$post" refers to a MongoDB expression.
db.votes.aggregate([
{$group: {
_id: null,
entries: {
$addToSet: {
"$post": "$direction"
}
}
}}
])

Mongo DB - how to query for id dependent on oldest date in array of a field

Lets say I have a collection called phone_audit with document entries of the following form - _id which is the phone number, and value containing items that always contains 2 entries (id, and a date).
Please see below:
{
"_id" : {
"phone_number" : "+012345678"
},
"value" : {
"items" : [
{
"_id" : "c14b4ac1db691680a3fb65320fba7261",
"updated_at" : ISODate("2016-03-14T12:35:06.533Z")
},
{
"_id" : "986b58e55f8606270f8a43cd7f32392b",
"updated_at" : ISODate("2016-07-23T11:17:53.552Z")
}
]
}
},
......
I need to get a list of _id values for every entry in that collection representing the older of the two items in each document.
So in the above - result would be [c14b4ac1db691680a3fb65320fba7261,...]
Any pointers at the type of query to execute would be v.helpful even if the exact syntax is not correct.

With aggregate(), you can $unwind value.items, $sort by update_at, then use $first to get the oldest:
[
{
"$unwind": "$value.items"
},
{
"$sort": { "value.items.updated_at": 1 }
},
{
"$group":{
_id: "$_id.phone_number",
oldest:{$first:"$value.items"}
}
},
{
"$project":{
value_id: "$oldest._id"
}
}
]

Mongodb Update/Upsert array exact match

I have a collection :
gStats : {
"_id" : "id1",
"criteria" : ["key1":"value1", "key2":"value2"],
"groups" : [
{"id":"XXXX", "visited":100, "liked":200},
{"id":"YYYY", "visited":30, "liked":400}
]
}
I want to be able to update a document of the stats Array of a given array of criteria (exact match).
I try to do this on 2 steps :
Pull the stat document from the array of a given "id" :
db.gStats.update({
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2096955"},{"value1" : "2015610"}]}
},
{
$pull : {groups : {"id" : "XXXX"}}
}
)
Push the new document
db.gStats.findAndModify({
query : {
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2015610"}, {"key2" : "2096955"}]}
},
update : {
$push : {groups : {"id" : "XXXX", "visited" : 29, "liked" : 144}}
},
upsert : true
})
The Pull query works perfect.
The Push query gives an error :
2014-12-13T15:12:58.571+0100 findAndModifyFailed failed: {
"value" : null,
"errmsg" : "exception: Cannot create base during insert of update. Cause
d by :ConflictingUpdateOperators Cannot update 'criteria' and 'criteria' at the
same time",
"code" : 12,
"ok" : 0
} at src/mongo/shell/collection.js:614

Neither query is working in reality. You cannot use a key name like "criteria" more than once unless under an operator such and $and. You are also specifying different fields (i.e groups) and querying elements that do not exist in your sample document.
So hard to tell what you really want to do here. But the error is essentially caused by the first issue I mentioned, with a little something extra. So really your { "$size": 2 } condition is being ignored and only the second condition is applied.
A valid query form should look like this:
query: {
"$and": [
{ "criteria" : { "$size" : 2 } },
{ "criteria" : { "$all": [{ "key1": "2015610" }, { "key2": "2096955" }] } }
]
}
As each set of conditions is specified within the array provided by $and the document structure of the query is valid and does not have a hash-key name overwriting the other. That's the proper way to write your two conditions, but there is a trick to making this work where the "upsert" is failing due to those conditions not matching a document. We need to overwrite what is happening when it tries to apply the $all arguments on creation:
update: {
"$setOnInsert": {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
},
"$push": { "stats": { "id": "XXXX", "visited": 29, "liked": 144 } }
}
That uses $setOnInsert so that when the "upsert" is applied and a new document created the conditions specified here rather than using the field values set in the query portion of the statement are used instead.
Of course, if what you are really looking for is truly an exact match of the content in the array, then just use that for the query instead:
query: {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
}
Then MongoDB will be happy to apply those values when a new document is created and does not get confused on how to interpret the $all expression.