exclude find query fields while inserting document using upsert:true - mongodb

db.getCollection('placeFollow').update(
{
"_id":ObjectId("5af19959204438676c0d5268"),
"count":{"$lt":2}
},
{
"$set":{"data":"check"}
},
{"upsert":true})
Error: E11000 duplicate key error collection: geoFame.placeFollow
index: id dup key: { : ObjectId('5af19959204438676c0d5268') }
Indexes
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "geoFame.placeFollow"
}
]
I want to save the document if it doesn't exist. But above query is trying to insert _id, which is given in find query and it is throwing duplicate key error. How to exclude find query while inserting the document?

The core problem is of course supplying the _id, but you need to actually set the "count" field as well. Using $setOnInsert is probably what you want:
db.getCollection('placeFollow').update(
{ "count":{"$lt":2} },
{
"$set":{ "data": "check" },
"$setOnInsert": { "count": 2 }
},
{"upsert":true}
)
So you need to do something like that with whatever "default" or otherwise expected data should be in there. Otherwise MongoDB does not find a document because the field is not there and attempts inserting with the same _id value as provided.
Because your code is currently incorrect you need to remove the document which was created without the "count" first:
db.getCollection('placeFollow').deleteOne(
{ "_id":ObjectId("5af19959204438676c0d5268") }
)
Either that or set all documents with a default value of `count.
But basically you cannot use both an "primary key" and another condition when using "upsert", because if the document does not meet the condition then the upsert is attempted, and of course an existing value for _id is unique.

Related

How to set the value of a field if it doesn't exist in a document in mongodb and keep it as it is if it is already present?

So assume I have a document as shown below
{
"_id" : ObjectId("6336d94e0330f5d48e44fb0f"),
"systemId" : "5124301",
"userId" : "9876543210",
"tempId" : "123456da87sdafasdf",
"updatedAt" : ISODate("2022-09-30T12:01:11.714+0000"),
"receivedAt" : ISODate("2022-04-10T23:15:08.145+0000"),
}
Now I already have a temp ID assigned to the doc and sometimes that field might expire and not exist inside the document. I wanted to know if I'm updating the document with a different receivedAt parameter or any other parameter and it does not have a tempId then only assign it a tempId else let the tempId be as it is.
What should be the query for this to get the updated docs as given by two examples?
Case 1: If tempId exists:
{
"_id" : ObjectId("6336d94e0330f5d48e44fb0f"),
"systemId" : "5124301",
"userId" : "1234567890",
"tempId" : "123456da87sdafasdf",
"updatedAt" : ISODate("2022-09-30T12:01:11.714+0000"),
"receivedAt" : ISODate("2022-04-10T23:15:08.145+0000"),
}
Case 2: If there is no temp Id and it is generated as 13qeqrwrqwtqrfsdfweqr in above lines and document needs to be updated with the generated tempId.
{
"_id" : ObjectId("6336d94e0330f5d48e44fb0f"),
"systemId" : "5124301",
"userId" : "1234567890",
"tempId" : "13qeqrwrqwtqrfsdfweqr",
"updatedAt" : ISODate("2022-09-30T12:01:11.714+0000"),
"receivedAt" : ISODate("2022-04-10T23:15:08.145+0000"),
}
Query would be something like this
findOneAndUpdate({
systemId: "5124301"
},
{
{
$set: {
userId: "1234567890",
receivedAt : ISODate("2022-04-10T23:15:08.145+0000"),
tempId: {
$exists: then leave it as is, else update it with 13qeqrwrqwtqrfsdfweqr
}
}
}
})
Work with update with aggregation pipeline. Use $cond operator to check whether tempId is not equal ($ne) to undefined.
If true, remain existing value $tempId.
If false, assign value: "13qeqrwrqwtqrfsdfweqr".
findOneAndUpdate({
systemId: "5124301"
},
[
{
$set: {
userId: "1234567890",
receivedAt: ISODate("2022-04-10T23:15:08.145Z"),
tempId: {
$cond: {
if: {
$ne: [
"$tempId",
undefined
]
},
then: "$tempId",
else: "13qeqrwrqwtqrfsdfweqr"
}
}
}
}
])
Demo # Mongo Playground
I agree with #Yong Shun that using an aggregation pipeline to describe the transformation is the correct way to approach this. I offer an alternative syntax below just for general reference though both will satisfy the request as stated in the question perfectly fine.
Mainly just leaving an additional answer as I'm curious about why this workflow around tempId exists. What do they represent, why can the coexist with userIds, and why is the application generating a new one for each of these write operations even if one may already exist? One thing that comes to mind is that you can construct your filter predicates for your update to include references to tempId (perhaps to only have the application generate a new one if needed). But more importantly I suspect that the entire workflow with tempId should be simplified, but that would require more specific knowledge about the application to say for sure.
With respect to the alternative syntax, the tempId portion of the pipeline can be simplified to use the $ifNull operator:
tempId: { $ifNull: [ "$tempId", "13qeqrwrqwtqrfsdfweqr" ]
Full demo playground here.

Optimize Mongodb documents versioning

In my application I have need to load a lot of data and compare it to existing documents inside a specific collection, and version them.
In order to do it, for every new document I have to insert, I simply made a query and search for last version, using a specific key (not _id), group data together and found last version.
Example of data:
{
"_id" : ObjectId("5c73a643f9bc1c2fg4ca6ef5"),
"data" : {
the data
}
},
"key" : {
"value1" : "545454344",
"value2" : "123212321",
"value3" : "123123211"
},
"version" : NumberLong("1"),
}
As you can see, key is composed of three values, related to data and my query to find last version is the following:
db.collection.aggregate(
{
{
"$sort" : {
"version" : NumberInt("-1")
}
},
{
"$group" : {
"_id" : "$key",
"content" : {
"$push" : "$data"
},
"version" : {
"$push" : "version"
},
"_oid" : {
"$push" : "$_id"
},
}
},
{
"$project" : {
"data" : {
"$arrayElemAt" : [
"$content",
NumberInt("0")
]
},
"version" : {
"$arrayElemAt" : [
"$version",
NumberInt("0")
]
},
"_id" : {
"$arrayElemAt" : [
"$_oid",
NumberInt("0")
]
}
}
}
}
)
To improve performance (from exponential to linear), I build an index that holds key and version:
db.getCollection("collection").createIndex({ "key": 1, "version" : 1})
So my question is: there are some other capabilities/strategies to optimize this search ?
Notes
in these collection there are some other field I already use to filter data using match, omitted for brevity
my prerequisite is to load a lot of data, process one to one, before insert: if there is a better approach to calculate version, I can consider also to change this
I'm not sure if an unique index on key could do the same as my query. I mean, if I do an unique index on key and version, I could have the uniqueness on that couple an iterate on it, for example:
no data on collection: just insert first version
insert new document: try to insert version 1, then get error, iterate on it, this should hit unique index, right ?
I had similar situation and this is how I solved it.
Create a seperate collection that will hold Key and corresponding latest version, say KeyVersionCollection
Make this collection "InMemory" for faster response
Store Key in "_id" field
When inserting document in your versioned collection, say EntityVersionedCollection
Query latest version from KeyVersionCollection
Update the version number by 1 or insert a new document with version 0 in KeyVersionCollection
You can even combine above 2 operations in 1 (https://docs.mongodb.com/manual/reference/method/db.collection.findAndModify/#db.collection.findAndModify)
Use new version number to insert document in EntityVersionedCollection
This will save time of aggregation and sorting. On side note, I would keep latest versions in seperate collection - EntityCollection. In this case, for each entity - insert a new version in EntityVersionedCollection and upsert it in EntityCollection.
In corner cases, where process is interrupted between getting new version number and using it while inserting entity, you might see that the version is skipped in EntityVersionedCollection; but that should be ok. Use timestamps to track inserts/updates so that it can be used to correlate/audit in future.
Hope that helps.
You can simply pass an array into the mongoDB insert function, and it should insert an entire JSON payload without any memory deficiencies.
You're welcome

MongoDB querying to with changing values for key

Im trying to get back into Mongodb and Ive come across something that I cant figure out.
I have this data structure
> db.ratings.find().pretty()
{
"_id" : ObjectId("55881e43424cbb1817137b33"),
"e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
"type" : "like",
"time" : 1434984003156,
"u_id" : ObjectId("55817c072e48b4b60cf366a7")
}
{
"_id" : ObjectId("55893be1e6a796c0198e65d3"),
"e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
"type" : "dislike",
"time" : 1435057121808,
"u_id" : ObjectId("55817c072e48b4b60cf366a7")
}
{
"_id" : ObjectId("55893c21e6a796c0198e65d4"),
"e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
"type" : "null",
"time" : 1435057185089,
"u_id" : ObjectId("55817c072e48b4b60cf366a7")
}
What I want to be able to do is count the documents that have either a like or dislike leaving the "null" out of the count. So I should have a count of 2. I tried to go about it like this whereby I set the query to both fields:
db.ratings.find({e_id: ObjectId("5565e106cd7a763b2732ad7c")}, {type: "like", type: "dislike"})
But this just prints out all three documents. Is there any reason?
If its glaringly obvious im sorry pulling out my hair at the moment.
Use the following db.collection.count() method which returns the count of documents that would match a find() query:
db.ratings.count({
"e_id": ObjectId("5565e106cd7a763b2732ad7c"),
type: {
"$in": ["like", "dislike"]
}
})
The db.collection.count() method is equivalent to the db.collection.find(query).count() construct. Your query selection criteria above can be interpreted as:
Get me the count of all documents which have the e_id field values as ObjectId("5565e106cd7a763b2732ad7c") AND the type field which has either value "like" or "dislike", as depicted by the $in operator that selects the documents where the value of a field equals any value in the specified array.
db.ratings.find({e_id: ObjectId("5565e106cd7a763b2732ad7c")},
{type: "like", type: "dislike"})
But this just prints out all three
documents. Is there any reason? If its glaringly obvious im sorry
pulling out my hair at the moment.
The second argument here is the projection used by the find method . It specifies fields that should be included -- regardless of their value. Normally, you specify a boolean value of 1 or true to include the field. Obviously, MongoDB accepts other values as true.
If you only need to count documents, you should issue a count command:
> db.runCommand({count: 'collection',
query: { "e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
type: { $in: ["like", "dislike"]}}
})
{ "n" : 2, "ok" : 1 }
Please note the Mongo Shell provides the count helper for that:
> db.collection.find({ "e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
type: { $in: ["like", "dislike"]}}).count()
2
That being said, to quote the documentation, using the count command "can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress." To avoid that, you might prefer using the aggregation framework:
> db.collection.aggregate([
{ $match: { "e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
type: { $in: ["like", "dislike"]}}},
{ $group: { _id: null, n: { $sum: 1 }}}
])
{ "_id" : null, "n" : 2 }
This query should solve your problem
db.ratings.find({$or : [{"type": "like"}, {"type": "dislike"}]}).count()

Casbah MongoDB, how to both add and remove values to an array in a single operation, to multiple documents?

After searching, I was unable to figure out how to perform multiple updates to a single field.
I have a document with a "tags" array field. Every document will have random tags before I begin the update. In a single operation, I want to add some tags and remove some tags.
The following update operator returns an error "Invalid modifier specified: $and"
updateOperators: { "$and" : [
{ "$addToSet" : { "tags" : { "$each" : [ "tag_1" , "tag_2"]}}},
{ "$pullAll" : { "tags" : [ "tag_2", "tag_3"]}}]}
collection.update(query, updateOperators, multi=true)
How do I both add and remove values to an array in a single operation, to multiple documents?
You don't need the $and with the update query, but you cannot update two fields at the same time with an update - as you would see if you tried the following in the shell:
db.test.update({}, { "$addToSet" : { "tags" : { "$each" : [ "tag_1" , "tag_2"]}},
"$pullAll" : { "tags" : [ "tag_2", "tag_3"] }}, true, false)
You would get a Cannot update 'tags' and 'tags' at the same time error message. So how to achieve this? Well with this schema you would need to do it in multiple operations, you could use the new bulk operation api as shown below (shell):
var bulk = db.coll.initializeOrderedBulkOp();
bulk.find({ "tags": 1 }).updateOne({ "$addToSet": { "$each" : [ "tag_1" , "tag_2"]}});
bulk.find({ "tags": 1 }).updateOne({ "$pullAll": { "tags": [ "tag_2", "tag_3"] } });
bulk.execute();
Or in Casbah with the dsl helpers:
val bulk = collection.initializeOrderedBulkOperation
bulk.find(MongoDBObject("tags" -> 1)).updateOne($addToSet("tags") $each("tag_1", tag_2"))
bulk.find(MongoDBObject("tags" -> 1)).updateOne($pullAll("tags" -> ("tags_2", "tags_3")))
bulk.execute()
Its not atomic and there is no guarantee that nothing else will try to modify, but it is as close as you will currently get.
Mongo does atomic updates so you could just construct the tags you want in the array and then replace the entire array.
I would advise against using an array to store these values all together as this is an "unbound" array of tags. Unbound arrays cause movement on disk and that causes indexes to be updated and the OS and mongo to do work.
Instead you should store each tag as a seperate document in a different collection and "bucket" them based on the _id of the related document.
Example
{_id : <_id> <key> <value>} - single docuemnt
This will allow you to query for all the tags for a single user with db.collection.find({_id : /^<_id>/}) and bucket the results.

MongoDB Update for array of object

I am having following document in mongodb
{
"_id" : ObjectId("521aff65e4b06121b688f076"),
"uuid" : "160597270101684",
sessionId" : "160597270101684.1",
"stamps" :
{
"currentVisit" : "1377500985",
"lastVisit" : "1377500985"
},
visits : [
{
"page":"google.com",
"method": "GET"
}
]
}
Requirement:
If uuid and sessionId is not present i will insert the document as above otherwise i have to push only the object to visits array.
Any help will be greatful.
MongoDB supports an upsert option on update that updates the matching document if it exists, and inserts a new document if it doesn't exist. In MongoDB 2.4+ you can use the $setOnInsert operator to further tweak this to set certain fields only if the upsert performs an insert.
db.test.update({
uuid: "160597270101684",
sessionId: "160597270101684.1"
}, {
$setOnInsert: {
stamps: {
currentVisit: "1377500985",
lastVisit: "1377500985"
}
},
$push:{
visits: {
page: "google.com",
method: "GET"
}
}
}, { upsert:true })
So in the above example, the $push to visits will always occur but the $setOnInsert to stamps will only occur if the matching document doesn't already exist.
You can achieve this by following upsert query:
db.session.update({"uuid" : "160597270101684", sessionId : "160597270101684.1"},
{$set:"stamps" :{"currentVisit" : "1377500985","lastVisit" : "1377500985"}},
$push :{visits:{"page":"yahoo.com","method": "POST"}}},
{upsert:true})
You can use $addToSet instead of $push if you want to avoid duplicates