I created auto-incrementing sequence field via counters collection and getNextSequence() function (absolutely like in docs)
According another document JavaScript functions are stored in a special system collection named system.js
But there is no such collection at my database (a least db.system.js.find() shows empty result):
> db.dropDatabase();
{ "dropped" : "mongopa", "ok" : 1 }
> version()
3.2.5
> db.counters.insert({_id: "userid", seq: 0 })
WriteResult({ "nInserted" : 1 })
> db.counters.find()
{ "_id" : "userid", "seq" : 0 }
> function getNextSequence(name) {
... var ret = db.counters.findAndModify(
... {
... query: { _id: name },
... update: { $inc: { seq: 1 } },
... new: true
... }
... );
...
... return ret.seq;
... }
> db.system.js.find()
> show collections
counters
> db.users.insert({"login":"demo","user_id":getNextSequence("userid"),"password":"demo"})
WriteResult({ "nInserted" : 1 })
> db.users.find()
{ "_id" : ObjectId("574ff1c7436a1b4f9c6f47b9"), "login" : "demo", "user_id" : 1, "password" : "demo" }
> db.users.insert({"login":"demo2","user_id":getNextSequence("userid"),"password":"demo2"})
WriteResult({ "nInserted" : 1 })
> db.users.find()
{ "_id" : ObjectId("574ff1c7436a1b4f9c6f47b9"), "login" : "demo", "user_id" : 1, "password" : "demo" }
{ "_id" : ObjectId("574ff1d6436a1b4f9c6f47ba"), "login" : "demo2", "user_id" : 2, "password" : "demo2" }
>
So where does the getNextSequence function really stored?
When you define the function as,
function getNextSequence(name) {
var ret = db.counters.findAndModify(
{
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
}
);
return ret.seq;
}
It is merely defined for that particular session and is not available to you once the session ends. Hence, its not saved anywhere.
To make the function re-usable across the sessions, you need to explicitly save the function is system.js by using,
db.system.js.save(
{
_id: "getNextSequence",
value: function(name){var ret = db.counters.findAndModify({
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
});
return ret.seq;}
})
Once you have saved the function, you can cross-check it by,
db.system.js.find()
You need to call this
db.loadServerScripts();
across the sessions. It loads all the scripts saved in system.js collection.
For details, please check here.
Related
I am new to MongoDB so this is probably a basic question (hopefully). I currently have 10 million records with 410 fields loaded in a mongodb collection like so:
{
"_id" : ObjectId("........"),
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I need to wrap all of this data into an embedded document like so:
{
"_id" : ObjectId("........"),
"data" : {
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I don't necessarily need to update this data in-place but that would be nice. If I need to export this data somehow specifying the new format and then re-import the new, updated data that is fine. Performing this via the MongoDB shell would be ideal.
As suggested by chridam within comments you can execute the following aggregation pipeline:
db.collectionName.aggregate([
{ $project: { _id: "$_id", data: "$$ROOT" } },
{ $out: "newCollectionName" }
]);
This way you have the _id field both at root level and in the data object. Thus, you can execute a massive update to unset the second one:
db.newCollectionName.updateMany(
{},
{ $unset: { "data._id": "" } }
);
Finally, you can drop the first collection and rename the second to restore the original name on the updated collection:
db.collectionName.drop();
db.newCollectionName.rename("collectionName");
This approach fully works within the database, avoiding fetching any of your 10 million documents.
You can simply do this in the shell with the following
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
For example, if we insert the following documents:
> db.test.insertMany([
... {
... _id: ObjectId("5a91af8908e17c5997e03b7e"),
... field1: false,
... field2: 0,
... field3: "No"
... },
... {
... _id: ObjectId("5a91afbc08e17c5997e03b7f"),
... field1: true,
... field2: 1,
... field3: "Yes"
... }])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("5a91af8908e17c5997e03b7e"),
ObjectId("5a91afbc08e17c5997e03b7f")
]
}
Then run:
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
Our documents now look like this:
> db.test.find().pretty()
{
"_id" : ObjectId("5a91af8908e17c5997e03b7e"),
"data" : {
"field1" : false,
"field2" : 0,
"field3" : "No"
}
}
{
"_id" : ObjectId("5a91afbc08e17c5997e03b7f"),
"data" : {
"field1" : true,
"field2" : 1,
"field3" : "Yes"
}
}
I'm trying to implement a 'server-side counter-versioned' item in mongodb and trying to do following /* using Java API */
Document dbDoc = dbCollection.findOneAndUpdate(
new Document("_id", "meta"),
new Document("$inc", new Document("version", 1))
.append("$setOnInsert", new Document("version", 0)),
new FindOneAndUpdateOptions().upsert(true)
.returnDocument(ReturnDocument.AFTER));
Assumed logic is simple: if there is no record in database - start counting from zero (and with a whole fresh new object), otherwise - increment counter.
Code sample fails with: 'Cannot update 'version' and 'version' at the same time'
My assumption is that in 'upsert' mode mongo should only use "$setOnInsert" when no matching item is found - but it works in some other way.
Is it possible to implement such operation in one atomic mongoDB call?
PS: MongoDB documentation regarding findOneAndUpdate() and upsert() is fuzzy - at least I cannot get why this error arizes from their description.
Also there is similar question here - findAndModify fails with error: "Cannot update 'field1' and 'field1' at the same time - accepted, but again with no clear reasoning.
You can just remove the update operator of $setOnInsert as this will be set to the value specified in the $inc operator if the document does not exist
https://docs.mongodb.com/v3.2/reference/operator/update/inc/#behavior
If the field does not exist, $inc creates the field and sets the field to the specified value.
Example from the mongo shell:
> db.dropDatabase()
{ "ok" : 1 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 1 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 2 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 3 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 4 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 5 }
If you need to set a given version on the first initial insert, then mongodb does not support any operators to support this atomically, however the follow would be safe and is a common workaround:
> db.dropDatabase()
{ "dropped" : "test", "ok" : 1 }
> function updateMeta(){
... function update(){
... return db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {returnNewDocument: true});
... }
...
... var result = update();
...
... if(result === null){
... db.test.insert({_id: "meta", version: -10});
... result = update();
... }
...
... return result;
... }
>
> updateMeta()
{ "_id" : "meta", "version" : -9 }
> updateMeta()
{ "_id" : "meta", "version" : -8 }
> updateMeta()
{ "_id" : "meta", "version" : -7 }
> updateMeta()
{ "_id" : "meta", "version" : -6 }
> updateMeta()
{ "_id" : "meta", "version" : -5 }
>
We have a basic enquiry management tool that we're using to track some website enquiries in our administration suite, and we're using the ObjectId of each document in our enquiries collection to sort the enquiries by the date they were added.
{
"_id" : ObjectId("53a007db144ff47be1000003"),
"comments" : "This is a test enquiry. Please ignore. We'll delete it shortly.",
"customer" : {
"name" : "Test Enquiry",
"email" : "test#test.com",
"telephone" : "07890123456",
"mobile" : "07890123456",
"quote" : false,
"valuation" : false
},
"site" : [],
"test" : true,
"updates" : [
{
"_id" : ObjectId("53a007db144ff47be1000001"),
"status" : "New",
"status_id" : ObjectId("537de7c3a5e6e668ffc2335c"),
"status_index" : 100,
"substatus" : "New Web Enquiry",
"substatus_id" : ObjectId("5396bb9fa5e6e668ffc23388"),
"notes" : "New enquiry received from website.",
},
{
"_id" : ObjectId("53a80c977d299cfe91bacf81"),
"status" : "New",
"status_id" : ObjectId("537de7c3a5e6e668ffc2335c"),
"status_index" : 100,
"substatus" : "Attempted Contact",
"substatus_id" : ObjectId("53a80e06a5e6e668ffc2339e"),
"notes" : "In this test, we pretend that we've not managed to get hold of the customer on the first attempt.",
},
{
"_id" : ObjectId("53a80e539b966b8da5c40c36"),
"status" : "Approved",
"status_id" : ObjectId("52e77a49d85e95f00ebf6c72"),
"status_index" : 200,
"substatus" : "Enquiry Confirmed",
"substatus_id" : ObjectId("53901f1ba5e6e668ffc23372"),
"notes" : "In this test, we pretend that we've got hold of the customer after failing to contact them on the first attempt.",
}
]
}
Within each enquiry is an updates array of objects which also have an ObjectId as their main identity field. We're using an $unwind and $group aggregation to pull the first and latest updates, as well as the count of updates, making sure we only take enquiries where there have been more than one update (as one is automatically inserted when the enquiry is made):
db.enquiries.aggregate([
{
$match: {
"test": true
}
},
{
$unwind: "$updates"
},
{
$group: {
"_id": "$_id",
"latest_update_id": {
$last: "$updates._id"
},
"first_update_id": {
$first: "$updates._id"
},
"update_count": {
$sum: 1
}
}
},
{
$match: {
"update_count": {
$gt: 1
}
}
}
])
This results in the following output:
{
"result" : [
{
"_id" : ObjectId("53a295ad122ea80200000005"),
"latest_update_id" : ObjectId("53a80bdc7d299cfe91bacf7e"),
"first_update_id" : ObjectId("53a295ad122ea80200000003"),
"update_count" : 2
},
{
"_id" : ObjectId("53a007db144ff47be1000003"),
"latest_update_id" : ObjectId("53a80e539b966b8da5c40c36"),
"first_update_id" : ObjectId("53a007db144ff47be1000001"),
"update_count" : 3
}
],
"ok" : 1
}
This is then passed through to our code (node.js, in this case) where we perform a few operations on it and then present some information on our dashboard.
Ideally, I'd like to add another $group pipeline aggregation to the query which would subtract the timestamp of first_update_id from the timestamp of latest_update_id to give us a timespan, which we could then use $avg on.
Can anyone tell me if this is possible? (Thank you!)
As Neil already pointed out, you can't get to the timestamp from the ObjectId in the aggregation framework.
You said that speed is not important, so using MapReduce you can get what you want:
var map = function() {
if (this.updates.length > 1) {
var first = this.updates[0];
var last = this.updates[this.updates.length - 1];
var diff = last._id.getTimestamp() - first._id.getTimestamp();
var val = {
latest_update_id : last._id,
first_update_id : first._id,
update_count : this.updates.length,
diff: diff
}
emit(this._id, val);
}
};
var reduce = function() { };
db.runCommand(
{
mapReduce: "enquiries",
map: map,
reduce: reduce,
out: "mrresults",
query: { test : true}
}
);
This are the results:
{
"_id" : ObjectId("53a007db144ff47be1000003"),
"value" : {
"latest_update_id" : ObjectId("53a80e539b966b8da5c40c36"),
"first_update_id" : ObjectId("53a007db144ff47be1000001"),
"update_count" : 3,
"diff" : 525944000
}
}
Edit:
If you want to get the average diff for all documents you can do it like this:
var map = function() {
if (this.updates.length > 1) {
var first = this.updates[0];
var last = this.updates[this.updates.length - 1];
var diff = last._id.getTimestamp() - first._id.getTimestamp();
emit("1", {diff : diff});
}
};
var reduce = function(key, values) {
var reducedVal = { count: 0, sum: 0 };
for (var idx = 0; idx < values.length; idx++) {
reducedVal.count += 1;
reducedVal.sum += values[idx].diff;
}
return reducedVal;
};
var finalize = function (key, reducedVal) {
reducedVal.avg = reducedVal.sum/reducedVal.count;
return reducedVal;
};
db.runCommand(
{
mapReduce: "y",
map: map,
reduce: reduce,
finalize : finalize,
out: "mrtest",
query: { test : true}
}
);
And the example output:
> db.mrtest.find().pretty()
{
"_id" : "1",
"value" : {
"count" : 2,
"sum" : 1051888000,
"avg" : 525944000
}
}
I've answered this question on LinkedIn and I thought it's something useful and interesting to share. The question was:
"Suppose we have documents like {_id: ..., data: ..., timestamp: ...}.
Is there any way to write update criteria which will satisfy following rules:
1 If there is no documents with following _id then insert this document;
2 If there is exists document with following _id then
2.1 If new timestamp greater then stored timestamp then update data;
2.2 Otherwise do nothing"
Solution below should do the trick, you just need to ignore dup key errors. Example is given in Mongo shell:
> var lastUpdateTime = ISODate("2013-09-10")
> var newUpdateTime = ISODate("2013-09-12")
>
> lastUpdateTime
ISODate("2013-09-10T00:00:00Z")
> newUpdateTime
ISODate("2013-09-12T00:00:00Z")
>
> var id = new ObjectId()
> id
ObjectId("52310502f3bf4823f81e7fc9")
>
> // collection is empty, first update will do insert:
> db.testcol.update(
... {"_id" : id, "ts" : { $lt : lastUpdateTime } },
... { $set: { ts: lastUpdateTime, data: 123 } },
... { upsert: true, multi: false }
... );
>
> db.testcol.find()
{ "_id" : ObjectId("52310502f3bf4823f81e7fc9"), "data" : 123, "ts" : ISODate("2013-09-10T00:00:00Z") }
>
> // try one more time to check that nothing happens (due to error):
> db.testcol.update(
... {"_id" : id, "ts" : { $lt : lastUpdateTime } },
... { $set: { ts: lastUpdateTime, data: 123 } },
... { upsert: true, multi: false }
... );
E11000 duplicate key error index: test.testcol.$_id_ dup key: { : ObjectId('52310502f3bf4823f81e7fc9') }
>
> var tooOldToUpdate = ISODate("2013-09-09")
>
> // update does not happen because query condition does not match
> // and mongo tries to insert with the same id (and fails with dup again):
> db.testcol.update(
... {"_id" : id, "ts" : { $lt : tooOldToUpdate } },
... { $set: { ts: tooOldToUpdate, data: 999 } },
... { upsert: true, multi: false }
... );
E11000 duplicate key error index: test.testcol.$_id_ dup key: { : ObjectId('52310502f3bf4823f81e7fc9') }
>
> // now query cond actually matches, so update rather than insert happens which works
> // as expected:
> db.testcol.update(
... {"_id" : id, "ts" : { $lt : newUpdateTime } },
... { $set: { ts: newUpdateTime, data: 999 } },
... { upsert: true, multi: false }
... );
>
> // check that everything worked:
> db.testcol.find()
{ "_id" : ObjectId("52310502f3bf4823f81e7fc9"), "data" : 999, "ts" : ISODate("2013-09-12T00:00:00Z") }
>
The only annoying part are those errors, but they are cheap and safe.
db.collection.update({
_id: ObjectId("<id>"))
},
{timestamp: <newTimestamp>, data: <data>},
{upsert: true})
This operation would update an existing document if it meets the condition that it exists and that the existing timestamp is less than the newTimestamp; otherwise will insert a new document.
I've run into some strange differences between the mongodb running on MongoHQ and the version running on my own development machine. Specifically, when calling .toString() on an object id inside a MapReduce map function, the results vary:
On my own machine:
ObjectId('foo').toString() // => 'foo'
On MongoHQ:
ObjectId('foo').toString() // => 'ObjectId(\'foo\')'
Note: The id's I use are actual mongodb id's - not just 'foo' etc. as in these examples
I would expect .toString() to behave like on my own machine - not how it's behaving on MongoHQ. How come it's not?
My local OSX version of MongoDB is installed using Homebrew and is version 2.0.1-x86_64
To show what's actually going on, I've build a little test case. Let's assume that we have a users collection with a friends attribute, being an array of user ids:
> db.users.find()
{ _id: ObjectId('a'), friends: [ObjectId('b'), ObjectId('c')] },
{ _id: ObjectId('b'), friends: [] },
{ _id: ObjectId('c'), friends: [] }
As you can see a is friends with b and c where as b and c isn't friends with anybody.
Now let's look at a working test-algorithm:
var map = function() {
this.friends.forEach(function(f) {
emit(f, { friends: 1, user: user, friend: f.toString() });
});
};
var reduce = function(k, vals) {
var result = { friends: 0, user: [], friend: [] };
vals.forEach(function(val) {
result.friends += val.friends;
result.user.push(val.user);
result.friend.push(val.friend);
});
return result;
};
var id = ObjectId('50237c6d5849260996000002');
var query = {
query : { friends: id },
out : { inline: 1 },
scope : { user: id.toString() },
jsMode : true,
verbose : true
};
db.users.mapReduce(map, reduce, query);
Assuming id is set to an id of a user who is a friend of someone in the users collection, then the output returned by the mapReduce method on MongoHQ will look like this:
{
"results" : [
{
"_id" : ObjectId("50237c555849260996000001"),
"value" : {
"friends" : 1,
"user" : "50237c6d5849260996000002",
"friend" : "ObjectId(\"50237c555849260996000001\")"
}
},
{
"_id" : ObjectId("50237c74c271be07f6000002"),
"value" : {
"friends" : 1,
"user" : "50237c6d5849260996000002",
"friend" : "ObjectId(\"50237c74c271be07f6000002\")"
}
}
],
"timeMillis" : 0,
"timing" : {
"mapTime" : 0,
"emitLoop" : 0,
"reduceTime" : 0,
"mode" : "mixed",
"total" : 0
},
"counts" : {
"input" : 1,
"emit" : 2,
"reduce" : 0,
"output" : 2
},
"ok" : 1,
}
As you can see, the friend attribute in each result is not just a string containing the id, but a string containing the actual method call.
Did I run this on my own machine, the results array would have been:
{
"_id" : ObjectId("50237c555849260996000001"),
"value" : {
"friends" : 1,
"user" : "50237c6d5849260996000002",
"friend" : "50237c555849260996000001"
}
},
{
"_id" : ObjectId("50237c74c271be07f6000002"),
"value" : {
"friends" : 1,
"user" : "50237c6d5849260996000002",
"friend" : "50237c74c271be07f6000002"
}
}
MongoHQ is running a different version of MongoDB than you are.
To get the behavior of your homebrew version, try changing your map function:
var map = function() {
this.friends.forEach(function(f) {
emit(f, { friends: 1, user: user.str, friend: f.str });
});
};