MongoDB update with matching shard key and multiple=true

MongoDB update with matching shard key and multiple=true - mongodb

MongoDB recommends all update() operations for a sharded collection that specify the 'multi:false' option must include the shard key in the query condition so the query will hit only a specific shard cluster. If no shard key found and 'multi:false', it returns this error (See http://docs.mongodb.org/manual/core/sharded-cluster-query-router/):
update does not contain _id or shard key for pattern
I am switching my code to use a sharded collection. My code is using update() with 'multi:true' by default and I don't want to change that default option to avoid any potential error above. My question is if I include the shard key in an update() with 'multi:true', will mongos be smart enough to route the query to the specific cluster using the shard key and ignore 'multi: true'?
EDIT:
Checkout these codes, which confirms what #wdberkeley said.
Version 2.4:
https://github.com/mongodb/mongo/blob/v2.4/src/mongo/s/strategy_shard.cpp#L941
Version 2.6:
https://github.com/mongodb/mongo/blob/v2.6/src/mongo/s/chunk_manager_targeter.cpp#L250

Yes. If you have the shard key in the query, like
> db.myShardedCollection.update({ "shardKey" : 22, "category" : "frogs" }, { "$set" : { "category" : "amphibians" } }, { "multi" : true })
then mongos can use the shard key to direct the update to just the shard whose key range includes the value 22. Whether updating 1 document or 1000, all the documents affected have shardkey = 22 so all will be found on on the shard whose range contains 22. This would also work in the case of a range query like
> db.myShardedCollection.update({ "shardKey" : { "$gte" : 22 }, "category" : "frogs" }, { "$set" : { "category" : "amphibians" } }, { "multi" : true })
except for hashed shard keys.

Related

mongodb mapreduce doesn't return right in a sharded cluster

very interesting, mapreduce works fine in a single instance, but not on a sharded collection. as below, you may see that i got a collection and write a simple map-reduce
function,
mongos> db.tweets.findOne()
{
"_id" : ObjectId("5359771dbfe1a02a8cf1c906"),
"geometry" : {
"type" : "Point",
"coordinates" : [
131.71778292855996,
0.21856835860911106
]
},
"type" : "Feature",
"properties" : {
"isflu" : 1,
"cell_id" : 60079,
"user_id" : 35,
"time" : ISODate("2014-04-24T15:42:05.048Z")
}
}
mongos> db.tweets.find({"properties.user_id":35}).count()
44247
mongos> map_flow
function () { var key=this.properties.user_id; var value={ "cell_id":1}; emit(key,value); }
mongos> reduce2
function (key,values){ var ros={flows:[]}; values.forEach(function(v){ros.flows.push(v.cell_id);});return ros;}
mongos> db.tweets.mapReduce(map_flow,reduce2, { out:"flows2", sort:{"properties.user_id":1,"properties.time":1} })
but the results are not what i want
mongos> db.flows2.find({"_id":35})
{ "_id" : 35, "value" : { "flows" : [ null, null, null ] } }
I got lots of null and interesting all have three ones.
mongodb mapreduce seems not right on sharded collection?

The number one rule of MapReduce is:
thou shall emit the value of the same type as reduce function returneth
You broke this rule, so your MapReduce only works for small collection where reduce is only called once for each key (that's the second rule of MapReduce - reduce function may be called zero, once or many times).
Your map function emits exactly this value {cell_id:1} for each document.
How does your reduce function use this value? Well, you return a value which is a document with an array, into which you push the cell_id value. This is strange already, because that value was 1, so I'm not sure why you wouldn't just emit 1 (if you wanted to count).
But look what happens when multiple shards push a bunch of 1's into this flows array (whether it's what you intended, that's what your code is doing) and now reduce is called on several already reduced values:
reduce(key, [ {flows:[1,1,1,1]},{flows:[1,1,1,1,1,1,1,1,1]}, etc ] )
Your reduce function now tries to take each member of the values array (which is a document with a single field flows) and you push v.cell_id to your flows array. There is no cell_id field here, so of course you end up with null. And three nulls could be because you have three shards?
I would recommend that you articulate to yourself what exactly you are trying to aggregate in this code, and then rewrite your map and your reduce to comply with the rules that mapReduce in MongoDB expects your code to follow.

event streaming via mongodb. Get last inserted events

I consuming data from existing database. this database store system events. My service should check this database by timer, check if some new events created, then upload it and handle. Something like simple queue implementation.
The question is - how can I get new docs each time, when I check database. I can't use timestamps, because events goes to database from different sources and there are no any order for events. So I just need to use inserting order only.

There are a couple of options.
The first, and easiest if it matches your use case, is to use a capped collection. The capped collection is a collection as a pre-defined size that acts as a sort of ring-buffer. Once then collection is full it starts overwriting the documents. For iterating over the collection you simply create a "tailable" cursor you will need some way of identifying the "last document processed (even a simple "done" flag in the document could work but it would have to exist when the document is inserted). If you truly can't modify the documents in any way then you could even save off the last processed document somewhere and use a course time stamp to (approximate the start position) and look for the last document before processing more documents.
The only real issue with this solution is that you will be limited in the number of documents you can write in the collection and it won't grow over time. There are limits on the write operations you can perform on the documents (they can't grow) but it does not sound like you are modifying the documents.
The second option, which is more complex, is to use the oplog. For a standalone configuration you will need to still pass the -replSet option to create and use the oplog. You will just not configure the oplog. In a sharded configuration you will need to track each "replica set" separately. The oplog contains a document for each insert, update, delete done to all collections/documents on the server. Each entry contains a timestamp, operation and id (at a minimum). Here are examples of each.
Insert
{ "ts" : { "t" : 1362958492000, "i" : 1 },
"h" : NumberLong("5915409566571821368"), "v" : 2,
"op" : "i",
"ns" : "test.test",
"o" : { "_id" : "513d189c8544eb2b5e000001" } }
Delete
{ ... "op" : "d", ..., "b" : true,
"o" : { "_id" : "513d189c8544eb2b5e000001" } }
Update
{ ... "op" : "u", ...,
"o2" : { "_id" : "513d189c8544eb2b5e000001" },
"o" : { "$set" : { "i" : 1 } } }
The timestamps are generated on the server and are guaranteed to be monotonically increasing. which allows you to quickly find the documents of interest.
This option is the most robust but requires some work on your part.
I wrote some demo code to create a "watcher" on a collection that is almost what you want. You can find that code on GitHub. Specifically look at the code in the com.allanbank.mongodb.demo.coordination package.
HTH, Rob

You can actually use timestamps if your _id is of type ObjectId:
prefix = Math.floor((new Date( 2013 , 03 , 11 )).getTime()/1000).toString(16)
db.foo.find( { _id : { $gt : new ObjectId( prefix + "0000000000000000" ) } } )
This way, it doesn't matter where the source of the event was or when it was,
it only matters when document insertion was recorded (higher than previous timer)
Of course, it is schema-less and you can always set a field such as isNew to true,
and set it to false in conjunction with your query / cursor

Mongodb setting unique field

TENANT
{ "_ID" : 11, NAME : "ruben", OPERATION :[{OPERATION_ID: 100, NAME : "Check"}] }
how to set the OPERATION_ID has unique to avoid duplicate values and to avoid null values like primary key?

When you want the OPERATION_IDs to be unique for all tenants, then you can do it like that:
db.tenants.ensureIndex( { operation.OPERATION_ID : 1 }, { unique:true, sparse:true } );
When you want the OPERATION_IDs to be unique per tenant, so that two tenants can both have the operation_ID:100 but no tenant can have operation_id:100 twice, you have to add the _id of the tenant to the index so that any given combination of _id and operation_id is unique.
db.tenants.ensureIndex( { _id: 1, operation.OPERATION_ID : 1 }, { unique:true, sparse:true } );

Adding a unique index on OPERATION.OPERATION_ID will ensure that no two distinct documents will contain an element in OPERATION with the same OPERATION_ID.
If you want to prevent a single document from having two elements in OPERATION with the same OPERATION_ID, you can't use unique indexes; you will have to use set update operators (such as $set and $addToSet). You could turn OPERATION into a subdocument keyed by OPERATION_ID, like so:
{ "_ID" : 11, NAME : "ruben", OPERATION : {"100" : {NAME : "Check"} }}
Then you can enforce uniqueness by issuing updates with $set; for example:
db.<collection>.update({NAME: "ruben"}, {$set: {"OPERATION.100.NAME": "Uncheck"}})
Regarding null values: MongoDB doesn't feature non-null constraints on fields (it doesn't even force a given field to have a single type), so you will have to ensure in your application that null values aren't inserted.

Get position of selected document in collection [mongoDB]

How to get position (index) of selected document in mongo collection?
E.g.
this document: db.myCollection.find({"id":12345})
has index 3 in myCollection
myCollection:
id: 12340, name: 'G'
id: 12343, name: 'V'
id: 12345, name: 'A'
id: 12348, name: 'N'

If your requirement is to find the position of the document irrespective of any order, that is not
possible as MongoDb does not store the documents in specific order.
However,if you want to know the index based on some field, say _id , you can use this method.
If you are strictly following auto increments in your _id field. You can count all the documents
that have value less than that _id, say n , then n + 1 would be index of the document based on _id.
n = db.myCollection.find({"id": { "$lt" : 12345}}).count() ;
This would also be valid if documents are deleted from the collection.

As far as I know, there is no single command to do this, and this is impossible in general case (see Derick's answer). However, using count() for a query done on an ordered id value field seems to work. Warning: this assumes that there is a reliably ordered field, which is difficult to achieve in a concurrent writer case. In this example _id is used, however this will only work with a single writer case.:
MongoDB shell version: 2.0.1
connecting to: test
> use so_test
switched to db so_test
> db.example.insert({name: 'A'})
> db.example.insert({name: 'B'})
> db.example.insert({name: 'C'})
> db.example.insert({name: 'D'})
> db.example.insert({name: 'E'})
> db.example.insert({name: 'F'})
> db.example.find()
{ "_id" : ObjectId("4fc5f040fb359c680edf1a7b"), "name" : "A" }
{ "_id" : ObjectId("4fc5f046fb359c680edf1a7c"), "name" : "B" }
{ "_id" : ObjectId("4fc5f04afb359c680edf1a7d"), "name" : "C" }
{ "_id" : ObjectId("4fc5f04dfb359c680edf1a7e"), "name" : "D" }
{ "_id" : ObjectId("4fc5f050fb359c680edf1a7f"), "name" : "E" }
{ "_id" : ObjectId("4fc5f053fb359c680edf1a80"), "name" : "F" }
> db.example.find({_id: ObjectId("4fc5f050fb359c680edf1a7f")})
{ "_id" : ObjectId("4fc5f050fb359c680edf1a7f"), "name" : "E" }
> db.example.find({_id: {$lte: ObjectId("4fc5f050fb359c680edf1a7f")}}).count()
5
>
This should also be fairly fast if the queried field is indexed. The example is in mongo shell, but count() should be available in all driver libs as well.

This might be very slow but straightforward method. Here you can pass as usual query. Just I am looping all the documents and checking if condition to match the record. Here I am checking with _id field. You can use any other single field or multiple fields to check it.
var docIndex = 0;
db.url_list.find({},{"_id":1}).forEach(function(doc){
docIndex++;
if("5801ed58a8242ba30e8b46fa"==doc["_id"]){
print('document position is...' + docIndex);
return false;
}
});

There is no way that MongoDB can return this as it does not keep documents in order in the database, just like MySQL f.e. doesn't name row numbers.
The ObjectID trick from jhonkola will only work if only one client creates new elements, as the ObjectIDs are generated on the client side, with the first part being a timestamp. There is no guaranteed order if different clients talk to the same server. Still, I would not rely on this.
I also don't quite understand what you are trying to do though, so perhaps mention that in your question? I can then update the answer.

Restructure your collection to include the position of any entry i.e {'id': 12340, 'name': 'G', 'position': 1} then when searching the database collection(myCollection) using the desired position as a query

The queries I use that return the entire collection all use sort to get a reproducible order, find.sort.forEach works with the script above to get the correct index.

How do you get around missing values in a unique index using mongo db?

The mongo documentation states that "When a document is saved to a collection with unique indexes, any missing indexed keys will be inserted with null values. Thus, it won't be possible to insert multiple documents missing the same indexed key."
So is it impossible to create a unique index on an optional field? Should I create a compound index with say a userId as well to solve this? In my specific case I have a user collection that has an optional embedded oauth object.
e.g.
>db.users.ensureIndex( { "name":1, "oauthConnections.provider" : 1, "oauthConnections.providerId" : 1 } );
My sample user
{ name: "Bob"
,pwd: "myPwd"
,oauthConnections [
{
"provider":"Facebook",
"providerId" : "12345",
"key":"blah"
}
,{
"provider":"Twitter",
"providerId" : "67890",
"key":"foo"
}
]
}

I believe that this is possible: You can have an index that is sparse and unique. This way, non-existant values never make it to the index, hence they can't be duplicate.
Caveat: This is not possible with compound indexes. I'm not quite sure about your question. Your citing a part of the documentation that concerns compound indexes -- there, missing values will be inserted, but from your question I guess you're not looking for a solution w/ compound indexes?
Here's a sample:
> db.Test.insert({"myId" : "1234", "string": "foo"});
> show collections
Test
system.indexes
>
> db.Test.find();
{ "_id" : ObjectId("4e56e5260c191958ad9c7cb1"), "myId" : "1234", "string" : "foo" }
>
> db.Test.ensureIndex({"myId" : 1}, {sparse: true, unique: true});
>
> db.Test.insert({"myId" : "1234", "string": "Bla"});
E11000 duplicate key error index: test.Test.$myId_1 dup key: { : "1234" }
>
> db.Test.insert({"string": "Foo"});
> db.Test.insert({"string": "Bar"});
> db.Test.find();
{ "_id" : ObjectId("4e56e5260c191958ad9c7cb1"), "myId" : "1234", "string" : "foo" }
{ "_id" : ObjectId("4e56e5c30c191958ad9c7cb4"), "string" : "Foo" }
{ "_id" : ObjectId("4e56e5c70c191958ad9c7cb5"), "string" : "Bar" }
Also note that compound indexes can't be sparse

It is not impossible to index an optional field. The docs are talking about a unique index. Once you've specified a unique index, you can only insert one document per value for that field, even if that value is null.
If you want a unique index on an optional field but still allow multiple nulls, you could try making the index both unique and sparse, although I have no idea if that's possible. I couldn't find an answer in the documentation.

There's no good way to uniquely index an optional field. You can either fill it with a default (the _id on the user would work), let your access layer enforce uniqueness, or change your "schema" a bit.
We have a separate collection for oauth login tokens, partially for this reason. We never really need to access those in a context where having them as embedded docs is an obvious win. If this is a relatively easy change to make, it's probably your best bet.
----edit----
As the other answers points, you can achieve this with a sparse index. It's even a documented use. You should probably accept one of those answers instead of mine.
http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse