What is the best way to use collection as round robin in Mongodb - mongodb

I have a collection named items with three documents.
{
_id: 1,
item: "Pencil"
}
{
_id: 1,
item: "Pen"
}
{
_id: 1,
item: "Sharpner"
}
How could I query to get the document as round-robin?
Consider I got multiple user requests at the same time.
so one should get Pencil other will get Pen and then other will get Sharpner.
then start again from the first one.
If changing schema is a choice I am also ready for that.

I think I found a way to do this without changing the schema. It is based on skip() and limit(). Moreover you can specify to keep the internal sorting order for returned documents but as the guide says you should not rely on this, especially because you are losing performance since the indexing is overridden:
The $natural parameter returns items according to their natural order
within the database. This ordering is an internal implementation
feature, and you should not rely on any particular structure within
it.
Anyway, this is the query:
db.getCollection('YourCollection').find().skip(counter).limit(1)
Where counter stores the current index for your documents.

Few things to start..
_id has to be unique across a collection especially when the collection is only a replication set.
This is a very stateful requirement and would not work well with a distributed set of services for example.
With that said, assuming you really just want to iterate from the database i would use cursors to accomplish this. This will do a collection scan and is very inefficient for the record.
var myCursor = db.items.find().sort({_id:1});
while (myCursor.hasNext()) {
printjson(myCursor.next());
}
My suggestion is that you should pull all results from the database at once and do your iteration in the application tier.
var myCursor = db.inventory.find().sort({_id:1});
var documentArray = myCursor.toArray();
documentArray.foreach(doSomething)

If this is about distribution you may consider fetching random documents instead of round-robin via aggregation/$sample:
db.collection.aggregate([
{
"$sample": {
"size": 1
}
}
])
playground
Or there is options to randomize via $rand ...

Use text findOneAndUpdate after restructuring the data objects
db.counter.findOneAndUpdate( {}, pipeline)
{
"_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pencil",
"counter" : 1
}
db.counter.findOneAndUpdate( {}, pipeline)
{
"_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pen",
"counter" : 2
}
where the data object is now:
{
"_id" : ObjectId("6242fe3bc1551d0f3562bcb2"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pencil",
"counter" : 1
}
and the pipeline is:
[{$project: {
values: 1,
selected: {
$arrayElemAt: [
'$values',
'$counter'
]
},
counter: {
$mod: [
{
$add: [
'$counter',
1
]
},
{
$size: '$values'
}
]
}
}}]
This has some merits:
Firstly, using findOneAndUpdate means that moving the pointer to the
next item in the list and reading the object happen at once.
Secondly,by using the {$size: "$values"} adding a value into the list
doesn't change the logic.
And, instead of a string an object could be used instead.
Problems:
This method would be unwieldy with more than 10's of entries
It is hard to prove that this method works as advertised so there is an accompanying Kotlin project. The project uses coroutines so it is calling a find/update asynchronously.
text GitHub
The alternative (assuming 50K items and not 3):
Set-up a simple counter {counter: 0} and update as follows:
db.counter.findOneAndUpdate({},
[{$project: {
counter: {
$mod: [
{
$add: [
'$counter',
1
]
},
50000
]
}
}}])
Then use a simple select query to find the right document.
I've updated the github to include this example.

Related

Converting older Mongo database references to DBRefs

I'm in the process of updating some legacy software that is still running on Mongo 2.4. The first step is to upgrade to the latest 2.6 then go from there.
Running the db.upgradeCheckAllDBs(); gives us the DollarPrefixedFieldName: $id is not valid for storage. errors and indeed we have some older records with legacy $id, $ref fields. We have a number of collections that look something like this:
{
"_id" : "1",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
},
{
"_id" : "2",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "3",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "4",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
}
I want to script this to convert the older {"$id" : "42", "$ref" : "someRef"} objects to DBRef("someRef", "42") objects but leave the existing DBRef objects untouched. Unfortunately, I haven't been able to differentiate between the two types of objects.
Using typeof and $type simply say they are objects.
Both have $id and $ref fields.
In our groovy console when you pull one of the old ones back and one of the new ones getClass() returns DBRef for both.
We have about 80k records with this legacy format out of millions of total records. I'd hate to have to brute force it and modify every record whether it needs it or not.
This script will do what I need it to do but the find() will basically return all the records in the collection.
var cursor = db.someCollection.find({"someRef.$id" : {$exists: true}});
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update({"_id": rec._id}, {$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}});
}
Is there another way that I am missing that can be used to find only the offending records?
Update
As described in the accepted answer the order matters which made all the difference. The script we went with that corrected our data:
var cursor = db.someCollection.find(
{
$where: "function() { return this.someRef != null &&
Object.keys(this.someRef)[0] == '$id'; }"
}
);
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update(
{"_id": rec._id},
{$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}}
);
}
We did have a collection with a larger number of records that needed to be corrected where the connection timed out. We just ran the script again and it got through the remaining records.
There's probably a better way to do this. I would be interested in hearing about a better approach. For now, this problem is solved.
DBRef is a client side thing. http://docs.mongodb.org/manual/reference/database-references/#dbrefs says it pretty clear:
The order of fields in the DBRef matters, and you must use the above sequence when using a DBRef.
The drivers benefit from the fact that order of fields in BSON is consistent to recognise DBRef, so you can do the same:
db.someCollection.find({ $expr: {
$let: {
vars: {firstKey: { $arrayElemAt: [ { $objectToArray: "$someRef" }, 0] } },
in: { $eq: [{ $substr: [ "$$firstKey.k", 1, 2 ] } , "id"]}
}
} } )
will return objects where order of the fields doesn't match driver's expectation.

Remove complete document or element from array based on condition

My collection documents are:
{
"_id" : 1,
"fruits" : [ {"name":"pears"},
{"name":"grapes"},
{"name":"bananas"} ],
}
{
"_id" : 2,
"fruits" : [ {"name":"bananas"} ],
}
I need to remove the whole document when the fruits contains only "bananas" or only remove the fruit "bananas" when there are more than one fruit in the fruits array.
My final collection after running the required query should be:
{
"_id" : 1,
"fruits" : [ {"name":"pears"},
{"name":"grapes"}],
}
I am currently using two queries to get this done:
db.collection.remove({'fruits':{$size:1, $elemMatch:{'name': 'bananas'} }}) [this will remove the document when only one fruit present]
and
db.collection.update({},{$pull:{'fruits':{'name':'bananas'}}},{multi: true}) [this will remove the entry 'bananas' from the array]
Is there any way to combine these into one query?
EDIT: Final take
-- I guess there is no "one query" to perform the above tasks since the intents are very different of both the actions.
-- The best that can be performed is: club the actions into a bulk_write query which saves on the network I/O(as suggested in the answer by Neil). This is believe is more beneficial when you have multiple such actions being fired. Also, bulk_write can provide the feature of locking in the sense that the "ordered" mode of the bulk_write makes the actions sequential, breaking and halting execution in case of error.
Hence bulk_write is more beneficial when the actions performed need to be sequential. Somewhat like "chaining" in JS. There is also the option to perform un-ordered bulk_writes.
Also, the actions specified in the bulk write, operate on the collection level as individual actions.
You basically want bulk_write() here to do them both. Also Use $exists to ensure there's only one element:
from pymongo import UpdateMany, DeleteMany
db.collection.bulk_write(
[
UpdateMany(
{ "fruits.1": { "$exists": True }, "fruits.name": "bananas" },
{ "$pull":{
'fruits': { 'name':'bananas' }
}}
),
DeleteMany(
{ "fruits.1": { "$exists": False }, "fruits.name": "bananas" }
)
],
ordered=False
)
You don't really need $elemMatch for "one" condition and you should be using update_many() and in this case UpdateMany() instead of { "multi": true }. And that option is different in "pymongo" anyway. Then of course there is delete_many() or DeleteMany() for the "bulk" context.
Bulk operations send one request with one response, which is better than sending multiple requests. Also "update" and "delete" are two different things, but the single request can combine just like this.
The $size operator is valid but $exists can apply to a "range" where $size cannot, so it's generally a bit more flexible.
i.e Just as a $exists range example
# Array between 2 and 4 elements
db.collection.update_many(
{
"fruits.1": { "$exists": True },
"fruits.4": { "$exists": False },
"fruits.name": "bananas"
},
{ "$pull":{
'fruits': { 'name':'bananas' }
}}
)
And of course in the context here you actually want to know the difference between other possible things in the array and those with "only" a single "bananas".
The ordered=False here actually refers to two different ways that "bulk write" requests can be handled
Ordered - Where True ( which is the "default" ) then the operations are executed in "serial order" as they appear in the array of operations sent with the "bulk write". If any error occurs here then the batch stops execution at the point of the error and returns an exception.
UnOrdered - Where False the operations are executed in "parallel" within reasonable constraints on the server. If any error occurs there is still an exception raised, however this does not stop other operations within the "bulk write" from completing. Any errors are returned with the "array index" from the list provided to the command of which operation caused the error.
This option can used to "tune" the desired behavior in particular to error reporting and continuation, and also allows a degree of "parallelism" to the execution where "serial" is not actually required of the operations. Since these two statements do not actually depend on one or the other and will in fact select different documents anyway, then ordered=False is probably the better option in terms of efficiency here.
db.users.aggregate(
[{
$project: {
data: {
$filter: {
input: "$fruits",
as: "filterData",
cond: { $ne: [ "$$filterData.name", 'bananas' ] }
}
}
}
},
{
$unwind: {
path : "$data",
preserveNullAndEmptyArrays : false
}
},
{
$group: {
_id:"$_id",
data: { $addToSet: "$data" }
}
},
])
I think above query would give you perfect results

How to maintain the uniqueness based on a particular fieldin array Without using Unique index

I have the document like this.
[{
"_id" : ObjectId("aaa"),
"host": "host1",
"artData": [
{
"aid": "56004721",
"accessMin": NumberLong(1481862180
},
{
"aid": "56010082",
"accessMin": NumberLong(1481861880)
},
{
"aid": "55998802",
"accessMin": NumberLong(1481861880)
}
]
},
{
"_id" : ObjectId("bbb"),
"host": "host2",
"artData": [
{
"aid": "55922560",
"accessMin": NumberLong(1481862000)
},
{
"aid": "55922558",
"accessMin": NumberLong(1481861880)
},
{
"aid": "55940094",
"accessMin": NumberLong(1481861760)
}
]
}]
while updating any document, duplicate "aid" should not be added again in the array.
One option i got is using the unique index on artData.aid field. But building indexes is not preferred as i wont need it as per the requirement.
Is there any way to solve this?
Option 1: While designing Schema for that document use unique:true.
for example:
var newSchema = new Schema({
artData: [
{
aid: { type: String, unique: true },
accessMin: Number
}]
});
module.exports = mongoose.model('newSchema', newSchema );
Option 2: refer a link to avoid duplicate
As per this doc, you may use a multikey index as follows:
{ "artData.aid": 1 }
That being said, since you dont want to use a multikey index, another option for insertion is to
Query the document to find artData's that match the aid
Difference the result set with the set you are about to insert
remove the items that match your query
insert the remaining items from step 2
Ideally your query from step 1 wont return a set that is too large -- making this a surprisingly fast operation. That said, It's really based on the number of duplicates you assume you will be trying to insert. If the number is really high, the result of the query from step 1 could return a large set of items, in which case this solution may not be appropriate, but its all I've got for you =(.
My suggestion is to really re-evaluate the reason for not using multikey indexing

With MongoDB, can I sort on the parent document, and then on a child array?

I'm reading a about aggregation in search of solving my issue, but I'm not sure if its best for my case, or perhaps I need to re-model how I'm storing the data.
Consider the following Document:
{
title: "ParentCategory",
sort: 1,
children: [
{
title: "ChildA",
sort: 1,
},
{
title: "ChildB",
sort: 2
}
]
}
And the following query:
db.Categories.find({},
{
sort: { sort: 1 }
}
);
I want to sort first by the parent categories... that's no problem. But I also want the child categories to obey the sort order.
I've read suggestions to order them in the array that I want them, not using the sort field, and also read about aggregation, but seemed complex for this. Perhaps I should be modeling the data differently. I want to easily be able to change a sort for a particular category or child category later for certain reasons.
Tried using:
sort: { "sort" : 1, "children.sort" : 1 }
That didn't work.
Sorry for the newbie question. New to Mongo... like really new.
It's probably more efficient to keep the arrays sorted in the collection instead of sorting them on every query. Do the sort on the array before inserting the document, and every time you push an element into the array make mongo re-sort the array for you, like this,
db.cateegories.update(
{...}, // query
{
"$push": {
"children": {
"$each": [ {
"title" : "ChildC",
"sort" : 3
}, ...
],
"$sort": {"sort" : 1}
}
}
}
)
see more info here.

Mongo find query for longest arrays inside object

I currently have objects in mongo set up like this for my application (simplified example, I removed some irrelevant fields for clarity here):
{
"_id" : ObjectId("529159af5b508dd71500000a"),
"c" : "somecontent",
"l" : [
{
"d" : "2013-11-24T01:43:11.367Z",
"u" : "User1"
},
{
"d" : "2013-11-24T01:43:51.206Z",
"u" : "User2"
}
]
}
What I would like to do is run a find query to return the objects which have the highest array length under "l" and sort highest->lowest, limit to 25 results. Some objects may have 1 object in the array, some may have 100. I'd like to find out which ones have the most under "l". I'm new to mongo and got everything else to work up until this point, but I just can't figure out the right parameters to get this specific query. Where I'm getting confused is how to handle counting the length of the array, sorting, etc. I could manually code this by parsing everything in the collection, but I'm sure there has to be a way for mongo to do this far more efficiently. I'm not against learning, if anyone knows any resources for more advanced queries or could help me out I'd really be thankful as this is the last piece! :-)
As a side note, node.js and mongo together is amazing and I wish I started using them in conjunction a long time ago.
Use the aggregation framework. Here's how:
db.collection.aggregate( [
{ $unwind : "$l" },
{ $group : { _id : "$_id", len : { $sum : 1 } } },
{ $sort : { len : -1 } },
{ $limit : 25 }
] )
There is no easy way to do this with your existing schema. The reason for this is that there is nothing in mongodb to find the size of your array length. Yes, you have $size operator, but the way it works is just to find all the arrays of a specific length.
So you can not sort your find query based on the length of the array. The only reasonable way out is to add additional field to your schema which will hold the length of the array (you will have something like "l_length : 3" in additional to your fields for every document). Good thing is that you can do it easily by looking at this relevant answer and after this you just need to make sure to increment or decrement this value when you are modifying the array.
When you will add this field, you can easily sort it by that field and moreover you can take advantage of indexes.
There is no straight approach to do this,
You can try adding size field in your document using $size,
$addFields to add new field total to get total elements in l array
$sort by total in descending order
$limit to select single document
$project to remove total field if you don't needed
db.collection.aggregate([
{ $addFields: { total: { $size: "$l" } } },
{ $sort: { total: -1 } },
{ $limit: 25 }
// { $project: { total: 0 } }
])
Playground