I have a mongoDB collection containing items that can be identified through multiple identification schemes
{
"identification" : {
"SCHEME1" : [ "9181983" ],
"SCHEME2" : [ "ABC" , "CDE" ],
"SCHEME4" : ["FDE"]
}
}
{
"identification" : {
"SCHEME2" : [ "LALALAL" ],
"SCHEME5" : [ "CH98790789879" ]
}
},
An item will most likely have not all identification schemes, some have (like the example above ) 1-2-4 others may have different ones. The number of identification schemes is not finally defined and will grow. Every identification can only exists once.
I want to perform two different queries:
Seach an item with scheme and identification, e.g.
db.item.find({"identification.SCHEME2": "CDE"})
Seach all items with a specific identification scheme, e.g.
db.item.find({"identification.SCHEME2": {$exists: true}})
My approach was to create sparse indexes:
db.item.createIndex( { identification.SCHEME1: 1 }, { sparse: true, unique: true} );
db.item.createIndex( { identification.SCHEME2: 1 }, { sparse: true, unique: true } );
db.item.createIndex( { identification.SCHEME3: 1 }, { sparse: true, unique: true } );
and so on ....
This approach worked perfectly until I found out that there is a limit of 64 indexes on one collection in mongoDB.
Has anyone an idea how I could index the whole field "identification" with one index ? Or is my document structure wrong ? Any ideas are welcome, thanks.
I encountered the same problem in a reporting db that had dimensions that I wanted to use in the find clause. The solution was to use a fixed field to hold the data as a k/v pair and index on that.
In your case:
{
"identification" : [
{"k":"SCHEME1", "v":[ "9181983" ]},
{"k":"SCHEME2", "v":[ "ABC" , "CDE" ]},
{"k":"SCHEME4", "v":["FDE"]}
]
}
If you now create a compound index over {"identification.k":1, "identification.v":1} you can search it with the index like:
db.item.find({"identification.k":"SCHEME2", "identification.v":"CDE"})
Downside is you need to update your schema...
Related
Context
I have a users collection with an array of key-value pairs like so:
{
name: 'David',
customFields: [
{ key: 'position', value: 'manager' },
{ key: 'department', value: 'HR' }
]
},
{
name: 'Allison',
customFields: [
{ key: 'position', value: 'employee' },
{ key: 'department', value: 'IT' }
]
}
The field names in customFields are configurable by the application users, so in order for it to be indexable I store them as an array of key-value pairs. Index { 'customFields.key': 1, 'customFields.value': 1} works quite well. Problem is when I want to retrieve the results in either order (ascending or descending). Say I want to sort all users having the custom field position in either order. The below query gives me ascending order:
db.users.find({ customFields: { $elemMatch: { key: 'position' } } })
However, I couldn't figure out how to get the opposite order. I'm pretty sure with the way the indexes are structured, if I could tell mongodb to traverse the index backwards I would get what I want. The other options is to have an other 2 indexes to cover both directions, however it's more costly.
Is there a way to specify the traversal direction? If not, what are some good alternatives? Much appreciated.
EDIT
I'd like to further clarify my situation. I have tried:
db.users.find({ customFields: { $elemMatch: { key: 'position' } } })
.sort({ 'customFields.key': 1, 'customFields.value': 1 })
db.users.find({ customFields: { $elemMatch: { key: 'position' } } })
.sort({'customFields.value': 1 })
These two queries only sort the documents after they have been filtered, meaning the sorting is applied on all the custom fields they have, not on the field matching the query (position in this case). So it seems using the sort method won't be of any help.
Not using any sorting conveniently returns the documents in the correct order in the ascending case. As can be seen in the explain result:
"direction" : "forward",
"indexBounds" : {
"customFields.key" : [
"[\"position\", \"position\"]"
],
"customFields.value" : [
"[MinKey, MaxKey]"
]
}
I'm using exact match for customFields.key so I don't care about it's order. Within each value of customFields.key, the values of customFields.value are arranged in the order specified in the index, so I just take them out as it is and all is good. Now if I could do something like:
db.users.find({ customFields: { $elemMatch: { key: 'position' } } })
.direction('backwards')
It would be the equivalent of using the index { 'customFields.key': -1, 'customFields.value': -1 }. Like I said I don't care about customFields.key order, and 'customFields.value': -1 gives me exactly what I want. I've searched mongodb documentation but I couldn't find anything similar to that.
I could retrieve all the documents matching customFields.key and do the sorting myself, but it's expensive as the number of documents could get very big. Ideally all the filtering and sorting is solved by the index.
Here is the compound index you created:
db.users.createIndex( { "customFields.key": 1, "customFields.value": 1 } )
This index will allow traversal either using both fields in ascending order:
db.users.sort( { "customFields.key": 1, "customFields.value": 1 } )
Or, it can also support traversal using both keys in descending order:
db.users.sort( { "customFields.key": -1, "customFields.value": -1 } )
If you need mixed behavior, i.e. ascending on one field but descending on the other, then you would need to add a second index:
db.users.createIndex( { "customFields.key": 1, "customFields.value": -1 } )
Note that after adding this second index, all four possible combinations are now supported.
I have a collection named items with three documents.
{
_id: 1,
item: "Pencil"
}
{
_id: 1,
item: "Pen"
}
{
_id: 1,
item: "Sharpner"
}
How could I query to get the document as round-robin?
Consider I got multiple user requests at the same time.
so one should get Pencil other will get Pen and then other will get Sharpner.
then start again from the first one.
If changing schema is a choice I am also ready for that.
I think I found a way to do this without changing the schema. It is based on skip() and limit(). Moreover you can specify to keep the internal sorting order for returned documents but as the guide says you should not rely on this, especially because you are losing performance since the indexing is overridden:
The $natural parameter returns items according to their natural order
within the database. This ordering is an internal implementation
feature, and you should not rely on any particular structure within
it.
Anyway, this is the query:
db.getCollection('YourCollection').find().skip(counter).limit(1)
Where counter stores the current index for your documents.
Few things to start..
_id has to be unique across a collection especially when the collection is only a replication set.
This is a very stateful requirement and would not work well with a distributed set of services for example.
With that said, assuming you really just want to iterate from the database i would use cursors to accomplish this. This will do a collection scan and is very inefficient for the record.
var myCursor = db.items.find().sort({_id:1});
while (myCursor.hasNext()) {
printjson(myCursor.next());
}
My suggestion is that you should pull all results from the database at once and do your iteration in the application tier.
var myCursor = db.inventory.find().sort({_id:1});
var documentArray = myCursor.toArray();
documentArray.foreach(doSomething)
If this is about distribution you may consider fetching random documents instead of round-robin via aggregation/$sample:
db.collection.aggregate([
{
"$sample": {
"size": 1
}
}
])
playground
Or there is options to randomize via $rand ...
Use text findOneAndUpdate after restructuring the data objects
db.counter.findOneAndUpdate( {}, pipeline)
{
"_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pencil",
"counter" : 1
}
db.counter.findOneAndUpdate( {}, pipeline)
{
"_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pen",
"counter" : 2
}
where the data object is now:
{
"_id" : ObjectId("6242fe3bc1551d0f3562bcb2"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pencil",
"counter" : 1
}
and the pipeline is:
[{$project: {
values: 1,
selected: {
$arrayElemAt: [
'$values',
'$counter'
]
},
counter: {
$mod: [
{
$add: [
'$counter',
1
]
},
{
$size: '$values'
}
]
}
}}]
This has some merits:
Firstly, using findOneAndUpdate means that moving the pointer to the
next item in the list and reading the object happen at once.
Secondly,by using the {$size: "$values"} adding a value into the list
doesn't change the logic.
And, instead of a string an object could be used instead.
Problems:
This method would be unwieldy with more than 10's of entries
It is hard to prove that this method works as advertised so there is an accompanying Kotlin project. The project uses coroutines so it is calling a find/update asynchronously.
text GitHub
The alternative (assuming 50K items and not 3):
Set-up a simple counter {counter: 0} and update as follows:
db.counter.findOneAndUpdate({},
[{$project: {
counter: {
$mod: [
{
$add: [
'$counter',
1
]
},
50000
]
}
}}])
Then use a simple select query to find the right document.
I've updated the github to include this example.
I have the document like this.
[{
"_id" : ObjectId("aaa"),
"host": "host1",
"artData": [
{
"aid": "56004721",
"accessMin": NumberLong(1481862180
},
{
"aid": "56010082",
"accessMin": NumberLong(1481861880)
},
{
"aid": "55998802",
"accessMin": NumberLong(1481861880)
}
]
},
{
"_id" : ObjectId("bbb"),
"host": "host2",
"artData": [
{
"aid": "55922560",
"accessMin": NumberLong(1481862000)
},
{
"aid": "55922558",
"accessMin": NumberLong(1481861880)
},
{
"aid": "55940094",
"accessMin": NumberLong(1481861760)
}
]
}]
while updating any document, duplicate "aid" should not be added again in the array.
One option i got is using the unique index on artData.aid field. But building indexes is not preferred as i wont need it as per the requirement.
Is there any way to solve this?
Option 1: While designing Schema for that document use unique:true.
for example:
var newSchema = new Schema({
artData: [
{
aid: { type: String, unique: true },
accessMin: Number
}]
});
module.exports = mongoose.model('newSchema', newSchema );
Option 2: refer a link to avoid duplicate
As per this doc, you may use a multikey index as follows:
{ "artData.aid": 1 }
That being said, since you dont want to use a multikey index, another option for insertion is to
Query the document to find artData's that match the aid
Difference the result set with the set you are about to insert
remove the items that match your query
insert the remaining items from step 2
Ideally your query from step 1 wont return a set that is too large -- making this a surprisingly fast operation. That said, It's really based on the number of duplicates you assume you will be trying to insert. If the number is really high, the result of the query from step 1 could return a large set of items, in which case this solution may not be appropriate, but its all I've got for you =(.
My suggestion is to really re-evaluate the reason for not using multikey indexing
I have a data
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
},
{ "name":"QQQ",
"keyword":"key3",
"city":"xyz"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
}]
and i need to search records which have name= "BS" OR keyword="key2" with the help of query
db.collection.find({"$OR" : [{"name":"BS"}, {"keyword":"Key2"}]});
These records i need in the sequence
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
}]
but i am getting in following sequences:
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
}]
Please provide some suggestion i am stuck with this problem since 2 days.
Thanks
The order of results returned by MongoDB is not guaranteed unless you explicitly sort your data using the sort function. For smaller datasets you maybe "lucky" in the sense that the results are always returned in the same order, however, for bigger datasets and in particular when you have sharded Mongo clusters this is very unlikely. As proposed by Yathish you need to explicitly order your results using the sort function. Based on the suggested output, it seems you want to sort by name in descending order so I have set the sorting flag to -1 for the field name.
db.collection.find({"$or" : [{"name":"BS"}, {"keyword":"Key2"}]}).sort({"name" : -1});
If you need a more complex sorting algorithm as specified in your comment, you can convert your results to a Javascript array and create a custom sort function. This sort function will first list documents with a name equal to "BS" and then documents containing the keyword "Key2"
db.data.find({
"$or": [{
"name": "BS"
}, {
"keyword": "Key2"
}]
}).toArray().sort(function(doc1, doc2) {
if (doc1.name == "BS" && doc2.keyword == "Key2") {
return -1
} else if (doc2.name == "BS" && doc1.keyword == "Key2") {
return 1
} else {
return doc1.name < doc2.name
}
});
What I'm trying to do sounds logical to me however I'm not sure.
I am trying to improve part of a MongoDB collection by using Multikeys.
For example: I have multiple documents with the following format:
Document:
{
"_id": ObjectId("528a4177dbcfd00000000013"),
"name": "Shopping",
"tags": [
"retail",
"shop",
"shopping",
"store",
"grocery"
]
}
Query:
Up until now, I have been using the following query to match the tags field.
var tags = Array("store", "shopping", "etc");
db.collection.findOne({ 'tags': { $in: tags } }, { name: true });
This has been working well, however I think Multikeys should be used in this instance to improve speed & performance. Please, correct me if I am wrong!
Indexing:
I issued the following command in an attempt to index the tags.
db.collection.ensureIndex( { tags: 1 }, { safe: true }, function(err, doc) {} );
ensureIndex was successful.
Result:
However when using RockMongo's explain feature on the above query, the result is:
{
"indexOnly": false,
"indexBounds": {
"tags": [
[
"etc",
"etc"
],
[
"shopping",
"shopping"
],
[
"store",
"store"
]
]
}
}
Questions:
Why is indexing not working, is there something else I have to do?
Is Multikey indexing in this case beneficial? (I'm assuming yes.)
Is there another form of indexing that would be more beneficial?
Edit:
I've just noticed that in the RockMongo explain data there is a field:
"isMultiKey": true,
could it be that Multikeys are being used and I've completely misunderstood that it IS being indexed?
As you say in your edit, and coming from the part of explain you did not post is that isMulyiKey: true along with other information on the cursor are showing that the index is being used. The indexBounds are another indicator.
What is being described by indexOnly is the fact that your query contains another field, name, which is not part of the index. When the query optimizer sees that all elements of the query can be met by using the fields from within the index this is referred to as a covered query and the indexOnly property here is set to true.
So in an Ideal situation your query and results are using the information from the index only and MongoDB does not also have to look up the entry from the index in the collection in order to return more data.