MongoDB: How to create sparse+unique indexes with optional keys? - mongodb

I have a collection stats.daily that stores multiple dimensional data, but depending the type it requires other specific keys.
And also it's required that these keys (depending the type) are unique.
Example, in this case url key is required because type is "url":
{
"site": 1,
"type": "url",
"url": "http://google.com/"
"totals": {
"variable1": 12, // incrementing values
"variable2": 32
}
}
Another example:
{
"site": 1,
"type": "domain",
"domain": "google.com",
"totals": {...}
}
So I create the indexes:
db.coll.createIndex({site: 1, type: 1, url: 1}, {unique: true, sparse: true});
db.coll.createIndex({site: 1, type: 1, domain: 1}, {unique: true, sparse: true});
It doesn't works, it returns exception: E11000 duplicate key error index.
Makes sense for the unique index, but it should work for the sparse part.
What's the best solution to accomplish what I need?
Edit:
In some cases, depending the type, there may have more than one related key:
{
"site": 1,
"type": "search",
"engine": "google",
"term": "programming",
"totals": {...}
}
And the unique index will include these two new keys.

Related

Will unique indexes ignore fields that don't exist?

I have a MongoDB index:
Reservation.index(
{
source: 1,
accountID: 1, // <-- This is the only required field
confirmationCode_1: 1,
confirmationCode_2: 1,
confirmationCode_3: 1
},
{name: "Unique_reservation_index_1", unique: true}
);
Here are some sample entries I have in the database and I want to make sure that duplicates can't be made:
[
{
source: "A",
accountID: "AAA",
confirmationCode_1: "ABC"
},
{
source: "B",
accountID: "BBB",
confirmationCode_1: "ABC"
confirmationCode_2: "DEF"
},
{
source: "C",
accountID: "CCC",
confirmationCode_3: "GHI"
}
]
Sometimes I have confirmationCode_1 set and not confirmationCode_2 other times I both confirmationCode_1 and confirmationCode_2 set. Other times I have confirmationCode_3 set.
I want MongoDB to allow me to have the following doc (missing the confirmationCode_2 and confirmationCode_3 fields). Will it let me with the above index?
{
source: "A",
accountID: "123",
confirmationCode_1: "ABC"
}
Will it prevent me from adding two similar docs with confirmationCode_2 not defined or will that be considered the same? For example, if it does allow the above doc, will this be prevented?
{
source: "A",
accountID: "AAA",
confirmationCode_1: "ABC_2"
}
If I don't supply the confirmationCode_2 field, does it set the confirmationCode_2 field to null?
If I change the unique index to include sparse: true, how will it act differently?
Reservation.index(
{
source: 1,
accountID: 1, // <-- This is the only required field
confirmationCode_1: 1,
confirmationCode_2: 1
},
{name: "Unique_reservation_index_1", unique: true, sparse: true}
);
From MongoDB document on unique Index,
A unique index ensures that the indexed fields do not store duplicate values
undefined / empty / null field is allowed as long as you do not have the same tuple of values of the fields in the compound index.
Below is my actual testing result:
You can observe that the document is successfully added under the unique index.
Will unique indexes ignore fields that don't exist?
No, the index will store a null value for this field, MongoDB will enforce uniqueness on the combination of the index key values.
//You have this docuemt on you MongoDB
{
source: "A",
accountID: "123",
confirmationCode_1: "ABC"
}
//You try to insert the next document, note the missing "accountID" field
//Even though "source" and "confirmationCode_1"
//This operation SUCCESS because
//MongoDB will enforce uniqueness on the "combination" of the index key values
{
source: "A",
confirmationCode_1: "ABC"
}
//You try to insert the next document
//The operation FAIL to insert the document
//because of the violation of the unique constraint
//on the combination of key values
{
source: "A",
accountID: "123",
confirmationCode_1: "ABC"
}
What if you change unique: true to unique: true, sparse: true ?
An index that is both sparse and unique prevents collection from
having documents with duplicate values for a field but allows multiple
documents that omit the key.

How to search for child objects inside parent objects in MongoDB?

I'm trying to search any value that match with a "name" param, inside any object with any level in a MongoDB collection.
My BSON looks like this:
{
"name": "a",
"sub": {
"name": "b",
"sub": {
"name": "c",
"sub": [{
"name": "d"
},{
"name": "e",
"sub": {
"name": "f"
}
}]
}
}
}
I've created an index with db.collection.createIndex({"name": "text"}); and it seems to work, because it has created more than one.
{
"numIndexesBefore" : 1,
"numIndexesAfter" : 6,
"note" : "all indexes already exist",
"ok" : 1
}
But, when I use this db.collection.find({$text: {$search : "b"}}); to search, it does not work. It just searches at the first level.
I cannot do a search with precision, because the dimensions of the objects/arrays is dynamic and can grow or shrink at any time.
I appreciate your answers.
MongoDB cannot build an index on arbitrarily-nested objects. The index only occurs for the depth specified. In your case, the $text search will only check the top-level name field, but not the name field for any of the nested sub-documents. This is an inherent limitation for indexing.
To my knowledge, MongoDB has no support for handling these kinds of deeply-nested data structures. You really need to break your data out into separate documents in order to handle it correctly. For example, you could break it out into the following:
[
{
"_id": 0,
"name": "a",
"root_id": null,
"parent_id": null
},
{
"_id": 1,
"name": "b",
"root_id": 0,
"parent_id": 0
},
{
"_id": 2,
"name": "c",
"root_id": 0,
"parent_id": 1
},
{
"_id": 3,
"name": "d",
"root_id": 0,
"parent_id": 2
},
{
"_id": 4,
"name": "e",
"root_id": 0,
"parent_id": 2
},
{
"_id": 5,
"name": "f",
"root_id": 0,
"parent_id": 4
}
]
In the above structure, our original query db.collection.find({$text: {$search : "b"}}); will now return the following document:
{
"_id": 1,
"name": "b",
"root_id": 0,
"parent_id": 0
}
From here we can retrieve all related documents by retrieving the root_id value and finding all documents with an _id or root_id matching this value:
db.collection.find({
$or: [
{_id: 0},
{root_id: 0}
]
});
Finding all root-level documents is a simple matter of matching on root_id: null.
The drawback, of course, is that now you need to assemble these documents manually after retrieval by matching a document's parent_id with another document's _id because the hierarchical information has been abstracted away. Using a $graphLookup could help alleviate this somewhat by matching each subdocument with a list of ancestors, but you would still need to determine the nesting order manually.
Regardless of how you choose to structure your documents moving forward, this sort of restructure is going to be needed if you're going to query on arbitrarily-nested content. I would encourage you to consider different possibilities and determine which is most suited for your specific application needs.

Mongodb partial index on one of the indexed field

I want to create partial index on one of the indexed field
but I am failing miserably
db.Comment.createIndex(
{ "siteId": 1,
{ { "parent": 1} ,{partialFilterExpression:{parent:{$exists: true}}}},
"updatedDate": 1,
"label": 1 }
);
how to do that?
the field "parent" is the one I want to index partially
In roboMongo I get the error
Error: Line 3: Unexpected token {
You pass the partialFilterExpression object as a second parameter to createIndex. See the documentation.
db.Comment.createIndex(
{ "siteId": 1, "parent": 1, "updatedDate": 1, "label": 1 },
{ partialFilterExpression: { parent: { $exists: true } }
);
So don't think of it as partially indexing a field; your partial filter expression defines which documents to include in your index.

MongoDB Optional Unique Index

I have a MongoDB schema for users that looks something like this:
{
userId: "some-string",
anonymousId: "some-other-string",
project: {"$oid": "56d06bb6d9f75035956fa7ba"}
}
Users must have either a userId or an anonymousId. As users belong to a project, the model also has a reference called project, which links to the project collection.
Any userId or anonymousId value has to be unique per project, so I created two compound indexes as follows:
db.users.createIndex({ "userId": 1, "project": 1 }, { unique: true })
db.users.createIndex({ "anonymousId": 1, "project": 1 }, { unique: true })
However as not both userId and anonymousId have to be provided but just either one of them, MongoDB throws a duplicate key error for null values (for example if there is a second user with a provided anonymousId but no userId).
I therefore tried to add a sparse: true flag to the compound indexes, but this obviously only works if both fields are empty. I also tried adding the sparse flag only to the fields and not the compound indexes, but this doesn't work either.
To give an example, let's say I have the following three users in the collection:
{ userId: "user1", anonymousId: null, project: {"$oid": "56d06bb6d9f75035956fa7ba"}}
{ userId: "user2", anonymousId: "anonym", project: {"$oid": "56d06bb6d9f75035956fa7ba"}}
{ userId: "user3", anonymousId: "random", project: {"$oid": "56d06bb6d9f75035956fa7ba"}}
The following should be possible:
I want to be able to insert another user {userId: "user4", anonymousId: null} for the same project (without getting a duplicate key error)
However if I try to insert another user with {userId: "user3"} or another user with {anonymousId: "random"} there should be a duplicate key error
How else can I achieve this?
If you are using MongoDB 3.2, you can use unique partial index instead of sparse index.
Partial index is actually recommended over sparse index
Example
db.users.createIndex({ "userId": 1, "project": 1 },
{ unique: true, partialFilterExpression:{
userId: { $exists: true, $gt : { $type : 10 } } } })
db.users.createIndex({ "anonymousId": 1, "project": 1 },
{ unique: true, partialFilterExpression:{
anonymouseId: { $exists: true, $gt : { $type : 10 } } } })
In above example, Unique index will only be created when userId is present and doesn't contain null value. Same holds true to anonymousId too.
Please see https://docs.mongodb.org/manual/core/index-unique/#unique-partial-indexes
index a,c - cannot be sparse as is unique.....
index b,c - cannot be sparse as is unique.....
what about index a,b,c ?
db.benjiman.insert( { userId: "some-string", anonymousId:
"some-other-string", project: {"_oid": "56d06bb6d9f75035956fa7ba"}
})
db.benjiman.insert( { userId: "some-string2", project: {"_oid":
"56d06bb6d9f75035956fa7ba"} })
db.benjiman.insert( { anonymousId: "some-other-string2", project:
{"_oid": "56d06bb6d9f75035956fa7ba"} })
db.benjiman.createIndex({ "userId": 1, "anonymousId": 1, "project": 1 }, { unique: true })

MongoDB - Get Names of All Keys Matching Criteria in a Collection

As the title says, I need to retrieve the names of all the keys in my MongoDB collection, BUT I need them split up based on a key/value pair that each document has. Here's my clunky analogy: If you imagine the original collection is a zoo, I need a new collection that contains all the keys Zebras have, all the keys Lions have, and all the keys Giraffes have. The different animal types share many of the same keys, but those keys are meant to be specific to each type of animal (because the user needs to be able to (for example) search for Zebras taller than 3ft and giraffes shorter than 10ft).
Here's a bit of example code that I ran which worked well - it grabbed all the unique keys in my entire collection and threw them into their own collection:
db.runCommand({
"mapreduce" : "MyZoo",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "MyZoo" + "_keys"
})
I'd like a version of this command that would look through the MyZoo collection for animals with "type":"zebra", find all the unique keys, and place them in a new collection (MyZoo_keys) - then do the same thing for "type":"lion" & "type":"giraffe", giving each "type" its own array of keys.
Here's the collection I'm starting with:
{
"name": "Zebra1",
"height": "300",
"weight": "900",
"type": "zebra"
"zebraSpecific1": "somevalue"
},
{
"name": "Lion1",
"height": "325",
"weight": "1200",
"type": "lion",
},
{
"name": "Zebra2",
"height": "500",
"weight": "2100",
"type": "zebra",
"zebraSpecific2": "somevalue"
},
{
"name": "Giraffe",
"height": "4800",
"weight": "2400",
"type": "giraffe"
"giraffeSpecific1": "somevalue",
"giraffeSpecific2": "someothervalue"
}
And here's what I'd like the MyZoo_keys collection to look like:
{
"zebra": [
{
"name": null,
"height": null,
"weight": null,
"type": null,
"zebraSpecific1": null,
"zebraSpecific2": null
}
],
"lion": [
{
"name": null,
"height": null,
"weight": null,
"type": null
}
],
"giraffe": [
{
"name": null,
"height": null,
"weight": null,
"type": null,
"giraffeSpecific1": null,
"giraffeSpecific2": null
}
]
}
That's probably imperfect JSON, but you get the idea...
Thanks!
You can modify your code to dump the results in a more readable and organized format.
The map function:
Emit the type of animal as key, and an array of keys for
each animal(document). Leave out the _id field.
Code:
var map = function(){
var keys = [];
Object.keys(this).forEach(function(k){
if(k != "_id"){
keys.push(k);
}
})
emit(this.type,{"keys":keys});
}
The reduce function:
For each type of animal, consolidate and return the unique keys.
Use an Object(uniqueKeys) to check for duplicates, this increases the running
time even if it occupies some memory. The look up is O(1).
Code:
var reduce = function(key,values){
var uniqueKeys = {};
var result = [];
values.forEach(function(value){
value.keys.forEach(function(k){
if(!uniqueKeys[k]){
uniqueKeys[k] = 1;
result.push(k);
}
})
})
return {"keys":result};
}
Invoking Map-Reduce:
db.collection.mapReduce(map,reduce,{out:"t1"});
Aggregating the result:
db.t1.aggregate([
{$project:{"_id":0,"animal":"$_id","keys":"$value.keys"}}
])
Sample o/p:
{
"animal" : "lion",
"keys" : [
"name",
"height",
"weight",
"type"
]
}
{
"animal" : "zebra",
"keys" : [
"name",
"height",
"weight",
"type",
"zebraSpecific1",
"zebraSpecific2"
]
}
{
"animal" : "giraffe",
"keys" : [
"name",
"height",
"weight",
"type",
"giraffeSpecific1",
"giraffeSpecific2"
]
}