How to Create a nested index in MongoDB? - mongodb

A. How do I index nested and all of it's values?
B. How do I index valuetwo?
{
id: 00000,
attrs: {
nested:{
value: value1,
valuetwo: value2,
}
}
}
I've looked here: http://www.mongodb.org/display/DOCS/Indexes, and the docs to my knowledge, aren't clear about indexing things that aren't nested.

You'd create them just as if you were creating an index on a top level field:
db.collection.createIndex({"attrs.nested.value": 1})
You do need to explicitly create indexes on each field.

MongoDB automatically creates a multikey index if any indexed field is an array; you do not need to explicitly specify the multikey type.
This will work for both the scenario's
db.coll.createIndex( { "addr.pin": 1 } )
Scenario 1 nested OBJECTS
{
userid: "1234",
addr: {
pin:"455522"
}
},
{
userid: "1234",
addr: {
pin:"777777"
}
}
Scenario 2 nested Arrays
{
userid: "1234",
addr: [
{ pin:"455522" },
{ pin:"777777" },
]
}
https://docs.mongodb.com/manual/core/index-multikey/

A. to index all the properties in "nested" you will have to index them separately:
db.collection.createIndex({"attrs.nested.value": 1});
db.collection.createIndex({"attrs.nested.valuetwo": 1});
This can be done in one command with:
db.collection.createIndexes([{"attrs.nested.value": 1}, {"attrs.nested.valuetwo": 1}]);
B. to index just "valuetwo":
db.collection.createIndex({"attrs.nested.valuetwo": 1})
Use createIndex over ensureIndex as ensureIndex is Deprecated since version 3.0.0

It looks like there may have been a new feature to address this since these answers have been written.
Mongodb has wildcard indexes that allow you to create an index for a field that contains varying keys.
From the documentation:
If the field is a nested document or array, the wildcard index recurses into the document/array and indexes all scalar fields in the document/array.
So the index in the question would be constructed like this:
db.collections.createIndex({ "attrs.nested.$**": 1 })
You can read more about this here!
The previous answers nearly gave me a heart attack.

Related

MongoDB new ObjectId over array giving me same Id for all

I am trying to update items in an array with unique ObjectIds (meaning add an object ID to array object that are missing them)
If I have an array of shirt objects in my collection and I try this:
db.people.update({
$and : [
_id: ObjectId('5eeb44c6a042791d28a8641f'),
{
$or: [
{ 'shirts._id': { $eq:null } },
{ 'shirts._id':{ $exists:false } }
]
}
]
},{
$set: { 'shirts.$[]._id': new ObjectId() }
},{
"multi" : true
}
);
It generates IDENTICAL ObjectsIDs for each array element, I would put an unique index on this however, the use case probably wont see more then 2-3 items in the array with edge cases hitting 5-6, which seems like an abuse of an index
How can I update multiple records or multiple array objects with a unique ObjectId?
When you use $set you're telling mongo to set that value to all matching elements. If the elements in the array are already defined as schemas, mongo will issue new ObjectIds for each one of them automatically.
Alternatively, you can use forEach and iterate over each matching element creating a new ObjectId.

Delete documents where value is not an array

It seems that mongodb allows the same syntax for deleting documents where the value is a single value, and whenever the value is present in a collection:
This
db.SomeCollection.deleteMany({UserId: 12345});
Can affect both {UserId: 12345} and {UserId: [12345, 67895, 87897]}.
I understand that this schema isn't ideal; we are using a singular UserId prop to denote both a single id int, and an array of ints. However, I've inherited a database that makes use of mongodb's dynamic nature, if you catch my drift.
What is the safest way to execute a deleteMany query, specifying I only want to delete documents where the value is a single integer?
You can use $type operator to check field type but it's not so obvious in this case. You can't check if UserId is of type double because MongoDB will indicate that array of doubles is a double as well (more here). So you need to invert your logic and check if documents you're trying to remove are not an array (type 4):
db.SomeCollection.deleteMany({ $and: [ { UserId: 12345 }, { UserId: { $not: { $type: 4 } } } ] })

Create unique indexes for document's objects stored in an array

How do one create unique indexes for document's objects stored in array?
{
_id: 'documentId',
books: [
{
unique_id: 1,
title: 'Asd',
},
{
unique_id: 2,
title: 'Wsad',
}
...
]
}
One thing I can think of is autoincrementing. Or is there any mongo way to do so?
if you remove the _id field from your doc, mongo will automatically add one for you, which is:
guaranteed to be unique
contains the timestamp of creation
lots of other features.
see here: https://docs.mongodb.com/v3.2/reference/method/ObjectId/
Looking at the example object again, are you referring to the ids in the books array?
If so, you can assign them with ObjectIds as well, just like in the document root's _id field:
doc.books.forEach(x => { x.unique_id = new ObjectId() } );

MongoDB "filtered" index: is it possible?

Is it possible to index some documents of the collection "only if" one of the fields to be indexed has a particular value?
Let me explain with an example:
The collection "posts" has millions of documents, ALL defined as follows:
{
    "network": "network_1",
    "blogname": "blogname_1",
    "post_id": 1234,
    "post_slug": "abcdefg"
}
Let's assume that the distribution of the post is equally split on network_1 and network_2
My application OFTEN select the type of query based on the value of "network" (although sometimes I need the data from both networks):
For example:
www.test.it/network_1/blog_1/**postid**/1234/
-> db.posts.find ({network: "network_1" blogname "blog_1", post_id: 1234})
www.test.it/network_2/blog_4/**slug**/aaaa/
-> db.posts.find ({network: "network_2" blogname "blog_4" post_slug: "yyyy"})
I could create two separate indexes (network / blogname / post_id and network / blogname / post_slug) but I would get a huge waste of RAM, since 50% of the data in the index will never be used.
Is there a way to create an index "filtered"?
Example:
(Note the WHERE parameter)
db.posts.ensureIndex ({network: 1 blogname: 1, post_id: 1}, {where: {network: "network_1"}})
db.posts.ensureIndex ({network: 1 blogname: 1, post_slug: 1}, {where: {network: "network_2"}})
Indeed it's possible in MongoDB 3.2+ They call it partialFilterExpression where you can set a condition based on which index will be created.
Example
db.users.createIndex({ "userId": 1, "project": 1 },
{ unique: true, partialFilterExpression:{
userId: { $exists: true, $gt : { $type : 10 } } } })
Please see Partial Index documentation
As of MongoDB v3.2, partial indexes are supported. Documentation: https://docs.mongodb.org/manual/core/index-partial/
It's possible, but it requires a workaround which creates redundancy in your documents, requires you to rewrite your find-queries and limits find-queries to exact matches.
MongoDB supports sparse indexes which only index the documents where the given field exists. You can use this feature to only index a part of the collection by adding this field only to those documents you want to index.
The bad news is that sparse indexes can only include a single field. But the good news is, that this field can also contain an object with multiple fields, so you can still store all the data you want to search for in this field.
To do this, add a new field to the included documents which includes an object with the fields you search for:
{
"network": "network_1",
"blogname": "blogname_1",
"post_id": 1234,
"post_slug": "abcdefg"
"network_1_index_key": {
"blogname": "blogname_1",
"post_id": 1234
}
}
Your ensureIndex command would index the field network_1_index_key:
db.posts.ensureIndex( { network_1_index_key: 1 }, { sparse: true } )
A find-query which is supposed to use this index, must now query for the exact object of the field network_1_index_key:
db.posts.find ({
network_1_index_key: {
blogname: "blogname_1",
post_id: 1234
}
})
Doing this would likely only make sense when the documents you want to index are a very small part of the collection. When its about half, I would just create a regular index and live with it because the larger document-size could mitigate the gains from the reduced index size.
You can try create index on all field (network / blogname / post_id / post_slug)

How to remove duplicates based on a key in Mongodb?

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,
{ "_id" = ObjectId("50731xxxxxxxxxxxxxxxxxxxx"),
"source_references" : [
"_id" : ObjectId("5045xxxxxxxxxxxxxx"),
"name" : "xxx",
"key" : 123
]
}
I am having a lot of duplicate records in the collection having same source_references.key. (By Duplicate I mean, source_references.key not the _id).
I want to remove duplicate records based on source_references.key, I'm thinking of writing some PHP code to traverse each record and remove the record if exists.
Is there a way to remove the duplicates in Mongo Internal command line?
This answer is obsolete : the dropDups option was removed in MongoDB 3.0, so a different approach will be required in most cases. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.
If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:
db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})
This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.
Important Note: Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.
Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.
This is the easiest query I used on my MongoDB 3.2
db.myCollection.find({}, {myCustomKey:1}).sort({_id:1}).forEach(function(doc){
db.myCollection.remove({_id:{$gt:doc._id}, myCustomKey:doc.myCustomKey});
})
Index your customKey before running this to increase speed
While #Stennie's is a valid answer, it is not the only way. Infact the MongoDB manual asks you to be very cautious while doing that. There are two other options
Let the MongoDB do that for you using Map Reduce
Another way
You do programatically which is less efficient.
Here is a slightly more 'manual' way of doing it:
Essentially, first, get a list of all the unique keys you are interested.
Then perform a search using each of those keys and delete if that search returns bigger than one.
db.collection.distinct("key").forEach((num)=>{
var i = 0;
db.collection.find({key: num}).forEach((doc)=>{
if (i) db.collection.remove({key: num}, { justOne: true })
i++
})
});
I had a similar requirement but I wanted to retain the latest entry. The following query worked with my collection which had millions of records and duplicates.
/** Create a array to store all duplicate records ids*/
var duplicates = [];
/** Start Aggregation pipeline*/
db.collection.aggregate([
{
$match: { /** Add any filter here. Add index for filter keys*/
filterKey: {
$exists: false
}
}
},
{
$sort: { /** Sort it in such a way that you want to retain first element*/
createdAt: -1
}
},
{
$group: {
_id: {
key1: "$key1", key2:"$key2" /** These are the keys which define the duplicate. Here document with same value for key1 and key2 will be considered duplicate*/
},
dups: {
$push: {
_id: "$_id"
}
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
"$gt": 1
}
}
}
],
{
allowDiskUse: true
}).forEach(function(doc){
doc.dups.shift();
doc.dups.forEach(function(dupId){
duplicates.push(dupId._id);
})
})
/** Delete the duplicates*/
var i,j,temparray,chunk = 100000;
for (i=0,j=duplicates.length; i<j; i+=chunk) {
temparray = duplicates.slice(i,i+chunk);
db.collection.bulkWrite([{deleteMany:{"filter":{"_id":{"$in":temparray}}}}])
}
Expanding on Fernando's answer, I found that it was taking too long, so I modified it.
var x = 0;
db.collection.distinct("field").forEach(fieldValue => {
var i = 0;
db.collection.find({ "field": fieldValue }).forEach(doc => {
if (i) {
db.collection.remove({ _id: doc._id });
}
i++;
x += 1;
if (x % 100 === 0) {
print(x); // Every time we process 100 docs.
}
});
});
The improvement is basically using the document id for removing, which should be faster, and also adding the progress of the operation, you can change the iteration value to your desired amount.
Also, indexing the field before the operation helps.
pip install mongo_remove_duplicate_indexes
create a script in any language
iterate over your collection
create new collection and create new index in this collection with unique set to true ,remember this index has to be same as index u wish to remove duplicates from in ur original collection with same name
for ex-u have a collection gaming,and in this collection u have field genre which contains duplicates,which u wish to remove,so just create new collection
db.createCollection("cname")
create new index
db.cname.createIndex({'genre':1},unique:1)
now when u will insert document with similar genre only first will be accepted,other will be rejected with duplicae key error
now just insert the json format values u received into new collection and handle exception using exception handling
for ex pymongo.errors.DuplicateKeyError
check out the package source code for the mongo_remove_duplicate_indexes for better understanding
If you have enough memory, you can in scala do something like that:
cole.find().groupBy(_.customField).filter(_._2.size>1).map(_._2.tail).flatten.map(_.id)
.foreach(x=>cole.remove({id $eq x})