I have a need to build five indexes in a large MongoDB collection. I'm familiar with the ensureIndex operation, but I do not know of a way to create all five of the indexes with a single command. Is this batch index creation possible in MongoDB?
This is pretty simple within the shell, there is a extention to the collection of createIndexes and you just pass in the keys you wish to create indexes on.
db.test.createIndexes([
{ "a" : 1 },
{ "b" : 1 },
{ "c" : 1 },
{ "d" : 1 },
{ "e" : 1 }
]);
This will then give us the following
> db.test.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.test"
},
{
"v" : 2,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "test.test"
},
{
"v" : 2,
"key" : {
"b" : 1
},
"name" : "b_1",
"ns" : "test.test"
},
{
"v" : 2,
"key" : {
"c" : 1
},
"name" : "c_1",
"ns" : "test.test"
},
{
"v" : 2,
"key" : {
"d" : 1
},
"name" : "d_1",
"ns" : "test.test"
},
{
"v" : 2,
"key" : {
"e" : 1
},
"name" : "e_1",
"ns" : "test.test"
}
]
>
You are wrong, Mongo has createIndexes command since 2.6 (released before 2014)
https://docs.mongodb.com/v3.0/reference/command/createIndexes/
Documentation says, that it requires one pass through collection, so it should be approximately 5 times faster.
As for now there is no solution for that.
Background will prevent the database from locking and will allow to perform other operations. But to run those operations you will have to open new mongo shell or run them asynchronously in the language of your choice (like js).
But if you need a strong consistency and don't need background index building... you probably will have to wait for a MongoDB native solution to come.
I think this is not possible using single command, but you can create your own script which'll perform same. If your collection size is large then I suggest you to build indexes separately with background true to reduce chances of any problem of locking.
db.collection.ensureIndex( { a: 1 }, { background: true } )
Related
I'm using mongodb version 3.6.5. I would like to do a query on a collection, and then sorting it based on date. I work on a (what I think) is a pretty large dataset, currently 195064301 data in this collection, and it's growing.
Doing the filter or the sort in separated query work perfectly
db.getCollection('logs').find({session: ObjectId("5af3baa173faa851f8b0090c")})
db.getCollection('logs').find({}).sort({date: 1})
The result is returned is less than 1 sec, but if I try to do it in a single query like so
db.getCollection('logs').find({session: ObjectId("5af3baa173faa851f8b0090c")}).sort({date: 1})
Now it take about 5minutes to return the data. I was thinking it was a index problem, but as far as I can tell, the index seems fine
> db.logs.getIndexes();
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "client1.logs"
},
{
"v" : 2,
"key" : {
"session" : 1
},
"name" : "session_1",
"ns" : "client1.logs",
"background" : true
},
{
"v" : 2,
"key" : {
"date" : 1
},
"name" : "date_1",
"ns" : "client1.logs",
"background" : true
},
{
"v" : 2,
"key" : {
"user" : 1
},
"name" : "user_1",
"ns" : "client1.logs",
"background" : true
}
]
I'm still new to mongo, I try thoses request directly in the console, I also tried to use the reIndex() method, but nothing really help.
So I'm hoping there is a solution on this.
Thanks.
I have a MongoDB development cluster where i create indexes over time as a part of development improvements. On Testing/Production MongoDB cluster also, i want to maintain the same indexes.
So how do i get all indexes of existing collections and create the same collection indexes on new database?
From mongo shell, switch to database from where you want to collect indexes
Step 1: Switch to existing DB and Run below script
> use my_existing_db
Below script loops through all the collections and constructs a run command for each collection.
var database = ‘my_new_db' // SHOULD ALWAYS MATCH DESTINATION DB NAME
db.getCollectionNames().forEach(function(collection){
var command = {}
var indexes = []
idxs = db.getCollection(collection).getIndexes()
if(idxs.length>1){
idxs.forEach(function(idoc){
if(idoc.name!='_id_'){
var ns = database+"."+idoc.ns.substr(idoc.ns.indexOf('.') + 1 )
idoc.ns = ns
indexes.push(idoc)
}
})
command['createIndexes'] = collection
command['indexes'] = indexes
print('db.runCommand(')
printjson(command)
print(')')
}
})
Script outputs runCommand's for each collection
Step 2: Switch to new db and execute runCommands. Done, Cheers!
> use my_new_db
runCommands will be something like this. You can run all the commands in one shot.
db.runCommand(
{
"createIndexes" : "foo",
"indexes" : [
{
"v" : 2,
"key" : {
"xy_point" : "2d"
},
"name" : "xy_point_2d",
"ns" : "my_new_db.foo",
"min" : -99999,
"max" : 99999
},
{
"v" : 2,
"key" : {
"last_seen" : 1
},
"name" : "last_seen_1",
"ns" : "my_new_db.foo",
"expireAfterSeconds" : 86400
},
{
"v" : 2,
"key" : {
"point" : "2dsphere"
},
"name" : "point_2dsphere",
"ns" : "my_new_db.foo",
"background" : false,
"2dsphereIndexVersion" : 3
}
]
}
)
db.runCommand(
{
"createIndexes" : "bar",
"indexes" : [
{
"v" : 2,
"unique" : true,
"key" : {
"date" : 1,
"name" : 1,
"age" : 1,
"gender" : 1
},
"name" : "date_1_name_1_age_1_gender_1",
"ns" : "my_new_db.bar"
}
]
}
)
I'm trying the TTL feature in a mongo shell but I cannot seem to make it work. I have triple check everything using the documentation.
Here is what I did:
MongoDB shell version: 2.6.7
connecting to: test
> db.streamers.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.streamers"
},
{
"v" : 1,
"key" : {
"room" : 1
},
"name" : "room_1",
"ns" : "test.streamers",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"lastAlive" : 1
},
"name" : "lastAlive_1",
"ns" : "test.streamers",
"expiresAfterSeconds" : 60,
"background" : true,
"safe" : null
}
]
> db.streamers.insert({ _id: "hello", room: "yop", lastAlive: new Date() })
WriteResult({ "nInserted" : 1 })
[waiting for a while here...]
> db.streamers.find({ _id: "hello" })
{ "_id" : "hello", "room" : "yop", "lastAlive" : ISODate("2015-02-18T13:03:02.836Z") }
> new Date()
ISODate("2015-02-18T13:50:50.403Z")
So clearly, the document is not being removed even after waiting for more than an hour. db.currentOp() returns an empty array as well.
This is a dev environment so mongodb is in a standalone configuration with default settings. A getParameter on ttlMonitorEnabled returns true so that's fine too.
What is wrong here ?
There was a typo in your index creation.
The setting expiresAfterSeconds should be expireAfterSeconds.
I'm using a distict query with filter defined and query is quire slow on database with 73K items. The query looks like this:
db.runCommand({ "distinct": "companies", "key": "approver",
"query": { "doc_readers": { "$in": [ "ROLE_USER", "ROLE_MODUL_CUST", "ROLE_MODUL_PROJECTS" ] } } })
Query stats is here:
"stats" : {
"n" : 73394,
"nscanned" : 146788,
"nscannedObjects" : 73394,
"timems" : 292,
"cursor" : "BtreeCursor doc_readers_1"
},
It shows that it checks every item to get distinct list of approvers here . Is there a way how to create a better index to speed up things? I have a 3 similar queries on one web page so together they take 1 sec to get data.
Update 1: I have the following indexes
Only the first one is beeing used as stats shows ...
{
"v" : 1,
"key" : {
"doc_readers" : 1
},
"name" : "doc_readers_1",
"ns" : "netnotes.companies",
"sparse" : false
},
{
"v" : 1,
"key" : {
"doc_readers" : 1,
"approver" : 1
},
"name" : "doc_readers_1_schvalovatel_1",
"ns" : "netnotes.companies"
},
{
"v" : 1,
"key" : {
"approver" : 1,
"doc_readers" : 1
},
"name" : "schvalovatel_1_doc_readers_1",
"ns" : "netnotes.companies"
},
Honestly, I don't understand how this is even possible:
> db.ts.find({"bcoded_metadata" : { "$exists" : true} } ).count()
199049
> db.ts.find({"bcoded_metadata" : { "$exists" : false} } ).count()
0
> db.ts.count()
2507873
I think sum of the first and second queries must be equal to the third.
I need to select from the collection all elements for which "bcoded_metadata" does not exist but the query returns nothing.
When I iterate this collection in the simple python script and manually check whether "bcoded_metadata" exists everything works as expected.
from pymongo import Connection
connection = Connection('127.0.0.1', 27017)
db = connection.data
c = 0
for item in db.ts.find():
if not "bcoded_metadata" in item.keys():
c+= 1
print c
python test.py
2308824
This is correct answer.
Where is a source of problem?
indexes:
> db.ts.getIndexes();
[
{
"name" : "_id_",
"ns" : "data.ts",
"key" : {
"_id" : 1
},
"v" : 0
},
{
"_id" : ObjectId("4f3c299b4c4a5ccfddbe4069"),
"ns" : "data.ts",
"key" : {
"last_seen" : 1
},
"name" : "last_seen_1",
"v" : 0
},
{
"_id" : ObjectId("4f3c2cef4c4a5ccfddbe406a"),
"ns" : "data.ts",
"key" : {
"attempts" : -1
},
"name" : "attempts_-1",
"v" : 0
},
{
"_id" : ObjectId("4f4279ed6aca13be31acbe6d"),
"ns" : "data.ts",
"key" : {
"bcoded_metadata" : 1
},
"name" : "bcoded_metadata_1",
"sparse" : true,
"v" : 0
}
]
This is because you use a sparse index for bcoded_metadata. If you have a sparse index on bcoded_metadata, then the index will not contain the documents that don't have the field bcoded_metadata. The documents without the bcoded_metadata field are not part of your original query, and hence "count" will return 0.
If you run just the find: db.ts.find({"bcoded_metadata" : { "$exists" : false } }) then you won't get any results either. You can either use a non-sparse index, or do a full count with db.ts.count(); and subtract the result of db.ts.find({"bcoded_metadata" : { "$exists" : true } }) result.
There is a JIRA ticket that explains it a bit more, and can be tracked in case MongoDB acquires an error/warning message for this: https://jira.mongodb.org/browse/SERVER-3918