I have a single standalone mongo installation on a Linux machine.
The database contains a collection with 181 million documents. This collection is by far the largest collection in the database (approx 90%)
The size of the collection is currently 3.5 TB.
I'm running Mongo version 4.0.10 (Wired Tiger)
The collection have 2 indexes.
One on id
One on 2 fields and it is used when deleting documents (see those in the snippet below).
When benchmarking bulk deletion on this collection we used the following snippet
db.getCollection('Image').deleteMany(
{$and: [
{"CameraId" : 1},
{"SequenceNumber" : { $lt: 153000000 }}]})
To see the state of the deletion operation I ran a simple test of deleting 1000 documents while looking at the operation using currentOp(). It shows the following.
"command" : {
"q" : {
"$and" : [
{
"CameraId" : 1.0
},
{
"SequenceNumber" : {
"$lt" : 153040000.0
}
}
]
},
"limit" : 0
},
"planSummary" : "IXSCAN { CameraId: 1, SequenceNumber: 1 }",
"numYields" : 876,
"locks" : {
"Global" : "w",
"Database" : "w",
"Collection" : "w"
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(877),
"w" : NumberLong(877)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(877)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(877)
}
}
}
It seems to be using the correct index but the number and type of locks worries me. As I interpret this it aquires 1 global lock for each deleted document from a single collection.
When using this approach it has taken over a week to delete 40 million documents. This cannot be expected performance.
I realise there other design exists such as bulking documents into larger chunks and store them using GridFs, but the current design is what it is and I want to make sure that what I see is expected before changing my design or restructuring the data or even considering clustering etc.
Any suggestions of how to increase performance on bulk deletions or is this expected?
I am trying to use the transporter plugin to create a pipeline to sync a MongoDB database and ElasticSearch. I am using a Linux virtual machine (ubuntu) for this.
I have created a MongoDB collection my_application with the following data in it:
db.users.find().pretty();
{
"_id" : ObjectId("6008153cf979ac0f18681765"),
"firstName" : "Sammy",
"lastName" : "Shark"
}
{
"_id" : ObjectId("60081544f979ac0f18681766"),
"firstName" : "Gilly",
"lastName" : "Glowfish"
}
I configured ElasticSearch and the transporter pipeline and now exported MongoDB_URI and Elastic_URI.
I then ran my transporter pipeline.js to obtain this:
INFO[0005] metrics source records: 2 path=source ts=1611154492641006368
INFO[0005] metrics source/sink records: 2 path="source/sink" ts=1611154492641013556
I then try to view my ElasticSearch but get this error:
curl $ELASTICSEARCH_URI/_search?pretty=true
{
"error" : {
"root_cause" : [
{
"type" : "cluster_block_exception",
"reason" : "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"
}
],
"type" : "cluster_block_exception",
"reason" : "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"
},
"status" : 503
}
Here is my elasticsearch.yml:
# Use a descriptive name for the node:
node.name: node-1
path.data: /var/lib/elasticsearch
# Path to log files:
path.logs: /var/log/elasticsearch
# Set the bind address to a specific IP (IPv4 or IPv6):
network.host: 0.0.0.0
# Set a custom port for HTTP:
http.port: 9200
# Bootstrap the cluster using an initial set of master-eligible nodes:
cluster.initial_master_nodes: ["node-1", "node-2"]
Here is my elasticsearch node:
{
"name" : "node-1",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "_na_",
"version" : {
"number" : "7.7.1",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "ad56dce891c901a492bb1ee393f12dfff473a423",
"build_date" : "2020-05-28T16:30:01.040088Z",
"build_snapshot" : false,
"lucene_version" : "8.5.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
I have tried deleting indices and restarting the server but the error repeats. Would like to know the solution to this. I am using elasticsearch 7.10
I have a JAVA app which writes to a replicaset. Am using 3.0.7 version of MongoDB server. The mongo-driver for Java is 3.0.4.
It was working just fine but now am getting the following error on ALL writes:
com.mongodb.MongoWriteException: quota exceeded
at com.mongodb.MongoCollectionImpl.executeSingleWriteRequest(MongoCollectionImpl.java:487)
at com.mongodb.MongoCollectionImpl.update(MongoCollectionImpl.java:474)
at com.mongodb.MongoCollectionImpl.updateOne(MongoCollectionImpl.java:325)
Have looked at the MongoDB config documentation and I have not set any quota limits in mongod.conf. Am not using smallFiles either.
But I think am running into fileSize limits. The file sizes for my DB are as follows:
total 6240548
-rw------- 1 mongod mongod 67108864 Dec 8 16:55 2015.0
-rw------- 1 mongod mongod 134217728 Dec 8 12:15 2015.1
-rw------- 1 mongod mongod 268435456 Dec 8 12:15 2015.2
-rw------- 1 mongod mongod 536870912 Dec 8 12:15 2015.3
-rw------- 1 mongod mongod 1073741824 Dec 8 12:15 2015.4
-rw------- 1 mongod mongod 2146435072 Dec 8 16:06 2015.5
-rw------- 1 mongod mongod 2146435072 Dec 8 16:06 2015.6
-rw------- 1 mongod mongod 16777216 Dec 8 16:55 2015.ns
drwxr-xr-x 2 mongod mongod 4096 Dec 7 09:14 _tmp
The /etc/mongod.conf is as follows:
storage:
dbPath: /data/mongoDB
indexBuildRetry: true
repairPath: /data/mongoDB/repair
journal:
enabled: true
directoryPerDB: true
syncPeriodSecs: 60
engine: mmapv1
mmapv1:
preallocDataFiles: false
nsSize: 16
quota:
enforced: false
maxFilesPerDB: 8
smallFiles: false
journal:
debugFlags: 0
commitIntervalMs: 100
What can be going wrong?
PS: /etc/mongod.conf is being used.
mongod 10864 1 0 Nov16 ? 03:10:34 /usr/bin/mongod -f /etc/mongod.conf
Update- 1 .Tried the same update by changing collection name to a new collection. THat worked! but doesnt explain the issue yet. 2. Changed java driver to 3.0.3, that didnt help.
Update- 2:(12/9) Adding Collection Stats, since it is something to do with the collection itself and the java driver. Let me know if something seems awry pls.
{
"ns" : "2015.events",
"count" : 827054,
"size" : 3814018,
"avgObjSize" : 4722,
"numExtents" : 22,
"extents" : [
{
"len" : 8192,
"loc: " : {
"file" : 0,
"offset" : 20480
}
},
{
"len" : 32768,
"loc: " : {
"file" : 0,
"offset" : 2134016
}
},
{
"len" : 131072,
"loc: " : {
"file" : 0,
"offset" : 2166784
}
},
{
"len" : 524288,
"loc: " : {
"file" : 0,
"offset" : 2297856
}
},
{
"len" : 2097152,
"loc: " : {
"file" : 0,
"offset" : 2822144
}
},
{
"len" : 8388608,
"loc: " : {
"file" : 0,
"offset" : 4919296
}
},
{
"len" : 11325440,
"loc: " : {
"file" : 0,
"offset" : 14356480
}
},
{
"len" : 15290368,
"loc: " : {
"file" : 0,
"offset" : 28827648
}
},
{
"len" : 20643840,
"loc: " : {
"file" : 0,
"offset" : 44118016
}
},
{
"len" : 27869184,
"loc: " : {
"file" : 1,
"offset" : 8192
}
},
{
"len" : 37625856,
"loc: " : {
"file" : 1,
"offset" : 36265984
}
},
{
"len" : 50798592,
"loc: " : {
"file" : 1,
"offset" : 78086144
}
},
{
"len" : 68579328,
"loc: " : {
"file" : 2,
"offset" : 8396800
}
},
{
"len" : 92585984,
"loc: " : {
"file" : 2,
"offset" : 93753344
}
},
{
"len" : 124993536,
"loc: " : {
"file" : 3,
"offset" : 8192
}
},
{
"len" : 168742912,
"loc: " : {
"file" : 3,
"offset" : 125001728
}
},
{
"len" : 227803136,
"loc: " : {
"file" : 4,
"offset" : 8192
}
},
{
"len" : 307535872,
"loc: " : {
"file" : 4,
"offset" : 239136768
}
},
{
"len" : 415174656,
"loc: " : {
"file" : 4,
"offset" : 595939328
}
},
{
"len" : 560488448,
"loc: " : {
"file" : 5,
"offset" : 8192
}
},
{
"len" : 756662272,
"loc: " : {
"file" : 5,
"offset" : 607756288
}
},
{
"len" : 1021497344,
"loc: " : {
"file" : 6,
"offset" : 8192
}
}
],
"storageSize" : 3826952,
"lastExtentSize" : 997556,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 8,
"indexDetails" : {
},
"totalIndexSize" : 348270,
"indexSizes" : {
"_id_" : 47123,
"remoteRequest.uri_1" : 76969,
"transactionId_1" : 59611,
"startTime_1" : 39027,
"endTime_1" : 35235,
"remoteRequest.queryParams.q_1" : 20328,
"remoteRequest.queryParams.fq_1" : 38285,
"elapsedTimeInNanos_1" : 31689
},
"ok" : 1
}
Your mongod.conf has a quota enabled for each database. Based on that mongod.conf file, you will be unable to create more than 8 database files, which limits you to a max of about 6.4 GB of storage. You mention that you are able to get around this issue by using a new collection, so I am interested in what your data directory looks like now. I would not expect you to be able to bypass this hard limit, however due to internal data structures, it may be possible to "bypass" it for a short time.
You can verify how much actual data is being stored by running the dbStats command
use 2015
db.stats(1024*1024)
This output will tell you how much data you actually have in the db, vs the amount of allocated storage. These numbers will not match, this is expected as documents include empty space for padding.
My next question would be, is there are reason you are artificially limiting the amount of storage space your mongod can allocate? Perhaps a capped collection would better suit your needs? If you could expand on your use, I can perhaps give you a better answer.
I use MongoDB for an internal ADMIN type of application used by my team.
Mongo is installed on 1 box and no replica sets.
ADMIN application inserts 70K to 100K documents/per day and we maintain 4 months of data. DB has ~100 million documents at any given time.
When the application was deployed, it all started fine for few days. As the data kept accumulated to reach the 4 months max limit, I see severe performance issues with MongoDB.
I installed MongoDB 3.0.4 as-is on a Linux box and did not fine tune any optimization settings.
Are there any optimization settings I need to adjust?
ADMIN application has schedulers which runs every 1/2 hr to insert and purge outdated data. Given below collection with indexes defined on createdDate,env,messageId,sourceSystem, I see few queries were taking 30 min to respond.
Sample query: Count of documents with a given env,sourceSystem, but between a given range of dates. ADMIN app uses grails and the above query is created using GORM. It used to work fine in the beginning. But over the period of time, performance degraded. I tried restarting the application as well. It didn't help. I believe using the MongoDB as-is (like a Dev Mode) might be causing performance issue. Any suggestions on what to tweak in settings (perhaps cpu/mem limits etc)?
{
"_id" : ObjectId("5575e388e4b001976b5e570f"),
"createdDate" : ISODate("2015-06-07T05:00:34.040Z"),
"env" : "prod",
"messageId" : "f684b34d-a480-42a0-a7b8-69d6d18f39e5",
"payload" : "JSON or XML DATA",
"sourceSystem" : "sourceModule"
}
Update:
Indices:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "admin.Message"
},
{
"v" : 1,
"key" : {
"messageId" : 1
},
"name" : "messageId_1",
"ns" : "admin.Message"
},
{
"v" : 1,
"key" : {
"createdDate" : 1
},
"name" : "createdDate_1",
"ns" : "admin.Message"
},
{
"v" : 1,
"key" : {
"sourceSystem" : 1
},
"name" : "sourceSystem_1",
"ns" : "admin.Message"
},
{
"v" : 1,
"key" : {
"env" : 1
},
"name" : "env_1",
"ns" : "admin.Message"
}
]
Every 6-12 hours, my MongoDB CPU is pegged (100% CPU usage).
I've enabled profiling. The last time returned this:
PRIMARY> db.system.profile.find().sort({$natural:-1});
{ "ts" : ISODate("2012-11-08T05:31:09.042Z"), "client" : "10.188.14.195", "user" : "", "err" : "profile line too large (max is 100KB)" }
Not very helpful, unfortunately.
I tried doing a db.currentOp(); while it was pegged and got this:
{
"opid" : 18256845,
"active" : true,
"lockType" : "write",
"waitingForLock" : false,
"secs_running" : 803653,
"op" : "none",
"ns" : "streamified.credentials",
"query" : {
},
"client" : "",
"desc" : "rsSync",
"threadId" : "0x7f3b865f7700",
"numYields" : 1
},
Indicating that the query had been alive for over 800,000 seconds (FAR before the CPU was pegged). This query remained even after the CPU returned to normal, as well.
What is the best way to determine exactly which query (or, at the very least, which collection) is causing the CPU to become pegged?