I've configured my MongoDB 2.0.2 instance (update: also tried this with a v2.2.0 instance) to log all operations to the system.profile collection (i.e., db.setProfilingLevel(2)) and am trying to see exactly what data is being inserted by an application when it calls save() for a new doc.
I can see the 'insert' operations in the system.profile collection, but it doesn't include the data that's being inserted. Why is that?
In contrast, update operations recorded in system.profile have an 'updateobj' property which shows the data.
Here's an example from a 2.2.0 instance. As you can see, the profile log includes an entry for the update with 'updateObj' data. The insert, however, doesn't have any info about what was inserted.
> use test;
switched to db test
> db.getProfilingStatus();
{ "was" : 2, "slowms" : 100 }
> show collections;
cartoons
system.indexes
system.profile
> db.foobar.insert({ "blah": true });
> db.foobar.update({ "blah": true }, { $set: { blerg: 1 } });
> db.system.profile.find({ ns:"test.foobar" });
{
"ts": ISODate("2012-09-25T20:37:40.287Z"),
"op": "insert",
"ns": "test.foobar",
"keyUpdates": 0,
"numYield": 0,
"lockStats": {
"timeLockedMicros": {
"r": NumberLong(0),
"w": NumberLong(2028)
},
"timeAcquiringMicros": {
"r": NumberLong(0),
"w": NumberLong(10)
}
},
"millis": 2,
"client": "127.0.0.1",
"user": ""
}{
"ts": ISODate("2012-09-25T20:38:11.454Z"),
"op": "update",
"ns": "test.foobar",
"query": {
"blah": true
},
"updateobj": {
"$set": {
"blerg": 1
}
},
"nscanned": 1,
"moved": true,
"nmoved": 1,
"nupdated": 1,
"keyUpdates": 0,
"numYield": 0,
"lockStats": {
"timeLockedMicros": {
"r": NumberLong(0),
"w": NumberLong(1797)
},
"timeAcquiringMicros": {
"r": NumberLong(0),
"w": NumberLong(9)
}
},
"millis": 1,
"client": "127.0.0.1",
"user": ""
}
Apologies for misleading you originally, it turns out that this is intentional (my original response was related to this being a bug with logging slow ops). The idea behind not doing this is that you would just double the write load automatically by turning this on, since you are effectively just writing the same information (actually a little more) twice.
Since the idea with profiling is usually to troubleshoot a performance issue, this has not been implemented as the default. However, it has been requested as an option:
https://jira.mongodb.org/browse/SERVER-3848
As you can see, it is not yet scheduled for a version, but votes and comments outlining why this would be useful do help when deciding what gets implemented.
Related
I have a simple sample database which I use to develop a simple micro framework for our application. To verify versioning works as expected, I need to update all documents by adding a new field. I used this thread as guide, so what I'm doing should work.
{
"_id": {
"$oid": "63d95015f94d9a88ecbc1a00"
},
"Version": 1,
"bookTitle": "Fancy Book 0",
"author": "Author 0"
}
{
"_id": {
"$oid": "63d95015f94d9a88ecbc1a01"
},
"Version": 1,
"bookTitle": "Fancy Book 1",
"author": "Author 1"
}
... 8 more
Now when I run this in MongoSH:
db.Books.updateMany({}, {$set: {'NewField': true}})
I get this output:
{
acknowledged: true,
insertedId: null,
matchedCount: 0,
modifiedCount: 0,
upsertedCount: 0
}
So what's the issue here? It does execute and it seemingly is correct, but no single document updates. The official documentation states that {} is a selector for all documents, so why does it not match even one of them?
I will need your help with understanding an performance problem.
We have a system where we are storing set of documents (1k-4k docs) in batches. Documents have this structure: {_id: ObjectId(), RepositoryId: UUID(), data...}
where repository id is same for all instance in the set. We also set an unique indexes for: {_id: 1, RepositoryId: 1}, {RepositoryId: 1, ...}.
In the usecase is: delete all documents with same RepositoryId:
db.collection.deleteMany(
{ RepositoryId: UUID("SomeGUID") },
{ writeConcern: {w: "majority", j: true} }
)
And then re-upsert batches (300 items per batch) with same RepositoryId as we delete before:
db.collection.insertMany(
[ { RepositoryId: UUID(), data... }, ... ],
{
writeConcern: {w: 1, j: false},
ordered: false
}
)
The issue is that upsert of first few (3-5) batches take much more time then reset (first batch: 10s, 8th bach 0.1s). There is also entry in log file:
{
"t": {
"$date": "2023-01-19T15:49:02.258+01:00"
},
"s": "I",
"c": "COMMAND",
"id": 51803,
"ctx": "conn64",
"msg": "Slow query",
"attr": {
"type": "command",
"ns": "####.$cmd",
"command": {
"update": "########",
"ordered": false,
"writeConcern": {
"w": 1,
"fsync": false,
"j": false
},
"txnNumber": 16,
"$db": "#####",
"lsid": {
"id": {
"$uuid": "6ffb319a-6003-4221-9925-710e9e2aa315"
}
},
"$clusterTime": {
"clusterTime": {
"$timestamp": {
"t": 1674139729,
"i": 5
}
},
"numYields": 0,
"reslen": 11550,
"locks": {
"ParallelBatchWriterMode": {
"acquireCount": {
"r": 600
}
},
"ReplicationStateTransition": {
"acquireCount": {
"w": 601
}
},
"Global": {
"acquireCount": {
"w": 600
}
},
"Database": {
"acquireCount": {
"w": 600
}
},
"Collection": {
"acquireCount": {
"w": 600
}
},
"Mutex": {
"acquireCount": {
"r": 600
}
}
},
"flowControl": {
"acquireCount": 300,
"timeAcquiringMicros": 379
},
"readConcern": {
"level": "local",
"provenance": "implicitDefault"
},
"writeConcern": {
"w": 1,
"j": false,
"wtimeout": 0,
"provenance": "clientSupplied"
},
"storage": {
},
"remote": "127.0.0.1:52800",
"protocol": "op_msg",
"durationMillis": 13043
}
}
}
}
Is there some background process that is running after delete that affects upsert pefrormance of first batches? It was not a problem until we switched from standalone to single instance replica set, due to transaction support in another part of app. This case does not require transaction but we can not host two instances of mongo with different setup. The DB is exclusive for this operation, no other operation runs on DB (running in isolated test environment). How we can fix it?
The issue is reproducible, seems when there is time gap in test run (few minutes), the problem is not there for first run but then following runs are problematic.
Runing on machine with Ryzen 7 PRO 4750U, 32 GB Ram and Samsung 970 EVO M2 SSD. MongoDB version 5.0.5
In that log entry timeAcquiringMicros indicates that this operation waited while attempt to acquire a lock.
flowControl is a throttling mechanism that delays writes on the primary node when the secondary nodes are lagging, with the intent of letting them catch up before the get so far behind that consistency is lost.
Waiting on the flowControl lock would suggest that there was a backlog of operations that were still be replicated to the secondaries, and they were a bit behind, so the new writes were being slowed.
See Replication Lag and Flow Control for more detail
I have some write performance struggle with MongoDB 5.0.8 in an PSA (Primary-Secondary-Arbiter) deployment when one data bearing member goes down.
I am aware of the "Mitigate Performance Issues with PSA Replica Set" page and the procedure to temporarily work around this issue.
However, in my opinion, the manual intervention described here should not be necessary during operation. So what can I do to ensure that the system continues to run efficiently even if a node fails? In other words, as in MongoDB 4.x with the option "enableMajorityReadConcern=false".
As I understand the problem has something to do with the defaultRWConcern. When configuring a PSA Replica Set in MongoDB you are forced to set the DefaultRWConcern. Otherwise the following message will appear when rs.addArb is called:
MongoServerError: Reconfig attempted to install a config that would
change the implicit default write concern. Use the setDefaultRWConcern
command to set a cluster-wide write concern and try the reconfig
again.
So I did
db.adminCommand({
"setDefaultRWConcern": 1,
"defaultWriteConcern": {
"w": 1
},
"defaultReadConcern": {
"level": "local"
}
})
I would expect that this configuration causes no lag when reading/writing to a PSA System with only one data bearing node available.
But I observe "slow query" messages in the mongod log like this one:
{
"t": {
"$date": "2022-05-13T10:21:41.297+02:00"
},
"s": "I",
"c": "COMMAND",
"id": 51803,
"ctx": "conn149",
"msg": "Slow query",
"attr": {
"type": "command",
"ns": "<db>.<col>",
"command": {
"insert": "<col>",
"ordered": true,
"txnNumber": 4889253,
"$db": "<db>",
"$clusterTime": {
"clusterTime": {
"$timestamp": {
"t": 1652430100,
"i": 86
}
},
"signature": {
"hash": {
"$binary": {
"base64": "bEs41U6TJk/EDoSQwfzzerjx2E0=",
"subType": "0"
}
},
"keyId": 7096095617276968965
}
},
"lsid": {
"id": {
"$uuid": "25659dc5-a50a-4f9d-a197-73b3c9e6e556"
}
}
},
"ninserted": 1,
"keysInserted": 3,
"numYields": 0,
"reslen": 230,
"locks": {
"ParallelBatchWriterMode": {
"acquireCount": {
"r": 2
}
},
"ReplicationStateTransition": {
"acquireCount": {
"w": 3
}
},
"Global": {
"acquireCount": {
"w": 2
}
},
"Database": {
"acquireCount": {
"w": 2
}
},
"Collection": {
"acquireCount": {
"w": 2
}
},
"Mutex": {
"acquireCount": {
"r": 2
}
}
},
"flowControl": {
"acquireCount": 1,
"acquireWaitCount": 1,
"timeAcquiringMicros": 982988
},
"readConcern": {
"level": "local",
"provenance": "implicitDefault"
},
"writeConcern": {
"w": 1,
"wtimeout": 0,
"provenance": "customDefault"
},
"storage": {},
"remote": "10.10.7.12:34258",
"protocol": "op_msg",
"durationMillis": 983
}
The collection involved here is under proper load with about 1000 reads and 1000 writes per second from different (concurrent) clients.
MongoDB 4.x with "enableMajorityReadConcern=false" performed "normal" here and I have not noticed any loss of performance in my application. MongoDB 5.x doesn't manage that and in my application data is piling up that I can't get written away in a performant way.
So my question is, if I can get the MongoDB 4.x behaviour back. A write guarantee from the single data bearing node which is available in the failure scenario would be OK for me. But in a failure scenario, having to manually reconfigure the faulty node should actually be avoided.
Thanks for any advice!
At the end we changed the setup to a PSS layout.
This was also recommended in the MongoDB Community Forum.
I have the query:
db.changes.find(
{
$or: [
{ _id: ObjectId("60b1e8dc9d0359001bb80441") },
{ _oid: ObjectId("60b1e8dc9d0359001bb80441") },
],
},
{
_id: 1,
}
);
which returns almost instantly.
But the moment I add a sort, the query doesn't return. The query just runs. The longest I could tolerate the query running was over 30 Min, so I'm not entirely sure if it does eventually return.
db.changes
.find(
{
$or: [
{ _id: ObjectId("60b1e8dc9d0359001bb80441") },
{ _oid: ObjectId("60b1e8dc9d0359001bb80441") },
],
},
{
_id: 1,
}
)
.sort({ _id: -1 });
I have the following indexes:
[
{
"_oid" : 1
},
{
"_id" : 1
}
]
and this is what db.currentOp() returns:
{
"host": "xxxx:27017",
"desc": "conn387",
"connectionId": 387,
"client": "xxxx:55802",
"appName": "MongoDB Shell",
"clientMetadata": {
"application": {
"name": "MongoDB Shell"
},
"driver": {
"name": "MongoDB Internal Client",
"version": "4.0.5-18-g7e327a9017"
},
"os": {
"type": "Linux",
"name": "Ubuntu",
"architecture": "x86_64",
"version": "20.04"
}
},
"active": true,
"currentOpTime": "2021-09-24T15:26:54.286+0200",
"opid": 71111,
"secs_running": NumberLong(23),
"microsecs_running": NumberLong(23860504),
"op": "query",
"ns": "myDB.changes",
"command": {
"find": "changes",
"filter": {
"$or": [
{
"_id": ObjectId("60b1e8dc9d0359001bb80441")
},
{
"_oid": ObjectId("60b1e8dc9d0359001bb80441")
}
]
},
"sort": {
"_id": -1.0
},
"projection": {
"_id": 1.0
},
"lsid": {
"id": UUID("38c4c09b-d740-4e44-a5a5-b17e0e04f776")
},
"$readPreference": {
"mode": "secondaryPreferred"
},
"$db": "myDB"
},
"numYields": 1346,
"locks": {
"Global": "r",
"Database": "r",
"Collection": "r"
},
"waitingForLock": false,
"lockStats": {
"Global": {
"acquireCount": {
"r": NumberLong(2694)
}
},
"Database": {
"acquireCount": {
"r": NumberLong(1347)
}
},
"Collection": {
"acquireCount": {
"r": NumberLong(1347)
}
}
}
}
This wasn't always a problem, it's only recently started. I've also rebuilt the indexes, and nothing seems to work. I've tried using .explain(), and that also doesn't return.
Any suggestions would be welcome. For my situation, it's going to be much easier to make changes to the DB than it is to change the query.
This is happening due to the way Mongo chooses what's called a "winning plan", I recommend you read more on this in my other answer which explains this behavior. However it is interesting to see if the Mongo team will consider this specific behavior a feature or a bug.
Basically the $or operator has some special qualities, as specified:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
It seems that the addition of the sort is disrupting the usage this quality, meaning you're running a collection scan all of a sudden.
What I recommend you do is use the aggregation pipeline instead of the query language, I personally find it has more stable behavior and it might work there. If not maybe just do the sorting in code ..
The server can use a separate index for each branch of the $or, but in order to avoid doing an in-memory sort the indexes used would have to find the documents in the sort order so a merge-sort can be used instead.
For this query, an index on {_id:1} would find documents matching the first branch, and return them in the proper order. For the second branch, and index on {oid:1, _id:1} would do the same.
If you have both of those indexes, the server should be able to find the matching documents quickly, and return them without needing to perform an explicit sort.
I got a big array with data in the following format:
{
"application": "myapp",
"buildSystem": {
"counter": 2361.1,
"hostname": "host.com",
"jobName": "job_name",
"label": "2361",
"systemType": "sys"
},
"creationTime": 1517420374748,
"id": "123",
"stack": "OTHER",
"testStatus": "PASSED",
"testSuites": [
{
"errors": 0,
"failures": 0,
"hostname": "some_host",
"properties": [
{
"name": "some_name",
"value": "UnicodeLittle"
},
<MANY MORE PROPERTIES>,
{
"name": "sun",
"value": ""
}
],
"skipped": 0,
"systemError": "",
"systemOut": "",
"testCases": [
{
"classname": "IdTest",
"name": "has correct representation",
"status": "PASSED",
"time": "0.001"
},
<MANY MORE TEST CASES>,
{
"classname": "IdTest",
"name": "normalized values",
"status": "PASSED",
"time": "0.001"
}
],
"tests": 8,
"time": 0.005,
"timestamp": "2018-01-31T17:35:15",
"title": "IdTest"
}
<MANY MORE TEST SUITES >,
]}
Where I can distinct three main structures with big data: TestSuites, Properties, and TestCases. My task is to sum all times from each TestSuite so that I can get the total duration of the test. Since the properties and TestCases are huge, the query cannot complete. I would like to select only the "time" value from TestSuites, but it kind of conflicts with the "time" of TestCases in my query:
db.my_tests.find(
{
application: application,
creationTime:{
$gte: start_date.valueOf(),
$lte: end_date.valueOf()
}
},
{
application: 1,
creationTime: 1,
buildSystem: 1,
"testSuites.time": 1,
_id:1
}
)
Is it possible to project only the "time" properties from TestSuites without loading the whole schema? I already tried testSuites: 1, testSuites.$.time: 1 without success. Please notice that TestSuites is an array of one element with a dictionary.
I already checked this similar post without success:
Mongodb update the specific element from subarray
Following code prints duration of each TestSuite:
query = db.my_collection.aggregate(
[
{$match: {
application: application,
creationTime:{
$gte: start_date.valueOf(),
$lte: end_date.valueOf()
}
}
},
{ $project :
{ duration: { $sum: "$testSuites.time"}}
}
]
).forEach(function(doc)
{
print(doc._id)
print(doc.duration)
}
)
Is it possible to project only the "time" properties from TestSuites
without loading the whole schema? I already tried testSuites: 1,
testSuites.$.time
Answering to your problem of prejecting only the time property of the testSuites document you can simply try projecting it with "testSuites.time" : 1 (you need to add the quotes for the dot notation property references).
My task is to sum all times from each TestSuite so that I can get the
total duration of the test. Since the properties and TestCases are
huge, the query cannot complete
As for your task, i suggest you try out the mongodb's aggregation framework for your calculations documents tranformations. The aggregations framework option {allowDiskUse : true} will also help you if you are proccessing "large" documents.