I'm trying to test the performance of MongoDB before actually putting it to use. I'm trying to see how many documents can I update per second. I'm using C# (Mono + Ubuntu) MongoDB Driver v1.9 and MongoDB v2.4.6.
I believe one of the most effective MongoDB parameters on write performance is Write Concern. As it is stated in documentation, the most relaxed value for write concern would be -1, then 0 and finally 1 is the slowest one.
After searching I found that I can set write concerns in C# embedded in connection string like this:
var client = new MongoClient("mongodb://localhost/?w=-1");
Here are the results of me playing around with different values for w:
Fastest results are achieved when I set w to 1!
Setting w=0 is slower than w=1 28 times!
w=-1 will lead to an exception thrown with error message W value must be greater than or equal to zero!
Does anyone have any explanations on these results? Am I doing anything wrong?
[UPDATE]
I think it is necessary to set the test procedure maybe there's something hidden within it. So here it goes:
I've got a database with 100M documents in a single collection. Each document is created like this using mongo shell:
{ "counter" : 0, "last_update" : new Date() }
Here's the output of db.collection.stats();:
{
"ns" : "test.collection",
"count" : 100000100,
"size" : 6400006560,
"avgObjSize" : 64.0000015999984,
"storageSize" : 8683839472,
"numExtents" : 27,
"nindexes" : 2,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 5769582448,
"indexSizes" : {
"_id_" : 3251652432,
"last_update_1" : 2517930016
},
"ok" : 1
}
Using Mono 3.2.1 in Ubuntu 12.04 I've written a C# project which connects to MongoDB and tries to update the documents like this:
FindAndModifyArgs args = new FindAndModifyArgs();
args.SortBy = SortBy.Ascending("last_update");
args.Update = Update<Entity>.Set(e => e.last_update, DateTime.Now);
args.Fields = Fields.Include(new string[] { "counter", "_id" });
var m = collection.FindAndModify(args);
Entity ent = m.GetModifiedDocumentAs<Entity>();
var query = Query<Entity>.EQ(e => e.Id, ent.Id);
var update = Update<Entity>.Set(e => e.counter, ent.counter+1);
collection.Update(query, update);
To summarize what this piece of code does; it selects the oldest last_update and while it sets the last_update to current date, it also increments its counter (update happens in two steps).
I ran this code 10k for each of four different types of Write Concerns, w=-1, w=0, w=1 and w=1&j=true. While w=-1 throws an exception and gives out no results, here are the results for the rest of them:
Since the figure is a little hard to read, here're the same results in numbers:
w=-1 w=0 w=1 w=1&j=true
Average N/A 244.0948611492 7064.5143923477 238.1846428156
STDEV N/A 1.7787457992 511.892765742 21.0230097306
And the question is: Does anyone have any explanations why w=0 is much slower than w=1 and why w=-1 is not supported?
[UPDATE]
I've also tested RequestStart in my code like this:
using (server.RequestStart(database)) {
FindAndModifyArgs args = new FindAndModifyArgs();
args.SortBy = SortBy.Ascending("last_update");
args.Update = Update<Entity>.Set(e => e.last_update, DateTime.Now);
args.Fields = Fields.Include(new string[] { "counter", "_id" });
var m = collection.FindAndModify(args);
Entity ent = m.GetModifiedDocumentAs<Entity>();
var query = Query<Entity>.EQ(e => e.Id, ent.Id);
var update = Update<Entity>.Set(e => e.counter, ent.counter+1);
collection.Update(query, update);
}
It had no significant effect on any of the results, what so ever.
To address the w=-1 error, you may want to use the WriteConcern.Unacknowledged property directly.
Your test code might be running into consistency problems. Look into RequestStart/RequestDone to put the query onto the same connection. Per the documentation:
An example of when this might be necessary is when a series of Inserts are called in rapid succession with a WriteConcern of w=0, and you want to query that data is in a consistent manner immediately thereafter (with a WriteConcern of w=0, the writes can queue up at the server and might not be immediately visible to other connections
Aphyr did a good blog post on how well various settings of MongoDB's write concern effect its approximation of consistency. It may help you troubleshoot this issue.
Finally, you can find your driver version in build.ps1
Related
This question is updated with a simpler explanation.
For my data the following Mongo CLI query takes 208ms. This query retrieves ALL the data for the 18 requested objects.
db.videos.find({avg_rating: {$gt: 1}, poster_large_thumb: {$exists:true}, release_date: {$lte: ISODate("2000-12-31")}}).sort({release_date:-1, avg_rating:-1, title:1}).skip(30).limit(18).pretty().explain()
{
"cursor" : "BasicCursor",
"nscanned" : 76112,
"nscannedObjects" : 76112,
"n" : 48,
"scanAndOrder" : true,
"millis" : 208,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
However with Mongoid when I do the query it creates a criteria object that has no actual data. I pass this criteria to a method that iterates it to create a JSON data structure. Since the data is needed each object must be retrieved from the DB. Here is the criteria returning query:
#videos = Video.order_by(release_date: -1, avg_rating: -1, title: 1).where(:avg_rating.gt => 1, :poster_large_thumb.exists => 1, :release_date.lte => start_date).skip(skip*POSTERS_PER_ROW).limit(limit*POSTERS_PER_ROW)
When I iterate #videos, each object takes over 240ms to retrieve from the DB, this from some debug output.
Getting one object:
2013-12-18 00:43:52 UTC
2013-12-18 00:43:52 UTC
0.24489331245422363
Assuming that if I got all the data in the Video.order_by(...) query it would take 208ms total how can I force it to do retrieval in one query instead of getting each objects individually?
Something here is causing the entire retrieval to take a couple of orders of magnitude more than the Mongo CLI.
Responses:
skip().limit() queries tend to get slower and slower on MongoDB side. As skip walks through the docs, more info see here https://stackoverflow.com/a/7228190/534150
The multiple identical queries look like to me a N+1 type of issue. That means that probably, somewhere in your view, you have a loop that calls a property that is lazy loaded, so it sends the query over and over again. Those problems are usually tricky to find, but to track them you need to have the end-to-end trace, which you are probably the only one that can do that, as you have access to the source code.
the Mongoid side looks correct to me.
Thanks for the ideas, I think #Arthur gave the important hint. No one could have answered because it looks like the problem was in another bit of code--in how I access the criteria.
Given the following query, which produces a criteria:
#videos = Video.order_by(release_date: 1, avg_rating: 1, title: -1).where(:release_date.ne => 0,:avg_rating.gt => 1, :poster_large_thumb.exists => 1, :release_date.gt => start_date).skip(skip*POSTERS_PER_ROW).limit(limit*POSTERS_PER_ROW).only(:_id, :poster_large_thumb, :title)
In a couple nested block I'm grabbing values with a line of code like this:
video_full = #videos[((row_index)*POSTERS_PER_ROW)+column_index]
This random access notation seems to be the problem. It seems to execute the full Moped query for every individual object so POSTERS_PER_ROW*num_rows times.
If I grab all the videos before the loops with this bit of code:
#videos.each do |video|
videos_full.push video
end
Then grab the values from an array instead of the criteria like this:
video_full = videos_full[((row_index)*POSTERS_PER_ROW)+column_index]
I get only one Moped query of 248ms, all objects are retrieved with that one query. This is a huge speed up. the query time goes from num_rows* POSTERS_PER_ROW*248ms to just 248ms
This is a big lesson learned for me so if someone can point to the docs that describe this effect and the rules to follow I'd sure appreciate it.
I have a collection in MongoDB which makes an increase, the field was initially defined as an Integer, but I find that after the increase was converted to double.
But then I make an update of the document and see that changes to Long.
Is there any way to block these changes in Mongo?
Thanks in advance
Since MongoDB doesn't have a fixed schema per collection, there's no way to prevent such changes on the database side. Make sure that you use the same data type for the field everywhere, including its update operations. The C# driver is pretty smart about this.
Be careful when working with the shell, it can be irritating. Per default, the mongo shell will treat every number as a double, e.g.:
> db.Inc.find().pretty();
{ "_id" : 1, "Number" : 1000023272226647000 }
// this number is waaayyy larger than the largest 32 bit int, but there's no
// NumberLong here. So it must be double.
> db.Inc.update({}, {$inc: {"Number" : 1 }});
> db.Inc.find().pretty();
{ "_id" : 1, "Number" : 1000023272226647000 }
// Yikes, the $inc doesn't work anymore because of precision loss
Let's use NumberLong:
> db.Inc.insert({"Number" : NumberLong("1000023272226647000")});
> db.Inc.update({}, {$inc: {"Number" : 1}});
> db.Inc.find();
{ "Number" : 1000023272226647000, "_id" : 1 }
// Yikes! type conversion changed to double again! Also note
// that the _id field moved to the end
Let's use NumberLong also in $inc:
> db.Inc.insert({"Number" : NumberLong("1000023272226647000")});
> db.Inc.update({}, {$inc: {"Number" : NumberLong("1")}});
> db.Inc.find();
{ "_id" : 1, "Number" : NumberLong("1000023272226647001") }
// This actually worked
In C#, both of the following updates work, Number remains a long:
class Counter { public long Number {get;set;} public ObjectId Id {get;set;} }
var collection = db.GetCollection("Counter");
collection.Insert(new Counter { Number = 1234 });
collection.Update(Query.Null, Update<Counter>.Inc(p => p.Number, 1)); // works
collection.Update(Query.Null, Update.Inc("Number", 1)); // works too
MongoDB is schema-less. Schamalessness provides for easier changes in your data structure but at the cost of the database not enforcing things like type constraints. You need to be disciplined in your application code to ensure that things are persisted in the way you want them to be.
If you need to ensure that the data is always of type Integer then it's recommended to have your application access MongoDB through a data access layer within the application. The data access layer can enforce type constraints (as well as any other constraints you want to put on your objects).
Short answer: There is no way to enforce this in MongoDB.
I have an events collection of 2.502.011 elements and would like to perform an update on all elements. Unfortunately I facing a lot of mongodb faults due to the write lock.
Question: How can I avoid those faults in order to be sure that all my events are correctly updated ?
Here are the informations regarding my events collection:
> db.events.stats()
{
"count" : 2502011,
"size" : 2097762368,
"avgObjSize" : 838.4305136947839,
"storageSize" : 3219062784,
"numExtents" : 21,
"nindexes" : 6,
"lastExtentSize" : 840650752,
"paddingFactor" : 1.0000000000874294,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 1265898256,
"indexSizes" : {
"_id_" : 120350720,
"destructured_created_at_1" : 387804032,
"destructured_updated_at_1" : 419657728,
"data.assigned_author_id_1" : 76053152,
"emiting_class_1_data.assigned_author_id_1_data.user_id_1_data.id_1_event_type_1" : 185071936,
"created_at_1" : 76960688
}
}
Here is what an event look like:
> db.events.findOne()
{
"_id" : ObjectId("4fd5d4586107d93b47000065"),
"created_at" : ISODate("2012-06-11T11:19:52Z"),
"data" : {
"project_id" : ObjectId("4fc3d2abc7cd1e0003000061"),
"document_ids" : [
"4fc3d2b45903ef000300007d",
"4fc3d2b45903ef000300007e"
],
"file_type" : "excel",
"id" : ObjectId("4fd5d4586107d93b47000064")
},
"emiting_class" : "DocumentExport",
"event_type" : "created",
"updated_at" : ISODate("2013-07-31T08:52:48Z")
}
I would like to update each event to add 2 new fields base on the existing created_at and updated_at. Please correct me if I am wrong but it seems you can't use the mongo update command when you need to access current's element data along the way.
This is my update loop:
db.events.find().forEach(
function (e) {
created_at = new Date(e.created_at);
updated_at = new Date(e.updated_at);
e.destructured_created_at = [e.created_at]; // omitted the actual values
e.destructured_updated_at = [e.updated_at]; // omitted the actual values
db.events.save(e);
}
)
When running the above command, I get a huge amount of page faults due to the write lock on the database.
I think you are confused here, it is not the write lock causing that, it is MongoDB querying for your update documents; the lock does not exist during a page fault (in fact it only exists when actually updating, or rather saving, a document on the disk), it gives way to other operations.
The lock is more of a mutex in MongoDB.
Page faults on this size of data is perfectly normal, since you obviously do not query this data often, I am unsure what you are expecting to see. I am definitely unsure what you mean by your question:
Question: How can I avoid those faults in order to be sure that all my events are correctly updated ?
Ok, the problem you may be seeing is that you are getting page thrashing on that machine in turn destroying your IO bandwidth and flooding your working set with data that is not needed. Do you really need to add this field to ALL documents eagerly, can it not be added on-demand by the application when that data is used again?
Another option is to do this in batches.
One feature you could make use of here is priority queues that dictate that such an update is a background task that shouldn't effect the current workings of your mongod too much. I hear such a feature is due (can't find JIRA :/).
Please correct me if I am wrong but it seems you can't use the mongo update command when you need to access current's element data along the way.
You are correct.
Hi my mongodb collection is look likes this
{ "created" : ISODate("2013-01-01T00:00:00Z"), "total_page_impression_count" : 500, "total_page_story_count" : 7 }
{ "created" : ISODate("2013-01-02T00:00:00Z"), "total_page_impression_count" : 511, "total_page_story_count" : 7 }
{ "created" : ISODate("2013-01-03T00:00:00Z"), "total_page_impression_count" : 513, "total_page_story_count" : 7 }
and I want to persist the value of (total_page_impression_count + total_page_story_count) in another column on the same table. Any one know a way to do that. According to the finding what I got to know is it is not possible like SQL. I'll be great full if anyone can help. Thank you in advance.
You can iterate all documents and write aggregate field manually. using your language driver or javascript shell. Yes, there's no update users set full_name = first_name + last_name kind of commands in mongodb.
I think application logic is an application logic. And saving results of aggregating to perform new level of denormalization at deal with questionable performance gain is layed, 1st of all, on developer choice . So, do 10gen need to implement such functionality at all?
googletranslate
mongodb is not RDB.
mongodb has not "join" function.
db.yourtable.insert({ "created" : ISODate("2013-01-01T00:00:00Z"), "total_page_impression_count" : 500, "total_page_story_count" : 7 })
you need to insert 2 times.
db.yourtable.insert({ "created" : ISODate("2013-01-01T00:00:00Z"), "total_page_impression_count" : 500, "total_page_story_count" : 7 });
db.othertable.insert({"total_page_story_count":507});
I have the following collection for a user in a MongoDB:
{
"_id" : 1,
"facebook_id" : XX,
"name": "John Doe",
"points_snapshot" : [{
"unix_timestamp" : 1312300552,
"points" : 115
}, {
"unix_timestamp" : 1312330380,
"points" : 110
}, {
"unix_timestamp" : 1312331610,
"points" : 115
}]
}
Is it possible to write one query which by which I can get the user for id 1 along with the snapshots after a particular day. eg: 1312330300?
Basically limit the snapshots to X numbers matching some criteria?
So far, I have tried in C#:
Query.And(
Query.EQ("facebook_id", XX),
Query.GTE("points_snapshot.unix_timestamp", sinceDateTimeStamp)))
.SetFields("daily_points_snapshot", "facebook_id")
which I soon realised will not work for what I want.
Any help will be appreciated!
Thanks!
EDIT: My Solution to this
If anyone is looking to get a quick fix to this, this is what I ended up doing:
var points = MyDatabase[COLLECTION_NAME]
.Find(Query.EQ("facebook_id", XX))
.SetFields("points_snapshot", "facebook_id")
.FirstOrDefault();
if (points != null) {
var pointsArray = points.AsBsonDocument["points_snapshot"].AsBsonArray
.Where(x => x.AsBsonDocument["unix_timestamp"].AsInt32 > sinceDateTimeStamp)
.ToList();
return pointsArray;
}
Is it possible to write one query which by which I can get the user for id 1 along with the snapshots after a particular day. eg: 1312330300?
Not possible.
The problem here is that MongoDB queries return all matching Documents. Your Document contains an array of objects, but MongoDB does not treat these as "Documents".
Now, you can limit the fields that are returned, but in your case you actually want to limit the "portions of an array" that are returned.
This is a long-outstanding issue, the JIRA ticket is here. The ticket is dated about 18 months ago and it continues to get pushed back. So if this type of query is critical to your architecture, you may have to re-design the architecture.