Mongoid to Mongo query translation - mongodb

This question is updated with a simpler explanation.
For my data the following Mongo CLI query takes 208ms. This query retrieves ALL the data for the 18 requested objects.
db.videos.find({avg_rating: {$gt: 1}, poster_large_thumb: {$exists:true}, release_date: {$lte: ISODate("2000-12-31")}}).sort({release_date:-1, avg_rating:-1, title:1}).skip(30).limit(18).pretty().explain()
{
"cursor" : "BasicCursor",
"nscanned" : 76112,
"nscannedObjects" : 76112,
"n" : 48,
"scanAndOrder" : true,
"millis" : 208,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
However with Mongoid when I do the query it creates a criteria object that has no actual data. I pass this criteria to a method that iterates it to create a JSON data structure. Since the data is needed each object must be retrieved from the DB. Here is the criteria returning query:
#videos = Video.order_by(release_date: -1, avg_rating: -1, title: 1).where(:avg_rating.gt => 1, :poster_large_thumb.exists => 1, :release_date.lte => start_date).skip(skip*POSTERS_PER_ROW).limit(limit*POSTERS_PER_ROW)
When I iterate #videos, each object takes over 240ms to retrieve from the DB, this from some debug output.
Getting one object:
2013-12-18 00:43:52 UTC
2013-12-18 00:43:52 UTC
0.24489331245422363
Assuming that if I got all the data in the Video.order_by(...) query it would take 208ms total how can I force it to do retrieval in one query instead of getting each objects individually?
Something here is causing the entire retrieval to take a couple of orders of magnitude more than the Mongo CLI.

Responses:
skip().limit() queries tend to get slower and slower on MongoDB side. As skip walks through the docs, more info see here https://stackoverflow.com/a/7228190/534150
The multiple identical queries look like to me a N+1 type of issue. That means that probably, somewhere in your view, you have a loop that calls a property that is lazy loaded, so it sends the query over and over again. Those problems are usually tricky to find, but to track them you need to have the end-to-end trace, which you are probably the only one that can do that, as you have access to the source code.
the Mongoid side looks correct to me.

Thanks for the ideas, I think #Arthur gave the important hint. No one could have answered because it looks like the problem was in another bit of code--in how I access the criteria.
Given the following query, which produces a criteria:
#videos = Video.order_by(release_date: 1, avg_rating: 1, title: -1).where(:release_date.ne => 0,:avg_rating.gt => 1, :poster_large_thumb.exists => 1, :release_date.gt => start_date).skip(skip*POSTERS_PER_ROW).limit(limit*POSTERS_PER_ROW).only(:_id, :poster_large_thumb, :title)
In a couple nested block I'm grabbing values with a line of code like this:
video_full = #videos[((row_index)*POSTERS_PER_ROW)+column_index]
This random access notation seems to be the problem. It seems to execute the full Moped query for every individual object so POSTERS_PER_ROW*num_rows times.
If I grab all the videos before the loops with this bit of code:
#videos.each do |video|
videos_full.push video
end
Then grab the values from an array instead of the criteria like this:
video_full = videos_full[((row_index)*POSTERS_PER_ROW)+column_index]
I get only one Moped query of 248ms, all objects are retrieved with that one query. This is a huge speed up. the query time goes from num_rows* POSTERS_PER_ROW*248ms to just 248ms
This is a big lesson learned for me so if someone can point to the docs that describe this effect and the rules to follow I'd sure appreciate it.

Related

Storing a query in Mongo

This is the case: A webshop in which I want to configure which items should be listed in the sjop based on a set of parameters.
I want this to be configurable, because that allows me to experiment with different parameters also change their values easily.
I have a Product collection that I want to query based on multiple parameters.
A couple of these are found here:
within product:
"delivery" : {
"maximum_delivery_days" : 30,
"average_delivery_days" : 10,
"source" : 1,
"filling_rate" : 85,
"stock" : 0
}
but also other parameters exist.
An example of such query to decide whether or not to include a product could be:
"$or" : [
{
"delivery.stock" : 1
},
{
"$or" : [
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 60
}
},
{
"delivery.filling_rate" : {
"$gt" : 90
}
}
]
},
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 40
}
},
{
"delivery.filling_rate" : {
"$gt" : 80
}
}
]
},
{
"$and" : [
{
"delivery.delivery_days" : {
"$lt" : 25
}
},
{
"delivery.filling_rate" : {
"$gt" : 70
}
}
]
}
]
}
]
Now to make this configurable, I need to be able to handle boolean logic, parameters and values.
So, I got the idea, since such query itself is JSON, to store it in Mongo and have my Java app retrieve it.
Next thing is using it in the filter (e.g. find, or whatever) and work on the corresponding selection of products.
The advantage of this approach is that I can actually analyse the data and the effectiveness of the query outside of my program.
I would store it by name in the database. E.g.
{
"name": "query1",
"query": { the thing printed above starting with "$or"... }
}
using:
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
Which results in:
2016-03-27T14:43:37.265+0200 E QUERY Error: field names cannot start with $ [$or]
at Error (<anonymous>)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:161:19)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:165:18)
at insert (src/mongo/shell/bulk_api.js:646:20)
at DBCollection.insert (src/mongo/shell/collection.js:243:18)
at (shell):1:12 at src/mongo/shell/collection.js:161
But I CAN STORE it using Robomongo, but not always. Obviously I am doing something wrong. But I have NO IDEA what it is.
If it fails, and I create a brand new collection and try again, it succeeds. Weird stuff that goes beyond what I can comprehend.
But when I try updating values in the "query", changes are not going through. Never. Not even sometimes.
I can however create a new object and discard the previous one. So, the workaround is there.
db.queries.update(
{"name": "query1"},
{"$set": {
... update goes here ...
}
}
)
doing this results in:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 52,
"errmsg" : "The dollar ($) prefixed field '$or' in 'action.$or' is not valid for storage."
}
})
seems pretty close to the other message above.
Needles to say, I am pretty clueless about what is going on here, so I hope some of the wizzards here are able to shed some light on the matter
I think the error message contains the important info you need to consider:
QUERY Error: field names cannot start with $
Since you are trying to store a query (or part of one) in a document, you'll end up with attribute names that contain mongo operator keywords (such as $or, $ne, $gt). The mongo documentation actually references this exact scenario - emphasis added
Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)...
I wouldn't trust 3rd party applications such as Robomongo in these instances. I suggest debugging/testing this issue directly in the mongo shell.
My suggestion would be to store an escaped version of the query in your document as to not interfere with reserved operator keywords. You can use the available JSON.stringify(my_obj); to encode your partial query into a string and then parse/decode it when you choose to retrieve it later on: JSON.parse(escaped_query_string_from_db)
Your approach of storing the query as a JSON object in MongoDB is not viable.
You could potentially store your query logic and fields in MongoDB, but you have to have an external app build the query with the proper MongoDB syntax.
MongoDB queries contain operators, and some of those have special characters in them.
There are rules for mongoDB filed names. These rules do not allow for special characters.
Look here: https://docs.mongodb.org/manual/reference/limits/#Restrictions-on-Field-Names
The probable reason you can sometimes successfully create the doc using Robomongo is because Robomongo is transforming your query into a string and properly escaping the special characters as it sends it to MongoDB.
This also explains why your attempt to update them never works. You tried to create a document, but instead created something that is a string object, so your update conditions are probably not retrieving any docs.
I see two problems with your approach.
In following query
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
a valid JSON expects key, value pair. here in "query" you are storing an object without a key. You have two options. either store query as text or create another key inside curly braces.
Second problem is, you are storing query values without wrapping in quotes. All string values must be wrapped in quotes.
so your final document should appear as
db.queries.insert({
"name" : "query1",
"query": 'the thing printed above starting with "$or"... '
})
Now try, it should work.
Obviously my attempt to store a query in mongo the way I did was foolish as became clear from the answers from both #bigdatakid and #lix. So what I finally did was this: I altered the naming of the fields to comply to the mongo requirements.
E.g. instead of $or I used _$or etc. and instead of using a . inside the name I used a #. Both of which I am replacing in my Java code.
This way I can still easily try and test the queries outside of my program. In my Java program I just change the names and use the query. Using just 2 lines of code. It simply works now. Thanks guys for the suggestions you made.
String documentAsString = query.toJson().replaceAll("_\\$", "\\$").replaceAll("#", ".");
Object q = JSON.parse(documentAsString);

Strange performance test results with different write concerns in MongoDB + C#

I'm trying to test the performance of MongoDB before actually putting it to use. I'm trying to see how many documents can I update per second. I'm using C# (Mono + Ubuntu) MongoDB Driver v1.9 and MongoDB v2.4.6.
I believe one of the most effective MongoDB parameters on write performance is Write Concern. As it is stated in documentation, the most relaxed value for write concern would be -1, then 0 and finally 1 is the slowest one.
After searching I found that I can set write concerns in C# embedded in connection string like this:
var client = new MongoClient("mongodb://localhost/?w=-1");
Here are the results of me playing around with different values for w:
Fastest results are achieved when I set w to 1!
Setting w=0 is slower than w=1 28 times!
w=-1 will lead to an exception thrown with error message W value must be greater than or equal to zero!
Does anyone have any explanations on these results? Am I doing anything wrong?
[UPDATE]
I think it is necessary to set the test procedure maybe there's something hidden within it. So here it goes:
I've got a database with 100M documents in a single collection. Each document is created like this using mongo shell:
{ "counter" : 0, "last_update" : new Date() }
Here's the output of db.collection.stats();:
{
"ns" : "test.collection",
"count" : 100000100,
"size" : 6400006560,
"avgObjSize" : 64.0000015999984,
"storageSize" : 8683839472,
"numExtents" : 27,
"nindexes" : 2,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 5769582448,
"indexSizes" : {
"_id_" : 3251652432,
"last_update_1" : 2517930016
},
"ok" : 1
}
Using Mono 3.2.1 in Ubuntu 12.04 I've written a C# project which connects to MongoDB and tries to update the documents like this:
FindAndModifyArgs args = new FindAndModifyArgs();
args.SortBy = SortBy.Ascending("last_update");
args.Update = Update<Entity>.Set(e => e.last_update, DateTime.Now);
args.Fields = Fields.Include(new string[] { "counter", "_id" });
var m = collection.FindAndModify(args);
Entity ent = m.GetModifiedDocumentAs<Entity>();
var query = Query<Entity>.EQ(e => e.Id, ent.Id);
var update = Update<Entity>.Set(e => e.counter, ent.counter+1);
collection.Update(query, update);
To summarize what this piece of code does; it selects the oldest last_update and while it sets the last_update to current date, it also increments its counter (update happens in two steps).
I ran this code 10k for each of four different types of Write Concerns, w=-1, w=0, w=1 and w=1&j=true. While w=-1 throws an exception and gives out no results, here are the results for the rest of them:
Since the figure is a little hard to read, here're the same results in numbers:
w=-1 w=0 w=1 w=1&j=true
Average N/A 244.0948611492 7064.5143923477 238.1846428156
STDEV N/A 1.7787457992 511.892765742 21.0230097306
And the question is: Does anyone have any explanations why w=0 is much slower than w=1 and why w=-1 is not supported?
[UPDATE]
I've also tested RequestStart in my code like this:
using (server.RequestStart(database)) {
FindAndModifyArgs args = new FindAndModifyArgs();
args.SortBy = SortBy.Ascending("last_update");
args.Update = Update<Entity>.Set(e => e.last_update, DateTime.Now);
args.Fields = Fields.Include(new string[] { "counter", "_id" });
var m = collection.FindAndModify(args);
Entity ent = m.GetModifiedDocumentAs<Entity>();
var query = Query<Entity>.EQ(e => e.Id, ent.Id);
var update = Update<Entity>.Set(e => e.counter, ent.counter+1);
collection.Update(query, update);
}
It had no significant effect on any of the results, what so ever.
To address the w=-1 error, you may want to use the WriteConcern.Unacknowledged property directly.
Your test code might be running into consistency problems. Look into RequestStart/RequestDone to put the query onto the same connection. Per the documentation:
An example of when this might be necessary is when a series of Inserts are called in rapid succession with a WriteConcern of w=0, and you want to query that data is in a consistent manner immediately thereafter (with a WriteConcern of w=0, the writes can queue up at the server and might not be immediately visible to other connections
Aphyr did a good blog post on how well various settings of MongoDB's write concern effect its approximation of consistency. It may help you troubleshoot this issue.
Finally, you can find your driver version in build.ps1

Querying sub array with $where

I have a collection with following document:
{
"_id" : ObjectId("51f1fd2b8188d3117c6da352"),
"cust_id" : "abc1234",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 27,
"items" : [{
"sku" : "mmm",
"qty" : 5,
"price" : 2.5
}, {
"sku" : "nnn",
"qty" : 5,
"price" : 2.5
}]
}
I want to use "$where" in the fields of "items", so something like this:
{$where:"this.items.sku==mmm"}
How can I do it? It works when the field is not of array type.
You don't need a $where operator to do this; just use a query object of:
{ "items.sku": mmm }
As for why your $where isn't working, the value of that operator is executed as JavaScript, so that's not going to check each element of the items array, it's just going to treat items as a normal object and compare its sku property (which is undefined) to mmm.
You are comparing this.items.sku to a variable mmm, which isn't initialized and thus has the value unefined. What you want to do, is iterate the array and compare each entry to the string 'mmm'. This example does this by using the array method some which returns true, when the passed function returns true for at least one of the entries:
{$where:"return this.items.some(function(entry){return entry.sku =='mmm'})"}
But really, don't do this. In a comment to the answer by JohnnyHK you said "my service is just a interface between user and mongodb, totally unaware what the field client want's to store". You aren't really explaining your use-case, but I am sure you can solve this better.
The $where operator invokes the Javascript engine even though this
trivial expression could be done with a normal query. This means unnecessary performance overhead.
Every single document in the collection is passed to the function, so when you have an index, it can not be used.
When the javascript function is generated from something provided by the client, you must be careful to sanetize and escape it properly, or your application gets vulnerable to code injection.
I've been reading through your comments in addition to the question. It sounds like your users can generically add some attributes, which you are storing in an array within a document. Your client needs to be able to query an arbitrary pair from the document in a generic manner. The pattern to achieve this is typically as follows:
{
.
.
attributes:[
{k:"some user defined key",
v:"the value"},
{k: ,v:}
.
.
]
}
Note that in your case, items is attributes. Now to get the document, your query will be something like:
eg)
db.collection.find({attributes:{$elemMatch:{k:"sku",v:"mmm"}}});
(index attributes.k, attributes.v)
This allows your service to provide a way to query the data, and letting the client specify what the k,v pairs are. The one caveat with this design is always be aware that documents have a 16MB limit (unless you have a use case that makes GridFS appropriate). There are functions like $slice which may help with controlling this.

How to 'explain' a runCommand?

Is there a way to run explain with runCommand? I have the following query:
db.runCommand({geoNear:"Locations", near:[50,50], spherical:true})
How can I run an explain on it? I want to get the execution time.
As far as I know, explain is a method on cursors. You, however, can enable the integrated mongodb profiler:
db.setProfilingLevel(2); // log all operations
db.setProfilingLevel(1, 50); // log all operations longer than 50msecs
This will log details like nscanned, nreturned to a capped collection called system.profile, but does not provide as much detail as an explain() call does.
In this case, however, I think it might be possible to change the runCommand to a $near-query instead? That would give you full access to explain.
I guess we can't do explain for runCommand. Some of the runCommand give you the stats automatically ( eg. distinct command : db.runCommand({distinct : 'test', key : 'a'}) )
But, you can take help of the query profiler.
db.setProfilingLevel(2)
Once you run that query, switch off the profiler, and check the system.profile collection for this query.
According to explain docs you can simply:
db.runCommand({
explain: {geoNear:"Locations", near:[50,50], spherical:true}
})
Both answers (from mnemosyn and Abhishek Kumar) are correct.
However, if you just want to see the execution time (like me), this is provided in the results along with some extra stats:
...
"stats" : {
"time" : 2689,
"btreelocs" : 0,
"nscanned" : 1855690,
"objectsLoaded" : 979,
"avgDistance" : 0.006218027001875209,
"maxDistance" : 0.006218342348749806
},
"ok" : 1
Well, I should have looked closer before posting the question :).
Maybe not a runCommand but I use something like this:
expStats = function() {
var exp = db.collection.find({"stats.0.age" : 10},{"stats.age" : 1}).explain("allPlansExecution");
print ("totalDocsExamined = "+exp.executionStats.totalDocsExamined);
print ("nReturned = "+exp.executionStats.nReturned);
return print ("execTime = "+exp.executionStats.executionTimeMillis);
}

get number of updated documents mongo

Is there a function in mongo to return the number of documents that were updated in an update statement?
I have tried to use count() but apparently the update statement only returns true or false so i think I'm getting the count of a string.
Thanks
Use the getLastError command to get information about the result of your operation.
I don't know the ruby driver, but most drivers automatically do this in 'safe mode'. In safe mode, each write will examine the result of getLastError to make sure the write was successful. The update operation should return an object that looks like the JSON object further down, and it includes the number of updated documents (n). You can fine-tune the safe mode settings, but be warned that the default mode is "fire and forget", so safe mode is a good idea for many use cases.
In the shell,
> db.customers.update({}, {$set : {"Test" : "13232"}}, true, true);
> db.runCommand( "getlasterror" )
{
"updatedExisting" : true,
"n" : 3,
"connectionId" : 18,
"err" : null,
"ok" : 1
}
Here, I updated n = 3 documents. Note that by default, update operations in mongodb only apply to the first matched document. In the shell, the fourth parameter is used to indicate we want to update multiple documents.
For future reference as of the latest version of mongoDB, you can save the result of the update command and look for the property modifiedCount. Note that if the update operation results in no change to the document, such as setting the value of the field to its current value, it may be less than the number of documents that were matched in the query.
For example (in JS):
var updateResult = await collection.updateOne( .... )
if(updateResult.modifiedCount < 1) throw new Error("No docs updated");
There are also two fields result.n and result.nModified in the response object that tell you the number of documents matched, and the number updated, respectively. Here's a portion of the response object that mongoDB returns
result: {
n: 1,
nModified: 1,
...
},
...
modifiedCount: 1
More info here: https://docs.mongodb.com/manual/reference/command/update/