g.V().hasLabel('Person').count() gives me number of person vertices present in my database, and g.V().hasLabel('Person').range(10,15) gives me a range of person vertices.
But is there a way to combine these two into a single Gremlin query?
This is just a simplified version of my query, but my actual query is quiet complex and repeating those number of traversals just to find the count, seem ineffective!
I just realized, I could use groovy with gremlin to achieve what I want. Not sure how elegant this is!
def lst = g.V().hasLabel('Person').toList(); def result = ["count":0, "data":[] ]; result.count=lst.size();result.data=lst[2..3];result
This is working great even in my complex case.
I wouldn't recommend to do a full count() in every query. Rather count the total number once and cache it in your application.
That said, here's how you could do it anyway:
g.V().hasLabel('Person').aggregate('x').cap('x').
project('page3', 'total').by(range(local, 10, 15)).by(count(local))
UPDATE:
In older versions you can try this:
g.V().hasLabel('Person').aggregate('x').cap('x').as('page3', 'total').
select('page3', 'total').by(range(local, 10, 15)).by(count(local))
Related
I am running tests against my MongoDB and for some reason find has the same performance as count.
Stats:
orders collection size: ~20M,
orders with product_id 6: ~5K
product_id is indexed for improved performance.
Query: db.orders.find({product_id: 6}) vs db.orders.find({product_id: 6}).count()
result the orders for the product vs 5K after 0.08ms
Why count isn't dramatically faster? it can find the first and last elements position with the product_id index
As Mongo documentation for count states, calling count is same as calling find, but instead of returning the docs, it just counts them. In order to perform this count, it iterates over the cursor. It can't just read the index and determine the number of documents based on first and last value of some ID, especially since you can have index on some other field that's not ID (and Mongo IDs are not auto-incrementing). So basically find and count is the same operation, but instead of getting the documents, it just goes over them and sums their number and return it to you.
Also, if you want a faster result, you could use estimatedDocumentsCount (docs) which would go straight to collection's metadata. This results in loss of the ability to ask "What number of documents can I expect if I trigger this query?". If you need to find a count of docs for a query in a faster way, then you could use countDocuments (docs) which is a wrapper around an aggregate query. From my knowledge of Mongo, the provided query looks like a fastest way to count query results without calling count. I guess that this should be preferred way regarding performances for counting the docs from now on (since it's introduced in version 4.0.3).
I have the following query - it takes about 20-40 seconds to complete (similar queries without RegEx on the same collection take milliseconds at most):
var filter = Builders<BsonDocument>.Filter.Regex("DescriptionLC", new BsonRegularExpression(descriptionStringToFindFromCallHere, "i"));
var mongoStuff = GetMongoCollection<BsonDocument>(MongoConstants.StuffCollection);
var stuff = await mongoStuff
.Find(filter)
.Limit(50)
.Project(x => Mapper.Map<BsonDocument, StuffViewModel>(x))
.ToListAsync();
I saw an answer here that seems to imply that this query would be faster using the following format (copied verbatim):
var names = namesCollection.AsQueryable().Where(name =>
name.FirstName.ToLower().Contains("hamster"));
However, the project is using MongoDb .NET Driver 2.0 and it doesn't support LINQ. So, my question comes down to:
a). Would using LINQ be noticeably faster, or about the same? I can update to 1, but I would rather not.
b). Is there anything I can do to speed this up? I am already looking for a lower-case only field.
------------END ORIGINAL OF POST------------
Edit: Reducing the number of "stuff" returned via changing .Limit(50) to say .Limit(5) reduces the call time linearly. 40 seconds drops to 4 with the latter, I have experimented with different numbers and it seems to be a direct correlation. That's strange to me, but I don't really understand how this works.
Edit 2: It seems that the only solution might be to use "starts with" instead of "contains" regular expressions. Apparently the latter doesn't use indices efficiently according to the docs ("Index Use" section).
Edit 3: In the end, I did three things (field was already indexed):
1). Reduce the number of results returned - this help dramatically, linear correlation between number of items returned and amount of time the call takes.
2). Changed the search to lower-case only - this helped only slightly.
3). Changed the regular expression to only search "starts with" rather than "contains", again, this barely helped, changes for that were:
//Take the stringToSearch and make it into a "starts with" RegEx
var startingWithSearchRegEx = "^" + stringToSearch;
Then pass that into the new BsonRegularExpression instead of just the search string.
Still looking for any feedback!!
Regex on hundreds of thousands documents is not recommend as it's essentially doing document scan so no index is being used at all.
This is the main reason why your query is so slow. It has nothing to do with .net driver.
If you have a lot of text or searching for text patterns often, I'll suggest create text index on field of interest and do full text search. Please see docs for $text
I am implementing a simple 'get max value' map reduce in MongoDB (c# driver).
For my tests I have 10 items in a collection with int _id = 1 to 10.
My map and reduce are as follows:
var map = "function() {emit('_id', this.Id);}";
var reduce = "function(key, values) {var max = 1; for (id in values) {if(id>max) {max=id;}} } return max;}";
When I run however I get the result 9, strange!!
I think that the map is outputting a string, and thus the compare is not working as desired.
Any help would be great
Reduce function won't run if the values contain only one item. If all the ids are unique and your key in the map is only that id, reduce phase won't work because of a design issue (for improving performance). If you need to change the format of your reduce output, you should use finalize method. Or just take a look at the aggregation framework which provides quite useful tools for playing with data.
Check the jira
jira.mongodb.org/browse/SERVER-5818
If you are just trying to get familiar with map reduce I would suggest to try different scenarios where using map-reduce really makes sense
Cheers
model checkin:
checkin
_id
interest_id
author_id
I've got a collection of checkins (resolved by simple "find" query)
I'd like to count the number of checkins for each interest.
What makes the task a bit more difficult - we should count two checkins from one person and one interest as one checkin.
AFAIK, group operations in mongo are performed by map/reduce query. Should we use it here? The only idea I've got with such an approach is to aggregate the array of users for each interest and then return this array's length.
EDIT I ended up with not using map/reduce at all, allthough Emily's answer worked fine & quick.
I have to select only checkins from last 60 minutes, and there shouldn't be too many results. So I just get all of them to Ruby driver, and do all the calculation's on ruby side. It's a bit slower, but much more scalable and easy-to-understand.
best,
Roman
Map reduce would probably be the way to go for this and you could get the desired results with two map reduces.
In the first, you could remove duplicate author_id and interest_id pairs.
key would be author_id and interest_id
values would be checkin_id
The second map reduce would just be a count of the number of checkins by a given author_id.
key would be author_id
value would be checkin_id count
For example given the BlogPost/Comments schema here:
http://mongoosejs.com/
How would I find all posts with more than five comments? I have tried something along the lines of
where('comments').size.gte(5)
But I'm getting tripped up with the syntax
MongoDb doesn't support range queries with size operator (Link). They recommend you to create a separate field to contain the size of the list that you increment yourself.
You cannot use $size to find a range of sizes (for example: arrays with more than 1 element). If you need to query for a range, create an extra size field that you increment when you add elements.
Note that for some queries, it may be feasible to just list all the counts you want in or excluded using (n)or conditions.
In your example, the following query will give all documents with more than 5 comments (using standard mongodb syntax, not mongoose):
db.col.find({"comments":{"$exists"=>true}, "$nor":[{"comments":{"$size"=>4}}, {"comments":{"$size"=>3}}, {"comments":{"$size"=>2}}, {"comments":{"$size"=>1}}, {"comments":{"$size"=>0}}]})
Obviously, this is very repetitive, so it only makes sense for small boundaries, if at all. Keeping a separate count variable, as recommended in the mongodb docs, is usually the better solution.
It's slow, but you could also use the $where clause:
db.Blog.find({$where:"this.comments.length > 5"}).exec(...);