Mongo aggregate and count over n fields - mongodb

Im having trouble understanding MongoDB's Aggregation framework. Basically my JSON looks like this:
[
{
"id": 1,
"param1": true,
"param2": false,
"param3": false
},
{
"id": 2,
"param1": true,
"param2": false,
"param3": true
},
{
"id": 3,
"param1": false,
"param2": true,
"param3": false
}
]
I want to count how many documents have, for example, param1 == true, param2 == true and so on.
In this case the expected result should be:
count_param1: 2
count_param2: 1
count_param3: 1
The trick here is that param can be param1 .. paramN, so basically I either need to do a distinct and specify exactly which fields im interested in or can I "group on" all fields starting with "param".
What is the recommended approach?
Further explanation:
The SQL equivalent would be to do:
SELECT COUNT(param1) AS param1
FROM [Table]
GROUP BY param1
For each column (but in one query).

I would not use aggregation, as there is a built-in helper count() for this:
> db.collection.count({ "param1" : true })
You can create a simple function that takes the parameter name as argument and gives back the count:
> param_count = function(param_name) {
count_obj = {}
count_obj[param_name] = true
return db.collection.count(count_obj)
}
While it is technically possible to get the counts for all the params in one aggregation pipeline, it's infeasible for 1 million+ rows and it will be better to do one aggregation pipeline per param name. I'm not well-versed in SQL, but I am guessing when you give the SQL equivalent and say you'd do them all in "one query" you mean you'd send one batch of SQL but it would essentially be a concatenation of different queries to group and count, so it's not much different from the solution I have given.
The count can use an index on paramN if one exists.

This has been solved.
Check out my related question and chridam's excellent answer.
A perfect solution for my needs.

Related

Mongodb create index for boolean and integer fields

user collection
[{
deleted: false,
otp: 3435,
number: '+919737624720',
email: 'Test#gmail.com',
name: 'Test child name',
coin: 2
},
{
deleted: false,
otp: 5659,
number: '+917406732496',
email: 'anand.satyan#gmail.com',
name: 'Nivaan',
coin: 0
}
]
I am using below command to create index Looks like for string it is working
But i am not sure this is correct for number and boolean field.
db.users.createIndex({name:"text", email: "text", coin: 1, deleted: 1})
I am using this command to filter data:
db.users.find({$text:{$search:"anand.satya"}}).pretty()
db.users.find({$text:{$search:"test"}}).pretty()
db.users.find({$text:{$search:2}}).pretty()
db.users.find({$text:{$search:false}}).pretty()
string related fields working. But numeric and boolean fields are not working.
Please check how i will create index for them
The title and comments in this question are misleading. Part of the question is more focused on how to query with fields that contain boolean and integer fields while another part of the question is focused on overall indexing strategies.
Regarding indexing, the index that was shown in the question is perfectly capable of satisfying some queries that include predicates on coin and deleted. We can see that when looking at the explain output for a query of .find({$text:{$search:"test"}, coin:123, deleted: false}):
> db.users.find({$text:{$search:"test"}, coin:123, deleted: false}).explain().queryPlanner.winningPlan.inputStage
{
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
filter: {
'$and': [ { coin: { '$eq': 123 } }, { deleted: { '$eq': false } } ]
},
keyPattern: { _fts: 'text', _ftsx: 1, coin: 1, deleted: 1 },
indexName: 'name_text_email_text_coin_1_deleted_1',
isMultiKey: false,
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'backward',
indexBounds: {}
}
}
Observe here that the index scan stage (IXSCAN) is responsible for providing the filter for the coin and deleted predicates (as opposed to the database having to do that after FETCHing the full document.
Separately, you mentioned in the question that these two particular queries aren't working:
db.users.find({$text:{$search:2}}).pretty()
db.users.find({$text:{$search:false}}).pretty()
And by 'not working' you are referring to the fact that no results are being returned. This is also related to the following discussion in the comments which seemed to have a misleading takeaway:
You'll have to convert your coin and deleted fields to string, if you want it to be picked up by $search – Charchit Kapoor
So. There is no way for searching boolean or integger field. ? – Kiran S youtube channel
Nope, not that I know of. – Charchit Kapoor
You can absolutely use boolean and integer values in your query predicate to filter data. This playground demonstrates that.
What #Charchit Kapoor is mentioning that can't be done is using the $text operator to match and return results whose field values are not strings. Said another way, the $text operator is specifically used to perform a text search.
If what you are trying to achieve are direct equality matches for the field values, both strings and otherwise, then you can delete the text index as there is no need for using the $text operator in your query. A simplified query might be:
db.users.find({ name: "test"})
Demonstrated in this playground.
A few additional things come to mind:
Regarding indexing overall, databases will generally consider using an index if the first key is used in the query. You can read more about this for MongoDB specifically on this page. The takeaway is that you will want to create the appropriate set of indexes to align with your most commonly executed queries. If you have a query that just filters on coin, for example, then you may wish to create an index that has coin as its first key.
If you want to check if the exact string value is present in multiple fields, then you may want to do so using the $or operator (and have appropriate indexes for the database to use).
If you do indeed need more advanced text searching capabilities, then it would be appropriate to either continue using the $text operator or consider Atlas Search if the cluster is running in Atlas. Doing so does not prevent you from also having indexes that would support your other queries, such as on { coin: 2 }. It's simply that the syntax for performing such a query needs to be updated.
There is a lot going on here, but the big takeaway is that you can absolutely filter data based on any data type. Doing so simply requires using the appropriate syntax, and doing so efficiently requires an appropriate indexing strategy to be used along side of the queries.

How to get sorted result start from a point in MongoDB?

For example, I got some data in MongoDB
db: people
{_id:1, name:"Tom", age:26}
{_id:2, name:"Jim", age:22}
{_id:3, name:"Mac", age:22}
{_id:4, name:"Zoe", age:22}
{_id:5, name:"Ray", age:18}
....
If I want to get result sorted by "age", that's easy, just create a index "age" and use sort. Then I got a long list of return. I may got result like below:
{_id:5, name:"Ray", age:18}
{_id:2, name:"Jim", age:22}
{_id:3, name:"Mac", age:22}
{_id:4, name:"Zoe", age:22}
{_id:1, name:"Tom", age:26}
...
What if I only want this list also sorted by "age" and start from "Mac"? like below:
{_id:3, name:"Mac", age:22}
{_id:4, name:"Zoe", age:22}
{_id:1, name:"Tom", age:26}
...
I can't use $gte because this may include "Jim". Ages can be the same.
What the right way to query this? Thanks.
I think this is more of a "terminology" problem in that what you call a "start point" others call it something different. There are two things I see here as both what I would believe to be the "wrong" approach, and one I would think is the "right" approach to what you want to do. Both would give the desired result on this sample though. There is of course the "obvious" approach if that is simple enough for your needs as well.
For the "wrong" approach I would basically say to use $gte in both cases, for the "name" and "age". This basically gives you are "starting point" at "Mac":
db.collection.find(
{ "name": { "$gte": "Mac" }, "age": { "$gte": 22 } }
).sort({ "age": 1 })
But of course this would not work if you had "Alan" of age "27" since the name is less than the starting name value. Works on your sample though of course.
What I believe to the the "right" thing to what you are asking is that you are talking about "paging" data with a more efficient way that using .skip(). In this case what you want to do is "exclude" results in a similar way.
So this means essentially keeping the last "page" of documents seen, or possibly more depending on how much the "range" value changes, and excluding by the unique _id values. Best demonstrated as:
// First iteration
var cursor = db.collection.find({}).sort({ "age": 1 }).limit(2);
cursor.forEach(function(result) {
seenIds.push(result._id);
lastAge = result.age;
// do other things
});
// Next iteration
var cursor = db.collection.find(
{ "_id": { "$nin": seenIds }, "age": { "$gte": lastAge } }
).sort({ "age": 1 }).limit(2);
Since in the first instance you had already "seen" the first two results, you submit the _id values as a $nin operation to exclude them and also ask for anything "greater than or equal to" the "last seen" age value.
That is an efficient way to "page" data in a forwards direction, and may indeed be what you are asking, but of course it requires that you know "which data" comes "before Mac" in order to do things properly. So that leaves the final "obvious" approach:
The most simple way to just start at "Mac" is to basically query the results and just "discard" anything before the results ever got to this desired value:
var startSeen = false;
db.collection.find(
{ "age": {"$gte": 22}}
).sort({ "age": 1 }).forEach(function(result) {
if ( !startSeen )
startSeen = ( result.name == 'Mac' );
if ( startSeen ) {
// Mac has been seen. Do something with your data
}
})
At the end of the day, there is no way of course to "start from where 'Mac' appears in a a sorted list" in any arbitrary way. You are either going to :
Lexically remove any other results occurring before
Store results and page through them to "cut points" for last seen values
Just live with iterating the cursor and discarding results until the "first" desired match is found.
I did a test and found the solution.
db.collection.find({
$or: [{name: {$gt 'Mac'}, age: 22}, {age: {$gt: 22}}]
})
.sort({age:1, name:1})
really did the magic.

Why was the explain query output giving me BasicCursor eventhough the collection had indexes on it?

I have a collection named stocks , i created a compound index on it as shown below
db.stocks.ensureIndex({"symbol":1,"date":1,"type": 1, "isValid": 1,"rootsymbol":1,"price":1},{"unique" : false})
I have set profilinglevel , to find out all the slow queries .
One of the query was below took 38 millis , when did explain on it , this was the below result
Sorry i have updated my question
db.stocks.find({ query: { symbol: "AAPLE", date: "2014-01-18", type: "O", isValid: true }, orderby: { price: "1" } }).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 705402,
"nscannedObjects" : 705402,
"n" : 0,
"millis" : 3456,
"indexBounds" : {
}
}
My question is why its showing a BasicCursor even though it had indexes on it ??
I'm pretty sure that the issue here is your use of the find() function. You are specifying a query parameter and inside it, placing your search criteria. I don't think that you need to actually put query in there. Simply insert your search criteria. Something like this:
db.stocks.find({
symbol: "AAPLE",
date: "2014-01-18",
type: "O",
isValid: true
}).sort( { "price": 1} ).explain();
Note also my changes to the sorting. You can read more about sorting a cursor here.
Since the problem isn't actually described I will go on to describe it.
You are calling top level query operators with functional operators. So for example you call query operators here:
{ query: { symbol: "AAPLE", date: "2014-01-18", type: "O", isValid: true }, orderby: { price: "1" } }
In the form of query and orderby but then you call a functional operator:
explain();
This is a known bug with MongoDB that these two do not play well together and so produce the output you get.
Of course when the query comes in and is parsed by MongoDB it is recorded in the profile with query operators query and orderby and maxscan etc.
This is more of a problem when calling the command.
Reference: MongoDB $query operator ignores index? I couldn't find the actual JIRA for this but this is related.
Edit: I think this vaguely represents it: https://jira.mongodb.org/browse/SERVER-6767
The syntax is not the problem. In order for MongoDB to use a compound index (ie, an index that contains more than one field), the fields in your query/sort must be a prefix of the index fields. In this case, your index includes these fields: symbol, date, type, isValid, rootsymbol, and price. Your query/sort includes all fields except rootsymbol, so the index cannot be used. Possible solutions:
Remove rootsymbol from the index, or
Add rootsymbol to your query, or
If you can't do either of the above, create another index without rootsymbol
Reference
Regarding the syntax, there is in fact a query syntax in which an index cannot be used: the $where clause requires evaluating inline JavaScript, so indexes cannot be used. For example:
db.collection.find( { $where: "field1.value > field2.value" } )

Add new field to all documents in a nested array

I have a database of person documents. Each has a field named photos, which is an array of photo documents. I would like to add a new 'reviewed' flag to each of the photo documents and initialize it to false.
This is the query I am trying to use:
db.person.update({ "_id" : { $exists : true } }, {$set : {photos.reviewed : false} }, false, true)
However I get the following error:
SyntaxError: missing : after property id (shell):1
Is this possible, and if so, what am I doing wrong in my update?
Here is a full example of the 'person' document:
{
"_class" : "com.foo.Person",
"_id" : "2894",
"name" : "Pixel Spacebag",
"photos" : [
{
"_id" : null,
"thumbUrl" : "http://site.com/a_s.jpg",
"fullUrl" : "http://site.com/a.jpg"
},
{
"_id" : null,
"thumbUrl" : "http://site.com/b_s.jpg",
"fullUrl" : "http://site.com/b.jpg"
}]
}
Bonus karma for anyone who can tell me a cleaner why to update "all documents" without using the query { "_id" : { $exists : true } }
For those who are still looking for the answer it is possible with MongoDB 3.6 with the all positional operator $[] see the docs:
db.getCollection('person').update(
{},
{ $set: { "photos.$[].reviewed" : false } },
{ multi: true}
)
Is this possible, and if so, what am I doing wrong in my update?
No. In general MongoDB is only good at doing updates on top-level objects.
The exception here is the $ positional operator. From the docs: Use this to find an array member and then manipulate it.
However, in your case you want to modify all members in an array. So that is not what you need.
Bonus karma for anyone who can tell me a cleaner why to update "all documents"
Try db.coll.update(query, update, false, true), this will issue a "multi" update. That last true is what makes it a multi.
Is this possible,
You have two options here:
Write a for loop to perform the update. It will basically be a nested for loop, one to loop through the data, the other to loop through the sub-array. If you have a lot of data, you will want to write this is your driver of choice (and possibly multi-thread it).
Write your code to handle reviewed as nullable. Write the data such that if it comes across a photo with reviewed undefined then it must be false. Then you can set the field appropriately and commit it back to the DB.
Method #2 is something you should get used to. As your data grows and you add fields, it becomes difficult to "back-port" all of the old data. This is similar to the problem of issuing a schema change in SQL when you have 1B items in the DB.
Instead just make your code resistant against the null and learn to treat it as a default.
Again though, this is still not the solution you seek.
You can do this
(null, {$set : {"photos.reviewed" : false} }, false, true)
The first parameter is null : no specification = any item in the collection.
"photos.reviewed" should be declared as string to update subfield.
You can do like this:
db.person.update({}, $set:{name.surname:null}, false, true);
Old topic now, but this just worked fine with Mongo 3.0.6:
db.users.update({ _id: ObjectId("55e8969119cee85d216211fb") },
{ $set: {"settings.pieces": "merida"} })
In my case user entity looks like
{ _id: 32, name: "foo", ..., settings: { ..., pieces: "merida", ...} }

MongoDB MapReduce : use positional operator $ in map function

I have a collection with entries that look like that :
{"userid": 1, "contents": [ { "tag": "whatever", "value": 100 }, {"tag": "whatever2", "value": 110 } ] }
I'm performing a MapReduce on this collection with queries such as {"contents.tag": "whatever"}.
What I'd like to do in my map function is emiting the field "value" corresponding to the entry in the array "contents" that matched the query without having to iterate through the whole array. Under normal circumstances, I could do that using the $ positional operator with something like contents.$.value. But in the MapReduce case, it's not working.
To summarize, here is the code I have right now :`
map=function(){
emit(this.userid, WHAT DO I WRITE HERE TO EMIT THE VALUE I WANT ?);
}
reduce=function(key,values){
return values[0]; //this reduce function does not make sense, just for the example
}
res=db.runCommand(
{
"mapreduce": "collection",
"query": {'contents.tag':'whatever'},
"map": map,
"reduce": reduce,
"out": "test_mr"
}
);`
Any idea ?
Thanks !
This will not work without iterating over the whole array. In MongoDB a query is intended to match an entire document.
When dealing with Map / Reduce, the query is simply trimming the number of documents that are passed into the map function. However, the map function has no knowledge of the query that was run. The two are disconnected.
The source code around the M/R is here.
There is an upcoming aggregation feature that will more closely match this desire. But there's no timeline on this feature.
No way. I've had the same problem. The iterate is necessary.
You could do this:
map=function() {
for(var i in this.contents) {
if(this.contents[i].tag == "whatever") {
emit(this.userid, this.contents[i].value);
}
}
}