Count unique keys - mongo aggregation pipeline - mongodb

I have a mongo collection of searches. Each search has a criteria object, which can have any combination of criteria. So something like:
{
"_id": 1,
"criteria": {
"state": ["NY", "IL"]
...
},
...
}
I'm building a mongo aggregation pipeline, and I'm wondering how to project only the keys so that I can count them.
So far the first step of my pipeline is:
db.userSearch.aggregate([
{ "$project": { "criteria":1 } },
...
])
This returns all of the criteria objects correctly, now I need to project the keys somehow. Does anyone have any ideas?
Edit:
desired output: {"state":20, "balance":5, "geolocation":10, ...}

In case anyone was wondering, I used mapReduce as follows.
map = function() {
Object.keys(this.criteria).forEach(function(k) {
emit(k, 1)
})
}
reduce = function(k, vals) {
return Array.sum(vals)
}
db.userSearch.mapReduce(map, reduce, 'out')

Related

Mongoose: search for ObjectID by Array

I want to filter my collection by aggregation for one of many ObjectIDs.
Because of some DocumentDB restrictions I can not build a single pipeline with uncorrelated subqueries. So my fix is to do it in two queries.
for example: I have an aggregation that returns all teamIds, for some conditions as an array of Object with the IDs.
[{_id: ObjectID("abcdef")}, {_id: ObjectID("ghijkl")}, {_id: ObjectID("vwxyz")}, ...]
I now want to have a second aggregation filter another collection using the ObjectIDs.
This would work in Mongo Compass:
{
"team": {
"$in": [ObjectId("60aabcb05c7462f42b3d7zyx"), ObjectId("60aabc7b05c7462f42b3dxyz")]
},
....
}
My issue is that i can not find the correct syntax for JS to generate such a pipeline.
What ever I try, JS always converts my Array of ObjectIDs to something like this:
{
"team": {
"$in": [{
"_id": "60aabcb05c7462f42b3d7zyx"
},{
"_id": "60aabc7b05c7462f42b3dxyz"
}]
},
I fixed it like this. I am not 100% why this syntax works because it is still just an array of objects, formatted like before, but I guess there is some stuff mongoose does, that is opaque to me.
let teams = await TeamMgmt.getTeamsAggregatedByFilter( teamFilter )
// make an array of ObjectIds so we can filter for them.
let idArray = []
Object.keys( teams ).map( function ( key, index ) {
idArray.push( new mongoose.Types.ObjectId( teams[ index ]._id.toString() ) )
} );
const shiftFilter = [
{
'$match': {
'team': {
"$in": idArray
},
....
}

mongodb query: nested elemMatch

Currently, that's my current document:
{
_id: 'd283015f-91e9-4404-9202-093c28d6a931',
referencedGeneralPractitioner: [
{
resourceType: 'practitioner',
cachedIdentifier: [
{
system: { value: 'urn:oid:1.3.6.1.4.1.19126.3' },
value: { value: '14277399B' }
}
]
}
]
}
Here, there's two nested objects arrays: referencedGeneralPractitioner[{cachedIdentifier[{}]}].
Currently, I'm getting results using this query:
{
"referencedGeneralPractitioner":{
"$elemMatch":{
"cachedIdentifier.value.value":"14277399B",
"cachedIdentifier.system.value":"urn:oid:1.3.6.1.4.1.19126.3"
}
}
}
It's getting my desired document, but I don't quite figure out if above query is which I'm really looking for.
I mean, I'm only applying $elemMatch on referencedGeneralPractitioner field array.
Is it really enought?
Should I add a nested $elemMatch on cachedIdentifier?
Any ideas?
It looks like you need to query it like this:
db.collection.find({
"referencedGeneralPractitioner.cachedIdentifier": {
"$elemMatch": {
"value.value": "14277399B",
"system.value": "urn:oid:1.3.6.1.4.1.19126.3"
}
}
})
playground
This is in case you need to find the full document having $and of both values in same element in any of the elements in the nested array , if you need to extract specific element you will need to $filter
if you need to search also based on element in the 1st array level then you need to modify as follow:
{
"referencedGeneralPractitioner": {
"$elemMatch": {
resourceType: 'practitioner',
"cachedIdentifier": {
"$elemMatch": {
"value.value": 1,
"system.value":2
}
}
}
}
}
This will give you all full documents where at same time there is resouceType:"practitioner" and { value.value:3 and system.value: 2 }
Also is important to stress that this will not gona work correctly!:
{
"referencedGeneralPractitioner":{
"$elemMatch":{
"cachedIdentifier.value.value":"14277399B",
"cachedIdentifier.system.value":"urn:oid:1.3.6.1.4.1.19126.3"
}
}
}
Since it will match false positives based on any single value in the nested elements like:
wrong playground

Need MongoDB Query

I'm sorry but I'm little confuse with a query , Kindly help. suppose we've one document that contains
{
"_id":100,
"name":"Demarcus Audette",
"scores":[
{
"score":47.42608580155614,
"type":"exam"
},
{
"score":44.83416623719906,
"type":"quiz"
},
{
"score":39.01726616178844,
"type":"homework"
},
"score":89.01726616178844,
"type":"homework"
}
]
}
And I want to write a query that should return only rows which contains homework in that , that means the out put should be like below
{
"_id":100,
"name":"Demarcus Audette",
"scores":[
{
"score":39.01726616178844,
"type":"homework"
},
"score":89.01726616178844,
"type":"homework"
}
]
}
Kindly suggest. Thanks in Advance
Use the $elemMatch operator.
db.collection.find({ "scores": { $elemMatch: { "type": "homework" } } } );
EDIT
What you are asking is not possible. You will need the above query and filter out the rest in whatever language you are programming. You can also use an aggregate function using $unwind and $match.
db.collection.aggregate(
{$unwind: "$messages"},
{$match: {"scores.type": "homework"}}
);
$unwind flattens your array and $match is your actually query which will return matching documents. Please note that $unwind will create a different document for each element in your array. This means you will get two results when you filter on 'homework' according to your example.

Sorting by relevance with MongoDB

I have a collection of documents in the following form:
{ _id: ObjectId(...)
, title: "foo"
, tags: ["bar", "baz", "qux"]
}
The query should find all documents with any of these tags. I currently use this query:
{ "tags": { "$in": ["bar", "hello"] } }
And it works; all documents tagged "bar" or "hello" are returned.
However, I want to sort by relevance, i.e. the more matching tags the earlier the document should occur in the result. For example, a document tagged ["bar", "hello", "baz"] should be higher in the results than a document tagged ["bar", "baz", "boo"] for the query ["bar", "hello"]. How can I achieve this?
MapReduce and doing it client-side is going to be too slow - you should use the aggregation framework (new in MongoDB 2.2).
It might look something like this:
db.collection.aggregate([
{ $match : { "tags": { "$in": ["bar", "hello"] } } },
{ $unwind : "$tags" },
{ $match : { "tags": { "$in": ["bar", "hello"] } } },
{ $group : { _id: "$title", numRelTags: { $sum:1 } } },
{ $sort : { numRelTags : -1 } }
// optionally
, { $limit : 10 }
])
Note the first and third pipeline members look identical, this is intentional and needed. Here is what the steps do:
pass on only documents which have tag "bar" or "hello" in them.
unwind the tags array (meaning split into one document per tags element
pass on only tags exactly "bar" or "hello" (i.e. discard the rest of the tags)
group by title (it could be also by "$_id" or any other combination of original document
adding up how many tags (of "bar" and "hello") it had
sort in descending order by number of relevant tags
(optionally) limit the returned set to top 10.
You could potentially use MapReduce for something like that. You'd process each document in the Map step, figuring out how many tags match the query, and assign a score. Then you could sort based on that score.
http://www.mongodb.org/display/DOCS/MapReduce
Something that complex should be done after querying. Either server-side through db.eval (if your client supports this) or just clientside. Here's an example for what you're looking for.
It will retreive all posts with the tags you specified, then sorts them according to the amount of matches.
remove the db.eva( part and translate it to the language your client uses to query to get the clientside effect (
db.eval(function () {
var tags = ["a","b","c"];
return db.posts.find({tags:{$in:tags}}).toArray().sort(function(a,b){
var matches_a = 0;
var matches_b = 0;
a.tags.forEach(function (tag) {
for (t in tags) {
if (tag == t) {
matches_a++;
} else {
matches_b++;
}
}
});
b.tags.forEach(function(tag) {
for (t in tags) {
if (tag == t) {
matches_b++;
} else {
matches_a++;
}
}
});
return matches_a - matches_b;
});
});

How to count document elements inside a mongo collection with php?

I have the following structure of a mongo document:
{
"_id": ObjectId("4fba2558a0787e53320027eb"),
"replies": {
"0": {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
}
"1": {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
}
"2" ....
}
}
How do I count all the replies from all the documents in the collection?
Thank you!
In the following answer, I'm working with a simple data set with five replies across the collection:
> db.foo.find()
{ "_id" : ObjectId("4fba6b0c7c32e336fc6fd7d2"), "replies" : [ 1, 2, 3 ] }
{ "_id" : ObjectId("4fba6b157c32e336fc6fd7d3"), "replies" : [ 1, 2 ] }
Since we're not simply counting documents, db.collection.count() won't help us here. We'll need to resort to MapReduce to scan each document and aggregate the reply array lengths. Consider the following:
db.foo.mapReduce(
function() { emit('totalReplies', { count: this.replies.length }); },
function(key, values) {
var result = { count: 0 };
values.forEach(function(value) {
result.count += value.count;
});
return result;
},
{ out: { inline: 1 }}
);
The map function (first argument) runs across the entire collection and emits the number of replies in each document under a constant key. Mongo will then consider all emitted values and run the reduce function (second argument) a number of times to consolidate (literally reduce) the result. Hopefully the code here is straightforward. If you're new to map/reduce, one caveat is that the reduce method must be capable of processing its own output. This is explained in detail in the MapReduce docs linked above.
Note: if your collection is quite large, you may have to use another output mode (e.g. collection output); however, inline works well for small data sets.
Lastly, if you're using MongoDB 2.1+, we can take advantage of the Aggregation Framework to avoid writing JS functions and make this even easier:
db.foo.aggregate(
{ $project: { replies: 1 }},
{ $unwind: "$replies" },
{ $group: {
_id: "result",
totalReplies: { $sum: 1 }
}}
);
Three things are happening here. First, we tell Mongo that we're interested in the replies field. Secondly, we want to unwind the array so that we can iterate over all elements across the fields in our projection. Lastly, we'll tally up results under a "result" bucket (any constant will do), adding 1 to the totalReplies result for each iteration. Executing this query will yield the following result:
{
"result" : [{
"_id" : "result",
"totalReplies" : 5
}],
"ok" : 1
}
Although I wrote the above answers with respect to the Mongo client, you should have no trouble translating them to PHP. You'll need to use MongoDB::command() to run either MapReduce or aggregation queries, as the PHP driver currently has no helper methods for either. There's currently a MapReduce example in the PHP docs, and you can reference this Google group post for executing an aggregation query through the same method.
I haven't checked your code, might work as well. I've did the following and it just works:
$replies = $db->command(
array(
"distinct" => "foo",
"key" => "replies"
)
);
$all = count($replies['values']);
I've did it again using the group command of the PHP Mongo Driver. It's similar to a MapReduce command.
$keys = array("replies.type" => 1); //keys for group by
$initial = array("count" => 0); //initial value of the counter
$reduce = "function (obj, prev) { prev.count += obj.replies.length; }";
$condition = array('replies' => array('$exists' => true), 'replies.type' => 'follow');
$g = $db->foo->group($keys, $initial, $reduce, $condition);
echo $g['count'];
Thanks jmikola for giving links to Mongo.
JSON should be
{
"_id": ObjectId("4fba2558a0787e53320027eb"),
"replies":[
{
0: {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
},
1: {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
},
2: {....}
]
}