How to select custom data in mongo - mongodb

Is there a way how to include custom data in the mongo query response?
What I mean is a mongo alternative for something like this in MySQL code:
SELECT
value,
'7' AS min_value
FORM
my_table
WHERE
value >= 7
...while the 7 should probably be a variable in the language where the mongo query is being called from.

Try the $literal operator if using the aggregation framework with a $match pipeline step as your query filter. For example, create a sample collection in mongo shell that has 10 test documents with the value field as an increasing integer (0 to 9):
for(x=0;x<10;x++){ db.my_table.insert({value: x }) }
Running the following aggregation pipeline:
var base = 7;
db.my_table.aggregate([
{
"$match": {
"value": { "$gte": base }
}
},
{
"$project": {
"value": 1,
"min_value": { "$literal": base }
}
}
])
would produce the result:
/* 0 */
{
"result" : [
{
"_id" : ObjectId("561e2bcc3d8f561c1548d39b"),
"value" : 7,
"min_value" : 7
},
{
"_id" : ObjectId("561e2bcc3d8f561c1548d39c"),
"value" : 8,
"min_value" : 7
},
{
"_id" : ObjectId("561e2bcc3d8f561c1548d39d"),
"value" : 9,
"min_value" : 7
}
],
"ok" : 1
}

The only things in MongoDB query actions that actuallly "modify" the results returned other than the original document or "field selection" are the .aggregate() method or the JavaScript manipulation alternate in mapReduce.
Otherwise documents are returned "as is", or at least with just the selected fields or array entry specified.
So if you want something else returned from the server, then you need to use one of those methods:
var seven = 7;
db.collection.aggregate([
{ "$match": {
"value": { "$gt": seven }
}},
{ "$project": {
"value": 1,
"min_value": { "$literal": seven }
}}
])
Where the $literal operator comes into play, or in versions prior to 2.6 and greater or equal to 2.2 ( aggregation framework introduced ) can use $const instead:
var seven = 7;
db.collection.aggregate([
{ "$match": {
"value": { "$gt": seven }
}},
{ "$project": {
"value": 1,
"min_value": { "$const": seven }
}}
])
Or just use mapReduce and it's JavaScript translation:
var seven = 7;
db.mapReduce(
function() {
emit(this._id,{ "value": this.value, "min_value": seven });
},
function() {}, // no reduce at all since all _id unique
{
"out": { "inline": 1 },
"query": { "value": { "$gt": seven } },
"scope": { "seven": seven }
}
);
Those are basically your options.

Related

Most efficient way to change a string field value to its substring

I have a collection filled with documents that look like this:
{
data: 11,
version: "0.0.32"
}
and some have a test suffix to version:
{
data: 55,
version: "0.0.42-test"
}
The version field has different values but it always conforms to the pattern: 0.0.XXX. I would like to update all the documents to look like this:
{
data: 11,
version: 32
}
and the suffixed version (for test documents - version should be negative):
{
data: 55,
version: -42
}
The collection with these documents is used by our critical system, that needs to be turned off while updating the data - so I want the update/change to be as fast as possible. There are about 66_000_000 documents in this collection, and it's about 100GB in size.
Which type of mongodb operation would be the most efficient one?
The most efficient way to do this is in the upcoming release of MongoDB as of this writing using the $split operator to split our string as shown here then assign the last element in the array to a variable using the $let variable operator and the $arrayElemAt operators.
Next, we use the $switch operator to perform a logical condition processing or case statement against that variable.
The condition here is $gt which returns true if the value contains "test", and in which case in the in expression we split that string and simply return the $concatenated value of the first element in the newly computed array and the -. If the condition evaluates to false, we just return the variable.
Of course in our case statement, we use the $indexOfCP which returns -1 if there were no occurrences of "test".
let cursor = db.collection.aggregate(
[
{ "$project": {
"data": 1,
"version": {
"$let": {
"vars": {
"v": {
"$arrayElemAt": [
{ "$split": [ "$version", "." ] },
-1
]
}
},
"in": {
"$switch": {
"branches": [
{
"case": {
"$gt": [
{ "$indexOfCP": [ "$$v", "test" ] },
-1
]
},
"then": {
"$concat": [
"-",
"",
{ "$arrayElemAt": [
{ "$split": [ "$$v", "-" ] },
0
]}
]
}
}
],
"default": "$$v"
}
}
}
}
}}
]
)
The aggregation query produces something like this:
{ "_id" : ObjectId("57a98773cbbd42a2156260d8"), "data" : 11, "version" : "32" }
{ "_id" : ObjectId("57a98773cbbd42a2156260d9"), "data" : 55, "version" : "-42" }
As you can see, the "version" field data are string. If the data type for that field does not matter then, you can simply use the $out aggregation pipeline stage operator to write the result into a new collection or replace your collection.
{ "out": "collection" }
If you need to convert your data to floating point number then, the only way to do this, simply because MongoDB doesn't not provides a way to do type conversion out of the box except for integer to string, is to iterate the aggregation Cursor object and convert your value using parseFloat or Number then update your documents using the $set operator and the bulkWrite() method for maximum efficiency.
let requests = [];
cursor.forEach(doc => {
requests.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"data": doc.data,
"version": parseFloat(doc.version)
},
"$unset": { "person": " " }
}
}
});
if ( requests.length === 1000 ) {
// Execute per 1000 ops and re-init
db.collection.bulkWrite(requests);
requests = [];
}}
);
// Clean up queues
if(requests.length > 0) {
db.coll.bulkWrite(requests);
}
While the aggregation query will perfectly work in MongoDB 3.4 or newer our best bet from MongoDB 3.2 backwards is mapReduce with the bulkWrite() method.
var results = db.collection.mapReduce(
function() {
var v = this.version.split(".")[2];
emit(this._id, v.indexOf("-") > -1 ? "-"+v.replace(/\D+/g, '') : v)
},
function(key, value) {},
{ "out": { "inline": 1 } }
)["results"];
results looks like this:
[
{
"_id" : ObjectId("57a98773cbbd42a2156260d8"),
"value" : "32"
},
{
"_id" : ObjectId("57a98773cbbd42a2156260d9"),
"value" : "-42"
}
]
From here you use the previous .forEach loop to update your documents.
From MongoDB 2.6 to 3.0 you will need to use the now deprecated Bulk() API and it associated method as show in my answer here.

MongoDB advanced aggregation

I'm a total newbie to MongoDB. I work on a privat project for my golf club to analyze the round.
I use meteorJS for the Application and tried some aggregation on the command line. But I'm not sure if I even have the right point to the task
A sample document:
{
"_id" : "2KasYR3ytsaX8YuoT",
"course" : {
"id" : "rHmYJBhRtSt38m68s",
"name" : "CourseXYZ"
},
"player" : {
"id" : "tdaYaSvXJueDq4oTN",
"firstname" : "Just",
"lastname" : "aPlayer"
},
"event" : "Training Day",
"tees" : [
{
"tee" : 1,
"par" : 4,
"fairway" : "straight",
"greenInRegulation" : true,
"putts" : 3,
"strokes" : 5
},
{
"tee" : 2,
"par" : 5,
"fairway" : "right",
"greenInRegulation" : true,
"putts" : 2,
"strokes" : 5
},
{
"tee" : 3,
"par" : 5,
"fairway" : "right",
"greenInRegulation" : false,
"shotType": "bunker",
"putts" : 2,
"strokes" : 5
}
]
}
My attempt so far:
db.analysis.aggregate([
{$unwind: "$tees"},
{$group: {
_id:"$player.id",
strokes: {$sum: "$tees.strokes"},
par: {$sum: "$tees.par"},
putts: {$sum: "$tees.putts"},
teesPlayed: {$sum:1}
}}
])
And what I want for a result
{
"_id" : "tdaYaSvXJueDq4oTN",
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3
// here comes what I want to add:
"fairway.straight": 1 // where tees.fairway equals "straight"
"fairway.right": 2 // where tees.fraiway equals "right" (etc.)
"shotType.bunker": 1 // where shotType equals "bunker" etc.
}
There are a few ways of approaching this depending on your overall needs and which MongoDB server version you have available as a target for your project.
Whilst "meteor" installations and default project setups do not "bundle" a MongoDB 3.2 instance, there is no need why your project cannot use such an instance as an external target. If it's a new project to get off the ground, then I would highly recommend working against the latest version available. And maybe even possibly against latest development releases, depending on your own targeted release cycle. Work with what is most fresh, and your application will be too.
For that reason, we start with the latest at the top of the list.
MongoDB 3.2 way - Fast
The big feature in MongoDB 3.2 that makes it really stand out here in terms of performance is a change in how $sum operates. Previously just as an accumulator operator for $group this would work on singular numeric values to produce a total.
The big improvement is hidden within the $project stage usage which is added, where $sum can be directly applied to an array of values. i.e { "$sum": [1,2,3] } results in 6. So now you can "nest" the operations with anything that produces an array of values from a source. Most notably here is $map:
db.analysis.aggregate([
{ "$group": {
"_id": "$player.id",
"strokes": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.strokes"
}
}
}
},
"par": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.par"
}
}
}
},
"putts": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.putts"
}
}
}
},
"teesPlayed": { "$sum": { "$size": "$tees" } },
"shotsRight": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.fairway", "right" ] }
}
}
}
},
"shotsStraight": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.fairway", "straight" ] }
}
}
}
},
"bunkerShot": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.shotType", "bunker" ] }
}
}
}
}
}}
])
So here each field is split out by either doing the double $sum trick on the single field values from the array items, or in contrast the arrays are being processed with $filter to just restrict to matching items and processed for lenght of matches with $size, for the result fields that rather want "counts".
Though this looks long winded in pipeline construction it will yield the fasted results. And though you need to specify all of the keys to result with the associated logic, there is nothing stopping "generation" of the data structure necessary for the pipeline as the result of other queries on the data set.
The other Aggregate Way - A bit slower
Of course not every project can practically use the latest version of things. So before a MongoDB 3.2 release that introduced some of the operators used above, the only real practical way to work with array data and conditionally work with different elements and sums was to process first with $unwind.
So essentially we start with the query you began to construct, but then add in the handling for the different fields:
db.analysis.aggregate([
{ "$unwind": "$tees" },
{ "$group": {
"_id": "$player.id",
"strokes": { "$sum": "$tees.strokes" },
"par": { "$sum": "$tees.par" },
"putts": { "$sum": "$tees.putts" },
"teedsPlayed": { "$sum": 1 },
"shotsRight": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.fairway", "right" ] },
1,
0
]
}
},
"shotsStraight": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.fairway", "straight" ] },
1,
0
]
}
},
"bunkerShot": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.shotType", "bunker" ] },
1,
0
]
}
}
}}
])
So you should note that there is still "some" similarity to the first listing, in that where the $filter statements all have some logic within there "cond" argument, that logic is rather transposed to the $cond operator here.
As a "ternary" operator ( if/then/else) , it's job it is to evaluate a logical condition (if) and either return the next argument where that condition was true (then) or otherwise return the last argument where it is false (else). In this case either 1 or 0 depending on whether the tested condition matched. This gives the "counts" to $sum as is required.
In either statement, the produced results come out like this:
{
"_id" : "tdaYaSvXJueDq4oTN",
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3,
"shotsRight" : 2,
"shotsStraight" : 1,
"bunkerShot" : 1
}
Since this is an aggregate statement with $group, then one rule is that the "keys" ( apart from needing to be specified in the constructed statement ) must be in the "top-level" of the structure. So no "nested" structures are allowed within a $group, hence the whole names for each key.
If you really must transform, then you can by adding a $project stage following the $group in each example:
{ "$project": {
"strokes": 1,
"par": 1,
"putts": 1,
"teesPlayed": 1,
"fairway": {
"straight": "$shotsStraight",
"right": "$shotsRight"
},
"shotType": {
"bunker": "$bunkerShot"
}
}}
So a bit of "re-shaping" can be done, but of course all the names and structure must be specified, though again you could in theory just generate this all in code. It is just a data structure after all.
The bottom line here is that $unwind adds cost, and quite a lot of cost. It is basically going to add a copy of each document in the pipeline for processing "per" every array element contained in each document. So not is there only the cost of processing all of those produced things, but also a cost of "producing" them in the first place.
MapReduce - Slower still, but more flexible on the keys
And finally as an approach
db.analysis.mapReduce(
function() {
var data = { "strokes": 0 ,"par": 0, "putts": 0, "teesPlayed": 0, "fairway": {} };
this.tees.forEach(function(tee) {
// Increment common values
data.strokes += tee.strokes;
data.par += tee.par;
data.putts += tee.putts;
data.teesPlayed++;
// Do dynamic keys
if (!data.fairway.hasOwnProperty(tee.fairway))
data.fairway[tee.fairway] = 0;
data.fairway[tee.fairway]++;
if (tee.hasOwnProperty('shotType')) {
if (!data.hasOwnProperty('shotType'))
data.shotType = {};
if (!data.shotType.hasOwnProperty(tee.shotType))
data.shotType[tee.shotType] = 0;
data.shotType[tee.shotType]++
}
});
emit(this.player.id,data);
},
function(key,values) {
var data = { "strokes": 0 ,"par": 0, "putts": 0, "teesPlayed": 0, "fairway": {} };
values.forEach(function(value) {
// Common keys
data.strokes += value.strokes;
data.par += value.par;
data.putts += value.putts;
data.teesPlayed += value.teesPlayed;
Object.keys(value.fairway).forEach(function(fairway) {
if (!data.fairway.hasOwnProperty(fairway))
data.fairway[fairway] = 0;
data.fairway[fairway] += value.fairway[fairway];
});
if (value.hasOwnProperty('shotType')) {
if (!data.hasOwnProperty('shotType'))
data.shotType = {};
Object.keys(value.shotType).forEach(function(shotType) {
if (!data.shotType.hasOwnProperty(shotType))
data.shotType[shotType] = 0;
data.shotType[shotType] += value.shotType[shotType];
});
}
});
return data;
},
{ "out": { "inline": 1 } }
)
And the output from this can be done immediately with the nested structure, but of course in the very mapReduce output form of "key/value", being that "key" is the grouping _id and "value" contains all the output:
{
"_id" : "tdaYaSvXJueDq4oTN",
"value" : {
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3,
"fairway" : {
"straight" : 1,
"right" : 2
},
"shotType" : {
"bunker" : 1
}
}
}
The "out" options for mapReduce are either the "inline" as shown here where you can fit all the result in memory ( and within the 16MB BSON limit ), or alternately to another collection from which you can read later. There is a similar $out for .aggregate(), but this is generally negated by aggregation output being available as a "cursor", unless of course you really want it in a collection instead.
Concluding
So it all depends on how you really want to approach this. If speed is of the upmost importance then .aggregate() is generally going to yield the fastest results. On the other hand if you want to work "dynamically" with the produced "keys" then mapReduce allows the logic to be generally self contained, without the need for another inspection pass to generate the required aggregation pipeline statement.
I am not clear how to do that through aggregation, however, there is one work around in this way
> db.collection.find({}).forEach(function(doc) {
var ret = {};
ret._id = doc._id;
doc.tees.forEach(function(obj) {
for (var k in obj) {
var type = typeof obj[k];
if (type === 'number') {
if (ret.hasOwnProperty(k)) {
ret[k] += obj[k];
} else {
ret[k] = obj[k];
}
} else if (type === 'string') {
if (ret.hasOwnProperty(k+'.'+obj[k])) {
ret[k+'.'+obj[k]] += 1;
} else {
ret[k+'.'+obj[k]] = 1;
}
}
}
});
printjson(ret);
});

MongoDB lists - get every Nth item

I have a Mongodb schema that looks roughly like:
[
{
"name" : "name1",
"instances" : [
{
"value" : 1,
"date" : ISODate("2015-03-04T00:00:00.000Z")
},
{
"value" : 2,
"date" : ISODate("2015-04-01T00:00:00.000Z")
},
{
"value" : 2.5,
"date" : ISODate("2015-03-05T00:00:00.000Z")
},
...
]
},
{
"name" : "name2",
"instances" : [
...
]
}
]
where the number of instances for each element can be quite big.
I sometimes want to get only a sample of the data, that is, get every 3rd instance, or every 10th instance... you get the picture.
I can achieve this goal by getting all instances and filtering them in my server code, but I was wondering if there's a way to do it by using some aggregation query.
Any ideas?
Updated
Assuming the data structure was flat as #SylvainLeroux suggested below, that is:
[
{"name": "name1", "value": 1, "date": ISODate("2015-03-04T00:00:00.000Z")},
{"name": "name2", "value": 5, "date": ISODate("2015-04-04T00:00:00.000Z")},
{"name": "name1", "value": 2, "date": ISODate("2015-04-01T00:00:00.000Z")},
{"name": "name1", "value": 2.5, "date": ISODate("2015-03-05T00:00:00.000Z")},
...
]
will the task of getting every Nth item (of specific name) be easier?
It seems that your question clearly asked "get every nth instance" which does seem like a pretty clear question.
Query operations like .find() can really only return the document "as is" with the exception of general field "selection" in projection and operators such as the positional $ match operator or $elemMatch that allow a singular matched array element.
Of course there is $slice, but that just allows a "range selection" on the array, so again does not apply.
The "only" things that can modify a result on the server are .aggregate() and .mapReduce(). The former does not "play very well" with "slicing" arrays in any way, at least not by "n" elements. However since the "function()" arguments of mapReduce are JavaScript based logic, then you have a little more room to play with.
For analytical processes, and for analytical purposes "only" then just filter the array contents via mapReduce using .filter():
db.collection.mapReduce(
function() {
var id = this._id;
delete this._id;
// filter the content of "instances" to every 3rd item only
this.instances = this.instances.filter(function(el,idx) {
return ((idx+1) % 3) == 0;
});
emit(id,this);
},
function() {},
{ "out": { "inline": 1 } } // or output to collection as required
)
It's really just a "JavaScript runner" at this point, but if this is just for anaylsis/testing then there is nothing generally wrong with the concept. Of course the output is not "exactly" how your document is structured, but it's as near a facsimile as mapReduce can get.
The other suggestion I see here requires creating a new collection with all the items "denormalized" and inserting the "index" from the array as part of the unqique _id key. That may produce something you can query directly, bu for the "every nth item" you would still have to do:
db.resultCollection.find({
"_id.index": { "$in": [2,5,8,11,14] } // and so on ....
})
So work out and provide the index value of "every nth item" in order to get "every nth item". So that doesn't really seem to solve the problem that was asked.
If the output form seemed more desirable for your "testing" purposes, then a better subsequent query on those results would be using the aggregation pipeline, with $redact
db.newCollection([
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$mod": [ { "$add": [ "$_id.index", 1] }, 3 ] },
0 ]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
That at least uses a "logical condition" much the same as what was applied with .filter() before to just select the "nth index" items without listing all possible index values as a query argument.
No $unwind is needed here. You can use $push with $arrayElemAt to project the array value at requested index inside $group aggregation.
Something like
db.colname.aggregate(
[
{"$group":{
"_id":null,
"valuesatNthindex":{"$push":{"$arrayElemAt":["$instances",N]}
}}
},
{"$project":{"valuesatNthindex":1}}
])
You might like this approach using the $lookup aggregation. And probably the most convenient and fastest way without any aggregation trick.
Create a collection Names with the following schema
[
{ "_id": 1, "name": "name1" },
{ "_id": 2, "name": "name2" }
]
and then Instances collection having the parent id as "nameId"
[
{ "nameId": 1, "value" : 1, "date" : ISODate("2015-03-04T00:00:00.000Z") },
{ "nameId": 1, "value" : 2, "date" : ISODate("2015-04-01T00:00:00.000Z") },
{ "nameId": 1, "value" : 3, "date" : ISODate("2015-03-05T00:00:00.000Z") },
{ "nameId": 2, "value" : 7, "date" : ISODate("2015-03-04T00:00:00.000Z") },
{ "nameId": 2, "value" : 8, "date" : ISODate("2015-04-01T00:00:00.000Z") },
{ "nameId": 2, "value" : 4, "date" : ISODate("2015-03-05T00:00:00.000Z") }
]
Now with $lookup aggregation 3.6 syntax you can use $sample inside the $lookup pipeline to get the every Nth element randomly.
db.Names.aggregate([
{ "$lookup": {
"from": Instances.collection.name,
"let": { "nameId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$nameId", "$$nameId"] }}},
{ "$sample": { "size": N }}
],
"as": "instances"
}}
])
You can test it here
Unfortunately, with the aggregation framework it's not possible as this would require an option with $unwind to emit an array index/position, of which currently aggregation can't handle. There is an open JIRA ticket for this here SERVER-4588.
However, a workaround would be to use MapReduce but this comes at a huge performance cost since the actual calculations of getting the array index are performed using the embedded JavaScript engine (which is slow), and there still is a single global JavaScript lock, which only allows a single JavaScript thread to run at a single time.
With mapReduce, you could try something like this:
Mapping function:
var map = function(){
for(var i=0; i < this.instances.length; i++){
emit(
{ "_id": this._id, "index": i },
{ "index": i, "value": this.instances[i] }
);
}
};
Reduce function:
var reduce = function(){}
You can then run the following mapReduce function on your collection:
db.collection.mapReduce( map, reduce, { out : "resultCollection" } );
And then you can query the result collection to geta list/array of every Nth item of the instance array by using the map() cursor method :
var thirdInstances = db.resultCollection.find({"_id.index": N})
.map(function(doc){return doc.value.value})
You can use below aggregation:
db.col.aggregate([
{
$project: {
instances: {
$map: {
input: { $range: [ 0, { $size: "$instances" }, N ] },
as: "index",
in: { $arrayElemAt: [ "$instances", "$$index" ] }
}
}
}
}
])
$range generates a list of indexes. Third parameter represents non-zero step. For N = 2 it will be [0,2,4,6...], for N = 3 it will return [0,3,6,9...] and so on. Then you can use $map to get correspinding items from instances array.
Or with just a find block:
db.Collection.find({}).then(function(data) {
var ret = [];
for (var i = 0, len = data.length; i < len; i++) {
if (i % 3 === 0 ) {
ret.push(data[i]);
}
}
return ret;
});
Returns a promise whose then() you can invoke to fetch the Nth modulo'ed data.

Sum of Substrings in mongodb

We have field(s) in mongodb which has numbers in string form, values such as "$123,00,89.00" or "1234$" etc
Is it possible to customize $sum accumulators in mongodb, so that, certain processing can be done at each field value while the sum is performed. Such as substring or reg-ex processing etc.
The .mapReduce() method is what you need here. You cannot "cast" values in the aggregation framework from one "type" to another ( with the exception of "to string" or from Date to numeric ).
The JavaScript processing means that you can convert a string into a value for "summing". Somthing like this ( with a bit more work on a "safe" regex for the required "currency" values:
db.collection.mapReduce(
function() {
emit(null, this.amount.replace(/\$|,|\./g,"") / 100 );
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
Or with .group() which also uses JavaScript procesing, but is a bit more restrcitive in it's requirements:
db.collection.group({
"key": null,
"reduce": function( curr,result ) {
result.total += curr.amount.replace(/\$|,|\./g,"") /100;
},
"initial": { "total": 0 }
});
So JavaScript processing is your only option as these sorts of operations are not supported in the aggregatation framework.
A number can be a string:
db.junk.aggregate([{ "$project": { "a": { "$substr": [ 1,0,1 ] } } }])
{ "_id" : ObjectId("55a458c567446a4351c804e5"), "a" : "1" }
And a Date can become a number:
db.junk.aggregate([{ "$project": { "a": { "$subtract": [ new Date(), new Date(0) ] } } }])
{ "_id" : ObjectId("55a458c567446a4351c804e5"), "a" : NumberLong("1436835669446") }
But there are no other operators to "cast" a "string" to "numeric" or even anthing to do a Regex replace as shown above.
If you want to use .aggregate() then you need to fix your data into a format that will support it, thus "numeric":
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "amount": /\$|,\./g }).forEach(function(doc) {
doc.amount = doc.amount.replace(/\$|,|\./g,"") /100;
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "amount": doc.amount }
});
count++;
// execute once in 1000 operations
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
});
// clean up queued operations
if ( count % 1000 != 0 )
bulk.execute();
Then you can use .aggregate() on your "numeric" data:
db.collection.aggregate([
{ "$group": { "_id": null, "total": { "$sum": "$amount" } } }
])

Mongodb: find documents with array field that contains more than one SAME specified value

There is three documents in collection test:
// document 1
{
"id": 1,
"score": [3,2,5,4,5]
}
// document 2
{
"id": 2,
"score": [5,5]
}
// document 3
{
"id": 3,
"score": [5,3,3]
}
I want to fetch documents that score field contains [5,5].
query:
db.test.find( {"score": {"$all": [5,5]}} )
will return document 1, 2 and 3, but I only want to fetch document 1 and 2.
How can I do this?
After reading your problem I personally think mongodb not supported yet this kind of query. If any one knows about how to find this using mongo query they defiantly post answers here.
But I think this will possible using mongo forEach method, so below code will match your criteria
db.collectionName.find().forEach(function(myDoc) {
var scoreCounts = {};
var arr = myDoc.score;
for (var i = 0; i < arr.length; i++) {
var num = arr[i];
scoreCounts[num] = scoreCounts[num] ? scoreCounts[num] + 1 : 1;
}
if (scoreCounts[5] >= 2) { //scoreCounts[5] this find occurrence of 5
printjsononeline(myDoc);
}
});
Changed in version 2.6.
The $all is equivalent to an $and operation of the specified values; i.e. the following statement:
{ tags: { $all: [ "ssl" , "security" ] } }
is equivalent to:
{ $and: [ { tags: "ssl" }, { tags: "security" } ] }
I think you need to pass in a nested array -
So try
db.test.find( {"score": {"$all": [[5,5]]}} )
Source
Changed in version 2.6.
When passed an array of a nested array (e.g. [ [ "A" ] ] ), $all can now match documents where the field contains the nested array as an element (e.g. field: [ [ "A" ], ... ]), or the field equals the nested array (e.g. field: [ "A" ]).
http://docs.mongodb.org/manual/reference/operator/query/all/
You can do it with an aggregation. The first step can use an index on { "score" : 1 } but the rest is hard work.
db.test.aggregate([
{ "$match" : { "score" : 5 } },
{ "$unwind" : "$score" },
{ "$match" : { "score" : 5 } },
{ "$group" : { "_id" : "$_id", "sz" : { "$sum" : 1 } } }, // use $first here to include other fields in the results
{ "$match" : { "sz" : { "$gte" : 2 } } }
])