We have a problem wherein certain strings appear as 123, 00123, 000123. We need to group by this field and we would like all the above to be considered as one group. I know the length of these values cannot be greater than 6.
The approach I was thinking was to left pad all of these fields in projection with 0s to a length of 6. One way would be to concat 6 0s first and then do a substr - but there is no length available for me to calculate the indexes for the substr method. -JIRA
Is there something more direct? Couldn't find anything here : https://docs.mongodb.org/manual/meta/aggregation-quick-reference/#aggregation-expressions or has anyone solved this some way?
I would convert then to int. E.g.:
For collection:
db.leftpad.insert([
{key:"123"},
{key:"0123"},
{key:"234"},
{key:"000123"}
])
counting:
db.leftpad.mapReduce(function(){
emit(this.key * 1, 1);
}, function(key, count) {
return Array.sum(count);
}, {out: { inline: 1 }}
).results
returns an array:
[
{_id : 123, value : 3},
{_id : 234, value : 1}
]
If you can, it may worth to reduce it once:
db.leftpad.find({key:{$exists:true}, intKey:{$exists:false}}).forEach(function(d){
db.leftpad.update({_id:d._id}, {$set:{intKey: d.key * 1}});
})
And then group by intKey.
Related
I think I have a pretty complex one here - not sure if I can do this or not.
I have data that has an address and a data field. The data field is a hex value. I would like to run an aggregation that groups the data by address and then the length of the hex data. All of the data will come in as 16 characters long, but the length of that data should calculated in bytes.
I think I have to take the data, strip the trailing 00's (using regex 00+$), and divide that number by 2 to get the length. After that, I would have to then group by address and final byte length.
An example dataset would be:
{addr:829, data:'4100004822000000'}
{addr:829, data:'4100004813000000'}
{addr:829, data:'4100004804000000'}
{addr:506, data:'0000108000000005'}
{addr:506, data:'0000108000000032'}
{addr:229, data:'0065005500000000'}
And my desired output would be:
{addr:829, length:5}
{addr:506, length:8}
{addr:229, length:4}
Is this even possible in an aggregation query w/o having to use external code to do?
This is not too complicated if your "data" is in fact strings as you show in your sample data. Assuming data exists and is set to something (you can add error checking as needed) you can get the result you want like this:
db.coll.aggregate([
{$addFields:{lastNonZero:{$add:[2,{$reduce:{
initialValue:-2,
input:{$range:[0,{$strLenCP:"$data"},2]},
in:{$cond:{
if: {$eq:["00",{$substr:["$data","$$this",2]}]},
then: "$$value",
else: "$$this"
}}
}}]}}},
{$group:{_id:{
addr:"$addr",
length:{$divide:["$lastNonZero",2]}
}}}
])
I used two stages but of course they could be combined into a single $group if you wish. Here in $reduce I step through data 2 characters at a time, checking if they are equal to "00". Every time they are not I update the value to where I am in the sequence. Since that returns the position of the last non-"00" characters, we add 2 to it to find where the string of zeros that goes to the end starts and then later in $group we divide that by 2 to get the true length.
On your sample data, this returns:
{ "_id" : { "addr" : 229, "length" : 4 } }
{ "_id" : { "addr" : 506, "length" : 8 } }
{ "_id" : { "addr" : 829, "length" : 5 } }
You can add a $project stage to transform the field names into ones you want returned.
Let's assume I have a collection entry of 100000.
So What is the approach to get only 50 data every time rather than 100000, Because calling the whole dataset is foolishness.
My Dataset is kind of this type:
{
"_id" : ObjectId("5a2e282417d0b91708fa83b5"),
"post" : "Hello world",
"createdate" : ISODate("2017-12-11T06:39:32.035Z"),
"__v" : 0
}
Like what are the techniques I have to append on my query?
//What filter I have to add.?
db.collection.find({}).sort({'createdate': 1}).exec(function(err, data){
console.log(data);
});
db.collection.find({}).sort({'createdate': 1}).skip(0).limit(50).exec(function(err, data){
console.log(data);
});
there are two more ways to use pagination
one is mongoose-paginate npm module link :- https://www.npmjs.com/package/mongoose-paginate
seconnd is aggregation pipeline with $skip and $limit options
eg:
//from 1 to 50 records
db.col.aggregate[{$match:{}},{$sort:{_id:-1}},{$skip:0},{$limit:50}];
//form 51 to 100 records
db.col.aggregate[{$match:{}},{$sort:{_id:-1}},{$skip:50},{$limit:50}];
First, we have to sort the data and then do limit and skip function.
db.collection.aggregate([{"$sort": {f2: -1}, {$limit : 2}, { $skip : 5 }}]);
using limit with find,
db.collection.find().limit(3)
Using limit with aggregate,
db.collection.aggregate({$limit : 2})
usually aggregate is used if we need to get the pipe lined out, for example we need to to have limit and sort together, then
// sorting happens only on the pipelined out put from limit.
db.collection.aggregate([{$limit : 50},{"$sort": {_id: -1}}]);
// . operator - sorting happening on entire values even though it comes last.
db.collection.find().limit(50).sort({_id:-1});
The same with added skip to get offset
db.collection.aggregate([{$limit : 50},{ $skip : 50 },{"$sort": {_id: -1}}]);
db.collection.find().skip(50).limit(50).sort({_id:-1});
I have a collection foo:
{ "_id" : ObjectId("5837199bcabfd020514c0bae"), "x" : 1 }
{ "_id" : ObjectId("583719a1cabfd020514c0baf"), "x" : 3 }
{ "_id" : ObjectId("583719a6cabfd020514c0bb0") }
I use this query:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$sum:1}}})
Then I get a result:
{ "_id" : 1, "avg" : 2, "sum" : 3 }
What does {$sum:1} mean in this query?
From the official docs:
When used in the $group stage, $sum has the following syntax and returns the collective sum of all the numeric values that result from applying a specified expression to each document in a group of documents that share the same group by key:
{ $sum: < expression > }
Since in your example the expression is 1, it will aggregate a value of one for each document in the group, thus yielding the total number of documents per group.
Basically it will add up the value of expression for each row. In this case since the number of rows is 3 so it will be 1+1+1 =3 . For more details please check mongodb documentation https://docs.mongodb.com/v3.2/reference/operator/aggregation/sum/
For example if the query was:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$sum:$x}}})
then the sum value would be 1+3=4
I'm not sure what MongoDB version was there 6 years ago or whether it had all these goodies, but it seems to stand to reason that {$sum:1} is nothing but a hack for {$count:{}}.
In fact, $sum here is more expensive than $count, as it is being performed as an extra, whereas $count is closer to the engine. And even if you don't give much stock to performance, think of why you're even asking: because that is a less-than-obvious hack.
My option would be:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$count:{}}}})
I just tried this on Mongo 5.0.14 and it runs fine.
The good old "Just because you can, doesn't mean you should." is still a thing, no?
I've translated the follow sql statment to map reduce:
select
p_brand, p_type, p_size,
count(ps_suppkey) as supplier_cnt
from
partsupp, part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#45'
and p_type not like 'MEDIUM POLISHED %'
and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand, p_type, p_size
order by
supplier_cnt desc, p_brand, p_type, p_size;
Map reduce function:
db.runCommand({
mapreduce: "partsupp",
query: {
"ps_partkey.p_size": { $in: [49, 14, 23, 45, 19, 3, 36, 9] },
"ps_partkey.p_brand": { $ne: "Brand#45" }
},
map: function() {
var pattern1 = /^MEDIUM POLISHED .*/;
var pattern2 = /.*Customer.*Complaints.*/;
var suppkey = this.ps_suppkey.s_suppkey;
if( this.ps_suppkey.s_comment.match(pattern1) == null ){
if(this.ps_suppkey.s_comment.match(pattern2) != null){
emit({p_brand: this.ps_partkey.p_brand, p_type: this.ps_partkey.p_type, p_size: this.ps_partkey.p_size}, suppkey);
}
}
},
reduce: function(key, values) {
return values.length;
},
out: 'query016'
});
The output result (seems to me) has no one reduce:
{
"result" : "query016",
"timeMillis" : 46862,
"counts" : {
"input" : 122272,
"emit" : 54,
"reduce" : 0,
"output" : 54
},
"ok" : 1
}
Whats wrong?
The map function outputs key and value pairs.
The reduce function's purpose is to combine multiple values for the same key. This means that if particular key value is only emitted once it has only one value and there is nothing to reduce.
This is one of the reasons that you must output the value in your emit statement in exact same format that reduce function will be returning.
Map outputs:
emit(key1, valueX);
emit(key1, valueY);
emit(key2, valueZ);
Reduce combines valueX and valueY to return new valueXY for key1 and the final result will be:
key1, valueXY
key, valueZ
Notice that reduce was never called on key2. Reduce function may be called zero, once or multiple times for each key value, so you have to be careful to construct both the map and reduce functions to allow for that possibility.
Your map function doesn't emit a correct value - you want to be counting so you have to output a count. Your reduce function must loop over the already accumulated counts and add them up and return the combined count. You may want to look at some examples provided in the MongoDB documentation.
You can probably do this much simpler using the Aggregation Framework - I don't see the need for MapReduce here unless you are expecting to output a huge amount of results.
I suspect that you called emit(value,key) instead of emit(key,value).
As others have already stated, the mapped value and the reduced value must have the same structure. If you just want to make a count, map a value=1 and in the reduce function just return Array.sum(values).
The database is near 5GB. I have documents like:
{
_id: ..
user: "a"
hobbies: [{
_id: ..
name: football
},
{
_id: ..
name: beer
}
...
]
}
I want to return users who have more then 0 "hobbies"
I've tried
db.collection.find({"hobbies" : { > : 0}}).limit(10)
and it takes all RAM and no result.
How to do conduct this select?
And how to return only: id, name, count ?
How to do it with c# official driver?
TIA
P.S.
near i've found:
"Add new field to hande category size. It's a usual practice in mongo world."
is this true?
In this specific case, you can use list indexing to solve your problem:
db.collection.find({"hobbies.0" : {$exists : true}}).limit(10)
This just makes sure a 0th element exists. You can do the same to make sure the list is shorter than n or between x and y in length by checking the existing of elements at the ends of the range.
Have you tried using hobbies.length. i haven't tested this, but i believe this is the right way to query the range of the array in mongodb
db.collection.find({$where: '(this.hobbies.length > 0)'})
You can (sort of) check for a range of array lengths with the $size operator using a logical $not:
db.collection.find({array: {$not: {$size: 0}}})
That's somewhat true.
According to the manual
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size
$size
The $size operator matches any array with the specified number of
elements. The following example would match the object {a:["foo"]},
since that array has just one element:
db.things.find( { a : { $size: 1 } } );
You cannot use $size to find a range of sizes (for example: arrays
with more than 1 element). If you need to query for a range, create an
extra size field that you increment when you add elements
So you can check for array size 0, but not for things like 'larger than 0'
Earlier questions explain how to handle the array count issue. Although in your case if ZERO really is the only value you want to test for, you could set the array to null when it's empty and set the option to not serialize it, then you can test for the existence of that field. Remember to test for null and to create the array when you want to add a hobby to a user.
For #2, provided you added the count field it's easy to select the fields you want back from the database and include the count field.
if you need to find only zero hobbies, and if the hobbies key is not set for someone with zero hobbies , use EXISTS flag.
Add an index on "hobbies" for performance enhancement :
db.collection.find( { hobbies : { $exists : true } } );
However, if the person with zero hobbies has empty array, and person with 1 hobby has an array with 1 element, then use this generic solution :
Maintain a variable called "hcount" ( hobby count), and always set it equal to size of hobbies array in any update.
Index on the field "hcount"
Then, you can do a query like :
db.collection.find( { hcount : 0 } ) // people with 0 hobbies
db.collection.find( { hcount : 5 } ) // people with 5 hobbies
3 - From #JohnPs answer, "$size" is also a good operator for this purpose.
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size