I would like to perform a complex merge:
e.g.
[
{
“one.two.three”: 4,
“number.two”: “B”
},
{
“one.two.three”: 7,
“number.two”: “A”
},
{
“one.two.three”: 10,
“number.two”: “B”
}
]
where the result is:
{
“one.two.three”: 10,
“number.two”: “A”
}
because those are the maximum values…I could have any N+ number of arbitrary KV pairs, so I can’t just sort on a specific field
Query
having field names with $ or . is problematic and hard to use, i think you need to change your field names to not contain those characters see this
group and find the max and min, i guess for letters you wanted the min because you wanted A letter
Playmongo
aggregate(
[{"$group":
{"_id": null,
"max-one": {"$max": "$one"},
"max-number": {"$min": "$number"}}}])
Related
I have a collection in MongoDB that looks something like:
{
"foo": "something",
"tag": 0,
},
{
"foo": "bar",
"tag": 1,
},
{
"foo": "hello",
"tag": 0,
},
{
"foo": "world",
"tag": 3,
}
If we consider this example, there are entries in the collection with tag of value 0, 1 or 3 and these aren't unique values, tag value can be repeated. My goal is to find that 2 is missing. Is there a way to do this with a query?
Query1
in the upcoming mongodb 5.2 we will have sort on arrays that could do this query easier without set operation but this will be ok also
group and find the min,max and all the values
take the range(max-min)
the missing are (setDifference range_above tags)
and from them you take only the smallest => 2
Test code here
aggregate(
[{"$group":
{"_id":null,
"min":{"$min":"$tag"},
"max":{"$max":"$tag"},
"tags":{"$addToSet":"$tag"}}},
{"$project":
{"_id":0,
"missing":
{"$min":
{"$setDifference":
[{"$range":[0, {"$subtract":["$max", "$min"]}]}, "$tags"]}}}}])
Query2
in Mongodb 5 (the current version) we can use also $setWindowFields
sort by tag, add the dense-rank(same values=same rank), and the min
then find the difference of tag-min
and then filter those that this difference < rank
and find the max of them (max of the tag that are ok)
increase 1 to find the one missing
*test it before using it to be sure, i tested it 3-4 times seemed ok,
for big collection if you have many different tags, this is better i think. (the above addtoset can cause memory problems)
Test code here
aggregate(
[{"$setWindowFields":
{"output":{"rank":{"$denseRank":{}}, "min":{"$first":"$tag"}},
"sortBy":{"tag":1}}},
{"$set":{"difference":{"$subtract":["$tag", "$min"]}}},
{"$match":{"$expr":{"$lt":["$difference", "$rank"]}}},
{"$group":{"_id":null, "last":{"$max":"$tag"}}},
{"$project":{"_id":0, "missing":{"$add":["$last", 1]}}}])
Given a set of prices from a provider I want to work out the closest matching item set that will match a total value.
IE. The total value to match is $5, I have a price list from MongoDB as follows
val items = List(0.05, 0.06, 1.0, 2.0)
How do I work out the set of items that will give $5? The items can be duplicated in order to match the price.
My idea is to use a Stream to evaluate the price list and create permutations and match with an accuracy of $0.01 but I do not know how to go about doing that?
The other idea is to write an aggregation from MongoDB to provide the permutations and then sort by total price and pick the closest one
UPDATE:
So I am now sampling products from the database using the following:
db.getCollection("shop-items").aggregate([
{
$bucketAuto: {
groupBy: "$prices.avg",
buckets: 50,
output: {
"items" : {
$push: "$$ROOT"
},
"count": { $sum: 1 },
}
}
},
{$addFields: {"startField": {$floor :{$multiply: [ { $rand: {} }, "$count"]}}}},
{$project : { _id: 0, items: {$slice: ["$items", "$startField", 20]}}},
{$unwind: "$items"}
])
This results in about 650 items which will still result in 650! permutations which I would need to calculate
This is an algorithm question. Using a DB to generate permutations doesn't sound efficient. What you are looking for is a Knapsack solution with memoization. Basically, you need to solve ar(n) = minimum number items to get total sum n (with buffer of $0.01, if needed). The algorithm sketch will be:
Memoized array ar[500], where ar[y] = minimum number of items to match $(y/100).
All elements are initalized to infinite/very large value.
val items = [5, 6, 100, 200]
Initialize / populate: ar[0] = 0; items.forEach(ar[_] = 1);
while ($ar[500] is not populated) {
for (y where ar[y] is populated) {
items.forEach( item => ar[y + item] = min(ar[y + item], ar[y] + 1);
}
}
This will result in upper bound complexity of O(n * m) where n is the total number of possible outcomes (here 500), and m is total number of distinct items-prices, probably a constant.
I think I have a pretty complex one here - not sure if I can do this or not.
I have data that has an address and a data field. The data field is a hex value. I would like to run an aggregation that groups the data by address and then the length of the hex data. All of the data will come in as 16 characters long, but the length of that data should calculated in bytes.
I think I have to take the data, strip the trailing 00's (using regex 00+$), and divide that number by 2 to get the length. After that, I would have to then group by address and final byte length.
An example dataset would be:
{addr:829, data:'4100004822000000'}
{addr:829, data:'4100004813000000'}
{addr:829, data:'4100004804000000'}
{addr:506, data:'0000108000000005'}
{addr:506, data:'0000108000000032'}
{addr:229, data:'0065005500000000'}
And my desired output would be:
{addr:829, length:5}
{addr:506, length:8}
{addr:229, length:4}
Is this even possible in an aggregation query w/o having to use external code to do?
This is not too complicated if your "data" is in fact strings as you show in your sample data. Assuming data exists and is set to something (you can add error checking as needed) you can get the result you want like this:
db.coll.aggregate([
{$addFields:{lastNonZero:{$add:[2,{$reduce:{
initialValue:-2,
input:{$range:[0,{$strLenCP:"$data"},2]},
in:{$cond:{
if: {$eq:["00",{$substr:["$data","$$this",2]}]},
then: "$$value",
else: "$$this"
}}
}}]}}},
{$group:{_id:{
addr:"$addr",
length:{$divide:["$lastNonZero",2]}
}}}
])
I used two stages but of course they could be combined into a single $group if you wish. Here in $reduce I step through data 2 characters at a time, checking if they are equal to "00". Every time they are not I update the value to where I am in the sequence. Since that returns the position of the last non-"00" characters, we add 2 to it to find where the string of zeros that goes to the end starts and then later in $group we divide that by 2 to get the true length.
On your sample data, this returns:
{ "_id" : { "addr" : 229, "length" : 4 } }
{ "_id" : { "addr" : 506, "length" : 8 } }
{ "_id" : { "addr" : 829, "length" : 5 } }
You can add a $project stage to transform the field names into ones you want returned.
I am storing time-series data across multiple fixed-sized, pre-allocated documents. When one fills up, another is created. Each document has two pre-calculated values:
prevEnd (stores the value in last index of previous document's values)
nextStart (stores the value in the next document's first index)
I want to rely on these pre-aggregated values to find a range of documents when searching by a time range. The following example uses integers in place of timestamps or dates for clarity.
Question: How can I select the two documents below knowing only the time range of interest (111-114)?
{
"prevEnd"; 107,
"nextStart": 110,
"time" : [
NumberLong(107)
NumberLong(108)
NumberLong(109)
]
},
//-----------------Select Start
{
"prevEnd": 109,
"nextStart": 113,
"time" : [
NumberLong(110),
NumberLong(111),
NumberLong(112),
]
},
{
"prevEnd": 112,
"nextStart": 116,
"time" : [
NumberLong(113),
NumberLong(114),
NumberLong(115)
]
},
//-----------------Select End
{
"prevEnd": 115,
"nextStart": 99999999999999999999999999999999,
"time" : [
NumberLong(116),
NumberLong(117),
NumberLong(118)
]
}
The following find() call will work:
db.collection.find({"time": {"$elemMatch": {$gt: 111, $lt: 114}}})
because it uses the $elemMatch operator to match the documents which contain a time field with at least one element that matches both the upper and lower limits.
But since your question explicitly refers to prevEnd and nextStart I suspect you are looking for a solution which filters on those attribuites. For example:
db.collection.find({$or: [{"prevEnd": {$gt: 111}}, {"nextStart": {$gt: 111}}], "prevEnd": {$lt: 114}})
We have a problem wherein certain strings appear as 123, 00123, 000123. We need to group by this field and we would like all the above to be considered as one group. I know the length of these values cannot be greater than 6.
The approach I was thinking was to left pad all of these fields in projection with 0s to a length of 6. One way would be to concat 6 0s first and then do a substr - but there is no length available for me to calculate the indexes for the substr method. -JIRA
Is there something more direct? Couldn't find anything here : https://docs.mongodb.org/manual/meta/aggregation-quick-reference/#aggregation-expressions or has anyone solved this some way?
I would convert then to int. E.g.:
For collection:
db.leftpad.insert([
{key:"123"},
{key:"0123"},
{key:"234"},
{key:"000123"}
])
counting:
db.leftpad.mapReduce(function(){
emit(this.key * 1, 1);
}, function(key, count) {
return Array.sum(count);
}, {out: { inline: 1 }}
).results
returns an array:
[
{_id : 123, value : 3},
{_id : 234, value : 1}
]
If you can, it may worth to reduce it once:
db.leftpad.find({key:{$exists:true}, intKey:{$exists:false}}).forEach(function(d){
db.leftpad.update({_id:d._id}, {$set:{intKey: d.key * 1}});
})
And then group by intKey.