Doing an "ORDER BY ... LIMIT ..." style query on a hash in KRL - krl

Say I have a hash with a list of delivery drivers (for the classic flower shop scenario). Each driver has a rating and an event signal URL (ESL). I want to raise an event only to the top three drivers in that list, sorted by ranking.
With a relational database, I'd run a query like this:
SELECT esl FROM driver ORDER BY ranking LIMIT 3;
Is there a way to do this in KRL? There are two requirements:
A way to sort the hash
A way to limit the number of times a foreach iterates
The second could be solved like this:
rule reset_counter {
select when rfq delivery_ready
noop();
always {
clear ent:loop_counter;
raise explicit event loop_drivers;
}
}
rule loop_on_drivers {
select when explicit loop_drivers
foreach app:drivers setting (driver)
pre {
esl = driver.pick("$.esl");
}
if (ent:loop_counter < 3) then {
// Signal the driver's ESL
}
always {
ent:loop_counter += 1 from 0;
}
}
But that's kind of kludgy. Is there a more KRL-ish way to do it? And how should I solve the ordering problem?
EDIT: Here's the format of the app:drivers array, to make the question easier to answer:
[
{
"id": "1",
"rating": "5",
"esl": "http://example.com/esl"
},
{
"id": "2",
"rating": "3",
"esl": "http://example.com/esl2"
}
]

Without knowing the form of the hash, it's impossible to give you a specific answer, but you can use the sort operator to sort and then use the pick operator or hash
Something like
driver_data.sort(function(){...}).pick("$..something[:2]")
"something" is the name from the hash of the relevant field.

Related

Push Item to Array and Delete in the Same Request

I have a document that stores sensor data where the sensor readings are objects stored in an array. Example:
{
"readings": [
{
"timestamp": 1499475320,
"temperature": 121
},
{
"timestamp": 1499475326,
"temperature": 93
},
{
"timestamp": 1499475340,
"temperature": 142
}
]
}
I know how to push/add an item to the "readings" array. But what I need is when I add an item to the array, I also want to "clean" the array by removing items that have "timestamp" value older than a cutoff time.
Is this possible in mongodb?
The way I see this you basically have two options here that have varying approaches.
Restrict Arrays to Capped Size
The first option here is "not exactly" what you are asking for, but it is the option with the least implementation and execution overhead. The variance from your question is that instead of "removing past a certain age", we instead simply place a "limit/cap" on the total number of entries in the array.
This is actually done using the $slice modifier to $push:
Model.update(
{ "_id": docId },
{ "$push": {
"readings": {
"$each": [{ "timestamp": 1499478496679, "temperature": 100 }],
"$slice": -10
}
}
)
In this case the -10 argument restricts the array to only have the "last ten" entries from the end of the array since we are "appending" with $push. If you wanted instead the "latest" as the first entry then you would modify with $position and instead provide the "positive" value to $slice, which means "first ten" in contrast.
So it's not the same thing you asked for, but it is practical since the arrays do not have "unlimited growth" and you can simply "cap" them as each update is made and the "oldest" item will be removed once at the maximum length. This means the overall document never actually grows beyond a set size, and this is a very good thing for MongoDB.
Issue with Bulk Operations
The next case which actually does exactly what you ask uses "Bulk Operations" to issue "two" update operations in a "single" request to the server. The reason why it is "two" is because there is a rule that you cannot have different update operators "assigned to the same path" in a singe update operation.
Therefore what you want actually involves a $push AND a $pull operation, and on the "same array path" we need to issue those as "separate" operations. This is where the Bulk API can help:
Model.collection.bulkWrite([
{ "updateOne": {
"filter": { "_id": docId },
"update": {
"$pull": {
"readings": { "timestamp": { "$lt": cutOff } }
}
}
}},
{ "updateOne": {
"filter": { "_id": docId },
"update": {
"$push": { "timestamp": 1499478496679, "temperature": 100 }
}
}}
])
This uses the .bulkWrite() method from the underlying driver which you access from the model via .collection as shown. This will actually return a BulkWriteOpResult within the callback or Promise which contains information about the actual operations performed within the "batch". In this case it will be the "matched" and "modified" numbers which will be appropriate to the operations that were actually performed.
Hence if the $pull did not actually "remove" anything since the timestamp values were actually newer than the given constraint, then the modified count would only reflect the $push operation. But most of the time this need not concern you, where instead you would just accept that the operations completed without error and did something according to what you actually asked.
Conclude
So the general case of "both" is that it's really all done in one request and one response. The differences come in that "under the hood" the second approach which matches your request actually does do "two" operations per request and therefore takes microseconds longer.
There is actually no reason why you could not "combine" the logic of "both", and remove past your "cutoFF" as well as keeping a "cap" on the overall array size. But the general idea here is that the first implementation, though not exactly the same thing as asked will actually do a "good enough" job of "housekeeping" with little to no additional overhead on the request, or indeed the implementation of the actual code.
Also, whilst you can always "read the data" -> "modify" -> "save". That is not a really great pattern. And for best performance as well as "consistency" without conflict, you should be using the atomic operations to modify in just the same way as is outlined here.

MongoDB: Find document given field values in an object with an unknown key

I'm making a database on theses/arguments. They are related to other arguments, which I've placed in an object with a dynamic key, which is completely random.
{
_id : "aeokejXMwGKvWzF5L",
text : "test",
relations : {
cF6iKAkDJg5eQGsgb : {
type : "interpretation",
originId : "uFEjssN2RgcrgiTjh",
ratings: [...]
}
}
}
Can I find this document if I only know what the value of type is? That is I want to do something like this:
db.theses.find({relations['anything']: { type: "interpretation"}}})
This could've been done easily with the positional operator, if relations had been an array. But then I cannot make changes to the objects in ratings, as mongo doesn't support those updates. I'm asking here to see if I can keep from having to change the database structure.
Though you seem to have approached this structure due to a problem with updates in using nested arrays, you really have only caused another problem by doing something else which is not really supported, and that is that there is no "wildcard" concept for searching unspecified keys using the standard query operators that are optimal.
The only way you can really search for such data is by using JavaScript code on the server to traverse the keys using $where. This is clearly not a really good idea as it requires brute force evaluation rather than using useful things like an index, but it can be approached as follows:
db.theses.find(function() {
var relations = this.relations;
return Object.keys(relations).some(function(rel) {
return relations[rel].type == "interpretation";
});
))
While this will return those objects from the collection that contain the required nested value, it must inspect each object in the collection in order to do the evaluation. This is why such evaluation should really only be used when paired with something that can directly use an index instead as a hard value from the object in the collection.
Still the better solution is to consider remodelling the data to take advantage of indexes in search. Where it is neccessary to update the "ratings" information, then basically "flatten" the structure to consider each "rating" element as the only array data instead:
{
"_id": "aeokejXMwGKvWzF5L",
"text": "test",
"relationsRatings": [
{
"relationId": "cF6iKAkDJg5eQGsgb",
"type": "interpretation",
"originId": "uFEjssN2RgcrgiTjh",
"ratingId": 1,
"ratingScore": 5
},
{
"relationId": "cF6iKAkDJg5eQGsgb",
"type": "interpretation",
"originId": "uFEjssN2RgcrgiTjh",
"ratingId": 2,
"ratingScore": 6
}
]
}
Now searching is of course quite simple:
db.theses.find({ "relationsRatings.type": "interpretation" })
And of course the positional $ operator can now be used with the flatter structure:
db.theses.update(
{ "relationsRatings.ratingId": 1 },
{ "$set": { "relationsRatings.$.ratingScore": 7 } }
)
Of course this means duplication of the "related" data for each "ratings" value, but this is generally the cost of being to update by matched position as this is all that is supported with a single level of array nesting only.
So you can force the logic to match with the way you have it structured, but it is not a great idea to do so and will lead to performance problems. If however your main need here is to update the "ratings" information rather than just append to the inner list, then a flatter structure will be of greater benefit and of course be a lot faster to search.

Delete a MongoDB subdocument by value

I have a collection containing documents that look like this:
{
"user": "foo",
"topics": {
"Topic AB": {
"score": 20,
"frequency": 3,
"last_seen": 40
},
"Topic BD": {
"score": 10,
"frequency": 2,
"last_seen": 38
},
"Topic TF": {
"score": 19,
"frequency": 6,
"last_seen": 20
}
}
}
I want to remove subdocuments whose last_seen value is less than 30.
I don't want to use arrays here since I'm using $inc to update the subdocuments in conjunction with upsert (which doesn't support the $ notation).
The real question here is how can I delete a key depending on its value. Using $unset simply drops a subdocument regardless of what it contains.
I'm afraid I don't think this is possible with your current design. Knowing the name of the key whose last_seen value you wish to test, for example Topic TF, you can do
> db.topics.update({"topics.Topic TF.last_seen" : { "$lt" : 30 }},
{ "$unset" : { "topics.Topic TF" : 1} })
However, with an embedded document structure, if you don't know the name of the key that you want to query against then you can't run the query. If the Topic XX keys are only known by what's in the document, you'd have to pull the whole document to find out what keys to test, and at that point you ought to just manipulate the document client-side and then update by _id.
The best option is to use arrays. The $ positional operator works with upserts, it just has a serious gotcha that, in the case of an insert, the $ will be interpreted as part of the field name instead of as an operator, so I understand your conclusion that it doesn't seem feasible. I'm not quite sure how you are using upsert such that arrays seem like they won't work, though. Could you give more detail there and I'll try to help come up with a reasonable workaround to use arrays and $ with your use case?

Using IF/ELSE in map reduce

I am trying to make a simple map/reduce function on one of my MongoDB database collections.
I get data but it looks wrong. I am unsure about the Map part. Can I use IF/ELSE in this way?
UPDATE
I want to get the amount of authors that ownes the files. In other words how many of the authors own the uploaded files and thus, how many authors has no files.
The objects in the collection looks like this:
{
"_id": {
"$id": "4fa8efe33a34a40e52800083d"
},
"file": {
"author": "john",
"type": "mobile",
"status": "ready"
}
}
The map / reduce looks like this:
$map = new MongoCode ("function() {
if (this.file.type != 'mobile' && this.file.status == 'ready') {
if (!this.file.author) {
return;
}
emit (this.file.author, 1);
}
}");
$reduce = new MongoCode ("function( key , values) {
var count = 0;
for (index in values) {
count += values[index];
}
return count;
}");
$this->cimongo->command (array (
"mapreduce" => "files",
"map" => $map,
"reduce" => $reduce,
"out" => "statistics.photographer_count"
)
);
The map part looks ok to me. I would slightly change the reduce part.
values.forEach(function(v) {
count += v;
}
You should not use for in loop to iterate an array, it was not meant to do this. It is for enumerating object's properties. Here is more detailed explanation.
Why do you think your data is wrong? What's your source data? What do you get? What do you expect to get?
I just tried your map and reduce in mongo shell and got correct (reasonable looking) results.
The other way you can do what you are doing is get rid of the inner "if" condition in the map but call your mapreduce function with appropriate query clause, for example:
db.files.mapreduce(map,reduce,{out:'outcollection', query:{"file.author":{$exists:true}}})
or if you happen to have indexes to make the query efficient, just get rid of all ifs and run mapreduce with query:{"file.author":{$exists:true},"file.type":"mobile","file.status":"ready"} clause. Change the conditions to match the actual cases you want to sum up over.
In 2.2 (upcoming version available today as rc0) you can use the aggregation framework for this type of query rather than writing map/reduce functions, hopefully that will simplify things somewhat.

Paginate sub document MongoDB

I'm trying to make a paginate mechanism for our product documents stored in MongoDB. What makes this tricky, is that each document can have several colors, and I need to paginate by these instead of the document itself. E.g. the example below has two colors, and should then count as 2 in my paginate results.
How would anyone go around doing this the easiest / most affective way?
Thanks in advance!
{
"_id": ObjectId("4fdbaf608b446b0477000142"),
"created_at": newDate("14-10-2011 12:02:55"),
"modified_at": newDate("15-6-2012 23:55:43"),
"sku": "A1051g",
"name": {
"en": "Earrings - Celebrity"
},
"variants": [
{
color: {
en: "Blue"
}
},
{
color: {
en: "Yellow"
}
}
]
}
I like Sammaye's solution but another approach could just be pulling back more results than you need.
So for example, if you need 100 variants per page and each product has at least 1 variant, query with a limit of 100 to try and get 100 products, and therefore, at least 100 variants.
Chances are, you will have more than 100 variants (each product having more than 1) so build a list of products as you iterate over the cursor keeping track of the number variants.
When you have 100 variants, take note of how many products you have in the list, out of the 100 you retrieved, and use that as the skip for your next query.
This will eventually get expensive for large skips as you will have to seek over the number of documents you skip but could be a good solution for now.