I have a stream of events that are in a sorted order. Each event is part of a group that has a key property - its ID. The events have differing values for non-ID properties. I wish to ignore all but the final event before the ID changes. For example
{ID:1, Prop: "V1"}, {ID:1, Prop:"V2"}, {ID:1, Prop:"V3"}, {ID:2, Prop:"V1"}, {ID:2, Prop:"V2"}, {ID:2, Prop:"V3"}, {ID:2, Prop:"V25"}, {ID:3, Prop:"V1"}, {ID:3, Prop:"V8"}
I would want to emit only:
{ID:1, Prop:"V3"},{ID:2, Prop:"V25"},{ID:3, Prop:"V8"}
I had used GroupBy id and then TakeLast(1) however TakeLast, as far as I understand, will wait until stream completes and would seem likely to take a lot more memory for longer. I know that I have seen the last of that sequence of IDs as soon as I see different ID in the stream - so I want to emit the value as soon as I see a change in the key property. I guess kind of like distinctUntilChange but giving me the last value.
I would expect to emit the first element as soon as I see the first element with ID:2 in the stream.
I thought about buffering or something but still getting my head around reactive. Any ideas?
source.Buffer(2, 1)
.Where(i => i.Count == 2 ? (i[0].ID != i[1].ID) : true)
.Select(i => i[0])
Keep a running buffer of 2 latest elements of the Observable.
If the ID changes or the buffer contains only one element (last element), emit the first item of the buffer.
Related
I have a collection with following schema
{
content: string,
score: Decimal
}
The goal is to go through whole collection with pagination ordered by score, but score can change every minute. So it is possible that score on the first page can became equal or less then score on the second page and will return same object (from the first page) as a result on the second page. Is it possible to iterate over whole collection without doubles?
My documents all have sequential numbers, saved as a String as an ID (it's padded with 0s). When creating a new record, I first do a request for Comment.count(). Using the number returned from that, I generate the ID string. I then create an object, and save it as a new document.
var commentNumber = (result[1] + 1).toString().padStart(4, '0');
var newComment = this({
html: processedHtml,
number: commentNumber
});
newComment.save(function(err, result) {
if (err) return callback(err);
return callback(null, result);
});
The problem is, if two comments are submitted at the same time, they will get the same ID (this happens if I make 2 requests on submission instead of 1, they will both have the same ID).
How can I prevent this?
One simple option would be to create a unique index on number so that one of the requests fails.
Another would be to store the current number count elsewhere. If you wanted to use mongo, you could have a doc with commentCount in a different collection & do a findAndUpdate with $inc and use the returned value. This still leads to a weird race condition where a user might only see comments 1 and 3 when comment 2 takes longer to create than comment 3.
I think the approach of storing the comment number on the document is fundamentally flawed: it creates weird race conditions, strange error handling, and complex deletes. If possible, it's better to calculate the number of comments on the way out.
As far as ordering goes, mongo _ids encode date-time information at the start of the _id, so you can use the _id to sort documents.
EDIT(15/04/2016): As per the comments received in this question I think there is no direct method in mongodb to do so. In that case is it possible to do this with MongooseJS?
I am using mongodb, For one collection I need an 'lastModified' field, which should show me when in the last time any value of the document actually modified. That means, If I did an update query and all the other values are same, then the 'lastModified' shouldn't get updated.
Example:
Suppose I have a document in collection as :
{ _id: 1, status: "a", lastModified: ISODate("2016-04-02T01:11:18.965Z") }
when I update with
{$set:{"status":"a", "lastModified":"Current Time Here"}}
then the "lastModified" should not change
when I update with
{$set:{"status":"b", "lastModified":"Current Time Here"}}
then the "lastModified" should change
How to achieve this?
In my case I will call the update operation multiple times, and I don't want to get the 'lastModified' changed each in this case. Instead it should change only when the 'status' is actually modified.
I'm using MongoEngine in Python to work with my data model.
I have a data model which essentially looks like this as represented in BSON:
{
'id': ...
'revisions': [
{
'id': ...
'revision': 1,
'derivatives': [
{
'id': ...
'name': 'Derivative 1'
}
]
}
]
}
We'll call the outermost document the owner, all subdocuments in owner.revisions will be called revision, and all subdocuments in revision.derivatives will be called derivative.
I'm looking to $addToSet to the derivatives set inside a specific revision inside of a specific owner. If I had to write this in Python, it'd look like this:
def add_to_set(owner_id, revision_id, new_derivative):
for owner in owner_collection:
if owner.id == owner_id:
# found the right owner
for revision in owner.revisions:
if revision.id == revision_id:
# we've found the right revision in the right owner
# now append and get out
revision.derivatives.append(new_derivative)
return
How can I run this kind of query, selecting the right revision inside of the right owner and atomically appending to the inner derivatives collection on that revision?
Having a hard time figuring out how to get started with an update query like this.
The problem is with using save based on an older version of the object - it's not thread-safe.
You want to use update operator which atomically manipulates an object based on matched condition, rather than unilaterally overwrite an object (by _id) with an entire new object.
Example of unsafe "save":
Thread 1: reads object with field {_id:1, a:1}, increments a, saves {_id:1,a:2}
Thread 2: reads object with field {_id:1, a:1}, increments a, saves {_id:1,a:2}
Contrast this with "safe" update:
Thread 1: updates object matching {_id:1}using operator {$inc:{a:1}}
Thread 2: updates object matching {_id:1}using operator {$inc:{a:1}}
No matter what order the two threads execute in in the second case the value of a will be 2 (since it will be incremented by 1 twice on the server.
I'm guessing the same thing is happening with your object - the analogous update operation you want is $push operator.
I have the following schema in the database:
{
id: 12345;
friends: [123,345,678,908]
},
{
id: 908;
friends: [123,345]
}
Is there a way to get an array of all unique friends IDs from the entire collection?
To get distinct friends values you do not need to write map/reduce job.
Just run:
> db.collection.distinct("friends")
[ 123, 345, 678, 908 ]
I'm not too familiar with MongoDB's MapReduce implementation but I imagine you could have your mappers write out the values passed to them as keys, and simply use null values.
This way you can ensure the reducers only will receive a given key (your friend IDs) once, and you can simply write that out just once without iterating over the values. As the values are null anyway, there is no point to iterating (not to mention that if you iterate you will write out the keys multiple times, you just want it written once to ensure it is distinct.)
However, bear in mind that your keys will be spread across the reducers output files, e.g. reducer 1 might output 123 and reducer 2 might output 345 so you may have to consolidate the output files' contents afterwards in order to construct your array.