mongodb change stream, operation update, how to solve get previous value problem - mongodb

I know that this feature is not implemented by mongodb. I am thinking what can be best way to achieve that.
Using caching service ? The approach will work but there is one problem, when the query You watch on is too big, like whole collection, You will never have the first before value, because You start caching only when first change appear on watch.
Service started watching.
Received object id 1, no cache for previous change, caching value.
Received object id 1, cache for previous change, can do comparison, caching value
I see another problem here, if I have 2 watchers which could potentially receive information about the same object, this will cause sync problems, as one process may update cache and second will already receive wrong data, hm. I mean the second process could be in a situation that cached previous value is already the same as the one in mongodb change stream.
I was thinking as well about mongodb replicas, but not sure if the problem can be solved with it.
Best,
Igor

Related

Does Firebase Real time database always read the complete node if you reference it?

On an hypothetic node structure like:
NodeA:
-Subnode1: 000000001
-Subnode2: "thisIsAVeeeeeeeeeeeryLoooongString"
I would like to update the NodeA every X minutes, just write it, not reading it, Subnode1 would be a timestamp which I set with Server.TimeStamp and Subnode2 would be a changing string.
I would like to know if just by referencing 'NodeA' Firebase will read the contents of the whole node, and if it does, is there a way to avoid it? since the Subnode2 can be quite heavy and I would like to have control whenever I want to read it.
Clarifications:
I'm not reading the node using any querying function. My question arises because I wonder if when the app starts the referenced nodes (using dbReference = fbbase.GetReference(path)) are read automatically.
I know I could use different references for each node but then I would incur into different upload costs since it would mean 2 different connections (yes, the uploads also have costs depending on the frequency)
I'm using Firebase SDK for Unity.
Thanks in advance.
If you query NodeA, it will pull down the entire contents of that node, including all of its children.
If you want just a specific child, query it instead. You can certainly build a path to Subnode1 if you want.
There is no way to exclude a certain child from a query, while getting all others. If you don't want all children, you must query each desired child individually.
Firebase rtdb charges on storage volumes and data downloads. If you are simply updating the record in the node you should not incur costs other than minor network costs.
A reference does not incure any fee's
that being said, reads and writes do.
Reason being is a reference is a hypothetical location for a document or a query and does not nessessarily exist until its contents has been populated by an update snapshot
when you read or write to a node, your data + overhead is calculated based on the current cost model per kb

How to prevent lost updates on the views in a distributed CQRS/ES system?

I have a CQRS/ES application where some of the views are populated by events from multiple aggregate roots.
I have a CashRegisterActivated event on the CashRegister aggregate root and a SaleCompleted event on the Sale aggregate root. Both events are used to populate the CashRegisterView. The CashRegisterActivated event creates the CashRegisterView or sets it active in case it already exists. The SaleCompleted event sets the last sale sequence number and updates the cash in the drawer.
When two of these events arrive within milliseconds, the first update is overwritten by the last one. So that's a lost update.
I already have a few possible solutions in mind, but they all have their drawbacks:
Marshal all event processing for a view or for one record of a view on the same thread. This works fine on a single node, but once you scale out, things start to get complex. You need to ensure all events for a view are delivered to the same node. And you need to migrate to another node when it goes down. This requires some smart load balancer which is aware of the events and the views.
Lock the record before updating to make sure no other threads or nodes modify it in the meantime. This will probably work fine, but it means giving up on a lock-free system. Threads will set there, waiting for a lock to be freed. Locking also means increased latency when I scale out the data store (if I'm not mistaken).
For the record: I'm using Java with Apache Camel, RabbitMQ to deliver the events and MariaDB for the view data store.
I have a CQRS/ES application where some of the views in the read model are populated by events from multiple aggregate roots.
That may be a mistake.
Driving a process off of an isolated event. But composing a view normally requires a history, rather than a single event.
A more likely implementation would be to use the arrival of the events to mark the current view stale, and to use a single writer to update the view from the history of events produced by the aggregate(s) concerned.
And that requires a smart messaging solution. I thought "Smart endpoints and dumb pipes" would be a good practice for CQRS/ES systems.
It is. The endpoints just need to be smart enough to understand when they need histories, or when events are sufficient.
A view, after all, is just a snapshot. You take inputs (X.history, Y.history), produce a snapshot, write the snapshot into your view store (possibly with meta data describing the positions in the histories that were used), and you are done.
The events are just used to indicate to the writer that a previous snapshot is stale. You don't use the event to extend the history, you use the event to tell the writer that a history has changed.
You don't lose updates with multiple events, because the event itself, with all of its state, is captured in the history. It's the history that is used to build the event-sourced view.
Konrad Garus wrote
... handling events coming from a single source is easier, but more importantly because a DB-backed event store trivially guarantees ordering and has no issues with lost or duplicate messages.
A solution could be to detect the when this situation happens, and do a retry.
To do this:
Add to each table the aggregate version number which is kept up to date
On each update statement add the following the the where clause "aggr_version=n-1" (where n is the version of the event being processed)
When the result of the update statement is that no records where modified, it probably means that the event was processed out of order and a retry strategy can be performed
The problem is that this adds complexity and is hard to test. The performance bottleneck is very likely in the database, so a single process with a failover solution will probably be the easiest solution.
Although I see you ask how to handle these things at scale - I've seen people recommend using a single threaded approach - until such times as it actually becomes a problem - and then address it.
I would have a process manager per view model, draw the events you need from the store and write them single threaded.
I combined the answers of VoiceOfUnreason and StefRave into something I think might work. Populating a view from multiple aggregate roots feels wrong indeed. We have out of order detection with a retry queue. So an event on an aggregate root will only be processed when the last completely processed event is version n-1.
So when I create new aggregate roots for the views that would be populated by multiple aggregate roots (say aggregate views), all updates for the view will be synchronised without row locking or thread synchronisation. We have conflict detection with a retry mechanism on the aggregate roots as well, that will take care of concurrency on the command side. So if I just construct these aggregate roots from the events I'm currently using to populate the aggregate views, I will have solved the lost update problem.
Thoughts on this solution?

How cursor.observe works and how to avoid multiple instances running?

Observe
I was trying to figure it out how cursor.observe runs inside meteor, but found nothing about it.
Docs says
Establishes a live query that notifies callbacks on any change to the query result.
I would like to understand better what live query means.
Where will be my observer function executed? By Meteor or by mongo?
Multiple runs
When we have more than just a user subscribing an observer, one instance runs for each client, leading us to a performance and race condition issue.
How can I implement my observe to it be like a singleton? Just one instance running for all.
Edit: There was a third question here, but now it is a separated question: How to avoid race conditions on cursor.observe?
Server side, as of right now, observe works as follows:
Construct the set of documents that match the query.
Regularly poll the database with query and take a diff of the changes, emitting the relevant events to the callbacks.
When matching data is changed/inserted into mongo by meteor itself, emit the relevant events, short circuiting step #2 above.
There are plans (possibly in the next release) to automatically ensure that calls to subscribe that have the same arguments are shared. So basically taking care of the singleton part for you automatically.
Certainly you could achieve something like this yourself, but I believe it's a high priority for the meteor team, so it's probably not worth the effort at this point.

SimpleDB: Guaranteed to see all item attributes if we see the item? (non-consistent read)

I've just discovered an assumption in my use of SimpleDB. I suspect it's safe but would like other opinions since the docs don't seem to cover it.
So say Process 1 stores an item with x attributes. When Process 2 tries to access said item (without consistent read) & finds it, is it guaranteed to have all the attributes stored by Process 1?
I'm excluding the possibility that another process could have changed the data.
I also know that Process 2 has no guarantee of seeing the item unless consistent read is used, I'm just talking about the point when it does eventually see it.
I guess the question is, once I can get an item & am not changing it anywhere else can I assume it has an ad-hoc fixed schema and access all my expected attributes without checking they actually exist?
I don't want to be in a situation where I need to keep requesting items until they have all the attributes I need to use them.
Thanks.
Although Amazon makes no such guarantees in the documentation the current implementation of their eventual consistency guarantees that you'll see all the properties stored by Process 1 or none of them.
See this thread over at the AWS forums and more specifically this answer by an Amazon employee confirming the behavior (emphasis mine).
I don't think we make that guarantee in the documentation, but the
current implementation treats each Put request as a bundle. It won't
split the request up and apply the operations piecemeal. You'll get
either step-1 responses or step-2 responses until eventual consistency
shakes out and leaves you with step-2 responses.
While this is undocumented behavior I suspect quite a few SimpleDB users are relying on it now and as such Amazon won't be likely to change it anytime soon, but that's ju my guess.

How can I clear Class::DBI's internal cache?

I'm currently working on a large implementation of Class::DBI for an existing database structure, and am running into a problem with clearing the cache from Class::DBI. This is a mod_perl implementation, so an instance of a class can be quite old between times that it is accessed.
From the man pages I found two options:
Music::DBI->clear_object_index();
And:
Music::Artist->purge_object_index_every(2000);
Now, when I add clear_object_index() to the DESTROY method, it seems to run, but doesn't actually empty the cache. I am able to manually change the database, re-run the request, and it is still the old version.
purge_object_index_every says that it clears the index every n requests. Setting this to "1" or "0", seems to clear the index... sometimes. I'd expect one of those two to work, but for some reason it doesn't do it every time. More like 1 in 5 times.
Any suggestions for clearing this out?
The "common problems" page on the Class::DBI wiki has a section on this subject. The simplest solution is to disable the live object index entirely using:
$Class::DBI::Weaken_Is_Available = 0;
$obj->dbi_commit(); may be what you are looking for if you have uncompleted transactions. However, this is not very likely the case, as it tends to complete any lingering transactions automatically on destruction.
When you do this:
Music::Artist->purge_object_index_every(2000);
You are telling it to examine the object cache every 2000 object loads and remove any dead references to conserve memory use. I don't think that is what you want at all.
Furthermore,
Music::DBI->clear_object_index();
Removes all objects form the live object index. I don't know how this would help at all; it's not flushing them to disk, really.
It sounds like what you are trying to do should work just fine the way you have it, but there may be a problem with your SQL or elsewhere that is preventing the INSERT or UPDATE from working. Are you doing error checking for each database query as the perldoc suggests? Perhaps you can begin there or in your database error logs, watching the queries to see why they aren't being completed or if they ever arrive.
Hope this helps!
I've used remove_from_object_index successfully in the past, so that when a page is called that modifies the database, it always explicitly reset that object in the cache as part of the confirmation page.
I should note that Class::DBI is deprecated and you should port your code to DBIx::Class instead.