MongoDB/Spring: Subscribing to collection changes - mongodb

I'm working with a Spring Boot app. I'm trying to implement callback-based event notification for collection modifications in a MongoDB. I'm running out of ideas, as I have tried the following:
Classic Polling - Redundant, as the existing implementation is a REST endpoint that's polled by the UI, where it queries data.
Tailable Cursors - Requires a collection to be capped, which is likely a limitation that won't suffice for a database with a very high storage forecast.
Change Streams - I got a runtime exception stating that the storage engine doesn't support 'Majority Read Concern'.
collection.watch(asList(Aggregates.match(Filters.in("operationType", asList("insert", "update"))))).forEach(printBlock);
I'm not authorizated to view the engine configuration, but I'm assuming if the DBA can't change the storage engine to wiredTiger, than I can't use change streams. Is this correct? Are there other solutions? How about spring's mongodb-reactive API? I was under the impression the API still depends on tailable cursors or change streams.

Related

How to send mongodb collection updates in Spark streaming context?

How to send the data from a mongodb collection whenever it gets updated to spark streaming context. I have seen socketTextStream to write the data to mongodb. But is there any way to read the stream when the collection gets updated? Is there any way to avoid to implement Custom Receiver for mongodb ? Or If anyone has already implemented, then it will be nice, if someone can share.
Thanking you for any input on this.
One way to achieve this is using the CDC pattern where on one side you have the reactive mongo library that can hook onto a collection and receive events like update, insert, delete from the mongodb and on the other side you have a spark streaming application that will listen on this events.
The transport engine of the events can be any publish/subscriber system (e.g Kafka)

How can reactive programming react to database changes?

I am new in the topic reactive programming and therefore have some questions.
I am developing a small software.
I would like to take the opportunity to get to know reactive programming better.
So I looked at Spring's project-reactor.
I also use R2DBC to reactively access the database.
I would like to know if there is any way that database responds to changes.
Or rather: If a user saves an entry in the database, then servers (for example, RestController) should be notified.
How could I go about doing that?
Enresponding controllers, configuration, entities, etc. I have already implemented.
Sorry for spelling mistakes.
Complement: The updates to the frontend are then made by Server Sent Events.
Basically, what Nick Tsitlakidis mentioned. Let me add a couple of things here.
The typical database query pattern is to query for a number of records. Databases respond with their results and indicate that the query is complete once a server has sent all records to your application. If new records arrive while the query is active or after the query is complete, you do not see these changes immediately because the of isolation and in case the query is complete, then you no longer have a reference to the query.
The feature you're asking is event-driven consumption of data. Databases call this feature continuous queries. Some stores (such as MongoDB with Tailable cursors or Postgres Logical Decoding) come with features that allow keeping a cursor/query open and your client is able to receive continuous updates.
Kafka and JMS also follow the idea of sending (messages) that are consumed typically by listeners or even through a reactive stream.
So it all boils down to the technology that you're using.
My understanding is that reactor can't solve this problem for you on its own. If you want your application to respond (react) on some database change, then you need to identify who's making this change and implement some kind of integration there.
Example, if you have Service1 updating the database, and Service2 needs to respond then Service1 can either call Service2, or, you can emit an event from Service1 and listen for the event from Service2.
The first approach is simpler and easier to implement but it has the disantvantage that is couples the two services. The second is trickier to implement but services are decoupled.
Reactor can help you in both cases :
For events, reactor can give you a way to listen to the events. For example using the reactor-rabbitmq module or the reactor-kafka.
For service-to-service calls, reactor can help you if you use Spring Webflux.
Perhaps you can tell us more about your case so we can provide a more specific solution?

design question: best way to aggregate data from several microservices and show in UI

we have a scenario where we need to aggregate data from several services and show in UI. The current scenario is when an agent logins in, we need to show cases assigned to that agent. Case information needs to be aggregated from several microservices. There would be around 1K cases assigned to agent at a time and all of the needs to be shown to agent so that he can perform sorting based on certain case data.
What be best approach to show data in this scenario? should we do API calls to several services for each case and aggregate and show ? Or there are better approaches to achieve this.
No. You'll certainly not call multiple APIs to aggregate data on runtime. Even if you call the apis parallely, it will be a huge latency.
You need to pre-aggregate the case details and cache them in a distributed caching system (e.g. Redis or memcached) using a streaming platform (e.g. Kafka). Also, store the pre-aggregated case details in a persistent database. Basically, it's a kind of materialized views.
Caching will enable you to serve the case details fast to the user without any noticeable latency. And streaming will help you to keep the cache and DB aggregations updated in a near-real time. Storing the materialized view in database will save you from storing everything in memory. You can use an LRU cache. Only the recently used data will be in cache. If you need to show any case data that is not in cache, you'd read it from database and store it in cache for future requests.
I recommend you read these two Martin Kleppmann articles here and here

Pushing data from database to UI in realtime

I have a database (MySQL) to which data is being written to. I need to push new records and changed records to UI. A few constraints here: I do not have control on the code which writes to this database and I cannot modify it to write to a queue.
So far, I am reading the DB periodically for changes and new additions (using a last update timestamp) and pushing that data to a mongo db (as I do not want to hit main MySQL server for every request). Then I push these changes to frontend using cramp (ruby framework) and server sent events. To maintain per user queue, I have redis in the mix.
I realize that this is a convulated way of doing realtime push. I was wondering if there is a more neat solution to this mess.
If you want to push data realtime from the server, then make use of technologies that provide real time access. I would recommend you to make use of Websockets.
The only issue is websockets is not supported by all the browsers, to take care of this you can use the available frameworks built over websockets that provide fallback to protocols supported by the browsers such as long polling, streaming etc. Following are the frameworks which I would suggest to use:
Atmosphere framework - https://github.com/Atmosphere/atmosphere
Play framework! - http://www.playframework.org/

CQRS & ElasticSearch - using ElasticSearch to build read model

Does anyone use ElasticSearch for building read model in CQRS approach? I have some questions related to such solution:
Where do you store your domain
events? In JDBC database? In
ElasticSearch?
Do you build indexes by event handlers that processes domain events or using ElasticSearch River functionality?
How do you handle complete rebuild of view model - for example in case when view is corrupted? Do you process all events to rebuid view?
Where the authoritative repository for your domain event is located is an implementation detail. For example, you could store serialized versions on S3 or CouchDB or any other number of storage implementations. The easiest if you're just getting started is a relational database.
Typically people use event handlers that understand the business intent behind each message and can then properly denormalize the message into the read model structure appropriate for the needs of your views.
If the view model is ever corrupted or perhaps you have a bug in a view model handler, there are a few simple steps to follow after fixing the bug:
Temporarily enqueue the flow of events arriving from the domain--these are the typical messages that are being published as your domain is doing work. We still want these messages, but not just yet. This could be done by turning off any message bus or not connecting to your queuing infrastructure if you use one.
Read all events from event storage. As each event is received (this can be done through a simple DB query), run each message through the appropriate message handler. Make sure that you keep track of the last 10,000 (or so) identifiers for all messages processed.
Now reconnect to your queues and start processing normally. If the identifier for the message has been seen, drop the message. Otherwise, process it normally.
The reason for tracking identifiers is to avoid a race condition where you're getting all events from the event store but the same message is coming across through the message queue.
Another technique that's highly related, but involves keeping track of all message identifiers can be found here: http://blog.jonathanoliver.com/2011/03/removing-2pc-two-phase-commit.html