How to query axon aggregates - aggregate

Is there a way to see the current state of the aggregates stored in axon?
Our application uses a Oracle backed axon event store.
I tried querying the domainevententry and snapshotevententry tables, but they are empty.

Is there a way to see the current state of the aggregates stored in axon?
In short, yes, although it is not recommended. Granted, if you are planning to employ CQRS. CQRS, or Command-Query Responsibility Separation, dictates that the Command Model and the Query Model are separate.
The aggregate support Axon delivers supplies an easy means to construct a Command Model. As the name suggests, it's intended for commands. On the flip side, you have Query Models, which are designed for queries. AxonIQ has this to say on CQRS; maybe that clarifies some things.
I tried querying the domainevententry and snapshotevententry tables, but they are empty.
That's interesting on its own account! When you publish events in Axon, either through the AggregateLifecycle#apply(Object...) or EventGateway#publish(Object...) method, the published event should end up in your domain_event_entry table. If that's not the case, then either your JPA/JDBC configuration has a misser or some other exceptions occurring in your application.
Would you be able to update your issue with samples of your configuration and/or stack traces that you are seeing?
Replaying production issues locally
What I've done in the past to be able to replay behavior occurring in a production environment is by loading the Aggregate's event stream from that environment into a local dev/test event store. To be able to query this, you only need the aggregate identifier. As the aggregate identifier is indexed, retrieving all events for a specific aggregate (differently named, the aggregate stream) is straightforward.
By doing so, I could run the application locally to flow through the aggregate step-by-step. This gave the benefit of knowing exactly which event caused what state change, leading to the problematic scenario.
However, why your events are not present in your domainevententry is unclear to me. If you're still facing issues with that, I still recommend that you update the question with more specifics on your project.

Related

What is the difference between MongoDB Realm Triggers and MongoDB Atlas Triggers?

So both of them are part of MongoDB features that I think have common nature. In my case, every time a document is created or updated, it will trigger a function that will update the document field with Date.now() timestamp.
It can be achieved using a trigger, but there are 2 ways to do it, and I am not sure which one is suitable to choose. What is the difference between MongoDB Realm Trigger and MongoDB Atlas Trigger? Advantages over each other?
Thank You
They are inherently similar. The best way to think of it is two different GUI's that uses the same(ish) backend code.
Apart from authentication triggers that only exist on realm the other two types both work in similar ways.
They are both "triggered" by the same event (type) wether it be a cron expression or a database event and they both execute a realm based function (either pre-saved in realm or saved on the trigger in atlas. So the only actual difference comes from the configuration options, for example:
atlas trigger can connect to multiple clusters while realm must choose a single one.
realm has a project option available.
realm accepts a function name (as it's already saved) while atlas requires the actual code saved. (If for some reason you want the same code executing for different triggers realm is more stable as updating 4 different triggers due to code change is not fun)
You can compare the confirguration options yourself here for realm and here for basic trigger
I have personally haven't noticed a difference between the two (nor did I look that deep into it), I feel that Apart from inside knowledge from an engineer in Mongo that can spill the beans whether or not there's an actual performance different or if both triggers use the same code base there is not much to say on the subject.

Relational DB in microservices

I have a monolithic application that currently uses a PostgreSQL DB and the schemas are set up as you would expect for most relational databases with various table data being linked back to the user via FKs on the user_id.
I'm trying to learn more about microservices am trying to migrate my python API to a microservice architecture. I have a reasonable understanding of how I'm going to break up the larger app into smaller parts, however, I'm not entirely clear on how I'm supposed to deal with the data side of things.
I understand that one single large DB is against general design principles of microservices but I'm not clear on what the alternative would be.
My biggest concern is cascading across individual databases that would hold microservice data. In a simple rdb, I can just cascade on delete and the DB will handle the work across the various tables. In the case of microservices, how would that work? Would I need to have a separate service that handles deleting user data across the other service DBs?
I don't really understand how I would migrate a traditional application with a relational DB to a microservice architecture?
EDIT:
To clarify - a specific architectural/design problem I'm facing is as follows:
I have split up my application into a few microservices. The ones that are in my mind still relational are:
Geolocation - A service that checks geometry data, records in PostGIS, and returns certain information. A primary purpose is to record the location of a particular user for referencing later
Image - A simple upload service to upload images and store meta data in the db.
Load-Image - A simple service that returns a random set of images based on parameters such as location, and user profile data such as Age, Gender, etc
Profile - A service that simply manages user data such as Age, Gender, etc
Normally, these three items would have a table each in a larger db rather than their own individual dbs. Filtering images by say location and age is a very simple JOIN and filter.
How would something like that work in a microservice architecture? If the data is held in different dbs entirely how would I setup the logic to filter the data? I could duplicate data that doesn't change often like profile info and add it to a MongoDB document that would contain image data including user_id and profile data - however, location data can change regularly and constant updates doesn't sound practical.
What would be the best approach? Or should I stick with a shared RDBMS for just those few services?
It comes down to the duplication of data, why we want it, and how we manage it.
Early in our careers we were taught about the duplication of data to make it redundant, for example in database replication or backups. We were also taught that data can be modelled in a relational manner, with constraints enforcing the integrity of the model. In fact, the integrity of the model is sacrosanct. Without integrity, how can you have consistency? The answer is that you can't. Kinda.
When you work with distributed systems and service orientation, you do so because you want to minimise interactions thereby reducing coupling between components. However, there is a cost to this. The more distributed your architecture, the less coupling it has, and the more duplication of data will be necessary. This is taken to an extreme with microservices, where effectively the same data may be present in many different places, in varying degrees of consistency.
Instead of being bad, however, in this context data duplication is an essential feature of your system. It is an enabler of an architectural style with many great benefits. Put another way, without duplication of data, you get less distribution, you get more coupling, which makes your system more expensive to build, own, and change.
So, now we understand duplication of data and why we want it, let's move onto how we manage having lots of duplication. Let's try an example:
In a relational database, let's say we have a table called Customers, which contains a customer ID, and customer details, and another table called Orders which contains the order ID, customer ID, and the order details. Let's say we also have an ordering application, which needs to delete all the customer's orders if the customer is deleted for GDPR.
Because we are migrating our system to microservices, we decide to create a service called Customers.
So we create a service with the following operation:
DELETE /customers/{customerId} - deletes a customer
We create another service called Orders with the following operations:
GET /orders/customers/{customerId} - gets all the orders for a customer
DELETE /orders/{orderId} - deletes an order
We build a UX screen for deleting a customer. The UX first calls the orders service to get all the orders for the customer. Then it iterates over the list of orders, calling the orders service to delete the order. Then it calls the customers service to delete the user.
This example is very simplistic, but as you can see, there is no option but to orchestrate the "Delete Customer" operation from the caller, which in this case is the user interface. Of course, what would be a single atomic transaction in a database does not translate to multiple HTTP/s calls, so it is possible that some of the calls may not succeed, leaving the system as a whole in an inconsistent state. In this instance the inconsistency would need to be resolved via some recovery mechanism.
In a microservice architecture, we have both the option, either use database per service or a shared database. There are advantages and disadvantages to both the pattern. Database per service architecture is the best practice but when the monolithic application has lots of function, procedure or database-specific feature on database level then we can use the Shared database approach, I know this is not the best practice if you have time and bandwidth then you should go for database per service.
As your concern is cascading over individual databases, you need to remove cascading from the database and implement global transaction handling in your application and execute all cascading related queries from that transaction.

I'm accessing a mongoDb database using the repository patterns. Where should I check for data Integrity?

I'm kind of new to mongodb and NoSQL data design in general.
I'm building a mongodb database that will have some denormalized data. For exemple, my "User" documents contains a reference (just the id) to zero or more "Article" documents and my Article documents contains references to zero or more users.
Since I'm using the repository pattern, no parts of my Data Access Layer knows about Articles AND Users. Where in my code should I check to make sure that all my documents are consistent with each others? Should I simply let the DAL's users code do the checks?
Would it be a good idea to have a Data Integrity Script run once in a while to check if everything is consistent?
Here is Microsoft's write-up on the Repository Pattern. From that document:
Use a repository to separate the logic that retrieves the data and maps it to the entity model from the business logic that acts on the model.
You have a couple of questions:
Where in my code should I check to make sure that all my documents are consistent with each others?
Based on the statement above, I think it's clear that this logic belongs in the Repository. The relation between these objects only exists at the layer of "business logic", the database cannot enforce these types of rules.
Should I simply let the DAL's users code do the checks?
How could they? As the writer of the repository, you are the DAL user. For MongoDB, the DAL is basically the driver.
You could possibly write a wrapper around the driver that would wrap the multiple writes in some form of transactions. But you would have to write this, MongoDB has no notion of transactions.
Would it be a good idea to have a Data Integrity Script run once in a while to check if everything is consistent?
At the end of the day, whoever writes the repository is going to be responsible for the integrity of the data. Such a script might be useful, but it would definitely suck a lot of CPU cycles.
My suggestion for N:M mappings is to start building some basic blocks for handling the multiple writes that are required to keep these two in sync. One idea is to Queue the changes and let a background job make the updates. This way you don't have to worry about multiple writes and roll-backs causing bad data.

CQRS: Synchronizing the Write and Read databases

Can anyone please give me some direction in regards to various ways to
synchronize the Write and Read databases?
What are different technologies out there, and how do you evaluate each, in
terms of realiability, performance, cost to implement, etc.
Typically in CQRS, the write DB is used to store transitional data for long running processes (sagas). If you are synchronizing the read and write DB (I'm assuming you mean both ways), you might be doing something wrong.
For a long running process where a service expects multiple messages, it needs a way to temporary store data before the all the messages arrives. An example of this is customer registration where an approval from manager, which takes a week to process, is required. The service needs a way to temporarily store the customer information before the approval arrives. This is where the write DB is used to store this piece of temporary data. Note that before the customer is approved, nothing is written to the read DB yet.
When the approval finally arrives, the service will take the customer information from the write DB, complete the registration process and write it to the read DB. At this time, the temporary customer information in the write DB has done its job and can be removed from the write DB. Notice that there isn't any two-way sync'ing involved.
For simpler process such as change customer first name, the change can be written to the read DB right away. Writing to the write DB is not required because there is no temporary data in this case.
Query model need not be consistent.. it needs to be eventually consistent. Query model is also the view model, i.e. tables are already joined as per requirement of user interface. So you can use even an in memory cache, or like Redis.
Command side is like command objects which contain all relevant information to update database. These objects may fill up a messaging queue. The command objects are processed by a command processor which transactionally updates the query cache and the write database. The write database can be an RDBMS.. but as is apparent, should be write optimized like MongoDB.
You can update read database via a messaging system too.
Some good messaging systems for this purpose are RabbitMQ and 0MQ.
If you, like me see the read store as the db that the Query service use (and its denormalized)
and the write db as the database where the Domain events are stored , then if you need to Synch them to a particular moment then what you can do is just replay the events that you have stored.
In the case you want to be as up to date as possible then you need not to restrict by version
If you are using CQRS, then probably you will have a repository that looks somewhat like this
public interface IRepository<T> where T : AggregateRoot, new()
{
void Save(AggregateRoot aggregate, int expectedVersion);
T GetById(Guid id);
T GetById(Guid id, int version);
}
Hope this helps
Cheers

Database last updated?

I'm working with SQL 2000 and I need to determine which of these databases are actually being used.
Is there a SQL script I can used to tell me the last time a database was updated? Read? Etc?
I Googled it, but came up empty.
Edit: the following targets issue of finding, post-facto, the last access date. With regards to figuring out who is using which databases, this can definitively monitored with the right filters in the SQL profiler. Beware however that profiler traces can get quite big (and hence slow/hard to analyze) when the filters are not adequate.
Changes to the database schema, i.e. addition of table, columns, triggers and other such objects typically leaves "dated" tracks in the system tables/views (can provide more detail about that if need be).
However, and unless the data itself includes timestamps of sorts, there are typically very few sure-fire ways of knowing when data was changed, unless the recovery model involves keeping all such changes to the Log. In that case you need some tools to "decompile" the log data...
With regards to detecting "read" activity... A tough one. There may be some computer-forensic like tricks, but again, no easy solution I'm afraid (beyond the ability to see in server activity the very last query for all still active connections; obviously a very transient thing ;-) )
I typically run the profiler if I suspect the database is actually used. If there is no activity, then simply set it to read-only or offline.
You can use a transaction log reader to check when data in a database was last modified.
With SQL 2000, I do not know of a way to know when the data was read.
What you can do is to put a trigger on the login to the database and track when the login is successful and track associated variables to find out who / what application is using the DB.
If your database is fully logged, create a new transaction log backup, and check it's size. The log backup will have a fixed small lengh, when there were no changes made to the database since the previous transaction log backup has been made, and it will be larger in case there were changes.
This is not a very exact method, but it can be easily checked, and might work for you.