PostgreSQL Array Contains vs JOIN (performance)

PostgreSQL Array Contains vs JOIN (performance) - postgresql

I have a model in which a person can receive a gift for attending one event or receive multiple gifts for attending multiple events. The gift to person or multiple gifts to person is considered one transaction in both cases. I'm using PostgreSQL to implement this model.
For example,
if you attend to certain event, you will receive a gift (a single transaction of gift to person).
And another example, you attend to a set of events therefore you receive a set of gifts for these events (in a single transaction of gifts to person).
So, in the majority of cases, only one gift to one person will be transacted. But there will be a few cases of the second example.
In order to handle this cases, i have two options,
the first one is use a postgres array and query by array contains,
and the second one is create a new table of transaction_events and make a join to query by event.
I wanted to know which option is more performant and which option the community recommends. Tanking into account that the most transaction will contains only one event and also that i cannot change the transactions model.

The second option will perform better, and it has the added benefit that you can have foreign key constraints to enforce data integrity.
In most cases, it is a good idea to avoid composite types like arrays or JSON in the database.

Related

Should I use one projection per entity or category?

I am coding a new application usign CQRS+ES architecture with Event Store DB. In my app, I have the following streams:
user-1
user-2
user-3
...
Each stream contains all events regarding a given user.
I am now creating a projection called user-account, which consists in basic data regarding my user's account (like first name, email, and others)
What is the optimal way to design that projection?
I should have a single projection for each user, creating projections called:
user-account-1
user-account-2
user-account-3
...
Or a single projection for all user-accounts? Being it a key-value pair record (that may store millions of keys in the future)

You can go with one stream per user. Projections are like dimensions. A user can exist in different "dimensions" (CDC naming) and have a different shape in each.
Read https://www.eventstore.com/blog/the-cost-of-creating-a-stream

First, subscribing to individual streams (aggregate or entity streams) won't ever work. You will end up with thousands of subscriptions, which are sitting there doing nothing (how often the user details change?).
The category stream is one way to go, you will project all the events for all the users. Not only you need just one subscription for all your users, but you'll also have more interesting possibilities like "users pending activation" or "blocked users" projections.
I prefer subscribing to $all and apply server-side filtering if necessary. It might have a bit of overhead as you receive more events than you need, but you get so much more power by combining events from different aggregates.
I wrote a little about it in Eventuous documentation.

Maintain reference between aggregates

I'm trying to wrap my head around how to maintain id references between two aggregates, eg. when an event happens on either side that affects the relationship, the other side is updated as well in an eventual consistent manner.
I have two aggregates, one for "Team" and one for "Event", in the context of a festival with the following code:
#Aggregate
public class Event {
#AggregateIdentifier
private EventId eventId;
private Set<TeamId> teams; // List of associated teams
... protected constructor, getters/setters and command handlers ...
}
#Aggregate
public class Team {
#AggregateIdentifier
private TeamId teamId;
private EventId eventId; // Owning event
... protected constructor, getters/setters and command handlers ...
}
A Team must always be associated to an event (through the eventId). An event contains a list of associated teams (through the team id set).
When a team is created (CreateTeamCommand) on the Team aggregate, I would like the TeamId set on the Event aggregate to be updated with the team id of the newly created team.
If the command "DeleteEventCommand" on the Event aggregate is executed, all teams associated to the event should also be deleted.
If a team is moved from one event to another event (MoveTeamToEventCommand) on the Team aggregate, the eventId on the Team aggregate should be updated but the TeamId should be removed from the old Event aggregate and be added to the new Event aggregate.
My current idea was to create a saga where I would run SagaLifecycle.associateWith for both the eventId on the Event aggregate and the teamId on the Team aggregate with a #StartSaga on the "CreateTeamCommand" (essentially the first time the relationship starts) and then have an event handler for every event that affects the relationship. My main issue with this solution is:
1: It would mean I would have a unique saga for each possible combination of team and event. Could this cause trouble performance wise if it was scaled to eg. 1mil events with each event having 50 teams? (This is unrealistic for this scenario but relevant for a general solution to maintain relationships between aggregates).
2: It would require I had custom commands and event handlers dedicated to handle the update of teams in team list of the Event aggregate as the resulting events should not be processed in the saga to avoid an infinite loop of updating references.
Thank you for reading this small story and I hope someone can either confirm that I'm on the right track or point me in the direction of a proper solution.

An event contains a list of associated teams (through the team id set).
If you mean "An event aggregate" here by "An event", I don't believe your event aggregate needs team ids. If you think it does, it would be great to understand your reasoning on this.
What I think you need is though your read side to know about this. Your read model for a single "Event" can listen on CreateTeamCommand and MoveTeamToEventCommand as well as all other "Event" related events, and build up the projection accordingly. Remember, don't design your aggregates with querying concerns in mind.
If the command "DeleteEventCommand" on the Event aggregate is executed, all teams associated to the event should also be deleted.
A few things here:
Again, your read side can listen on this event, and update the projections accordingly.
You can also start performing validation on relevant command handlers for the Team aggregate to check whether the Event exists or not before performing the operations. This won't have exact sync, but will cover for most cases (see "How can I verify that a customer ID really exists when I place an order?" section here).
If you really want to delete the associated Team aggregates off the back of a DeleteEventCommand event, you need to handle this inside a Saga as there is no way for you to be able to perform this in an atomic way w/o leaking the data storage system specifics into your domain model. So, you need certain retry and idempotency needs here, where a saga can give you. It's not exactly what you are suggesting here but related fact is that a single command can't act on a set of aggregates, see "How can I update a set of aggregates with a single command?" section here.

Fetching large documents from mongodb

I have a collection in mongodb that stores activities of customers like product_view, added_to_cart etc with productId. I need this data to display products to my customer when he visits next.
Right now I am thinking to store all data of a customer in a single document,such as with customer_id as key and corresponding activities in array like product_view activities in product_view array etc.This will be fast to fetch for me as all data of a customer will be in one key only, but my consideration is that data size will go on increasing always this way. Moreover I may need to check say last 50-100 activities of a customer only. For that too I need to fetch the entire document.
What will be the best way to store this data. Request for data will be very very frequent. How can I manage response time ?

Your question answers itself. Have customer activity as separate collection with reference to customerId. Any time customer visits, you know customerId, hence can apply filter/aggregate operations to get whatever you want.
This way you can do paginated fetch of customer activities.

How to get list of aggregates using JOliviers's CommonDomain and EventStore?

The repository in the CommonDomain only exposes the "GetById()". So what to do if my Handler needs a list of Customers for example?

On face value of your question, if you needed to perform operations on multiple aggregates, you would just provide the ID's of each aggregate in your command (which the client would obtain from the query side), then you get each aggregate from the repository.
However, looking at one of your comments in response to another answer I see what you are actually referring to is set based validation.
This very question has raised quite a lot debate about how to do this, and Greg Young has written an blog post on it.
The classic question is 'how do I check that the username hasn't already been used when processing my 'CreateUserCommand'. I believe the suggested approach is to assume that the client has already done this check by asking the query side before issuing the command. When the user aggregate is created the UserCreatedEvent will be raised and handled by the query side. Here, the insert query will fail (either because of a check or unique constraint in the DB), and a compensating command would be issued, which would delete the newly created aggregate and perhaps email the user telling them the username is already taken.
The main point is, you assume that the client has done the check. I know this is approach is difficult to grasp at first - but it's the nature of eventual consistency.
Also you might want to read this other question which is similar, and contains some wise words from Udi Dahan.

In the classic event sourcing model, queries like get all customers would be carried out by a separate query handler which listens to all events in the domain and builds a query model to satisfy the relevant questions.
If you need to query customers by last name, for instance, you could listen to all customer created and customer name change events and just update one table of last-name to customer-id pairs. You could hold other information relevant to the UI that is showing the data, or you could simply hold IDs and go to the repository for the relevant customers in order to work further with them.

You don't need list of customers in your handler. Each aggregate MUST be processed in its own transaction. If you want to show this list to user - just build appropriate view.

Your command needs to contain the id of the aggregate root it should operate on.
This id will be looked up by the client sending the command using a view in your readmodel. This view will be populated with data from the events that your AR emits.

Help with first Core Data project

This is my first project which I've encountered that I can't get by on NSUserDefaults peppered with some NSCoding protocol. I've been asked to write some POS software.
Essentially, the App needs to store a bunch of products, prices and sales accounts. The user should be able to add items and accounts, and track the balance of accounts over time. The balance of an account should be able to be carried over from one "Session" (time period) to the next.
I'm comfortable with the concepts, but I'd like to be confident that I'm modeling this right. Here's how I've modeled my data. I'd like to know if I've done this properly or if there are any glaring errors/omissions.
I've created an "Account" Entity, which has the following properties:
First Name
Last Name
Account ID
Group
There is a relationship to the transaction entity.
I've created an entity for each Session. Again, a session is just like a fiscal month. The session will have a custom name and an ID.
Session ID
Session Name
There is a relationship to all of the accounts that are applied to that session.
There are of course, products, which have a name and ID. There is also a relationship to the "price" object, so I can change the prices without affecting balances.
Please see this screenshot from Xcode 4 which explains my model in its entirety:
Edit:
Looking at this, it seems that I'm missing some important info, such as dates of transactions etc. That said, am I on the right track?

It has been my experience that point of sale transactions list all the data that is necessary to recreate the receipt in three tables, a header (think date of sale, singular tracking entity), a set of records for all items being sold (linking back to the sale header), and a set of records for all the methods of payment (again linked back to the sale header).
This will give you the opportunity to rebuild the individual transactions in the future. Also, this is a simplistic model, but should suffice for what you're asking. Nominally yo uwould also keep track of applied discounts on a per-line-item basis, and per-invoice discounts, and per-group discounts, and etc.
What's the relationship between sessions and transactions?

You probably don't need to have an entity for price, as it will likely just be a float. I'd recommend adding a price attribute to your product entity instead.
I don't know if transactions will need names or not, I suppose if you want to have notes then they should.
Also transactions should probably have a to-many relationship with products.
Will this be used on a single device or will there be many users? If each user (account) is responsible for its own data then it may make more sense to have transactions/session rather than transactions/user.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse