Why is a GUID used as the type for the Id fields in the EventSources and Events tables in an EventStore database? - cqrs

Why is uniqueidentifier (which equates to a GUID in .NET) used as the type for the Id fields in the EventSources and Events tables?
Would it not be faster to use an integer type (like bigint in SQL Server) that functioned as an identity, so that the database could assign the Id as the inserts are performed?
I am a complete newb when it comes to Event Sourcing and CQRS, so I apologize if this has been asked and answered and my searching hasn't been correct enough to find the answer.

Note: Answers 2 and 4 assume that you are following a few basic principles of Domain-Driven Design.
IDs should be unique across different aggregate types and even across different bounded contexts
Every aggregate must always be in a valid state. Having a unique ID is part of that. This means you couldn't do anything with the aggregate before the initial event has been stored and the database generated and returned the respective ID.
A client that sends the initial command often needs some kind of reference to relate the created aggregate to. How would the client know which ID has been assigned? Remember, commands are void, they don't return anything beyond ack/nack, not even an ID.
A domain concern (identification of an entity) would heavily rely on technical implementation details (This should actually be #1)

Related

Ensure consistence for foreignkeys/ownerships in microservices

I have two bounded contexts which lead into two micro services
PersonalManagement
DocumentStorage
I keep the entity model simple here.
PersonalManagement:
Entity/Table Person:
#id - int
tenantId - int
name - string
...
DocumentStorage
Entity/Table Document:
#id - int
tenantId - int
personId - int
dateIssued - string
...
You need to know that before the application is started - a company (tenant) is choosen to define the company context.
I want to store a new document by using REST/JSON.
This is a POST to /tenants/1/persons/5/documents
with the body
{
"dateIssued" : "2018-06-11"
}
On the backend side - I validate the input body.
One validation might be "if the person specified exists and really belongs to given tenant".
Since this info is stored in the PersonalManagement-MicroService, I need to provide an operation like this:
"Does exists (personId=5,tenantId=1)"
in PersonalManagement to ensure consistence since caller might be evil.
Or in general:
What is best practise to check "ownership" of entities cross database in micro services
It might also be an option that if a new person is created (tenantId,personId) this information is stored additionally(!) in DocumentStorage but wanna avoid this redundancy.
I'm not going to extend this answer into whether your bounded contexts and service endpoints are well defined since your question seems to be simplifying the issue to keep a well defined scope, but regarding your specific question:
What is best practise to check "ownership" of entities cross database in micro services
Microservice architectures use strive for a "share nothing" principle. And that usually extends from code base to data base. So you're right to assume you're checking for this constraint "cross-DB" in your scenario.
You have a few options on this particular case, each with their set of drawbacks:
1) Your proposed "Does exists (personId=5,tenantId=1)" call from the DocumentContext to the PersonContext is not wrong on itself, but you will generate a straight dependency between these two microservices, so you must ask yourself whether it seems ok for you not to accept new documents if the PersonManagement microservice is offline.
In specific situations, such dependencies might be acceptable but the more of these you have, the less your microservice architecture will behave as one and more like a "distributed monolith" which on itself it pretty much an anti-pattern.
2) The other main option you have is that you should recognize that the DocumentContext is a very much interested in some information/behavior relating to People so it should be ok with modelling the Person Entity inside its boundaries.
That means, you can have the DocumentContext subscribe for changes in the PersonContext to be aware of which People currently exist and what their characteristics are and thus being able to keep a local copy of such information.
That way, your validation will be kept entirely inside the DocumentContext which will have its operation unhindered by eventual issues with the PersonContext and you will find out your modelling of the document related entities will be much cleaner than before.
But in the end, you will also discover that a "share nothing" principle usually will cost you in what seems to be redundancy, but it's actually independence of contexts.
just for the tenancy check , this can be done using the JWT token (token which can store tenancy information and other metadata).
Let me provide another example of the same scenario which can't be solved with JWT.
Assume one Customer wants to create a Order and our system wants to check whether the customer exist or not while creating the order.
As Order and Customer service are separate, and we want minimal dependencies between them, there are multiple sol. to above problems:
create Order in "validating state" and on OrderCreated event check for customer validity and update customer state to "Valid"
another one before creating order check for the customer (which is not the right way as it creates dependency, untill and unless very critical do not do it)
last way is the let the order be created , somebody who will final check the order for delivery will verify customer will remove

To relate one record to another in MongoDB, is it ok to use a slug?

Let's say we have two models like this:
User:
_ _id
- name
- email
Company:
- _id
_ name
_ slug
Now let's say I need to connect a user to the company. A user can have one company assigned. To do this, I can add a new field called companyID in the user model. But I'm not sending the _id field to the front end. All the requests that come to the API will have the slug only. There are two ways I can do this:
1) Add slug to relate the company: If I do this, I can take the slug sent from a request and directly query for the company.
2) Add the _id of the company: If I do this, I need to first use the slug to query for the company and then use the _id returned to query for the required data.
May I please know which way is the best? Is there any extra benefit when using the _id of a record for the relationship?
Agree with the 2nd approach. There are several issues to consider when deciding on which field to use as a join key (this is true of all DBs, not just Mongo):
The field must be unique. I'm not sure exactly what the 'slug' field in your schema represents, but if there is any chance this could be duplicated, then don't use it.
The field must not change. Strictly speaking, you can change a key field but the only way to safely do so is to simultaneously change it in all the child tables atomically. This is a difficult thing to do reliably because a) you have to know which tables are using the field (maybe some other developer added another table that you're not aware of) b) If you do it one at a time, you'll introduce race conditions c) If any of the updates fail, you'll have inconsistent data and corrupted parent-child links. Some SQL DBs have a cascading-update feature to solve this problem, but Mongo does not. It's a hard enough problem that you really, really don't want to change a key field if you don't have to.
The field must be indexed. Strictly speaking this isn't true, but if you're going to join on it, then you will be running a lot of queries on it, so you'll need to index it.
For these reasons, it's almost always recommended to use a key field that serves solely as a key field, with no actual information stored in it. Plenty of people have been burned using things like Social Security Numbers, drivers licenses, etc. as key fields, either because there can be duplicates (e.g. SSNs can be duplicated if people are using fake numbers, or if they don't have one), or the numbers can change (e.g. drivers licenses).
Plus, by doing so, you can format the key field to optimize for speed of unique generation and indexing. For example, if you use SSNs, you need to check the SSN against the rest of the DB to ensure it's unique. That takes time if you have millions of records. Similarly for slugs, which are text fields that need to be hashed and checked against an index. OTOH, mongoDB essentially uses UUIDs as keys, which means it doesn't have to check for uniqueness (the algorithm guarantees a high statistical likelihood of uniqueness).
The bottomline is that there are very good reasons not to use a "real" field as your key if you can help it. Fortunately for you, mongoDB already gives you a great key field which satisfies all the above criteria, the _id field. Therefore, you should use it. Even if slug is not a "real" field and you generate it the exact same way as an _id field, why bother? Why does a record have to have 2 unique identifiers?
The second issue in your situation is that you don't expose the company's _id field to the user. Intuitively, it seems like that should be a valuable piece of information that shouldn't be given out willy-nilly. But the truth is, it has no informational value by itself, because, as stated above, a key should have no actual information. The place to implement security is in the query, ensuring that the user doing the query has permission to access the record / specific fields that she's asking for. Hiding the key is a classic security-by-obscurity that doesn't actually improve security.
The only time to hide your primary key is if you're using a poorly thought-out key that does contain useful information. For example, an invoice Id that increments by 1 for each invoice can be used by someone to figure out how many orders you get in a day. Auto-increment Ids can also be easily guessed (if my invoice is #5, can I snoop on invoice #6?). Fortunately, Mongo uses UUIDs so there's really no information leaking out (except maybe for timing attacks on its cryptographic algorithm? And if you're worried about that, you need far more in-depth security considerations than this post :-).
Look at it another way: if a slug reliably points to a specific company and user, then how is it more secure than just using the _id?
That said, there are some instances where exposing a secondary key (like slugs) is helpful, none of which have to do with security. For example, if in the future you need to migrate DB platforms and need to re-generate keys because the new platform can't use your old ones; or if users will be manually typing in identifiers, then it's helpful to give them something easier to remember like slugs. But even in those situations, you can use the slug as a handy identifier for users to use, but in your DB, you should still use the company ID to do the actual join (like in your option #2). Check out this discussion about the pros/cons of exposing _ids to users:
https://softwareengineering.stackexchange.com/questions/218306/why-not-expose-a-primary-key
So my recommendation would be to go ahead and give the user the company Id (along with the slug if you want a human-readable format e.g. for URLs, although mongo _ids can be used in a URL). They can send it back to you to get the user, and you can (after appropriate permission checks) do the join and send back the user data. If you don't want to expose the company Id, then I'd recommend your option #2, which is essentially the same thing except you're adding an additional query to first get the company Id. IMHO, that's a waste of cycles for no real improvement in security, but if there are other considerations, then it's still acceptable. And both of those options are better than using the slug as a primary key.
Second way of approach is the best,That is Add the _id of the company.
Using _id is the best way of practise to query any kind of information,even complex queries can be solved using _id as it is a unique ObjectId created by Mongodb. Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s). We may populate a single document, multiple documents, plain object, multiple plain objects, or all objects returned from a query.

Relational database schema for event sourcing

I am trying to store domain events in a postgres database. I am not sure in many things, and I don't want to redesign this structure later, so I am seeking for guidance from people who have experience with event sourcing. I have currently the following table:
domain events
version - or event id, integer sequence, helps to maintain order by replays
type - event type, probably classname with namespace
aggregate - aggregate id, probably random string for each aggregate
timestamp - when the event occured
promoter - the promoter of the event, probably user id
details - json encoded data about the properties
What I am not sure:
Should I store the promoter of the domain event?
It could help to find a compromised account by security breaches, but
I don't know what to store for example by a CRONjob.
In what format should I store the event type?
Should I add a table with event types, or are the class names enough?
Should I add event groups?
I got confused by the definition of bounded contexts. As far as I know, every aggregate can have multiple bounded contexts, so I can use different aspects of a single aggregate in multiple modules. That sounds nice, since for example accounts can be related to many things, including authentication, authorization, user profile, user posts, user contracts, and so on...
What I am unsure, that a domain event can have multiple bounded contexts, or just a single one, so should I store event contexts as well? (for cases I want to replay events related to a single context)
How to implements so many properties in a single aggregate class, should I use some kind of composition?
1.Should I store the promoter of the domain event?
I think it's more flex if you store the promoter as part of the event payload instead of meta data. The security concerns should be handled outside the domain. Not every event is raised by a user, although you could make a fake one for them(a SysAdmin for CronJob).
For example:
ManualPaymentMadeEvent { //store this object as details in your schema
amount,
by_user//In this case, developers can determine whether store the promoter case by case
}
2.what format should I store the event type?
Should I add a table with event types, or are the class names enough?
Should I add event groups?
I think class names is enough. Adding another table complicates event read(by join tables), and I think it only adds value when the class names is renamed(Update one row in event type table). But I think it does not add much trouble by using
update domain_events set
aggregate_type = 'new class name'
where aggregate_type = 'origin class name'
I'm not sure that I understand event groups, could you add more explanation?
3.What I am unsure, that a domain event can have multiple bounded contexts, or just a single one, so should I store event contexts as
well?
Sometimes the events are used to integrate multiple contexts. But each event is raised only in one context. For example, A ManualPaymentMadeEvent is raised in ordering context, and an event listner in shipping context also consumes it, regards it as the trigger of start shipping.
I prefer to use per database user(oracle term) per context. shipping.domain_events for shipping context and ordering.domain_events for ordering context.
Here is the schema in axon-framework which might help
create table DomainEventEntry (
aggregateIdentifier varchar2(255) not null,
sequenceNumber number(19,0) not null,
type varchar2(255) not null, --aggregate class name
eventIdentifier varchar2(255) not null,
metaData blob,
payload blob not null, -- details
payloadRevision varchar2(255),
payloadType varchar2(255) not null, --event class name
timeStamp varchar2(255) not null
);
alter table DomainEventEntry
add constraint PK_DomainEventEntry primary key (aggregateIdentifier, sequenceNumber, type);

Should I use ObjectID or uid(implemented by myself) to identify user?

I am new to mongodb and database.
Implement a function to make uid and use the local ObjectId.
Which is better?
You should leave ObjectID generation to the clients/drivers. This makes sure that generated IDs are unique among many things, such as time, server and process. Using the standard ObjectID also means that methods implemented by drivers (such as getTimestamp()) work.
However, if you are thinking of using your own type of ID for the _id field (ie, not the standard ObjectID type), then that makes a viable choice. For example, if you want to store information about a twitter user, then using the user's twitter ID as _id value makes perfect sense. Personally, I try to rely on the ObjectID type as little as I have to, as often collections will have a field in each document already that uniquely identifies each document.
This depends on three things:
Its source
Where and how are you using the user ID?
Personal opinion.
My personal opinion is that the object ID is good enough, however, getting back to the first and second point.
If this ID comes or is to be used in another database like an SQL database you might find using an incrementing ID a good idea, but SQL and other techs do fully support the object ID in the hexadecimal form.
If this ID is something that can be used much like an account number (think of your account number for car insurance when you phone them up) you might find an object ID too difficult for your users to remember/recounter as such a more human friendly ID might be more applicable here.
So it really depends on how this ID is being used.

RESTful design: when to use sub-resources? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
When designing resource hierarchies, when should one use sub-resources?
I used to believe that when a resource could not exist without another, it should be represented as its sub-resource. I recently ran across this counter-example:
An employee is uniquely identifiable across all companies.
An employee's access control and life-cycle depend on the company.
I modeled this as: /companies/{companyName}/employee/{employeeId}
Notice, I don't need to look up the company in order to locate the employee, so should I? If I do, I'm paying a price to look up information I don't need. If I don't, this URL mistakenly returns HTTP 200:
/companies/{nonExistingName}/employee/{existingId}
How should I represent the fact that a resource to belongs to another?
How should I represent the fact that a resource cannot be identified without another?
What relationships are sub-resources meant and not meant to model?
A year later, I ended with the following compromise (for database rows that contain a unique identifier):
Assign all resources a canonical URI at the root (e.g. /companies/{id} and /employees/{id}).
If a resource cannot exist without another, it should be represented as its sub-resource; however, treat the operation as a search engine query. Meaning, instead of carrying out the operation immediately, simply return HTTP 307 ("Temporary redirect") pointing at the canonical URI. This will cause clients to repeat the operation against the canonical URI.
Your specification document should only expose root resources that match your conceptual model (not dependent on implementation details). Implementation details might change (your rows might no longer be unique identifiable) but your conceptual model will remain intact. In the above example, you'd tell clients about /companies but not /employees.
This approach has the following benefits:
It eliminates the need to do unnecessary database look-ups.
It reduces the number of sanity-checks to one per request. At most, I have to check whether an employee belongs to a company, but I no longer have to do two validation checks for /companies/{companyId}/employees/{employeeId}/computers/{computerId}.
It has a mixed impact on database scalability. On the one hand you are reducing lock contention by locking less tables, for a shorter period of time. But on the other hand, you are increasing the possibility of deadlocks because each root resource must use a different locking order. I have no idea whether this is a net gain or loss but I take comfort in the fact that database deadlocks cannot be prevented anyway and the resulting locking rules are simpler to understand and implement. When in doubt, opt for simplicity.
Our conceptual model remains intact. By ensuring that the specification document only exposes our conceptual model, we are free to drop URIs containing implementation details in the future without breaking existing clients. Remember, nothing prevents you from exposing implementation details in intermediate URIs so long as your specification declares their structure as undefined.
This is problematic because it's no longer obvious that a user belongs
to a particular company.
Sometimes this may highlight a problem with your domain model. Why does a user belong to a company? If I change companies, am I whole new person? What if I work for two companies? Am I two different people?
If the answer is yes, then why not take some company-unique identifier to access a user?
e.g. username:
company/foo/user/bar
(where bar is my username that is unique within that specific company namespace)
If the answer is no, then why am I not a user (person) by myself, and the company/users collection merely points to me: <link rel="user" uri="/user/1" /> (note: employee seems to be more appropriate)
Now outside of your specific example, I think that resource-subresource relationships are more appropriate when it comes to use rather than ownership (and that's why you're struggling with the redundancy of identifying a company for a user that implicitly identifies a company).
What I mean by this is that users is actually a sub-resource of a company resource, because the use is to define the relationship between a company and its employees - another way of saying that is: you have to define a company before you can start hiring employees. Likewise, a user (person) has to be defined (born) before you can recruit them.
Your rule to decide if a resource should be modeled as sub resource is valid. Your problem does not arise from a wrong conceptual model but you let leak your database model into your REST model.
From a conceptual view an employee if it can only exist within a company relationship is modeled as a composition. The employee could be thus only identified via the company. Now databases come into play and all employee rows get a unique identifier.
My advice is don't let the database model leak in your conceptional model because you're exposing infrastructure concerns to your API. For example what happens when you decide to switch to a document oriented database like MongoDB where you could model your employees as part of the company document and no longer has this artificial unique id? Would you want to change your API?
To answer your extra questions
How should I represent the fact that a resource to belongs to another?
Composition via sub resources, other associations via URL links.
How should I represent the fact that a resource cannot be identified without another?
Use both id values in your resource URL and make sure not to let your database leak into your API by checking if the "combination" exists.
What relationships are sub-resources meant and not meant to model?
Sub resources are well suited for compositions but more generally spoken to model that a resource cannot exist without the parent resource and always belongs to one parent resource. Your rule when a resource could not exist without another, it should be represented as its sub-resource is a good guidance for this decision.
if a subresource is uniquely identifiable without its owning entity, it is no subresource and should have its own namespace (i.e. /users/{user} rather than /companies/{*}/users/{user}).
Most importantly: never ever ever everer uses your entity's database primary key as the resource identifier. that's the most common mistake where implementation details leak to the outside world. you should always have a natural business key (like username or company-number, rather than user-id or company-id). the uniqueness of such a key can be enforced by a unique constraint, if you wish, but the primary key of an entity should never ever everer leave the persistence-layer of your application, or at least it should never be an argument to any service method. If you go by this rule, you shouldn't have any trouble distinguishing between compositions (/companies/{company}/users/{user}) and associations (/users/{user}), because if your subresource doesn't have a natural business key, that identifies it in a global context, you can be very certain it really is a depending subresource (or you must first create a business key to make it globally identifiable).
This is one way you can resolve this situation:
/companies/{companyName}/employee/{employeeId} -> returns data about an employee, should also include the person's data
/person/{peopleId} -> returns data about the person
Talking about employee makes no sense without also talking about the company, but talking about the person does make sense even without a company and even if he's hired by multiple companies. A person's existence is independent of whether he's hired by any companies, but an employment's existence does depend on the company.
The issue seems to be when there is no specific company but an employee technically belongs to some company or organization otherwise they could be called bums or politicians. Being an employee implies a company/organization relationship somewhere but not a specific one. Also employees can work for more than one company/organization. When the specific company context is required then your original works /companies/{companyName}/users/{id}
Lets say you want to know the EmployerContribution for your ira/rsp/pension you'd use:
/companies/enron/users/fred/EmployerContribution
You get the specific amount contributed by enron (or $0).
What if you want the EmployerContributions from any or all companies fred works(ed) for? You don't need a concrete company for it to make sense. /companies/any/employee/fred/EmployerContribution
Where "any" is obviously an abstraction or placeholder when the employee's company doesn't matter but being an employee does. You need to intercept the 'company" handler to prevent a db lookup (although not sure why company wouldn't be cached? how many can there be?)
You can even change the abstraction to represent something like all companies for which Fred was employed over the last 10 years.
/companies/last10years/employee/fred/EmployerContribution