Regarding Microservices Fuzzy Boundaries - rest

I am reading Microservices Patterns by Chris. In his book, he gave some example, which I could not able to understand section 5.2.1. The problem with fuzzy boundaries
Here is the link to read online. Can you someone please look into section 5.2.1 and help me understand what exactly the issue with fuzzy boundaries?
I didn't get clearly especially below statement:
In this scenario, Sam reduces the order total by $X and Mary reduces it by $Y. As a result, the Order is no longer valid, even though the application verified that the order still satisfied the order minimum after each consumer’s update
In above statement, can someone please explain me, why Order is no longer valid?

In above statement, can someone please explain me, why Order is no longer valid?
The business problem that Chris Richardson is using in this example assumes that (a) the system should ensure that orders are always valid, and (b) that valid orders exceed some minimum amount.
Minimum amount is determined by a sum of the order_items associated with a specific order.
The "fuzzy boundary" issue comes about because the code in question allows Sam and Mary to manipulate order_items directly; in other words, writing changes to order items does not lock the other items of the order.
If Sam and Mary were forced to acquire a lock on the entire order before validating their changes, then you wouldn't have a problem; the second person would see the changes made by the first.
Alternatively, locking at the level of the order_item would be fine if you weren't trying to ensure that the set of order items satisfy some property. Take away the constraint on the total order cost, and Sam and Mary only need to get locks on their specific item.

Related

CQRS projections, joining data from different aggregates via probe commands

In CQRS when we need to create a custom-tailored projections for our read-models, we usually prefer a "denormalized" projections (assume we are talking about projecting onto a DB). It is not uncommon to have the information need by the application/UI come from different aggregates (possibly from different BCs).
Imagine we need a projected table to contain customer's information together with her full address and that Customer and Address are different aggregates in our system (possibly in different BCs). Meaning that, addresses are generated and maintained independently of customers. Or, in other words, when a new customer is created, there is no guarantee that there will be an AddressCreatedEvent subsequently produced by the system, this event may have already been processed prior to the creation of the customer. All we have at the time of CreateCustomerCommand is an UUID of an existing address.
We have several solutions here.
Enrich CreateCustomerCommand and the subsequent CustomerCreatedEvent to contain full address of the customer (looking up this information on the fly from the UI or the controller). This way the projection handler will just update the table directly upon receiving CustomerCreatedEvent.
Use the addrUuid provided in CustomerCreatedEvent to perform an ad-hoc query in the projection handler to get the missing part of the address information before updating the table.
These are commonly discussed solution to this problem. However, as noted by many others, there are problems with each approach. Enriching events can be difficult to justify as well described by Enrico Massone in this question, for example. Querying other views/projections (kind of JOINs) will work but introduces coupling (see the same link).
I would like describe another method here, which, as I believe, nicely addresses these concerns. I apologize beforehand for not giving a proper credit if this is a known technique. Sincerely, I have not seen it described elsewhere (at least not as explicitly).
"A picture speaks a thousand words", as they say:
The idea is that :
We keep CreateCustomerCommand and CustomerCreatedEvent simple with only addrUuid attribute (no enriching).
In API controller we send two commands to the command handler (aggregates): the first one, as usual, - CreateCustomerCommand to create customer and project customer information together with addrUuid to the table leaving other columns (full address, etc.) empty for time being. (Warning: See the update, we may have concurrency issue here and need to issue the probe command from a Saga.)
Right after this, and after we have obtained custUuid of the newly created customer, we issue a special ProbeAddrressCommand to Address aggregate triggering an AddressProbedEvent which will encapsulate the full state of the address together with the special attribute probeInitiatorUuid which is, of course our custUuid from the previous command.
The projection handler will then act upon AddressProbedEvent by simply filling in the missing pieces of the information in the table looking up the required row by matching the provided probeInitiatorUuid (i.e. custUuid) and addrUuid.
So we have two phases: create Customer and probe for the related Address. They are depicted in the diagram with (1) and (2) correspondingly.
Obviously, we can send as many such "probe" commands (in parallel) as needed by our projection: ProbeBillingCommand, ProbePreferencesCommand, etc. effectively populating or "filling in" the denormalized projection with missing data from each handled "probe" event.
The advantages of this method is that we keep the commands/events in the first phase simple (only UUIDs to other aggregates) all the while avoiding synchronous coupling (joining) of the projections. The whole approach has a nice EDA feeling about it.
My question is then: is this a known technique? Seems like I have not seen this... And what can go wrong with this approach?
I would be more then happy to update this question with any references to other sources which describe this method.
UPDATE 1:
There is one significant flaw with this approach that I can see already: command ProbeAddrressCommand cannot be issued before the projection handler had a chance to process CustomerCreatedEvent. But this is impossible to know from the API gateway (or controller).
The solution would probably involve a Saga, say CustomerAddressJoinProjectionSaga with will start upon receiving CustomerCreatedEvent and which will only then issue ProbeAddrressCommand. The Saga will end upon registering AddressProbedEvent. Or, if many other aggregates are involved in probing, when all such events have been received.
So here is the updated diagram.
UPDATE 2:
As noted by Levi Ramsey (see answer below) my example is rather convoluted with respect to the choice of aggregates. Indeed, Customer and Address are often conceptualized as belonging together (same Aggregate Root). So it is a better illustration of the problem to think of something like Student and Course instead, assuming for the sake of simplicity that there is a straightforward relation between the two: a student is taking a course. This way it is more obvious that Student and Course are independent aggregates (students and courses can be created and maintained at different times and different places in the system).
But the question still remains: how can we obtain a projection containing the full information about a student (full name, etc.) and the courses she is registered for (title, credits, the instructor's full name, prerequisites, etc.) all in the same table, if the UI requires it ?
A couple of thoughts:
I question why address needs to be a separate aggregate much less in a different bounded context, in view of the requirement that customers have an address. If in some other bounded context customer addresses are meaningful (e.g. you want to know "which addresses have more customers" etc.), then that context can subscribe to the events from the customer service.
As an alternative, if there's a particularly strong reason to model addresses separately from customers, why not have the read side prospectively listen for events from the address aggregate and store the latest address for a given address UUID in case there's a customer who ends up with that address. The reliability per unit effort of that approach is likely to be somewhat greater, I would expect.

SQL Database Design - Flag or New Table?

Some of the Users in my database will also be Practitioners.
This could be represented by either:
an is_practitioner flag in the User table
a separate Practitioner table with a user_id column
It isn't clear to me which approach is better.
Advantages of flag:
fewer tables
only one id per user (hence no possibility of confusion, and also no confusion in which id to use in other tables)
flexibility (I don't have to decide whether fields are Practitioner-only or not)
possible speed advantage for finding User-level information for a practitioner (e.g. e-mail address)
Advantages of new table:
no nulls in the User table
clearer as to what information pertains to practitioners only
speed advantage for finding practitioners
In my case specifically, at the moment, practitioner-related information is generally one-to-many (such as the locations they can work in, or the shifts they can work, etc). I would not be at all surprised if it turns I need to store simple attributes for practitioners (i.e., one-to-one).
Questions
Are there any other considerations?
Is either approach superior?
You might want to consider the fact that, someone who is a practitioner today, is something else tomorrow. (And, by that I don't mean, not being a practitioner). Say, a consultant, an author or whatever are the variants in your subject domain, and you might want to keep track of his latest status in the Users table. So it might make sense to have a ProfType field, (Type of Professional practice) or equivalent. This way, you have all the advantages of having a flag, you could keep it as a string field and leave it as a blank string, or fill it with other Prof.Type codes as your requirements grow.
You mention, having a new table, has the advantage for finding practitioners. No, you are better off with a WHERE clause on the users table for that.
Your last paragraph(one-to-many), however, may tilt the whole choice in favour of a separate table. You might also want to consider, likely number of records, likely growth, criticality of complicated queries etc.
I tried to draw two scenarios, with some notes inside the image. It's really only a draft just to help you to "see" the various entities. May be you already done something like it: in this case do not consider my answer please. As Whirl stated in his last paragraph, you should consider other things too.
Personally I would go for a separate table - as long as you can already identify some extra data that make sense only for a Practitioner (e.g.: full professional title, University, Hospital or any other Entity the Practitioner is associated with).
So in case in the future you discover more data that make sense only for the Practitioner and/or identify another distinct "subtype" of User (e.g. Intern) you can just add fields to the Practitioner subtable, or a new Table for the Intern.
It might be advantageous to use a User Type field as suggested by #Whirl Mind above.
I think that this is just one example of having to identify different type of Objects in your DB, and for that I refer to one of my previous answers here: Designing SQL database to represent OO class hierarchy

Optaplanner finds unnecessary conflict for Custom dataset for curriculum example

What I was trying to do is , to test if optaplanner is suitable for our requirements etc.
Thus, I created our own dataset of courses, ~280 courses etc.
I "believe" XML I prepared is valid for sample, since it loads and optaplanner can start solving it.
However, right during CH phase, it finds some (-220) hard constraint violations, specifically for the rule "conflictingLecturesDifferentCourseInSamePeriod".
And for how long it tries, those violations still remain.
Then when I check violations, they are actually not real violations.
It is two different course, in same hours, but in different rooms, and teachers are not same. So there should be no violation for this scenario.
Also actually when I scan schedule by eye, I dont see any conflict.
So, I am lost right now....
Here is a link for XML dataset.
Actually I found the problem, well it is not a problem in first place :)
Maybe rule name is little bit misleading.
Anyway, problem is actually in too crowded curriculums. Like we had 30-40 courses, which makes 80-100 lectures. And for a 45 hours week, it is impossible to fit everything.
And I assume the rule "conflictingLecturesDifferentCourseInSamePeriod", checks "different" courses of same curriculum.
So, when I reduce course counts by splitting curriculumns into 4 for each, violations reduced to 0 .
Believe this will be a valuable info to whom couldnt understand mentioned rule's purpose.
Thanks.

Find First and First Difference in Progress 4GL

I'm not clear about below queries and curious to know what is the different between them even though both retrieves same results. (Database used sports2000).
FOR EACH Customer WHERE State = "NH",
FIRST Order OF Customer:
DISPLAY Customer.Cust-Num NAME Order-Num Order-Date.
END.
FOR EACH Customer WHERE State = "NH":
FIND FIRST Order OF Customer NO-ERROR.
IF AVAILABLE Order THEN
DISPLAY Customer.Cust-Num NAME Order-Num Order-Date.
END.
Please explain me
Regards
Suga
As AquaAlex says your first snippet is a join (the "," part of the syntax makes it a join) and has all of the pros and cons he mentions. There is, however, a significant additional "con" -- the join is being made with FIRST and FOR ... FIRST should never be used.
FOR LAST - Query, giving wrong result
It will eventually bite you in the butt.
FIND FIRST is not much better.
The fundamental problem with both statements is that they imply that there is an order which your desired record is the FIRST instance of. But no part of the statement specifies that order. So in the event that there is more than one record that satisfies the query you have no idea which record you will actually get. That might be ok if the only reason that you are doing this is to probe to see if there is one or more records and you have no intention of actually using the record buffer. But if that is the case then CAN-FIND() would be a better statement to be using.
There is a myth that FIND FIRST is supposedly faster. If you believe this, or know someone who does, I urge you to test it. It is not true. It is true that in the case where FIND returns a large set of records adding FIRST is faster -- but that is not apples to apples. That is throwing away the bushel after randomly grabbing an apple. And if you code like that your apple now has magical properties which will lead to impossible to cure bugs.
OF is also problematic. OF implies a WHERE clause based on the compiler guessing that fields with the same name in both tables and which are part of a unique index can be used to join the tables. That may seem reasonable, and perhaps it is, but it obscures the code and makes the maintenance programmer's job much more difficult. It makes a good demo but should never be used in real life.
Your first statement is a join statement, which means less network traffic. And you will only receive records where both the customer and order record exist so do not need to do any further checks. (MORE EFFICIENT)
The second statement will retrieve each customer and then for each customer found it will do a find on order. Because there may not be an order you need to do an additional statement (If Available) as well. This is a less efficient way to retrieve the records and will result in much more unwanted network traffic and more statements being executed.

Am i violating any NF RULE on my database design?

Good day!
I am a newbie on creating database... I need to create a db for my recruitment web application.
My database schema is as follows:
NOTE: I included the applicant_id on other tables... e.g. exam, interview, exam type.
Am i violating any normalization rule? If i do, what do you recommend to improve my design? Thank you
Overall looks good. A few minor points to consider:
Interviewer is also a person. You will need to use program logic to prevent different / misspellings.
The longest real life email address I've seen was 62 characters.
In exam you use the reserved word date for a column name
(subjective) I would rename applicant_date to applied_at
I don't see a postal / zip code for the applicant
All result columns are VARCHAR(4). If they use the same values, can they be normalized?
Birthdate is better to store then age. You don't want to schedule someone for an interview on their birthdate (or if you're cruel by nature, you do want that :) ). Age can be derived from it and will also be correct at all times.
EDIT:
Given that result is PASS or FAIL, simply declare the field a boolean and name it 'passed'. A lot faster.
One area where I could see a potential problem is the Interviewer being integrated in interview. Also I would like to point at the source channel in applicant, which could potentially get blobbed (depending of what you're going to store in there).
You don't seem to be violating any normalization rules upon first glance. It's not clear from your schema design, however, that the applicant_id is a referencing the applicant table. Make sure you declare it as a foreign key that references the applicant table when actually implementing the scehma.
Not to make any assumptions on your data, but can the result of a screening be stored in 4 characters?
Age and gender are generally illegal questions to ask in interviews so you may not want to record such things. You might want a separate interviewer table. You also might want a separate table that stores qualifications so you can search for people you have interviewed with C# knowledge when the next opening comes up. I'd probably do something like a Qualifications table that is the lookup for quals you want to add to the applicant qualfications table. Then you'd need the qualification id, applicantId, years, skill level in the Applicant Qualification table.
I notice results is a varchar 4 field, I assume you are planning to put Pass/fail in it. I would consider having a numeric score as well. The guy who got 80% of the questions right passed but the guy who got 100% of them right might be the better candidate. In fact for interviews I might have interview questions and results tables. Then you can record the score and any comments about each question which can help later in evaluation of a lot of candidates. We did this manually in paper spreadsheets once when we were interviewing several hundred people (we had over a hundred openings at the time and this was way before personal computers) and found it most helpful to be able to compare answers to questions. It's hard to remember 200 people you interviewed and who said what. It might help later when you have a new opening to find the people who were strong onthe questions most pertinent to the new job who might not have been given a job at the time of the interview(5 excellent candidates, 1 job for instance).
I might also consider a field to mark if the candidate is unaccepatble for ever hiring for some reason. Such as he committed a felony or he lied on the resume and you caught him or he was just totally clueless in the interview. This can make it easy to prevent this person from being considered repeatedly.
I think that your DB structure has a lot of limitation for future usage. For example you can even have a description of the exam because this stable store the score and exam date. It may by that this kind of information are already stored in another system and you have to design only the result container. But even then the exam, screen and interview are just a form of test, that why the information about should be stored in one table and distinguished by some type id. If you decide to this approach you have to create another table to store the information about result
So the definition of that should look more like this:
TEST
TEST_ID
TEST_TYPE_ID ref TEST_TYPE - Table that define the test type
TEST_REQUIRED_SCORE - The value of the score that need to be reach to pass the exam.
... - Many others properties of TEST like duration, expire date, active inactive etc.
APPLICANT_RESULTS
APPLICANT_ID ref APPLICANT
TEST_ID = ref TEST
TESTS_DATE - The day of exam
TEST_START - The time when the test has started
TEST_FINISH - The time when the test has ended
APPLICANT_RESULT - The applicant result of taken test.
This kind of structure is more flexible and give the easy way to specify the requirements between the test in table like this
TEST_REQUIREMENTS - Table that specify the test hierarchy and limitation
TEST_ID ref TEST
REQUIRED_TEST ref TEST
ORDER - the order of exams
Another scenario is that in the future your employer will want to switch to an e-exam system. In that case only think what you will need are:
Create table that will store the question definition (one question can be used in exam, screen or interview)
Crate table that will store the question answers.
Create table that will store the information about the test question.
Create table for storing the answer for each question given from applicant.
A trigger that will update the over all score of test.