In REST - revertable DELETE a nice introduction on howto model state changes in REST was given. Basically, if you have a resource with a field status, you just put a new version of that resource with an update status field.
In this topic, I would like to extend this model. Say you have a resource which can be in two states: 1 and 2. In contrast with the simple model as described in the cited post, there are three transitions to traverse from state 1 to state 2, instead of just one.
My question is: how would you model state transitions in REST?
I myself cannot come up with an RPC-like POST, which isn't very RESTian probably:
POST http://server/api/x
target_state=2&transition=3
This changes resource x from state 1 to state 2 by using transition 3.
This isn't really limited to REST; it's more a basic question about state machines. The state machine should encapsulate all of the state, so that if you find yourself creating multiple transitions from one state to another, and the difference is significant, then that difference must also be captured in the state.
For example, say I have no home. I can move from the "homeless" state to the "home" state in three ways: I can rent one, I can buy one, I can steal one. Three transitions from "homeless" to "home"? Not in the world of machines. Either the differences are significant, or they're not. If they're not, then to the machine there's no point in making a distinction at all. Merely PUT the new status of "home". If they are, then we need to expand our concept of state to encompass the differences. For example:
homeless
A A A
/ | \
V | V
possessor <--|--- renter
A | /
\ | /
V V V
owner
I can move from homeless to possessor by stealing a house. If I squat on it long enough, I might become the owner. Or I could be homeless and rent a house, or even rent-to-own. Or I could buy one outright. All three paths get me to the "owner" state, but they use intermediate states to represent the significantly different transitions.
If you then want to represent "homeless" vs. "in a home" (possessor|renter|owner), there's no problem exposing that as a read-only, calculated resource in its own right. Just don't let anyone PUT/POST to it, since the transition is ambiguous. There's also no problem keeping the history of state transitions as an additional set of resources, if that state is significant.
Related
This is a relatively subjective question, but I want to get other people's opinion nonetheless
I am designing a REST Api that will be accessed by internal systems (a couple of clients apps at most).
In general the API needs to update parameters of different car brands. Each car brand has around 20 properties, some of which are shared between all car brands, and some specific for each brand.
I am wondering what is a better approach to the design for the endpoints of this API.
Whether I should use a single endpoint, that takes in a string - that is a JSON of all the properties of the car brand, along with an ID of the car brand.
Or should I provide a separate endpoint per car brand, that has a body with the exact properties necessary for that car brand.
So in the first approach I have a single endpoint that has a string parameter that I expect to be a JSON with all necessary values
PUT /api/v1/carBrands/
Whereas in the second approach in the second scenario I have an endpoint per type of car brand, and each endpoint has a typed dto object representing all the values it needs.
PUT /api/v1/carBrand/1
PUT /api/v1/carBrand/2
.
.
.
PUT /api/v1/carBrand/n
The first approach seems to save a lot of repetitive code - afterall the only difference is the set of parameters. However, since this accepts an arbitrary string, there is no way for the enduser to know what he should pass - he will need someone to tell it to him and/or read from documentation.
The second approach is a lot more readable, and any one can fill in the data, since they know what it is. But it involves mostly replicating the same code around 20 times.
Its really hard for me to pick an option, since both approaches have their drawbacks. How should I judge whats the better option
I am wondering what is a better approach to the design for the endpoints of this API.
Based on your examples, it looks as though you are asking about resource design, and in particular whether you should use one large resource, or a family of smaller ones.
REST doesn't answer that question... not directly, anyway. What REST does do is identify that caching granularity is at the resource level. If there are two pieces of information, and you want the invalidation of one to also invalidate the other, then those pieces of information should be part of the same resource, which is to say they should be accessed using the same URI.
If that's not what you want, then you should probably be leaning toward using separated resources.
I wouldn't necessarily expect that making edits to Ford should force the invalidation of my local copy of Ferrari, so that suggests that I may want to treat them as two different resources, rather than two sub-resources.
Compare
/api/v1/carBrands#Ford
/api/v1/carBrands#Ferrari
with
/api/v1/carBrands/Ford
/api/v1/carBrands/Ferrari
In the former case, I've got one resource in my cache (/api/v1/carBrands); any changes I make to it invalidate the entire resource. In the latter case, I've got two resources cached; changing one ignores the other.
It's not wrong to use one or the other; both are fine, and have plenty of history. They make different trade offs, one or the other may be a better fit for the problem you are trying to solve today.
I was recently recommended a talk by Jim Webber.
And there was a very interesting point in there.
Jim says that when you think that there is a 1-1 correspondence between rows in your database, domain objects and resources in REST service. This makes it hard when want to transact work across arability groups.
No he goes on to point that if you have say 3 users and want to update them, you do then sequentially and it is very poor because you have to track each of them and handle issues if 1 out of the 3 (or how many transactions you want occur).
He mentioned the way you should handle this is to make a resource, for all of the 3 users. Resources are cheap and infinite (you can make as many as you want) so use them. So create that resource and in a single operation put their status update.
This is an extremely interesting point to me as there have been times where I have wanted to perform an operation on multiple things that i considered to be singular.
So here is an example:
Say I have a list of users. Say 100. Users would be their own thing/resource. I want to pick x amount of users out of that list (say 10 randomly) and apply 50 points to them.
I want to apply these points to these users that have no unique connection in the domain, they are just a random group of users. a arbitrary group.
How would I create a rest endpoint/resource as Jim Webber is implying to handle this operation?
Now In my admittedly old frame of mind I would go about it making a specific resource like users/points/bulk/ (or something) and pass in a list of user id's and the points I would apply them. I would never have had the mindset of treating them as a resource, I would have just had an hacky command rest endpoint to perform it.
This point Jim has pointed out is really something I never considered and is such a change of mindset, that it would really make things cleaner.
Could someone explain this to mean and give an example to how it would look
Thanks
He mentioned the way you should handle this is to make a resource, for all of the 3 users. Resources are cheap and infinite (you can make as many as you want) so use them. So create that resource and in a single operation put their status update.
...
How would I create a rest endpoint/resource as Jim Webber is implying to handle this operation?
The basic rule of thumb here is: How would you do it on the Web? As REST is just a generalization of the interaction model the Web allowed to grow to its todays size, the same concpet that proven to be successful on the Web can (and should) be used in a REST architecture.
What is a group of resources actually?! If you think about most sport activities that are played in teams, such as football or the like, almost all players can be divided into certain groups. I.e. players of Team A and players of Team B or all defensive players or all attacking players. Each of the players is its own resource but each of the available groups is its own resource as well as we could give it a name also. We can further talk about the group instead of the individual player. Which allows us to instead of reference all of the players individually, to include all of them within a single, short statement. A statement such as "Team A beat the crap out of Team B" will most likey subsume that each of the players on Team A was playing better than their counterparts in the opponent team.
It is now only a matter of providing clients with the toolset to group resources together. In a typical HTML page you could i.e. have a table representation of all the active football players of this season across all teams with a checkbox to select certain players and some control element, such as a submit button, that allows you to create a group for the selected players. The backing HTML form contains not only the actual data set you could select sepcific players from and a submit button but also a target URI where the request has to be sent to as well as a request method to use. HTML by default uses application/x-www-form-urlencoded as representation format to send the data to the server, which knows depending on the invoked endpoint, the HTTP operation used and the media type received how to process the data accordingly.
As a new resource will be created as a consequence to the previous grouping request, the server will respond with a 201 Created response code and a Location HTTP header whose value is a URI pointing to the location the newly created grouping is accessible. A client may now get redirected to that URI automatically or it can use the returned URI to invoke further operations on that resource. As the domain-model does (and probably should) not need to match a resource or affordance model, each of the invidvidual player resources as well as the team-resource may use the same database entries to present the data to the client. On updating one resource (either an individual player or the team as a whole) other resources may get influenced by this operation as well.
If you take a look at the definition of PUT in the HTTP specification, you can read something like this:
A PUT request applied to the target resource can have side effects on other resources.
Due to this side-effect it is possible for an update performed via PUT to achive somthing similar to a partial-update:
Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, or by using a different method that has been specifically defined for partial updates (for example, the PATCH method defined in RFC5789).
I.e. if you update Player 1 of Team A via PUT it creates as a side effect a partial-update of the state of Team A as this just uses the same data the data-model provides for that particular player.
In order to achive the same functionality in a REST architecture, as mentioned before, the same concepts of providing a client with structured data it can select a subset from and perform operations on that subset, such as creating a new resource for these selected elements, should be used. In contrast to the Web where HTML is dominant, the supported media-types may varry drastically in a REST architecture. Here, content-type negotiation is a very important part as this allows the server to chose the most suitable representation format that is supported by the client. Instead of using proprietary representation formats, standardized formats should be used to increase the likelihood of clients not under your control to be able to interact with your system. While there is an ongoing effort on introducing media-types that support clients with client-feedback in the form of forms similar to the ones used in HTML, there is no de-facto standard form-representation, except for HTML, yet widely accepted. There are a couple of especially JSON-based approaches, such as hal-forms, halo+json, Ion or Hydra, in the working, though, as mentioned, nothing that is really used widely in production.
As your acutal intention is to update a bunch of resources atomically, you could use PATCH here as well, without the need of creating new resources, as PATCH is defined to perform all of the instructions atomically - either all succeed or none at all. In the spec, PATCH is defined similar to how patching is understood in software engineering, by having a sequence of instructions that should be applied to a resource to transform it to a desired output. application/json-patch+json is a representation format that is quite close to the actual definition whereas application/merge-patch+json has a totally different take on it by defining default rules to apply, depending whether the request contained a modified or nullified field value. As the latter representation-format is able to only work on a single resource, the first representation-format could be used for a batch update. By targeting the collection-resource directly, JSON Pointers can be used to address the respective fields of the sub-resources in that collection directly.
To avoid data-loss via PATCH operations, due to intermediary updates between fetching the most recent state, calculating the necessary steps to apply and sending the request to the API, an optimistic locking approach should be used that is achievable via conditional requests, such as ETag.
While patching provides you with the capability to apply the changes atomically, I feel that grouping resources together, if they naturally form a group, such as in the player - team example, feels more common and reuses the interaction model proposed by REST also better IMO.
I've been using Spec Explorer for about a month now on a big project,
it´s been going well besides one thing
Sometimes new states are being generated instead of looping, for example
- Create object, new state
- Do something with object, new state
- Do something that changes nothing (trying to create same object, does not change any state variables) here I get a new state instead of looping
Most of the times it loops, like it should, sometimes not, and there is absolutely no difference in the state comparison view except for the two top lines that only covers the description as to how the state came to be.
Anyone had similar problems or knows what´s going on?
There are several possible reasons.
But in most cases the problem is: scenarios introduce control states.
Here the deepest explanation you can get on "How are identical states identified?"
"Ideally, we would identify two states when they
(a) have the same state contents, and
(b) have the same future behavior.
The reason why (a) is not enough is that enabled actions don’t depend on the state contents only, but also potentially on scenarios applied in a Cord script. Scenarios introduce control states.
The problem here is that checking (b) is not feasible in practice, as it would imply looking ahead all paths stemming from a state.
So we rely on a heuristic, consisting on identifying states that not only have the same contents, but are also produced in the same step of a scenario.
So two states are equivalent if they contain the same data AND can perform the same actions.
For example, in a scenario such as A; A; B*, we have three states, all with the same (empty) contents.
When we compose this scenario in parallel with a model program, states corresponding to these three states will not be merged, regardless of their contents.
As a consequence, when you are comparing two states to understand why they are not merged, you should not just look at the values of their variables (data state), but also at the state description, which provides the control state.
States which have been generated by different machines using Spec Explorer heuristic cannot safely considered by a single state.
As said, this is just a safe heuristic. So there’s no guarantee that two conceptually-equivalent states will always be merged;
but two states that are not conceptually equivalent should never be merged."
Are births and deaths modeled as events for a person in a genealogy profile or as attributes of the person. What are the pros and cons of each approach?
if you consider that every event has artifacts to go with it, they really should be events, so you can have all of the documents, etc. associated with them.
on the other hand, can you imagine a person record that doesn't have birth/death dates as attributes? you wouldn't want to have to do a join with the events that give you birth/death just so you could sort by those dates.
so there are the pros and cons, but there's also the idea you can have both. if you are willing to live with a database that isn't completely normalized, you can have them as events and for each person with birth/death events, copy those values into the attributes.
keep in mind, of course, that you could have multiple birth/death events for a person, records that might be in conflict, in which case only one of them that the user has indicated is meant to be the person's birth/date attribute would be copied.
"Events" in genealogy (and in genealogy software) are generally considered something that takes place at a given time and place. They can be events for an individual, e.g. Birth, death, baptism, naturalization, emigration, etc., or for a family (husband/wife), e.g. Marriage, Engagement, Divorce.
"Attributes" (or "facts") are generally considered to be something that is true, e.g. Scholastic achievement,Tribal Origin, Occupation, Religious Affiliation, Title.
These are how GEDCOM defines them and how they try to get programmers to program them.
Personally, my concept of an "event' is a transition in a change of state. e.g. Going from before someone was born until once they are alive. It need not be a short period of time, but may take a long time, e.g. World War II was an event. And events can contain other events (e.g. the specific battles in World War II).
One more example is hair color, which is considered an attribute. But someone can be born with blond hair, have it fall out and replaced with brown hair, and then as they get older it turns grey before falling out again. Hair color are attributes that are true over a certain time, and are "fuzzy" as the event happens that changes it from one to another.
My concept of an "attribute" is that they have time periods to them. The attribute is the state which can be changed by events. e.g. "Occupation" changes with the "getting fired" event and
"Unemployed" takes over until the "getting hired" event occurs.
So attributes are between events, and events separate different attributes.
What I am basically saying is that in my genealogy program, I really don't make a distinction between events and attributes. I treat them the same. Either may include a date or time period and events usually include a place and attributes usually don't.
Because of their similarities, I don't see any need to model them separately.
I have an entity in my domain that represent a city electrical network. Actually my model is an entity with a List that contains breakers, transformers, lines.
The network change every time a breaker is opened/closed, user can change connections etc...
In all examples of CQRS the EventStore is queried with Version and aggregateId.
Do you think I have to implement events only for the "network" aggregate or also for every "Connectable" item?
In this case when I have to replay all events to get the "actual" status (based on a date) I can have near 10000-20000 events to process.
An Event modify one property or I need an Event that modify an object (containing all properties of the object)?
Theres always an exception to the rule but I think you need to have an event for every command handled in your domain. You can get around the problem of processing so many events by making use of Snapshots.
http://thinkbeforecoding.com/post/2010/02/25/Event-Sourcing-and-CQRS-Snapshots
I assume you mean currently your "connectable items" are part of the "network" aggregate and you are asking if they should be their own aggregate? That really depends on the nature of your system and problem and is more of a DDD issue than simple a CQRS one. However if the nature of your changes is typically to operate on the items independently of one another then then should probably be aggregate roots themselves. Regardless in order to answer that question we would need to know much more about the system you are modeling.
As for the challenge of replaying thousands of events, you certainly do not have to replay all your events for each command. Sure snapshotting is an option, but even better is caching the aggregate root objects in memory after they are first loaded to ensure that you do not have to source from events with each command (unless the system crashes, in which case you can rely on snapshots for quicker recovery though you may not need them with caching since you only pay the penalty of loading once).
Now if you are distributing this system across multiple hosts or threads there are some other issues to consider but I think that discussion is best left for another question or the forums.
Finally you asked (I think) can an event modify more than one property of the state of an object? Yes if that is what makes sense based on what that event represents. The idea of an event is simply that it represents a state change in the aggregate, however these events should also represent concepts that make sense to the business.
I hope that helps.