Web services and partial entity updates - rest

We are implementing CRUD interface to manage entities with SOAP messages. What are good practices to allow partial updating of the entity? Meaning the client could update just some attributes of the entity, without having to post the whole entity? Is there more general approach to this than distinct methods for each attribute update?

HTTP Patch can be used for partial updates, only sending the fields of the object you want to change. There's an interesting discussion about partial updates here.
I'd say it would be more important to make sure the partial update is idempotent, i.e. the same update fields in the request result in the same end state of the resource. So if you have internal logic that determines the state of a resource attribute based on the value of another resource attribute that is being updated that might be something to look into. e.g. if the resource as a whole has rules for when parts of it are updated but other parts are not specified (default values for some attributes?), that may cause different outcomes based on the current state of the resource.
If the resource as a whole is just a collection of unrelated attributes then partial updates make sense but if there are dependencies among some attributes and some get updated while others don't, then the end state of the resource has to be idempotent. e.g. does it make sense to update an address but not update the phone number? What happens to the phone number if it's a landline and the address gets updated? Is it set to null? and vice versa. So when doing partial updates it might be worth 'partitioning' the allowed partials based on the domain being updated.

Related

Which HTTP method to use with REST API if we have parent-child entities and parent has already been created?

I have looked at this PUT vs POST question and others on stackoverflow and after going through the answers I found out:
Use POST if server identifies the address of the resource
Use PUT if client know the address of the resource.
Now above works fine if I have a single independent entity. For example if I have Student entity I am admitting a new student to schools I might create a REST endpoint as /api/schools/schools-name/student with POST HTTP method. But once the student has been admitted and I have to make changes to this student I can use Patch/PUT.
But In my case I have dependent entities that is parent and child. First I create parent entity using the POST. Now the child entity is created only after parent entity has been created. Why they can't be created together like after parent entity is created, create the child entity also, is because of business requirement.
Important points to note are that parent and child entity are linked by an id column only. So currently my url for creating child entity is /api/entities/parent-entity-id. Also there is no request body while creating the child entity as all the required info for creating is stored in parent entity.
My question is that should this method be POST as we are creating the
child entity or PUT as I am updating the children of the parent
entity which already has been created?
As mentioned in the question there is no request body for creating a child entity. This api is just to trigger the child entity creation. Parent entity already has all the info.
If you are sending an unsafe request to the server, and it doesn't match the semantics of any of the other HTTP methods, then you should use POST.
In particular, if the message-body is not a candidate representation for the resource identified by the target-uri, then PUT is out of bounds.
The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload
First and foremost, REST is an architecture style used if you need decoupling of clients from servers to allow evolving the server side without risking clients to break. REST isn't a toolset you pick the most suitable things out and leave out the remainder. It is more of an either apply all of the steps and constraints REST proposes or you wont benefit from it thing! For simple back- to frontend communication it is probably to much effort as you are usually in control of both ends, however, if you aren't in control of one end only then you might gain the most benefit of such a design actually.
REST relies heavily on standardized protocols and media types. The interaction model is very similar to the browsable Web, the big cousin of REST. Therefore, the same concepts that apply to the Web also apply to REST. The core idea in both should always be that the server teaches the client on how to do things while clients only take what they are given without trying to deduce further knowledge from either previous interaction or analysis of URIs or the like. I.e. on the Web, HTML forms are used to allow clients to enter certain input that is sent to the server upon clicking a submit button. Both the target URI as well as the method to use are included in that Web form so a client actually doesn't have to care about that fact. Through the affordance of the button element, a client also has the implicit knowledge that a button can be clicked and certain actions may be triggered as a consequence. The same concepts used in the Web should now be used between applications to interact with each other. Here, either HTML forms can be reused or certain, specialized media-types need to be developed (i.e. like hal-forms). Through content-type negotiation client and server can actually agree on a representation format both support and therefore avoid interoperability issues.
One common issue many "REST developer" seem to have is to think of REST endpoints returning certain data to be of certain types, i.e. the data of a company employee or the items of a certain hierarchies. Fielding claims that instead of introducing typed resources meaningful to a client, REST APIs should spend almost all of its descriptive effort in defining the media type(s) used for representing resources and defining application state, or in defining extended relation names and/or hypertext-enabled mark-up for existing standard media types.
A further thing to mentioned here, which I already striped a bit in my comment, is that URIs don't inherit a parent-child relationship by default. A URI as a whole, including any path, matrix and/or query parameters is just a link to a resource and can be considered as a key used for caches to return a response body previously stored for that key (=URI). Clients therefore shouldn't attempt to deduce semantic knowledge from URIs itself but just use the link relations returned for such URIs. This allows servers to replace URIs down the road while clients still can invoke them based on the name the URI was returned for.
As URIs themselves don't convey any semantic information, they can't really express a parent-child relationship on their own. We humans tend to interpret a URI such as /api/company/abc/employee/123 as expressing that employee with the number 123 is working for company abc, which might be true, but also does not have to be as explained before, URIs lack the semantic of expressing such things. It is only through the utilization of a bunch of such URIs that such a semantic tree can be created.
But In my case I have dependent entities that is parent and child. First I create parent entity using the POST. Now the child entity is created only after parent entity has been created.
If you take a closer look at the HTTP methods you might see that POST requests are processed according to the resource's own specific semantics, meaning that you literally can perform anything you have to here. This is defacto a swiss-army-knife in your toolset available and should be used if the other methods aren't fitting your use-case.
PUT i.e. is specified to replace the current targets representation with the one provided in the payload of the request. However, a server is allowed to validate whether the PUT representation is consistent with any constraints the server has for the target resource and may reject therefore requests to update a certain resource due to conflicts with certain constraints. PUT is further allowed to reconfigure a targets media-type to match a more suitable representation, apply a transformation onto the received payload to convert the payload to a matching one of the target resource or reject the payload in general.
Neither HTTP methods nor URIs can create such a semantic relationship between a parent and child resources. However, this is what link relations are there for! Links are edges between two entities that give a name to the context of the relation between those two entities. Such link relations should be standardized, follow common conventions or represent extension types as defined in RFC 5988 (Web linking) to promote their reusage. Unfortunately, however, IANA does not directly specify a parent and child link relation. up may be used to refer from a child to a parent, in a tree. Through an extension mechanism this is however relatively easy to obtain, i.e. http://api.acme.com/rel/parent and http://api.acme.com/rel/child or something similar.
The next bit to discuss on the quoted segment of the initial post would be happens-before semantics of the creation of the parent in contrast to the child resource. HTTP does not have any kind of transaction semantics nor guarantees of ordering of requests other than outlined in the pipelining section, which only applies to safe methods anyways. HTTP therefore does not give any promises to the processing of requests as they either might not reach the server at all or the response just got lost for whatever reason. Only if the client is receiving a 201 Created response including a Location header pointing to the created resource a client knows for sure that a resource got created and according to the specification only then a client is allowed to create a further child resource.
To a generic HTTP server both the creation request of the parent as well as the consecutive request of creating the child resource are two distinctive requests which it will attempt to fulfill independently. This is the stateless nature of HTTP. As mentioned before though, certain validation of resource's own constraints might be performed preventing the creation of children though.
Important points to note are that parent and child entity are linked by an id column only. So currently my url for creating child entity is /api/entities/parent-entity-id. Also there is no request body while creating the child entity as all the required info for creating is stored in parent entity.
REST doesn't care about your domain model actually. What you have here is a classical example of /persons resources, where three persons are identifiable via separate, distinctive URIs such as /persons/alice, /persons/bob and /persons/joe. We don't know anything about the actual data returned by any of these endpoints actually and by itself, as above mentioned, you can't deduce from the URI directly whose parent of whom (or that any of the URIs actually represents a person to start with). Through link-relation such a context structure can now be given, stating that Bob and Alice are parents of Joe and Joe is a child of both Bob and Alice.
Note how in the example above the actual content of the resources was not of importance to the client. We still don't know if either of the resources contain any information at all. All we know is that there are 3 resources available that are linked to each other in some way. So if the intent of your system is to just represent such relationships than go ahead. Use links between those resources to allow clients to lookup these relationships if interested. If a client is interested in the details of a resource it will send a request for a certain set of media-types to the server anyways. Discoverability and exploration are two common things you will want to guarantee in a REST ecosystem.
My question is that should this method be POST as we are creating the child entity or PUT as I am updating the children of the parent entity which already has been created?
AS POST is an all-purpose tool that has to be used if the other methods aren't fitting, using POST is for sure not wrong. If you take a closer look on the other methods you might see that they serve different purposes, i.e. PUT has the semantics of replacing the current content with the one given in the request payload. It therefore expresses a different use-case than you actually want IMO. As such you should stick to POST also for generating your children.
What you should do within your POST logic, as hopefully was clear enough throughout this answer, is to introduce meaningful link-relations that give the relations between the "entities" some context you can name. Such an operation can further have side effects which allows you to update the parent resource as well and introduce some further links that point from the parent back to the child.
This post is probably already way longer than it needs to be, though I want to make sure that you understand the intent behind REST and when to use it. Unless you really need a system that requires properties such as freedom for evolution, failure robustness and support for the operation of the application/system for decades to come, either exposing your own RPC service or maybe exposing your data model directly is probably easier to obtain.
Also there is no request body while creating the child entity as all the required info for creating is stored in parent entity.
So, this has nothing to do with resource state and therefore nothing to do with REST.
You're not PUTting a new state of a resource, so you should stay away from using PUT.
You are creating a new instance, so you should use POST method on endpoint for previously created parent instance.
Example:
POST /parent/<parent_id>/children/
BODY:
{"json with children data...."}

Can EF 6 Data Annotations be different for POST than PUT or GET?

We are building a RESTful web service where there are sometimes different required fields for a POST than a PUT. For example, a field like CustomerSinceDate is allowed to be set on an insert, but not on an update. Is there is a way to set that up with Data Annotations?
EntityFramework does not (and should not) know anything about your web service. It deals only with what rules exist in the persistence layer.
What you are looking for is validation.
So in your REST service, you should check whether CustomerSinceData has been changed, and the entity is being updated. If so, you should throw an Exception with an appropriate message to the consumer.
Here is an article on writing your own DataAnnotations, if you prefer using those:
http://msdn.microsoft.com/en-us/data/jj819164#attributes
Otherwise, take a look at this article on how to write your own custom validation: http://msdn.microsoft.com/en-us/data/gg193959.aspx
(in particular, the section on IValidatableObject).
Your rule could be formulated as (pseudo code)
//if object exists in db AND CustomerSinceData has changed
DataAnnotations will get you a long way, but can be tedious to write if you are writing business logic that will never be reused anywhere else.

What is the point of the Update function in the Repository EF pattern?

I am using the repository pattern within EF using an Update function I found online
public class Repository<T> : IRepository<T> where T : class
{
public virtual void Update(T entity)
{
var entry = this.context.Entry(entity);
this.dbset.Attach(entity);
entry.State = System.Data.Entity.EntityState.Modified;
}
}
I then use it within a DeviceService like so:
public void UpdateDevice(Device device)
{
this.serviceCollection.Update(device);
this.uow.Save();
}
I have realise that what this actually does it update ALL of the device's information rather than just update the property that changed. This means in a multi threaded environment changes can be lost.
After testing I realised I could just change the Device then call uow.Save() which both saved the data and didnt overwrite any existing changes.
So my question really is - What is the point in the Update() function? It appears in almost every Repository pattern I find online yet it seems destructive.
I wouldn't call this generic Update method generally "destructive" but I agree that it has limited use cases that are rarely discussed in those repository implementations. If the method is useful or not depends on the scenario where you want to apply it.
In an "attached scenario" (Windows Forms application for instance) where you load entities from the database, change some properties while they are still attached to the EF context and then save the changes the method is useless because the context will track all changes anyway and know at the end which columns have to be updated or not. You don't need an Update method at all in this scenario (hint: DbSet<T> (which is a generic repository) does not have an Update method for this reason). And in a concurrency situation it is destructive, yes.
However, it is not clear that a "change tracked update" isn't sometimes destructive either. If two users change the same property to different values the change tracked update for both users would save the new column value and the last one wins. If this is OK or not depends on the application and how secure it wants changes to be done. If the application disallows to ever edit an object that is not the last version in the database before the change is saved it cannot allow that the last save wins. It would have to stop, force the user to reload the latest version and take a look at the last values before he enters his changes. To handle this situation concurrency tokens are necessary that would detect that someone else changed the record in the meantime. But those concurrency checks work the same way with change tracked updates or when setting the entity state to Modified. The destructive potential of both methods is stopped by concurrency exceptions. However, setting the state to Modified still produces unnecessary overhead in that it writes unchanged column values to the database.
In a "detached scenario" (Web application for example) the change tracked update is not available. If you don't want to set the whole entity to Modified you have to load the latest version from the database (in a new context), copy the properties that came from the UI and save the changes again. However, this doesn't prevent that changes another user has done in the meantime get overwritten, even if they are changes on different properties. Imagine two users load the same customer entity into a web form at the same time. User 1 edits the customer name and saves. User 2 edits the customer's bank account number and saves a few seconds later. If the entity gets loaded into the new context to perform the update for User 2 EF would just see that the customer name in the database (that already includes the change of User 1) is different from the customer name that User 2 sent back (which is still the old customer name). If you copy the customer name value the property will be marked as Modified and the old name will be written to the database and overwrite the change of User 1. This update would be just as destructive as setting the whole entity state to Modified. In order to avoid this problem you would have to either implement some custom change tracking on client side that recognizes if User 2 changed the customer name and if not it just doesn't copy the value to the loaded entity. Or you would have to work with concurrency tokens again.
You didn't mention the biggest limitation of this Update method in your question - namely that it doesn't update any related entities. For example, if your Device entity had a related Parts collection and you would edit this collection in a detached UI (add/remove/modify items) setting the state of the parent Device to Modified won't save any of those changes to the database. It will only affect the scalar (and complex) properties of the parent Device itself. At the time when I used repos of this kind I named the update method FlatUpdate to indicate that limitation better in the method name. I've never seen a generic "DeepUpdate". Dealing with complex object graphs is always a non-generic thing that has to be written individually per entity type and depending on the situation. (Fortunately a library like GraphDiff can limit the amount of code that has to be written for such graph updates.)
To cut a long story short:
For attached scenarios the Update method is redundant as EFs automatic change tracking does all the necessary work to write correct UPDATE statements to the database - including changes in related object graphs.
For detached scenarios it is a comfortable way to perform updates of simple entities without relationships.
Updating object graphs with parent and child entities in a detached scenario can't be done with such a simplified Update method and requires significantly more (non-generic) work.
Safe concurrency control needs more sophisticated tools, like enabling the optimistic concurrency checks that EF provides and handling the resulting concurrency exceptions in a user-friendly way.
After Slauma's very profound and practical answer I'd like to zoom in on some basic principles.
In this MSDN article there is one important sentence
A repository separates the business logic from the interactions with the underlying data source or Web service.
Simple question. What has the business logic to do with Update?
Fowler defines a repository pattern as
Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.
So as far as the business logic is concerned a repository is just a collection. Collection semantics are about adding and removing objects, or checking whether an object exists. The main operations are Add, Remove, and Contains. Check out the ICollection<T> interface: no Update method there.
It's not the business logic's concern whether objects should be marked as 'modified'. It just modifies objects and relies on other layers to detect and persist changes. Exposing an Update method
makes the business layer responsible for tracking and reporting its changes. Soon all kinds of if constructs will creep in to check whether values have changes or not.
breaks persistence ignorance, because the mere fact that storing updates is something else than storing new objects is a data layer detail.
prevents the data access layer from doing its job properly. Indeed, the implementation you show is destructive. While the Data Access Layer may be perfectly capable of perceiving and persisting granular changes, this method marks a whole object as modified and forces a swiping UPDATE.

New entity ID in domain event

I'm building an application with a domain model using CQRS and domain events concepts (but no event sourcing, just plain old SQL). There was no problem with events of SomethingChanged kind. Then I got stuck in implementing SomethingCreated events.
When I create some entity which is mapped to a table with identity primary key then I don't know the Id until the entity is persisted. Entity is persistence ignorant so when publishing an event from inside the entity, Id is just not known - it's magically set after calling context.SaveChanges() only. So how/where/when can I put the Id in the event data?
I was thinking of:
Including the reference to the entity in the event. That would work inside the domain but not necesarily in a distributed environment with multiple autonomous system communicating by events/messages.
Overriding SaveChanges() to somehow update events enqueued for publishing. But events are meant to be immutable, so this seems very dirty.
Getting rid of identity fields and using GUIDs generated in the entity constructor. This might be the easiest but could hit performance and make other things harder, like debugging or querying (where id = 'B85E62C3-DC56-40C0-852A-49F759AC68FB', no MIN, MAX etc.). That's what I see in many sample applications.
Hybrid approach - leave alone the identity and use it mainly for foreign keys and faster joins but use GUID as the unique identifier by which i pull the entities from the repository in the application.
Personally I like GUIDs for unique identifiers, especially in multi-user, distributed environments where numeric ids cause problems. As such, I never use database generated identity columns/properties and this problem goes away.
Short of that, since you are following CQRS, you undoubtedly have a CreateSomethingCommand and corresponding CreateSomethingCommandHandler that actually carries out the steps required to create the new instance and persist the new object using the repository (via context.SaveChanges). I will raise the SomethingCreated event here rather than in the domain object itself.
For one, this solves your problem because the command handler can wait for the database operation to complete, pull out the identity value, update the object then pass the identity in the event. But, more importantly, it also addresses the tricky question of exactly when is the object 'created'?
Raising a domain event in the constructor is bad practice as constructors should be lean and simply perform initialization. Plus, in your model, the object isn't really created until it has an ID assigned. This means there are additional initialization steps required after the constructor has executed. If you have more than one step, do you enforce the order of execution (another anti-pattern) or put a check in each to recognize when they are all done (ooh, smelly)? Hopefully you can see how this can quickly spiral out of hand.
So, my recommendation is to raise the event from the command handler. (NOTE: Even if you switch to GUID identifiers, I'd follow this approach because you should never raise events from constructors.)

Multiple entity replacement in a RESTful interface

I have a service with some entities that I would like to expose in a RESTful way. Due to some of the requirements I have some trouble finding a way I find good.
These are the 'normal' operations I intend to support:
GET /rest/entity[?filter=<query>] # Return (matching) entities. The filter is optional and just a convenience for us CLI curl-users :)
GET /rest/entity/<id> # Return specific entity
POST /rest/entity # Creates one or more new entities
PUT /rest/entity/<id> # Updates specific entity
PUT /rest/entity # Updates many entities (json-dict or multipart. Haven't decided yet)
DELETE /rest/entity/<id> # Deletes specific entity
DELETE /rest/entity # Deletes all entities (dangerous but very useful to us :)
Now, the additional requirements:
We need to be able to replace the entire set of entities with a completely new set of entities (merging can occur internally as an optimization).
I thought of using POST /rest/entity for that, but that would remove the ability to create single entities unless I move that functionality. I've seen /rest/entity/new-style paths in other places, but it always seemed a bit odd to reuse the id path segment for that as there might or might not be a collision in IDs (not in my case, but mixing namespaces like that gives me an itch :)
Are there any common practices for this type of operation? I've also considered /rest/import/entity as a separate path for similar non-restful operations for other entity types we might have, but I don't like moving it outside of the entity home path.
We need to be able to perform most operations in a "dry-run"-mode for validation purposes.
Query strings are usually considered anathema, but I'm already a sinner for the filter one. For the validation mode, would adding a ?validate or ?dryrun flag be ok? Have anyone done anything similar? What are the drawbacks? This is meant as an aid for user-facing interfaces to implement validation easily.
We don't expect to have to use any caching mechanism as this is a tiny configuration service rarely touched, so optimization for caching is not strictly necessary
We need to be able to replace the entire set of entities with a
completely new set of entitiescompletely new set of entities
That's what this does, no?
PUT /rest/entity
PUT has replace semantics. Maybe you could use the PATCH verb to support doing partial updates.
Personally, I would change the resource name to "EntityList" or "EntityCollection", but that's just because it is clearer for me.