Defining gRPC RPCs - service

I'm looking for some suggestions here. The usecase is a networking device (like router) with networking operations performed over gRPC.
Let's say there are "n" model objects, like router, interfaces, routing configuration objects like OSPF etc. Every networking operation, like finally be a CRUD on on or many of the model objects.
Now, when defining this over a gRPC service, there seems to be 2 options:
Define generic gRPC RPCs, like "SET" and "GET". The parameter will be a list of objects and operations. Like SET((router, update), (interface, update)..
Define very specific RPCs. Like "setInterfaceProperty_x", "createOSPFInstance".. And there could be many many such RPCs.
With #2, we are building the application intelligence in the RPCs itself. Every new feature might need new RPCs from this service.
With #1, the RPCs are the means, but the intelligence reside with the application which uses the RPC in a context. The RPC list will be just a very few and doesn't change over time.
What is the preferred approach? Generic RPCs (and keep it very few) or have tens (or more) of operation driven RPCs? I see some opensource projects like P4Runtime take approach #1.
Thanks for your time. I can provide more information if required.

You should use option #2. This puts your interface contract in the proto, rather than in your application. You leave your self many open doors by picking option #2 that would be cumbersome or unsupportable otherwise:
If the API definition of an object doesn't match the internal representation, you need to define a mapping between the two. Suppose you update your internal code to not need InterfaceProperty any more, and it was instead moved to a new field called BetterInterfaceProperties. Option one would force you to keep the old field exposed, while option 2 would allow you to reinterpret the call and do the right thing.
Fine grained access controls are easier with specific methods. All users may be able to set publicProperty, but only admins can set dangerousProperty. By grouping all the fields into a single call (as in #1), your caller has to reinterpret error messages, while option #2 it's more clear why authorization failed.
Smaller return values. Having a method like getSpecificProperty will do much less work than getFullObject. As your data model gets more complex, you will have to include more and more data on return messages. Even if the caller only cares about one thing, they have to wait for all of them. Consider a Database application. The database might have to do several unnecessary queries to fill in fields the client will never read.
There are reason to use #1, but they aren't that valuable until you identify what properties go together and are logically a single RPC. (such as a Get)

Related

When to use multiple KieBases vs multiple KieSessions?

I know that one can utilize multiple KieBases and multiple KieSessions, but I don't understand under what scenarios one would use one approach vs the other (I am having some trouble in general understanding the definitions and relationships between KieContainer, KieBase, KieModule, and KieSession). Can someone clarify this?
You use multiple KieBases when you have multiple sets of rules doing different things.
KieSessions are the actual session for rule execution -- that is, they hold your data and some metadata and are what actually executes the rules.
Let's say I have an application for a school. One part of my application monitors students' attendance. The other part of my application tracks their grades. I have a set of rules which decides if students are truant and we need to talk to their parents. I have a completely unrelated set of rules which determines whether a student is having trouble academically and needs to be put on probation/a performance plan.
These rules have nothing to do with one another. They have completely separate concerns, different rule inputs, and are triggered in different parts of the application. The part of the application that is tracking attendance doesn't need to trigger the rules that monitor student performance.
For this application, I would have two different KieBases: one for attendance, and one for academics. When I need to fire the rules, I fire one or the other -- there is no use case for firing both at the same time.
The KieSession is the runtime for when we fire those rules. We add to it the data we need to trigger the rules, and it also tracks some other metadata that's really not relevant to this discussion. When firing the academics rules, I would be adding to it the student's grades, their classes, and maybe some information about the student (eg the grade level, whether they're an "honors" student, tec.). For the attendance rules, we would need the student information, plus historical tardiness/absence records. Those distinct pieces of data get added to the sessions.
When we decide to fire rules, we first get the appropriate KieBase -- academics or attendance. Then we get a session for that rule set, populate the data, and fire it. We technically "execute" the session, not the rules (and definitely not the rule base.) The rule base is just the collection of the rules; the session is how we actually execute it.
There are two kinds of sessions -- stateful and stateless. As their names imply, they differ with how data is stored and tracked. In most cases, people use stateful sessions because they want their rules to do iterative work on the inputs. You can read more about the specific differences in the documentation.
For low-volume applications, there's generally little need to reuse your KieSessions. Create, use, and dispose of them as needed. There is, however, some inherent overhead in this process, so there comes a point in which reuse does become something that you should consider. The documentation discusses the solution provided out-of-the box for Drools, which is session pooling.
(When trying to wrap your head around this, I like to use an analogy of databases. A session is like a JDBC connection: for small applications you can create them, use them, then close them as you need them. But as you scale you'll quickly find that you need to look into connection pooling to minimize this overhead. In this particular analogy, the rule base would be the database that the rules are executing against -- not the tables!)

Arbitrary Groups In REST

I was recently recommended a talk by Jim Webber.
And there was a very interesting point in there.
Jim says that when you think that there is a 1-1 correspondence between rows in your database, domain objects and resources in REST service. This makes it hard when want to transact work across arability groups.
No he goes on to point that if you have say 3 users and want to update them, you do then sequentially and it is very poor because you have to track each of them and handle issues if 1 out of the 3 (or how many transactions you want occur).
He mentioned the way you should handle this is to make a resource, for all of the 3 users. Resources are cheap and infinite (you can make as many as you want) so use them. So create that resource and in a single operation put their status update.
This is an extremely interesting point to me as there have been times where I have wanted to perform an operation on multiple things that i considered to be singular.
So here is an example:
Say I have a list of users. Say 100. Users would be their own thing/resource. I want to pick x amount of users out of that list (say 10 randomly) and apply 50 points to them.
I want to apply these points to these users that have no unique connection in the domain, they are just a random group of users. a arbitrary group.
How would I create a rest endpoint/resource as Jim Webber is implying to handle this operation?
Now In my admittedly old frame of mind I would go about it making a specific resource like users/points/bulk/ (or something) and pass in a list of user id's and the points I would apply them. I would never have had the mindset of treating them as a resource, I would have just had an hacky command rest endpoint to perform it.
This point Jim has pointed out is really something I never considered and is such a change of mindset, that it would really make things cleaner.
Could someone explain this to mean and give an example to how it would look
Thanks
He mentioned the way you should handle this is to make a resource, for all of the 3 users. Resources are cheap and infinite (you can make as many as you want) so use them. So create that resource and in a single operation put their status update.
...
How would I create a rest endpoint/resource as Jim Webber is implying to handle this operation?
The basic rule of thumb here is: How would you do it on the Web? As REST is just a generalization of the interaction model the Web allowed to grow to its todays size, the same concpet that proven to be successful on the Web can (and should) be used in a REST architecture.
What is a group of resources actually?! If you think about most sport activities that are played in teams, such as football or the like, almost all players can be divided into certain groups. I.e. players of Team A and players of Team B or all defensive players or all attacking players. Each of the players is its own resource but each of the available groups is its own resource as well as we could give it a name also. We can further talk about the group instead of the individual player. Which allows us to instead of reference all of the players individually, to include all of them within a single, short statement. A statement such as "Team A beat the crap out of Team B" will most likey subsume that each of the players on Team A was playing better than their counterparts in the opponent team.
It is now only a matter of providing clients with the toolset to group resources together. In a typical HTML page you could i.e. have a table representation of all the active football players of this season across all teams with a checkbox to select certain players and some control element, such as a submit button, that allows you to create a group for the selected players. The backing HTML form contains not only the actual data set you could select sepcific players from and a submit button but also a target URI where the request has to be sent to as well as a request method to use. HTML by default uses application/x-www-form-urlencoded as representation format to send the data to the server, which knows depending on the invoked endpoint, the HTTP operation used and the media type received how to process the data accordingly.
As a new resource will be created as a consequence to the previous grouping request, the server will respond with a 201 Created response code and a Location HTTP header whose value is a URI pointing to the location the newly created grouping is accessible. A client may now get redirected to that URI automatically or it can use the returned URI to invoke further operations on that resource. As the domain-model does (and probably should) not need to match a resource or affordance model, each of the invidvidual player resources as well as the team-resource may use the same database entries to present the data to the client. On updating one resource (either an individual player or the team as a whole) other resources may get influenced by this operation as well.
If you take a look at the definition of PUT in the HTTP specification, you can read something like this:
A PUT request applied to the target resource can have side effects on other resources.
Due to this side-effect it is possible for an update performed via PUT to achive somthing similar to a partial-update:
Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, or by using a different method that has been specifically defined for partial updates (for example, the PATCH method defined in RFC5789).
I.e. if you update Player 1 of Team A via PUT it creates as a side effect a partial-update of the state of Team A as this just uses the same data the data-model provides for that particular player.
In order to achive the same functionality in a REST architecture, as mentioned before, the same concepts of providing a client with structured data it can select a subset from and perform operations on that subset, such as creating a new resource for these selected elements, should be used. In contrast to the Web where HTML is dominant, the supported media-types may varry drastically in a REST architecture. Here, content-type negotiation is a very important part as this allows the server to chose the most suitable representation format that is supported by the client. Instead of using proprietary representation formats, standardized formats should be used to increase the likelihood of clients not under your control to be able to interact with your system. While there is an ongoing effort on introducing media-types that support clients with client-feedback in the form of forms similar to the ones used in HTML, there is no de-facto standard form-representation, except for HTML, yet widely accepted. There are a couple of especially JSON-based approaches, such as hal-forms, halo+json, Ion or Hydra, in the working, though, as mentioned, nothing that is really used widely in production.
As your acutal intention is to update a bunch of resources atomically, you could use PATCH here as well, without the need of creating new resources, as PATCH is defined to perform all of the instructions atomically - either all succeed or none at all. In the spec, PATCH is defined similar to how patching is understood in software engineering, by having a sequence of instructions that should be applied to a resource to transform it to a desired output. application/json-patch+json is a representation format that is quite close to the actual definition whereas application/merge-patch+json has a totally different take on it by defining default rules to apply, depending whether the request contained a modified or nullified field value. As the latter representation-format is able to only work on a single resource, the first representation-format could be used for a batch update. By targeting the collection-resource directly, JSON Pointers can be used to address the respective fields of the sub-resources in that collection directly.
To avoid data-loss via PATCH operations, due to intermediary updates between fetching the most recent state, calculating the necessary steps to apply and sending the request to the API, an optimistic locking approach should be used that is achievable via conditional requests, such as ETag.
While patching provides you with the capability to apply the changes atomically, I feel that grouping resources together, if they naturally form a group, such as in the player - team example, feels more common and reuses the interaction model proposed by REST also better IMO.

REST design principles: Referencing related objects vs Nesting objects

My team and I we are refactoring a REST-API and I have come to a question.
For terms of brevity, let us assume that we have an SQL database with 4 tables: Teachers, Students, Courses and Classrooms.
Right now all the relations between the items are represented in the REST-API through referencing the URL of the related item. For example for a course we could have the following
{ "id":"Course1", "teacher": "http://server.com/teacher1", ... }
In addition, if ask a list of courses thought a call GET call to /courses, I get a list of references as shown below:
{
... //pagination details
"items": [
{"href": "http://server1.com/course1"},
{"href": "http://server1.com/course2"}...
]
}
All this is nice and clean but if I want a list of all the courses titles with the teachers' names and I have 2000 courses and 500 teachers I have to do the following:
Approximately 2500 queries just to read the data.
Implement the join between the teachers and courses
Optimize with caching etc, so that I will do it as fast as possible.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently.
Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
My question therefore is:
1. Is it wrong if we we nest the teacher information in the courses.
2. Should the listing of items e.g. GET /courses return a list of references or a list of items?
Edit: After some research I would say the model I have in mind corresponds mainly to the one shown in jsonapi.org. Is this a good approach?
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Your colleagues have lost the plot.
Here's your heuristic - how would you support this use case on a web site?
You would probably do it by defining a new web page, that produces the report you need. You'd run the query, you the result set to generate a bunch of HTML, and ta-da! The client has the information that they need in a standardized representation.
A REST-API is the same thing, with more emphasis on machine readability. Create a new document, with a schema so that your clients can understand the semantics of the document you return to them, tell the clients how to find the target uri for the document, and voila.
Creating new resources to handle new use cases is the normal approach to REST.
Yes, I totally think you should design something similar to jsonapi.org. As a rule of thumb, I would say "prefer a solution that requires less network calls". It's especially true if amount of network calls will be less by order of magnitude.
Of course it doesn't eliminate the need to limit the request/response size if it becomes unreasonable.
Real life solutions must have a proper balance. Clean API is nice as long as it works.
So in your case I would so something like:
GET /courses?include=teachers
Or
GET /courses?includeTeacher=true
Or
GET /courses?includeTeacher=brief|full
In the last one the response can have only the teacher's id for brief and full teacher details for full.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Have you actually measured the overhead generated by each request? If not, how do you know that the overhead will be too intense? From an object-oriented programmers perspective it may sound bad to perform each call on their own, your design, however, lacks one important asset which helped the Web to grew to its current size: caching.
Caching can occur on multiple levels. You can do it on the API level or the client might do something or an intermediary server might do it. Fielding even mad it a constraint of REST! So, if you want to comply to the REST architecture philosophy you should also support caching of responses. Caching helps to reduce the number of requests having to be calculated or even processed by a single server. With the help of stateless communication you might even introduce a multitude of servers that all perform calculations for billions of requests that act as one cohesive system to the client. An intermediary cache may further help to reduce the number of requests that actually reach the server significantly.
A URI as a whole (including any path, matrix or query parameters) is actually a key for a cache. Upon receiving a GET request, i.e., an application checks whether its current cache already contains a stored response for that URI and returns the stored response on behalf of the server directly to the client if the stored data is "fresh enough". If the stored data already exceeded the freshness threshold it will throw away the stored data and route the request to the next hop in line (might be the actual server, might be a further intermediary).
Spotting resources that are ideal for caching might not be easy at times, though the majority of data doesn't change that quickly to completely neglect caching at all. Thus, it should be, at least, of general interest to introduce caching, especially the more traffic your API produces.
While certain media-types such as HAL JSON, jsonapi, ... allow you to embed content gathered from related resources into the response, embedding content has some potential drawbacks such as:
Utilization of the cache might be low due to mixing data that changes quickly with data that is more static
Server might calculate data the client wont need
One server calculates the whole response
If related resources are only linked to instead of directly embedded, a client for sure has to fire off a further request to obtain that data, though it actually is more likely to get (partly) served by a cache which, as mentioned a couple times now throughout the post, reduces the workload on the server. Besides that, a positive side effect could be that you gain more insights into what the clients are actually interested in (if an intermediary cache is run by you i.e.).
Is it wrong if we we nest the teacher information in the courses.
It is not wrong, but it might not be ideal as explained above
Should the listing of items e.g. GET /courses return a list of references or a list of items?
It depends. There is no right or wrong.
As REST is just a generalization of the interaction model used in the Web, basically the same concepts apply to REST as well. Depending on the size of the "item" it might be beneficial to return a short summary of the items content and add a link to the item. Similar things are done in the Web as well. For a list of students enrolled in a course this might be the name and its matriculation number and the link further details of that student could be asked for accompanied by a link-relation name that give the actual link some semantical context which a client can use to decide whether invoking such URI makes sense or not.
Such link-relation names are either standardized by IANA, common approaches such as Dublin Core or schema.org or custom extensions as defined in RFC 8288 (Web Linking). For the above mentioned list of students enrolled in a course you could i.e. make use of the about relation name to hint a client that further information on the current item can be found by following the link. If you want to enable pagination the usage of first, next, prev and last can and probably should be used as well and so forth.
This is actually what HATEOAS is all about. Linking data together and giving them meaningful relation names to span a kind of semantic net between resources. By simply embedding things into a response such semantic graphs might be harder to build and maintain.
In the end it basically boils down to implementation choice whether you want to embed or reference resources. I hope, I could shed some light on the usefulness of caching and the benefits it could yield, especially on large-scale systems, as well as on the benefit of providing link-relation names for URIs, that enhance the semantical context of relations used within your API.

How to retrieve data from another bounded context in ddd?

Initially, There is an app runs in Desktop, however, the app will run in web platform in the future.
There are some bounded contexts in the app and some of them needs to retrieve data from another. In this case, I don't know which approach I have to use for this case.
I thought of using mediator pattern that a bound context "A" requests data "X" and then mediator call another bound context, like B" " and gets the correct data "X". Finally, The mediator brings data "X" to BC "A".
This scenario will be change when the app runs in web, then I've thought of using a microservice requests data from another microservice using meaditor pattern too.
Do the both approaches are interest or there is another better solution?
Could anyone help me, please?
Thanks a lot!
If you're retrieving data from other bounded contexts through either DB or API calls, your architecture might potentially fall to death star pattern because it introduces unwanted coupling and knowledge to the client context.
A better approach might be is looking at event-driven mechanisms like webhooks or message queues as a way of emitting data that you want to share to subscribing context(s). This is good because it reduces coupling of your bounded context(s) through data replication across contexts which results to higher bounded contexts independence.
This gives you the feeling of "Who cares if bounded context B is not available ATM, bounded context A and C have data they need inside them and I can resume syncing later since my data update related events are recorded on my queue"
The answer to this question breaks down into two distinct areas:
the logical challenge of communicating between different contexts, where the same data could be used in very different ways. How does one context interpret the meaning of the data?
and the technical challenge of synchronizing data between independent systems. How do we guarantee the correctness of each system's behavior when they both have independent copies of the "same" data?
Logically, a context map is used to define the relationship between any bounded contexts that need to communicate (share data) in any way. The domain models that control the data are only applicable with a single bounded context, so some method for interpreting data from another context is needed. That's where the patterns from Evan's book come in to play: customer/supplier, conformist, published language, open host, anti-corruption layer, or (the cop-out pattern) separate ways.
Using a mediator between services can be though of as an implementation of the anti-corruption layer pattern: the services don't need to speak the same language, because there's an independent buffer between them doing the translation. In a microservice architecture, this could be some kind of integration service between two very different contexts.
From a technical perspective, direct API calls between services in different bounded contexts introduce dependencies between those services, so an event-driven approach like what Allan mentioned is preferred, assuming your application is okay with the implications of that (eventual consistency of the data). Picking a messaging platforms that gives you the guarantees necessary to keep the data in sync is important. Most asynchronous messaging protocols guarantee "at least once" delivery, but ordering of messages and de-duplication of repeats is up to the application.
Sometimes it's simpler to use a synchronous API call, especially if you find yourself doing a lot of request/response type messaging (which can happen if you have services sending command-type messages to each other).
A composite UI is another pattern that allows you to do data integration in the presentation layer, by having each component pull data from the relevant service, then share/combine the data in the UI itself. This can be easier to manage than a tangled web of cross-service API calls in the backend, especially if you use something like an IT/Ops service, NGINX, or MuleSoft's Experience API approach to implement a "backend-for-frontend".
What you need is a ddd pattern for integration. BC "B" is upstream, "A" is downstream. You could go for an OHS PL in upstream, and ACL in downstream. In practice this is a REST API upstream and an adapter downstream. Every time A needs the data from B , the adapter calls the REST API and adapts the info returned to A domain model. This would be sync. If you wanna go for an async integration, B would publish events to MQ with the info, and A would listen for those events and get the info.
I want to add-on a comment about analysis in DDD. Exist e several approaches for sending data to analytic.
1) If you have a big enterprise application and you should collect a lot of statistic from all bounded context better move analytic in separate service and use a message queue for send data there.
2) If you have a simple application separate your Analytic from your App in other context and use an event or command to speak with there.

Is this generic REST service a good idea?

I have a question on whether or not a particular REST-service design is good or not.
The background is of having an inhouse monolithic system (will call this "the main system") dealing with e.g. customers. Then there are external components that have additional information on persons, which may or may not correspond 1-1 with a customer in the main system.
At present there is no definite specification of what kind of data is or may be associated with a person/customer in these external components.
The proposed design I have been presented with is a REST- service that exposes an API for the external system to call in order to feed the component with this arbitrary data associated with persons.
The idea is that by doing so the main system will have a single place to go to, to get the external data for customers/persons.
A proposed requirement of this REST service is that as new types of data is loaded into it by an external component, this data is automatically made accessible by the service, without it needing to be changed in any way, or redeployed. And "new data" generally means a new type of key value set. E.g. initially the service might provide data for customer identified by a customerId. Then an external component decides to post some kind of data associated to SSN. This should automatically entail that the service can be queried for this data by supplying SSN in the request.
In order to avoid the need to change/redeploy the service I’m assuming the solution will ahve to have a very generic scheme of reference, e.g.
http://url/generic-resource-name/?id=[customerId]&keyType=cusomterId
There is really nothing in the requirements that limits the data to be associated to a person, only that it’s key be made up of one value.
And example use case sequence could be:
So to the question:
Is it a good idea to implement such a general purpose service? And how does it rhyme with the principles of REST: the noun in question that the service will operate on will have to be very generic, really nothin short of “resource” or “data”, which in itself seems like a smell to me.
So to the question: Is it a good idea to implement such a general
purpose service?
I believe not. You are going straight into Inner platform effect antipattern. You must be very careful, or you might end like Vision.
Please also read a chapter "The allure of distributed objects" from Fowler's PoEAA book. Just to be careful.