Where is the best place to store constraints in an api using mongodb - mongodb

I'm writing an API which uses MongoDB as the storage backend. Let's say the API allows a consumer to query for upcoming events. Let's say some events are private, and for the current user, should not come up in the results.
Should I:
Implement this at the API level. The API code, will be responsible for these checks. The advantage seems to be that if I change storage engines (unlikely), the business code will be intact.
Implement this as a stored javascript function.

At the API level, for the reason you mentioned: you're independent of the underlying storage mechanism of your application.
A good guideline is Persistence Ignorance: make your business logic as little aware of the storage mechanism as possible. This implies that business logic shouldn't be located in your storage layer either. So stored procedures or stored JavaScript functions shouldn't contain business logic. Some advantages of this:
You can swap the underlying database with minimal effort and without having to re-implement your business logic in the new database.
All of your business logic is contained inside your application layer, making the code base easier to understand and debug.
The only functions you should store in MongoDB are 'utility' functions; functions that simplify common operations, such as string operations, but are not tied to your business logic in any way.

Related

Managing UI requirements in a microservice architecture

We have different client applications (each is built with a different UI and is targeted to a different sales channels) that are used to capture orders that ultimately need to be processed by our factory.
At first we decided to offer a single "order" microservice that would be used by all these client applications for business rules execution and data storage. This microservice will also trigger our backoffice processes such as client profile update, order analysis, documents storage to our electronic vault, invoicing, communications, etc.
The challenge we are facing is that these client applications are developed by teams that are external to ours (we are a backoffice team only). Each team responsible to develop a client application will be able to offer a different UX to their users (some will allow to save orders in an incomplete state, some wil allow to capture data using a specific worflow, some will use text fields instead of listboxes for some values, etc.).
This diversity of behaviors from client applications is an issue because our microservice logic will become very complex to be able to support all those UI requirements. Moreover, everytime a change will be made to one of the client applications, we will have to modify our microservice which is a case of strong coupling.
My questions are: What would be your best advice to manage this issue? Should we let each application capture the data the way it wants (and persist it if needed in its own database) and let them call our microservice only when an order is complete and compliant to our API contract?
Should we keep our idea of having a single "order" microservice for everyone and force each client application to capture the data the same way?
Any other option?
We want to reduce the duplication of data and business rules in our ecosystem but in the same time we don't want our 'order' microservice to become a mess.
Many thanks for your help.
Moreover, everytime a change will be made to one of the client applications, we will have to modify our microservice which is a case of strong coupling.
This rings alarm bells for me. A change to a UI shouldn't require a change to a backend service. (The exception would be if a new feature were being added to a system and the backend service needed to play a part in supporting that feature, but I wouldn't just call that a change to a client.) As you have said, it's strong coupling, and that's something to be avoided in a microservices environment.
Ideally, your service should provide a generic, programmatic API that is flexible enough to support multiple UIs (or other non-UI applications) without having any knowledge of how the UIs work.
It sounds like you have some decisions to make about what responsibilities your service will and won't take on:
Does it make more sense for your generic orders service to facilitate the storage/retrieval/completion of incomplete orders, or to force its clients to manage this somewhere else?
Does it make more sense for your generic service to provide facilities to assist in the tracking of workflows, or to force the UIs that need that functionality to find it elsewhere?
For clients that want to show list boxes, does it make sense for your generic orders service to provide APIs that aid in populating those boxes?
Should we let each application capture the data the way it wants (and persist it if needed in its own database) and let them call our microservice only when an order is complete and compliant to our API contract?
It really depends on whether you think that's the most sensible way for your service to behave. Something that will play into that will be how similar or dissimilar the needs of each UI is. If 4 out of 5 UIs have the same needs, it could well make sense to support that generically in your service. If every single UI behaves differently to the others, putting that functionality in your generic orders service would amount to storing frontend code somewhere that it doesn't belong.
It seems like there might also be some organisational considerations to these decisions. If the teams using your service are only frontend teams (i.e. without capacity/skills to build backend services), then someone will still have to build the backend functionality they require.
Should we keep our idea of having a single "order" microservice for everyone and force each client application to capture the data the same way?
Yes to the idea of having a single order service with a generic interface for everyone. With regards to forcing client applications to capture data a certain way, your API will only dictate what they need to do to create an order. You can't (and shouldn't) force anything on them about the way they capture the data before calling your service. They can do it however they like. The questions are really around whether your service supports various models of capture or pushes that responsibility back to the frontend.
What would be your best advice to manage this issue?
Collaborate with the teams that will use the service. Gather as much information as you can about the use cases in which they intend to use it. Discover what is common for the majority and choose what of that you will support. Create a semi-formal spec (e.g. well-documented Open API), share it with the client teams, ask for feedback, and iterate. For the parts of the UIs that aren't common across clients, strongly consider telling those teams they'll need to support those elements of their design themselves, especially if they represent significant work on your end.

Shared library vs REST service. What are the pros and cons?

There is:
a requirement to have a key-values pairs storage shared between multiple services
a simple table in DynamoDB
very simple logic of key-value pairs creation
Intuitively I want to put the DynamoDB table behind a REST service that will implement all the simple logic I have. Unfortunately, this means adding a lot of reliability and performance challenges to the solution, since making my service as good, resilient, and performant as DynamoDB isn't easy.
It's been a while now that I think about creating a shared library for the purpose. The library will implement the logic and connect directly to DynamoDB table. I don't anticipate a lot of changes neither in DynamoDB table, nor in logic that will be implemented in the library.
What are the possible pros and cons of both approaches?
A service is simply a packaging and deployment selection for a library. Both are absolutely valid depending on your particular needs.
I'm curious why you feel the need to wrap dynamodb at all? Is there some particular domain logic you would like to place on top of it to constrain it? DynamoDB is already a restful service... Putting your own restful service on top of it may be advantageous, but you would have to convince me of the value of doing so. If you have particular business logic that requires you constraining the functionality, packaging it as a shared library has certain advantages, especially if you can encapsulate that business logic and separate it from the implementation of DynamoDB.
I am assuming that updating the shared library will not be in your control. And clients (library users) will update whenever it suits them.
If the above assumption is true, You should always go with a rest service. Considering few things
Your rest api may use cache instead of calling dynamodb all the time.
You might want to update the schema of the data you want to put in dynamodb.
You may use another db all together.
You may have some validation logic which will certainly evolve over time.

How to structure a RESTful backend API with a database?

I want to make an API using REST which interacts (stores) data in a database.
While I was reading some design patterns and I came across remote facade, and the book I was reading mentions that the role of this facade is to translate the course grained methods from the remote calls into fine grained local calls, and that it should not have any extra logic. As an explaination, it says that the program should still work without this facade.
Here's an example
Yet I have two questions:
Considering I also have a database, does it make sense to split the general call into specific calls for each attribute? Doesn't it make more sense to just have a general "get data" method that runs one query against the database and converts it into an usable object, to reduce the number of database calls? So instead of splitting the get address to get street, get city, get zip, make on db call for all that info.
With all this in mind, and, in my case using golang, how should the project be structured in terms of files and functions?
I will have the main file with all the endpoints from the REST API, calling the controllers that handle these requests.
I will have a set of files that define those controllers. Are these controllers the remote facade? Should those methods not have logic in that case, and just call the equivalent local methods?
Should the local methods call the database directly, or should they use some sort of helper class that accesses the database?
Assuming all questions are positive, does the following structure make sense?
Main
Controllers
Domain
Database helper
First and foremost, as Mike Amundsen has stated
Your data model is not your object model is not your resource model is not your affordance model
Jim Webber did say something very similar, that by implementing a REST architecture you have an integration model, in the form of the Web, which is governed by HTTP and the other being the domain model. Resources adept and project your domain model to the world, though there is no 1:1 mapping between the data in your database and the representations you send out. A typical REST system does have many more resources than you have DB entries in your domain model.
With that being said, it is hard to give concrete advice on how you should structure your project, especially in terms of a certain framework you want to use. In regards to Robert "Uncle Bob" C. Martin on looking at the code structure, it should tell you something about the intent of the application and not about the framework¹ you use. According to him Architecture is about intent. Though what you usually see is the default-structure imposed by a framework such as Maven, Ruby on Rails, ... For golang you should probably read through certain documentation or blogs which might or might not give you some ideas.
In terms of accessing the database you might either try to follow a micro-service architecture where each service maintains their own database or you attempt something like a distributed monolith that acts as one cohesive system and shares the database among all its parts. In case you scale to the broad and a couple of parallel services consume data, i.e. in case of a message broker, you might need a distributed lock and/or queue to guarantee that the data is not consumed by multiple instances at the same time.
What you should do, however, is design your data layer in a way that it does scale well. What many developers often forget or underestimate is the benefit they can gain from caching. Links are basically used on the Web to reference from one resource to an other and giving the relation some semantic context by the utilization of well-defined link-relation names. Link relations also allow a server to control its own namespace and change URIs as needed. But URIs are not only pointers to a resource a client can invoke but also keys for a cache. Caching can take place on multiple locations. On the server side to avoid costly calculations or look ups on the client side to avoid sending requests out in general or on intermediary hops which allow to take away pressure from heavily requested servers. Fielding made caching even a constraint that needs to be respected.
In regards to what attributes you should create queries for is totally dependent on the use case you attempt to depict. In case of the address example given it does make sense to return the address information all at once as the street or zip code is rarely queried on its own. If the address is part of some user or employee data it is more vague whether to return that information as part of the user or employee data or just as a link that should be queried on its own as part of a further request. What you return may also depend on the capabilities of the media-type client and your service agree upon (content-type negotiation).
If you implement something like a grouping for i.e. some football players and certain categories they belong to, such as their teams and whether they are offense or defense players, you might have a Team A resource that includes all of the players as embedded data. Within the DB you could have either an own table for teams and references to the respective player or the team could just be a column in the player table. We don't know and a client usually doesn't bother as well. From a design perspective you should however be aware of the benefits and consequences of including all the players at the same time in regards to providing links to the respective player or using a mixed approach of presenting some base data and a link to learn further details.
The latter approach is probably the most sensible way as this gives a client enough information to determine whether more detailed data is needed or not. If needed a simple GET request to the provided URI is enough, which might be served by a cache and thus never reach the actual server at all. The first approach has for sure the disadvantage that it doesn't reuse caching optimally and may return way more data then actually needed. The approach to include links only may not provide enough information forcing the client to perform a follow-up request to learn data about the team member. But as mentioned before, you as the service designer decide which URIs or queries are returned to the client and thus can design your system and data model accordingly.
In general what you do in a REST architecture is providing a client with choices. It is good practice to design the overall interaction flow as a state machine which is traversed through receiving requests and returning responses. As REST uses the same interaction model as the Web, it probably feels more natural to design the whole system as if you'd implement it for the Web and then apply the design to your REST system.
Whether controllers should contain business logic or not is primarily an opinionated question. As Jim Webber correctly stated, HTTP, which is the de-facto transport layer of REST, is an
application protocol whose application domain is the transfer of documents over a network. That is what HTTP does. It moves documents around. ... HTTP is an application protocol, but it is NOT YOUR application protocol.
He further points out that you have to narrow HTTP into a domain application protocol and trigger business activities as a side-effect of moving documents around the network. So, it's the side-effect of moving documents over the network that triggers your business logic. There is no straight rule whether to include business logic in your controller or not, but usually you try to keep the business logic in yet their own layer, i.e. as a service that you just invoke from within the controller. That allows to test the business logic without the need of the controller and thus without the need of a real HTTP request.
While this answer can't provide more detailed information, partly due to the broad nature of the question itself, I hope I could shed some light in what areas you should put in some thoughts and that your data model is not necessarily your resource or affordance model.

What is the real benefit of using GraphQL?

I have been reading about the articles on the web about the benefits of graphql but so far I have not been able to find a single benefit of it.
One of the most common benefits mentioned in those articles are below?
No Overfetching with GraphQL.
Reducing number of calls made from client side.
Data Load Control Granularity
Evolve your API without versions.
Those above all makes sense but it is not the graphql itself that provides these benefits. Any second layer api written in java/python or any other language would be able to provide this benefits too. It is basically introducing another layer of abstraction above the data retrieval systems, rest or whatever, and decoupling the client side from that layer. After you do that everything you can do with graphql can also be done with any other language too.
Anyone can implement a say scala server that retrieves the data from various api's integrates them, create objects internally and feeds the client with only the relevant part of the data with total control on the data. This api can be easily versioned and released accordingly. Considering the syntax of graphql and how cumbersome it is and difficulty of creating a good cache around it, I can't see why would you use it really.
So the overall question is there any benefits of graphql that is provided to the application because of the graphql itself and not because you implement another layer of abstraction between your applications and your api's?
Best practices known as REST existed earlier, too.
GraphQL is more standarized than REST, safer (no injections) and syntax gives great flexibility in the area of quickly changing client needs.
It's just a good standard of best practices.
I feel GrapgQL is another example of overengineering. I would say "Best standards and practices" are "Keeping It Simple."
Breaking down and object and building a custom one before sending it to the client is very basic.

What specific issue does the repository pattern solve?

(Note: My question has very similar concerns as the person who asked this question three months ago, but it was never answered.)
I recently started working with MVC3 + Entity Framework and I keep reading that the best practice is to use the repository pattern to centralize access to the DAL. This is also accompanied with explanations that you want to keep the DAL separate from the domain and especially the view layer. But in the examples I've seen the repository is (or appears to be) simply returning DAL entities, i.e. in my case the repository would return EF entities.
So my question is, what good is the repository if it only returns DAL entities? Doesn't this add a layer of complexity that doesn't eliminate the problem of passing DAL entities around between layers? If the repository pattern creates a "single point of entry into the DAL", how is that different from the context object? If the repository provides a mechanism to retrieve and persist DAL objects, how is that different from the context object?
Also, I read in at least one place that the Unit of Work pattern centralizes repository access in order to manage the data context object(s), but I don't grok why this is important either.
I'm 98.8% sure I'm missing something here, but from my readings I didn't see it. Of course I may just not be reading the right sources... :\
I think the term "repository" is commonly thought of in the way the "repository pattern" is described by the book Patterns of Enterprise Application Architecture by Martin Fowler.
A Repository mediates between the domain and data mapping layers,
acting like an in-memory domain object collection. Client objects
construct query specifications declaratively and submit them to
Repository for satisfaction. Objects can be added to and removed from
the Repository, as they can from a simple collection of objects, and
the mapping code encapsulated by the Repository will carry out the
appropriate operations behind the scenes.
On the surface, Entity Framework accomplishes all of this, and can be used as a simple form of a repository. However, there can be more to a repository than simply a data layer abstraction.
According to the book Domain Driven Design by Eric Evans, a repository has these advantages:
They present clients with a simple model for obtaining persistence objects and managing their life cycle
They decouple application and domain design from persistence technology, multiple database strategies, or even multiple data sources
They communicate design decisions about object access
They allow easy substitution of a dummy implementation, for unit testing (typically using an in-memory collection).
The first point roughly equates to the paragraph above, and it's easy to see that Entity Framework itself easily accomplishes it.
Some would argue that EF accomplishes the second point as well. But commonly EF is used simply to turn each database table into an EF entity, and pass it through to UI. It may be abstracting the mechanism of data access, but it's hardly abstracting away the relational data structure behind the scenes.
In simpler applications that mostly data oriented, this might not seem to be an important point. But as the applications' domain rules / business logic become more complex, you may want to be more object oriented. It's not uncommon that the relational structure of the data contains idiosyncrasies that aren't important to the business domain, but are side-effects of the data storage. In such cases, it's not enough to abstract the persistence mechanism but also the nature of the data structure itself. EF alone generally won't help you do that, but a repository layer will.
As for the third advantage, EF will do nothing (from a DDD perspective) to help. Typically DDD uses the repository not just to abstract the mechanism of data persistence, but also to provide constraints around how certain data can be accessed:
We also need no query access for persistent objects that are more
convenient to find by traversal. For example, the address of a person
could be requested from the Person object. And most important, any
object internal to an AGGREGATE is prohibited from access except by
traversal from the root.
In other words, you would not have an 'AddressRepository' just because you have an Address table in your database. If your design chooses to manage how the Address objects are accessed in this way, the PersonRepository is where you would define and enforce the design choice.
Also, a DDD repository would typically be where certain business concepts relating to sets of domain data are encapsulated. An OrderRepository may have a method called OutstandingOrdersForAccount which returns a specific subset of Orders. Or a Customer repository may contain a PreferredCustomerByPostalCode method.
Entity Framework's DataContext classes don't lend themselves well to such functionality without the added repository abstraction layer. They do work well for what DDD calls Specifications, which can be simple boolean expressions sent in to a simple method that will evaluate the data against the expression and return a match.
As for the fourth advantage, while I'm sure there are certain strategies that might let one substitute for the datacontext, wrapping it in a repository makes it dead simple.
Regarding 'Unit of Work', here's what the DDD book has to say:
Leave transaction control to the client. Although the REPOSITORY will insert into and delete from the database, it will ordinarily not
commit anything. It is tempting to commit after saving, for example,
but the client presumably has the context to correctly initiate and
commit units of work. Transaction management will be simpler if the
REPOSITORY keeps its hands off.
Entity Framework's DbContext basically resembles a Repository (and a Unit of Work as well). You don't necessarily have to abstract it away in simple scenarios.
The main advantage of the repository is that your domain can be ignorant and independent of the persistence mechanism. In a layer based architecture, the dependencies point from the UI layer down through the domain (or usually called business logic layer) to the data access layer. This means the UI depends on the BLL, which itself depends on the DAL.
In a more modern architecture (as propagated by domain-driven design and other object-oriented approaches) the domain should have no outward-pointing dependencies. This means the UI, the persistence mechanism and everything else should depend on the domain, and not the other way around.
A repository will then be represented through its interface inside the domain but have its concrete implementation outside the domain, in the persistence module. This way the domain depends only on the abstract interface, not the concrete implementation.
That basically is object-orientation versus procedural programming on an architectural level.
See also the Ports and Adapters a.k.a. Hexagonal Architecture.
Another advantage of the repository is that you can create similar access mechanisms to various data sources. Not only to databases but to cloud-based stores, external APIs, third-party applications, etc.
You're right,in those simple cases the repository is just another name for a DAO and it brings only one value: the fact that you can switch EF to another data access technique. Today you're using MSSQL, tomorrow you'll want a cloud storage. OR using a micro orm instead of EF or switching from MSSQL to MySql.
In all those cases it's good that you use a repository, as the rest of the app won't care about what storage you're using now.
There's also the limited case where you get information from multiple sources (db + file system), a repo will act as the facade, but it's still a another name for a DAO.
A 'real' repository is valid only when you're dealing with domain/business objects, for data centric apps which won't change storage, the ORM alone is enough.
It would be useful in situations where you have multiple data sources, and want to access them using a consistent coding strategy.
For example, you may have multiple EF data models, and some data accessed using traditional ADO.NET with stored procs, and some data accessed using a 3rd party API, and some accessed from an Access database living on a Windows NT4 server sitting under a blanket of dust in your broom closet.
You may not want your business or front-end layers to care about where the data is coming from, so you build a generic repository pattern to access "data", rather than to access "Entity Framework data".
In this scenario, your actual repository implementations will be different from each other, but the code that calls them wouldn't know the difference.
Given your scenario, I would simply opt for a set of interfaces that represent what data structures (your Domain Models) need to be returned from your data layer. Your implementation can then be a mixture of EF, Raw ADO.Net or any other type of Data Store/Provider. The key strategy here is that the implementation is abstracted away from the immediate consumer - your Domain layer. This is useful when you want to unit test your domain objects and, in less common situations - change your data provider / database platform altogether.
You should, if you havent already, consider using an IOC container as they make loose coupling of your solution very easy by way of Dependency Injection. There are many available, personally i prefer Ninject.
The domain layer should encapsulate all of your business logic - the rules and requirements of the problem domain, and can be consumed directly by your MVC3 web application. In certain situations it makes sense to introduce a services layer that sits above the domain layer, but this is not always necessary, and can be overkill for straightforward web applications.
Another thing to consider is that even when you know that you will be working with a single data store it still might make sense to create a repository abstraction. The reason is that there might be a function that your application needs that your ORM du jour either does badly (performance), not at all, or you just don't know how to make the ORM bend to your needs.
If you are wrapping your ORM behind a well thought out repository interface, you can easily switch between different technologies as you see fit. It's not uncommon in my repositories to see some methods use EF for their work and others to use something like PetaPoco, or (gasp) ADO.net code. The repository abstraction enables you to use exactly the right tool for the job at hand without leaking these complexities into the client code.
I think there is a big misunderstanding of what many articles call "repository." And that's why there are doubts about what real value those abstractions bring.
In my opinion the repository in it's pure form is IEnumerable, while you and many articles are talking about "data access service."
I've blogged about it here.