We are currently using direct DB connection to query mongodb from our scripts and retrieve the required data.
Is it advisable / best practice to make the data retrieval from DB as a microservice.
It does until it doesn't :)
A service needs to get its data from somewhere and a database is a good start. If you have high loads you may find that you need to add a cache in the middle see this post from Instagram engineering https://instagram-engineering.com/thundering-herds-promises-82191c8af57d
edit (after comment)
generally speaking, a service should own its database and other services shouldn't access another database service directly only via its API. The idea is to keep services autonomous and enable them to evolve independently.
Depending on the size of microservice, that's now always practical since it can make the overhead of having the service be more of the utility it provide (I call this nanoservices). Also, if you have a lot of services you don't want to allow each one to talk to any other (even not via the DB) since you just get a huge mess. The way I see it there should be clear logical boundaries (services or microservices) and then within each such logical services you may find that it makes sense to have more than one "parts" (which I call aspects) e.g. they have different scaling needs or different suitable technologies etc. When you set things this way aspects can access the same database and services shouldn't (and you can still tame the chaos :) )
One last thing to think about - who said API is only a REST API, you can add views on top of the data that belongs to another service and as long as you treat that like an API (security, versioning etc.) you can have other services access that as well
Related
I am developing monolithic applications since many years and want to try microservices and containers now.
For learning microservices and containers I am planning a small calendar application (as a web application), where you can create dates and invite others to them. I identified three services that I have to implement:
Authentication service -> handles login, gives back a JWT
UserData services -> handles registration, shows user data, handles edits of user data (profile picture, name, short description)
Calendar service -> for creating, editing and deleting dates, inviting others to dates, viewing dates and so on.
This is the part I feel confident about, but if I did already something wrong, please correct me.
Where I am not sure what to do is how to implement the database and frontend part.
Database
I never used Neo4j or any graphbased database before, but they seem to work well for this use case and I want to learn something new. Maybe my database choice may not be relevant here, but my question is:
Should every microservice have its own database or should they access the same database?
Some data sets are connected, like every date has a creator and multiple participants.
And should the database run in a container or should it be hosted on the server directly? (I want to self-host the database)
For storing profile pictures I decided to use a shared container volume, so multiple instances of a service can access the same files. But I never used containers before, so if you have a better idea, I am open to hear it.
Frontend
Should I build a single (monolithic) frontend application, or is it useful to build some kind of micro frontend, which contains only a navigation and loads other parts like the calendar view or user data view from the above defined services? And if yes, should I pack the UserData frontend and the UserData service into one container?
MVP
I want the whole application to be useable by others, so they can install it on their own servers/cloud. Is it possible to pack the whole application (all services and frontend) into one package and make it installable in kubernetes with a single step, but in a way, that each service still has its own container? In my head this sounds necessary, because the calendar service won’t work without the userdata service, because every date needs a user who creates it.
Additional optional services
Imagine I want to add some additional but optional features like real time chat. What is the best way to check if these features are installed? Should I add a management service which checks which services exist and tells the frontend which navigation links it should show, or should the frontend application ping all possible services to check if they are installed?
I know, these are many questions, but I think they are tied together, because choices on one part can influence others.
I am looking forward for your answers.
I'm further along than you but far from "microservice expert". But I'll try my best:
Should every microservice have its own database or should they access the same database?
The rule of thumb is that a microservice should have its own database. This decouples them, making it so that your contract between services is just the API. Furthermore it keeps the logic simpler within a service in that there's less types for data that your service is responsible for handling.
And should the database run in a container or should it be hosted on the server directly?
I'm not a fan of running a database in a container. In general, a database:
can consume a lot of resources (depending on the query)
sustains long-lived connections
is something you vertically scale, rather than horizontally
These qualities reflect a poor case for containerization, imho.
For storing profile pictures I decided to use a shared container volume
Nothing wrong with this, but using cloud storage like Amazon S3 is a popular move here, just so you don't have to manage the state of the volume. i.e. if the volume goes away, do you still want your pictures around?
Should I build a single (monolithic) frontend application?
I would say yes: this is would be the simplest approach. The other big motivation for microservices is to allow separate development teams to work independently. If that's not the case here, I think you should avoid yourself the headache for micro-frontends.
And if yes, should I pack the UserData frontend and the UserData service into one container?
I'm not sure I perfectly understand here. I think it's fine to have the frontend and backend service as part of the same codebase, which can get "deployed" together. How you want to serve the frontend is completely up to you and how it's implemented. Some like to serve it through a CDN to minimize latency, but I've never seen the need.
Is it possible to pack the whole application (all services and frontend) into one package and make it installable in kubernetes with a single step?
This is use-case for Helm charts: A package manager for kubernetes.
Imagine I want to add some additional but optional features like real time chat. What is the best way to check if these features are installed? Should I add a management service which checks which services exist and tells the frontend which navigation links it should show, or should the frontend application ping all possible services to check if they are installed?
There's a thin line between what you're describing and monitoring. If it were me, I'd pick the service that radiates this information and just set some booleans in a config file or something. That's because this state is isn't going to change until reinstall, so there's no need to check it on an interval.
--
Those were my thoughts but, as with all things architecture, there's no universal truth. Some times exceptions are warranted depending on your actual circumstances.
Background:
I'm building an API service app. The app is just like any other, you send an HTTP request and receive a response. This seems simple up until I start thinking about user registration, payments, authentication, logging and so on.
Application:
tl;dr simple app diagram
Endpoints listening for HTTP requests and doing all the request related work. This is the core of the service, what the service user would use this app for. Directly not accessible to the end user (unless somehow it knows the url). Python flask server, deployed on google cloud RUN.
API gateway acting like a proxy and a single access point forwarding the requests to the endpoints. This is the service access point for the end users. This part will also be responsible for authentication, limitations, logging and tracking the use of the API endpoints. Python flask server, deployed on google cloud RUN.
Website including documentation, demo and show off of API calls through API gateway, registration, payment (thinking of Stripe) etc. VueJS app on NodeJS server on google cloud compute VM.
Database storing credentials of registered users, payment information and auth keys. Not implemented yet.
Problems:
Is this architecture proper? What could be done differently or improved? How could I further simplify all the interactions between separate parts of the app? Am I not missing any essential parts?
Haven't yet implemented the database part and I'm not sure what should I
use? There are plenty of options on google cloud. Also I could go with something simple and just install a DB with http/JSON interface on google cloud compute VM. How do I chose the DB? Given such an app, what would be the best choice?
Please recommend literature/blogs/other sources of info on similar app
architecture for new developers not familiar with it?
This is pretty open ended, but here are some general comments:
Think about how your UI will work. Are you setting up a static app served directly from cloud storage or do you need something rendered on the server? Personally I prefer separating UI from API when I can but you need to be aware of things like search engine optimization. Even if you need to render some content dynamically your site can still be static. Take a look at static site generators like Gatsby. I haven't had to implement a server rendered UI in years and that makes me happy.
API gateway might be fine, but you don't really need it for anything. It might be simpler to start without it and concentrate on what actually matters. If your APIs are being called by an external client you can't trust the calls anyways and any API key you might be using will be exposed. I'd say don't worry about it for a single app. That being said, if you definitely want to use a GW then use one, just be aware that it is mostly a glorified proxy and not some core part of your architecture.
Make sure your API implementations don't store any local state so you can rely on Cloud Run scaling your services up and down. Definitely don't ever store state directly inside your containers. If you need state on the server it needs to be in some external data store.
Use JWTs or an external IDM (that will generate JWTs) for authentication. Keep session data on the client side as much as possible and pass the JWT in every API call to authenticate the caller. If you are implementing login on your own the only APIs you need to expose without tokens are for auth and password recovery, which you can separate into their own service.
Database selection depends on how well you understand your processes, how transactional your services are and your existing skillset. Overall I would use what you are comfortable with, you can probably succeed with a lot of things. Certain NoSQL flavors can seem simple on the surface but if you don't have a clear understanding on the types of queries you need to run they can get tedious to work with. Generally you should stick to relational databases for OLAP style implementations and consider NoSQL for OLTP. Personally I like MongoDB and it is very popular, probably because it sort of sits in the middle of the pack which makes it fit a lot of applications. Using MongoDB also makes you cloud agnostic since it is available on every platform. Using platform specific database flavors can lock you down to a specific vendor.
Whatever you do, don't start installing things on VMs. You can be almost 100% sure you are doing it wrong if this comes up. Remember, the services you consume don't all have to be managed by Google or even run on GCP. You can get MongoDB capacity directly from MongoDB who manage it on your behalf on all of the Big3 cloud vendors.
At least think about the long term, even if you don't necessarily need to have it impact your architecture right now. If you are expecting your app to be up for years try to make it more platform agnostic than less. This might mean sticking away from some really platform specific serverless features that will force you to jump a couple of extra hoops. If you are using Cloud Run you are using containers which already makes your app pretty portable, don't lock it to one platform by using a lot of platform specific features. That being said, don't stay away from them either. You should always go for the low hanging fruit, so don't try to avoid using things like secrets manager etc. If your app has a short lifespan and you need really fast time to market then don't worry about it.
Just my 2c, what you are doing is very generic and can be done in a lot of different ways.
For example I have 2 databases. One of them is called ecommerce which contains real customer information. Another is called ec1 which basically contains only views from tables of ecommerce.
We use our ec1 database to connect to our website or apps. How secure is this method in terms of back end security?
Only exposing ec1 is better than exposing ecommerce because you can reset ec1 using your "safe" values in case of corruption and you can keep some secret data only stored in ecommerce if it doesn't need to be used by your website or your app.
However, this is only a small portion of backend security. Having two different databases with real data and data views doesn't matter a lot if someone can access your server OR can corrupt your data.
I mean, if someone found a way to get some data he should be not authorized to read, it is bad even if it comes from ec1 and not from ecommerce
So yeah, exposing only views is a BETTER solution, but nothing can be said on the overall security because it mainly doesn't depend on that
EDIT: A detailed explaination of backend security is way beyond the possibility of a simple stackoverflow answer (and probably i am not the best teacher) but for basic server security you must take care of:
- Firewall to stop every request but your webapps ones.
- Updated software
- good database passwords
- The user you use for your application queries must only be able to perform operations on ecl1 database, while the views should be generated with a cron and using a different user
These are the main security enhancement tips that comes to my mind
Imagine we have 2 services: Product and Order. Based on my understanding of SOA, I know that each service can have its own data store (a separate database, or a group of tables in the same database). But no Service is allowed to touch the data store of another Service directly.
Now, imagine we have stored the product and order data independently inside Product and Order Services. In the Order Service, we can identify products by their ID.
My question is: With this architecture, how can I display the list of orders and product details on the "same" page?
My understanding is that I should get the list of OrderItems from OrderService. Each OrderItem has a ProductID. Now, if I make a separate call to ProductService to retrieve the details about each Product, that would be very inefficient.
How would you approach this problem?
Cheers,
Mosh
I did some research and found 2 different solutions for this.
1- Services can cache data of other Services locally. But this requires a pub/sub mechanism, so any changes in the source of the data should be published so the subscribing Services can update their local cache. This is costly to implement, but is the fastest solution because the Service has the required data locally. It also increases the availability of a Service by preventing it from being dependent to the data of other Services. In other words, if the other Service is not available, it can still do its job by its cache data.
2- Alternatively, a Service can query a "list" of objects from another Service by supplying a list of identifiers. This prevents a separate call to be made to the target service to get the details of a given object. This is easier to implement but performance-wise, is not as fast as solution 1. Also, in case the target Service is not available, the source Service cannot do its job.
Hope this helps others who have come across this issue.
Mosh
DB integration (which is really what you are talking about when two services share a table in a DB) is wrong at so many levels!
It completely breaks some of the major principles of software engineering
loose coupling,
encapsulation
separation of concerns
A service should be (to earn that name) completely independent, namely:
it must not rely on others to ensure the consistency and coherence of its data
it must not rely on others to guaranty the security of its data
it must not depend on external implementations (only interfaces)
Two services that share data at the DB level are unable to guarantee any of the former.
The fact that you "control" both services is completely irrelevant. Today you control... tomorrow you might want to outsource or replace one of the services. That should be as simple as ensuring the proper interfaces are in place.
Imagine both services that share a table with some field (varchar) in it. Now one service needs to change that field to numeric... bang the other service stops functioning - loose coupling goes down the drain.
Most of the time the trick lies in properly defining the service scope and clearly stipulating what a service do and what it doesn't do. You should also avoid turning everything into a service. Set your service granularity to high and services will start popping everywhere and integration headaches will escalate.
That being said, there are some situations where data integration between services poses some challenges. The main premiss do, should always be - data can belong only to one service. Data is intrinsically tied to business logic that affects data consistency and coherence and as such there should never be more than one service controlling any given data.
Another approach would be to have some sort of data source that lives outside of the SOA services. This data source could be considered your cache of the data, your operational data source or even a data warehouse. Extraction packages can export the data from the services (and/or some sort of real time mechanism). You can query this data source how you want.
The advantage of this approach is that the SOA black box is maintained and you can swap out a service knowing how you have coupled it.
Disadvantage is the added complexity and maintenance overhead.
SOA is just a buzz-phrase for deploying components behind web services. How many data stores you have is entirely up to you. In some cases it makes sense to have partitioned data behind individual components, and in other cases all the data lives behind one service, and in other cases many components that expose service interfaces connect to the same database via the database's connection protocol. Approach the problem by approaching the problem, not be imposing artificial constraints.
I don't think there is any principle in SOA that services should have separate data store. In general it is actually impractical. Yes you can have product and order service and the client can do the join using web service call as you said and this may be acceptable in some scenario. But that doesn't mean that you cannot have a specific service for a client if you already know the client's behaviour and performance requirements.
What I mean is that you should have a search service that returns orders and products with the join done in database. This is practical and would solve your business problem.
It is unfortunate to see this whole discussion being deteriorating in a "can I use a shared database or not in SOA" statement quest, which is totally irrelevant and does not help answering the original question at all.
More then often in a real world situation the data is already stored in different systems to start with. Customer data for example is coming from CRM, product data from SAP, contract data from yet another different source.
It is not a quest for getting this data technically together, rather than an understanding there is only one source of the data. To differently put it, there is only one owner of the data within your enterprise, who is solely responsible for maintaining it and ensuring the correct data quality.
Storing data locally for performance reasons means replicating data, which is more than often a dangerous venture, unless you have a solid caching strategy in place. I think Mosh has given some sensible answers when faced with an existing application landscape.
Let me start by saying: this is my 1st post here, this is a bit lenghty, and I havent done Windows Forms development in years....with that in mind please excuse me if this isn't directly a programming question and please bear with me as I really need the help!!
I have been asked to develop a Windows Forms app for our company that talks to a central (local area network) Linux Server hosting a PostgreSQL database. The app is to allow users to authenticate themselves into the system and thereafter conduct the usual transactions with the PG database. Ordinarily, I would propose writing a webforms app against Mono, but the clients need to utilise local resources such as USB peripheral devices, so that is out of the question. While it might not seem clear, my questions are italised below:
Dilemma #1:
The application is meant to be always connected. How should I structure my DAL/BLL - Should this reside on the server or with the client?
Dilemma #2:
I have been reading up on Client Application Services (CAS), and it seems like a great fit for authentication, as everything is exposed via URIs. I know that a .NET Data Provider exists for PostgreSQL, but not too sure if CAS will all work on a Linux (Debian) server? Believe me, I would get my hands dirty and try myself, but I need to come up with a logical design first before resources are allocated to me for "trial purposes"!
Dilemma #3:
If the DAL/BLL is to reside on the server, is there any way I can create data services, and expose only these services to authenticated clients. There is a (security) requirement whereby a connection string with username and password to the database cannot be present on any client machines...even if security on the database side is quite rigid. I'm guessing that the only way for this to work would be to create the various CRUD data service methods that are exposed by an ASP.NET app, and have the WindowsForms make a request for data or persist data to the ASP.NET app (thru a URI) and have that return a resultset or value. Would I be correct in assuming this? Should I be looking into WCF Data Services? and will WCF work with a non-SQL Server database?
Thank you for taking the time out to read this, but know that I am desperately seeking any advice on this! THANKS A MILLION!!!!
EDIT:
I am considering also using NHibernate as my ORM
Some parts of your questions are complicated and beyond my expertise. However, in general you can do almost anything you put effort into, CAP theorem and the like aside.
DAL/BLL stuff in general can reside in any of the tiers. I put a lot of this in my database and some in the middle tier, however this is to allow re-use in different environments which may or may not be a goal for you. The thing is I would think through carefully the separation of concerns issues here and what sorts of centralization of logic you want to place. The further back, the more re-usable this becomes but this is not always a free tradeoff.
I am not entirely familiar with CAS but it looked like AJAX kinds of stuff from what I saw on the MSDN web site. That could be wrong, but if it is right, then you have an issue in that such requests may be stateless and this could be an issue if you need a constant connection.
On the whole based on what you are saying it sounds cleanest to do a two tier rather than a three tier app, and have the DAL/BLL sit on the client, possibly supported by stored procedures in the server. You can then set PostgreSQL up to authenticate against whatever you use on your network (KRB5 if AD is what I would recommend). This simplifies your data access, and it allows you to control permissions based on the authentication against the database. Since you can authenticate users based on AD, you can then set permissions accordingly.
One important consideration is going to be number of connections. PostgreSQL does have some places where every current connection must be checked and iterated through, and connection startup and tear-down overhead in some cases can be significant. So one important decision will involve connection pooling. Whether or not you use connection pooling to boost performance will depend on what you are doing but I have seen cases where PostgreSQL has handled 600 connections without serious problems.