Reading material for distributed systems from practical aspect - distributed-computing

I have been developing a distributed application on top of Akka. As the application is getting mature, I see problems related to distributed systems. Forexample,
during a rolling update of my cluster, some nodes have old jar files and some nodes have new jar file (because it is updated but other are not). This means that within the code I have to support both old and jar code.
similarly during a rolling update, I can have old and new config on different nodes at the same time.
currently, I am using Postgres database at the backend. If the VM which is hosting database is down for update, all the other nodes cannot write any data inside.
I have basic idea on how I can fix the above problems but I would also like to know how others have been solving these problems. So, is there any book which focuses on distributed systems from practical aspect?

Related

Kubernetes shared library among pods

I have 15 pods that run different PHP applications in a Kubernetes cluster, each application must include a PHP library updated regularly, it's always the same lib/version for all pods.
What is the best approach to share a library with all pods?
I tried to find out a solution but my doubts persist.
What I thought:
Include the lib in the application container during the build, this solution create a consistent application but I have to re-deploy all 15 applications. Make sense?
Include the lib as shared volume among all pods, in this case I can update only the lib and all application will be updated. It seems better even if I don't like this option because each application depends on the shared volume.
What are your thoughts regarding this issue?
Always make your images be self-contained. Don't build an image that can't run unless something outside the container space is present, or try to use a volume to share code. (That is, pick your first option over the second.)
Say you have your shared library, and you have your 15 applications A through O that depend on it. Right now they're running last week's build image: service-a:20210929 and so on. Service A needs a new feature with support in the shared library, so you update that and redeploy it and announce the feature to your users. But then you discover that the implementation in the shared library causes a crash in service B on some specific customer requests. What now?
If you've built each service as a standalone image, this is easy. helm rollback service-b, or otherwise change service B back to the 20210929 tag while service A is still using the updated 20211006 tag. You're running mixed versions, but that's probably okay, and you can fix the library and redeploy everything tomorrow, and your system as a whole is still online.
But if you're trying to share the library, you're stuck. The new version breaks a service, so you need to roll it back; but rolling back the library would require you to also know to roll back the service that depends on the newer version, and so on. You'd have to undeploy your new feature even though the thing that's broken is only indirectly related to it.
If the library is really used everywhere, and if disk space is actually your limiting factor, you could restructure your image setup to have three layers:
The language interpreter itself;
FROM language-interpreter, plus the shared library;
FROM shared-library:20211006, plus this service's application code.
If they're identical, the lower layers can be shared between images, and the image pull mechanics know to not pull layers that are already present on the system. However, this can lead to a more complex build setup that might be a little trickier to replicate in developer environments.
Have you considered making a new microservice that has that shared library and the other 15 pods send requests to that one microservice to get what they need?
If this architecture works you would only have to update one deployment when that shared library is updated.
in our company, we have lots of teams using an common dict library. since we generate this library from tools, and make sure generated code is ok to go, what we do in vm env is to push this library to all servers, all no one need to worry about lib version issue.
we are moving to k8s, and require module/svc owner to keep up with library version changing by deploy new image. this changes work mode. last week, we found a bug caused by one module forget to update dict library :)
so, we are thinking moving some of configuration-like library to deployed by configuration center, and it can publish configurations dynamically.

MongoDB Atlas Projects/Clusters

I have recently finished a course for the merng tech stack and now I would like to take what I have learnt and create a project of my own. During the setup for MongoDB Atlas (Free tier) on my new project, I thought of these questions:
Should I start a new project on my MongoDB Atlas account and create a new cluster in that project? Or just create a new cluster in the previous project? (I would assume I should start a new project as the new one has no relevance to the previous one)
Why would/can you have more than one cluster for one project?
I'm still fairly new to this tech stack and would like some clarity on these questions, so I apologise in advance if these come across as stupid. Thanks.
If you're on the free tier, it's somewhat irrelevant as you can only have a single free cluster by project.
As to why you could want more than one cluster for a single project, it's mostly relevant for bigger and more complex projects. I expect a personal project to be able to scope itself to a single cluster. Where I work at, we mostly use clusters to separate domains between teams. It's also one of the easiest permission restriction to set. If you really get down to it, multiple clusters is a mean of organizing a project and you may want different configurations between your clusters, maybe you have a cluster where frequent backups is less necessary and since backups are very costly, you want to make sure you backup frequently only what needs to be.
Update
You might also want to explore sharding to remain on a single cluster, but that is also a costly and complex solution compared to maintaining multiple clusters since finding a relevant shard key to distribute equally the load is not a benign task. We've also moved away from separating clusters by domain, we now separate databases by domain. Databases are then distributed across clusters to balance the loads.

How can I compactly store a shared configuration with Kubernetes Kustomize?

First, I'm not sure this question is specific enough for Stack Overflow. Happy to remove or revise if someone has any suggestions.
We use Kubernetes to orchestrate our server side code, and have recently begun using Kustomize to modularize the code.
Most of our backend services fit nicely into that data model. For our main transactional system we have a base configuration that we overlay with tweaks for our development, staging, and different production flavors. This works really well and has helped us clean things up a ton.
We also use TensorFlow Serving to deploy machine learning models, each of which is trained and at this point deployed for each of our many clients. The only way that these configurations differ is in the name and metadata annotations (e.g., we might have one called classifier-acme and another one called classifier-bigcorp), and the bundle of weights that are pulled from our blob storage (e.g., one would pull from storage://models/acme/classifier and another would pull from storage://models/bigcorp/classifier). We also assign different namespaces to segregate between development, production, etc.
From what I understand of the Kustomize system, we would need to have a different base and set of overlays for every one of our customers if we wanted to encode the entire state of our current cluster in Kustomize files. This seems like a huge number of directories as we have many customers. If we have 100 customers and five different elopement environments, that's 500 directories with a kustomize.yml file.
Is there a tool or technique to encode this repeating with Kustomize? Or is there another tool that will work to help us generate Kubernetes configurations in a more systematic and compact way?
You can have more complex overlay structures than just a straight matrix approach. So like for one app have apps/foo-base and then apps/foo-dev and apps/foo-prod which both have ../foo-base in their bases and then those in turn are pulled in by the overlays/us-prod and overlays/eu-prod and whatnot.
But if every combo of customer and environment really does need its own setting then you might indeed end up with a lot of overlays.

How to synchronize deployments (especially of database object changes) on multiple environments

I have this challenge. I am the DevOps engineer and a software engineer in a team where months back, the developers moved from having a central Oracle DB to having the DB on a CentOS VM on their individual laptops. The move from a central DB was to reduce dependency on the DBAs and also to eliminate issues that stemmed from inconsistent data.
The plan for sharing and ensuring synchronization of the Database with everyone on the team was that each person will share change scripts with everyone. The problem is that we use Skype for communication (we just setup slack but are yet to start using it fully), and although people sometimes post the text of DB change scripts, it could be missed by some. The other problem is that some developers miss posting the changes. Further, new releases are deployed in Production without being deployed on the Test and Demo environments.
This has posed a serious challenge for us, especially myself who of recent, became responsible for ensuring that our Demo deployments were in sync with the Production deployments.
Most of the synchronization issues border on the lack of sync of the Database due to missing change scripts or missing DB objects. Oracle is our DB of preference.
A typical deployment in the Demo environment is a very painful process that involves testing an application and as issues occur due to missing DB table columns, functions, stored procs, we have to look for the missing DB objects, apply them to the DB and then continue until all issues are resolved.
How can I solve this problem to ensure smooth, painless and less time-consuming deployments? Can migrating our applications to Docker help with the DB synchronization issues and the associated lack of discipline of the developers? What process can we put into place to improve in this area?
Thank you very much in advance for your help.
Have a look # http://www.dbmaestro.com
I strongly recommend you to join the live demo session
DBmaetro TeamWork can help you merge the changes from multiple DBs into a single shared DB and to move safely the changes from one environment to the other
Danny

Version Management in Large SOA/Microservice Architectures

We are about to embark on a large programme of work to migrate a small number of hugely monolithic 3 tier frameworks into a SOA/Microservice architecture. However there is one thing that I haven't really managed to nail down, version management (note the use of the word management, not control)
One of the core principles of this programme is that each component is absolutely independent, and therefore is designed, developed, built, versioned, deployed, operated, monitored and deprecated independently of all other Consumers and Services. This is the right principle and therefore means that the future holds 15+ clients and 50+ services. In operation we need to quickly and very reliably know all the dependencies. In a world where a service may have 3 or 4 versions of its API in production and a consumer may use 20+ services the dependency tree very quickly becomes large and complex.
So my question is how do you guys manage this? How do you maintain your "enterprise version matrix" (if that is even the correct terminology)?