I am looking for keeping some kind of baseline for everything applied to kubernetes(or just a namespace).
Like versioning microservices in a list and then check that in to github, in case of the need to roll something back.
Check out Velero, it is a backup tool for kubernetes. I don’t think it can use git as a backend, but you could add that (or use s3 or similar).
You can write and deploy an application that Watch the resources you are interested in, e.g. all Deployment, Service and Ingress... including all changes, and then store the changes as you want. I can recommend client-go for this kind of service.
However
Like versioning microservices in a list and then check that in to github, in case of the need to roll something back.
It is more common, and recommended to work the other way around, first save your intended desired state in Git, have an CICD service to apply your changes in the cluster or clusters. This way of working is called Infrastructure as Code. The new edition of the book Kubernetes Up&Running have a new chapter (18) that describes how to work in this way.
Related
I am developing monolithic applications since many years and want to try microservices and containers now.
For learning microservices and containers I am planning a small calendar application (as a web application), where you can create dates and invite others to them. I identified three services that I have to implement:
Authentication service -> handles login, gives back a JWT
UserData services -> handles registration, shows user data, handles edits of user data (profile picture, name, short description)
Calendar service -> for creating, editing and deleting dates, inviting others to dates, viewing dates and so on.
This is the part I feel confident about, but if I did already something wrong, please correct me.
Where I am not sure what to do is how to implement the database and frontend part.
Database
I never used Neo4j or any graphbased database before, but they seem to work well for this use case and I want to learn something new. Maybe my database choice may not be relevant here, but my question is:
Should every microservice have its own database or should they access the same database?
Some data sets are connected, like every date has a creator and multiple participants.
And should the database run in a container or should it be hosted on the server directly? (I want to self-host the database)
For storing profile pictures I decided to use a shared container volume, so multiple instances of a service can access the same files. But I never used containers before, so if you have a better idea, I am open to hear it.
Frontend
Should I build a single (monolithic) frontend application, or is it useful to build some kind of micro frontend, which contains only a navigation and loads other parts like the calendar view or user data view from the above defined services? And if yes, should I pack the UserData frontend and the UserData service into one container?
MVP
I want the whole application to be useable by others, so they can install it on their own servers/cloud. Is it possible to pack the whole application (all services and frontend) into one package and make it installable in kubernetes with a single step, but in a way, that each service still has its own container? In my head this sounds necessary, because the calendar service won’t work without the userdata service, because every date needs a user who creates it.
Additional optional services
Imagine I want to add some additional but optional features like real time chat. What is the best way to check if these features are installed? Should I add a management service which checks which services exist and tells the frontend which navigation links it should show, or should the frontend application ping all possible services to check if they are installed?
I know, these are many questions, but I think they are tied together, because choices on one part can influence others.
I am looking forward for your answers.
I'm further along than you but far from "microservice expert". But I'll try my best:
Should every microservice have its own database or should they access the same database?
The rule of thumb is that a microservice should have its own database. This decouples them, making it so that your contract between services is just the API. Furthermore it keeps the logic simpler within a service in that there's less types for data that your service is responsible for handling.
And should the database run in a container or should it be hosted on the server directly?
I'm not a fan of running a database in a container. In general, a database:
can consume a lot of resources (depending on the query)
sustains long-lived connections
is something you vertically scale, rather than horizontally
These qualities reflect a poor case for containerization, imho.
For storing profile pictures I decided to use a shared container volume
Nothing wrong with this, but using cloud storage like Amazon S3 is a popular move here, just so you don't have to manage the state of the volume. i.e. if the volume goes away, do you still want your pictures around?
Should I build a single (monolithic) frontend application?
I would say yes: this is would be the simplest approach. The other big motivation for microservices is to allow separate development teams to work independently. If that's not the case here, I think you should avoid yourself the headache for micro-frontends.
And if yes, should I pack the UserData frontend and the UserData service into one container?
I'm not sure I perfectly understand here. I think it's fine to have the frontend and backend service as part of the same codebase, which can get "deployed" together. How you want to serve the frontend is completely up to you and how it's implemented. Some like to serve it through a CDN to minimize latency, but I've never seen the need.
Is it possible to pack the whole application (all services and frontend) into one package and make it installable in kubernetes with a single step?
This is use-case for Helm charts: A package manager for kubernetes.
Imagine I want to add some additional but optional features like real time chat. What is the best way to check if these features are installed? Should I add a management service which checks which services exist and tells the frontend which navigation links it should show, or should the frontend application ping all possible services to check if they are installed?
There's a thin line between what you're describing and monitoring. If it were me, I'd pick the service that radiates this information and just set some booleans in a config file or something. That's because this state is isn't going to change until reinstall, so there's no need to check it on an interval.
--
Those were my thoughts but, as with all things architecture, there's no universal truth. Some times exceptions are warranted depending on your actual circumstances.
With Kubernetes you can use the Garbage Collector to automate the deletion of dependent resources when owning resources are removed. I'm wondering the easiest method to print out the dependency tree of an owning resource, potentially limiting to a tree depth if needs be.
I understand the potential for crashing the API service given the ability to fan out to all resources in a cluster and likely why this isn't an easy feat to achieve but I've been struggling to even find usable, community supported workarounds or even discussions/issues relating to this topic (likely my poor searching skills) so any help in achieving this would be great!
To make things more concrete a specific example of an abstract kubectl get query I'd like to achieve would be something like kubectl get scheduledworkflow <workflow name> --dependents:
This would find the Kubeflow Pipelines ScheduledWorkflow resource then recurse,
That would find all Argo Workflow resources,
Then for each Workflow resource many Pod and Volume resources (there are a few other types but wanted to paint the picture of these being disparate resource types).
We typically only keep a small number of Argo Workflow resources in the cluster at anyone one time as the majority of our Workflow's spawn 1k+ Pod so we have pretty aggressive GC policies in place. Even so listing these is just painful at the moment and need to use a custom script to do it but wondering if there was a higher level CLI, SDK or API available (or any group working on this issue in the community!).
There are no ready solutions for this.
I see two options how this can be proceeded:
1 - probably this is what you already mentioned: "need to use a custom script to do it".
Idea is to get jsons of required resource groups and then process it by any available/known language like bash/python/java/etc and/or using jq. All dependent objects have ownerReference field which allows to match resources.
More information about owners and dependents
jq tool and examples
2 - Write your own tool based on kubernetes garbage collector
Kubernetes garbage collector works based on graph built by GraphBuilder:
garbage collector source code
Graph is always up to date by using `reflectors:
GarbageCollector runs reflectors to watch for changes of managed API
objects, funnels the results to a single-threaded
dependencyGraphBuilder, which builds a graph caching the dependencies
among objects
graph_builder source code to get whole logic of it.
Built graph has node type:
graph data structure
Also it's worth to mention that working with api server is more convenient using kubernetes clients libraries which are available for different languages.
I'm fairly new to Kubernetes and only got started with an example project to learn everything. I'm currently running one .NET microservice which needs a MongoDB as a database. The microservice is packed into a Docker Image and I've created a single Helm chart to properly deploy my microservice and the required MongoDB.
Now I've thought about versioning and discovered one big problem with this approach: I can't just install multiple versions of that helm chart to support running multiple versions of the same microservice because that way each microservice gets its own database which is obviously not what I want.
Does this mean I have to create two helm charts for my microservice? So one api chart containing my .NET service and one db chart containing the MongoDB? That way I can deploy the MongoDB once and have multiple versions of my .NET service pointing to this single instance. However, that way I don't have a single Helm chart per microservice but multiple ones, which increases deployment overhead I'm guessing.
Is this how it's been done? Or is there something I'm missing? All clues that point me in the right direction are very welcome!
I would recommend one chart per service. The place where helm dependencies work well is where you have a service that embeds/hides specific single other parts. And as #christopher said if your .NET service and MongoDB have different lifecycles, they shouldn't be packaged together in the same helm chart.
The most basic way to use Helm is by having a single chart that holds a single application. The single chart will contain all the resources needed by your application such as deployments, services etc.
Chart versions and appVersions
The version of the chart itself (version field in Chart.yaml). The
version of the application contained in the chart (appVersion field in
Chart.yaml). These are unrelated and can be bumped up in any manner
that you see fit. You can sync them together or have them increase
independently. There is no right or wrong practice here as long as you
stick into one.
An important point here is that you need to adopt a policy in your team on what a “chart change” means.
Helm does not enforce chart version changes. You can deploy a different chart with the same version as the previous one. So, if this is something that you want to do, you need to make sure that all teams are on the same page for versioning practices.
On the plus side, this workflow allows you to individually version charts and applications and is very flexible for companies with teams that manage separately the charts from the application source code.
Umbrella charts
However, you can also create a chart with dependencies to other charts called umbrella chart.
They are completely external using the requirements.yaml file. Using this strategy is optional and can
work well in several organizations. Again, there is no definitive
answer on right and wrong here, it depends on your team process.
Take a look: umrella-charts-example.
In other words, a collection of software elements that each have their
own individual charts but, for whatever reason (e.g. design choices,
ease of deployability, versioning complexities), must be installed or
upgraded as a since atomic unit
An umbrella chart references the version of the Helm chart itself and
not the underlying version of the container image. This means that any
change to the image version will result in chart modifications to the
individual component charts.
What to take into account when deciding on an option
There are two dimensions to take into account:
Team structure: You have to ask yourself questions like: do you have small autonomous teams that are responsible for each service? Do you have developers who have
knowledge of DevOps?
Dependencies and reproducibility: You have to ask yourself questions like: How different are the dependencies for each service? What is the risk that a change to one
service will break another? How do you reproduce the conditions of a
specific development?
Read more in useful article: helm-managing-microservices.
Speaking about versioning of microservices for backward compatibility, see Product Versioning Microservices, in general Semantic Versioning is generaly adviced.
In the broader sense - there should be an agreed phase-out roadmap for major versions, that is communicated to API consumers (together with SLAs). This is why tracking who uses your APIs is important. Take a look on this useful article about versioning management.
See example tool for tracking microservice versions- DeployHub.
I am new to web development, and the react/next/amplify ecosystem, but I have been playing around with it and it seems great. I am just having difficulties deploying my app. It seems to be an order of operations thing I might be doing wrong with the initial configuration, I am not sure.
So I followed the 5-minute tutorial on how to set Next.js up with aws-amplify using the git based deployment (so no amplify init), I then started to follow along with the todo tutorial for aws-amplify that I had previously completed, which included the aws-exports.js file. I could not deploy it because I was getting an import error for not being able to resolve ./aws-exports, which made sense because it wasn't there. I eventually performed a amplify init and had a copy, but found this is in the .gitignore file so it still failed when I tried to deploy. I took it out of my .gitignore just to see and voila a successful build.
This seemed wrong to me because why would it be in the .gitignore if it wasn't supposed to be?
I found this post that says the info is sensitive, but the documentation says otherwise.
This file is consumed by the Amplify JavaScript library for
configuration. It contains information which is non-sensitive and only
required for external, unauthenticated actions from clients (such as
user registration or sign-in flows in the case of Auth) or for
constructing appropriate endpoint URLs after authorization has taken
place.
So, can I leave this file out of the .gitignore? Is there a better way to do this? I experienced the same issue and solved it the same way deploying to Vercel, which may be my preferred deployment method bc of the easy lambda function integration (if that matters to the answer).
Thanks for any input.
Sorry for late to the party, I don't disagree with the doc which stated that aws-exports.js is non-sensitive, if you look into that file, it just bunch endpoints and nothing really enable you to "hack" into the system, unless you didn't config your #auth directive correctly. But I understand your concerns, and I don't like the style that endpoints are being exposed to clients in a single file, but they eventually will expose to user, either from the network tab or your code where you reference the endpoint.
A good reason that you should include aws-exports.js in your .gitignore is that you don't want to deal with conflicts in git, since this file will be auto-generated during build time.
If you using server-side rendering, seems like you are by using next.js, you can easily acchive only expose what is needed for client, I won't get into it as that's another topic.
We are currently serving around 140 webapps created by a bunch of different web agencies. The setup is the usual LEMP stack.
A 1.2 k8s cluster has been installed to migrate them as micro-services.
The problem we are facing is about serving static and dynamic content.
For this purpose we use, of course, two different containers (nginx and php-fpm) but we can't find an adequate solution to share data on both of them.
We hoped to be able to use versioned data containers but it is apparently not in the scope of k8s. Too bad.
gitRepo is not an option as we don't want to be dependent of a working git infra to instance pods. If it doesn't work we want to be autonomous and be able to serve traffic.
The other options (flocker, etc.) look heavy and complex in comparison to a simple data container. We also would like to be independent of data storage.
Is there an option I am not aware of? Does anyone have an advise on this?
Let me emphasise that we want to be able to version things in order to roll forward / backward easily.
Thank you for your time