Should dependencies between Helm charts reflect dependencies between microservices? - kubernetes

Given a following scheme of services and their dependencies I would like to engineer a set of Helm charts.
API Gateway calls Service A and Service C
Service A calls Service B
Service B calls Database
Service C calls Service B and Service D
At the moment I see two alternatives:
Each of the 6 components in a diagram below is a single chart and each arrow in a diagram is a single dependency.
There's an Umbrella chart that has a dependency on all other charts. The Database chart is a dependency of Service B chart.
Helm documentation suggest going with option 2. I am however more keen towards option 1 due to an ease of local development and CI/CD pipeline.
Example scenario: developer is refactoring Service C and he wants to run the code he changed and test it.
Option 1. Developer installs a Service C chart only.
Option 2: Developer would have to either:
install an Umbrella chart which leads to waste of a CPU and memory resources because of running unneeded services like Service A or API Gateway, which doesn't scale well with the complexity of the system;
install Service C, then Service B and then Service D, which also doesn't scale well with the complexity of the system because it requires to perform many manual actions and also require from developer to be faimiliar with the architecture of the system in order to know what charts needs to be installed.
I would like to make an educated decision on which alternative to take. I am more keen towards option 1, but Helm docs and also few examples I was able to find on the Internet (link) are also going with option 2, so I think I might be missing something.

I would recommend one chart per service, with the additional simplification of making the "service B" chart depend on its database. I would make these charts independent: none of the services depend on any other.
The place where Helm dependencies work well is where you have a service that embeds/hides specific single other parts. The database behind B is an implementation detail, for example, and nothing outside B needs to know about it. So B can depend on stable/postgres or some such, and this works well in Helm.
There's one specific mechanical problem that causes problems for the umbrella-chart approach. Say service D also depended on a database, and it was the same "kind" of database (both use PostgreSQL, say). Operationally you want these two databases to be separate. Helm will see the two paths umbrella > B > database and umbrella > D > database, and only install one copy of the database chart, so you'll wind up with the two services sharing a database. You probably don't want that.
The other mechanical issue you'll encounter using Helm dependencies here is that most resources are named some variant of {{ .Release.Name }}-{{ .Chart.Name }}. In your option 1, say you do just install service C; you'd wind up with Deployments like service-C-C, service-C-B, service-C-database. If you then wanted to deploy service A alongside it, that would introduce its own service-A-B and service-A-database, which again isn't what you want.
I'm not aware of a great high-level open-source solution to this problem. A Make-based solution is hacky, but can work:
# -*- gnu-make -*-
all: api-proxy.deployed
%.deployed:
helm upgrade --install --name $* -f values.yaml ./charts/$*
touch $#
api-proxy.deployed: a.deployed c.deployed
a.deployed: b.deployed
c.deployed: b.deployed d.deployed

Related

How to sync client and backend while are in different repositories and having different pipelines

That's not an actual problem that I have but I would like to know what are the different approach that people are taking in order to solve a very common scenario.
You have one or many microservices, and each of those have schemas and an interface that clients are using to consume resources.
We have a website in a different repo that is consuming data from one of those microservices, let's say REST API.
Something like
Microservice (API): I change the interface meaning that the JSON response is different.
Frontend: I make changes in the frontend to adapt the response from the microservice.
If we deploy the Microservice before deploying the frontend you will brake the frontend site.
So you need to make sure that some have deployed the new version and then deploy the microservice.
This is the manual approach but hos is the people tracking that in an automated way like not be able to make a deployment without having the correct version of the frontend deployed.
One of the safest one is trying to be always backward compatible by using versioning on service level that means having different version of the same service when you need to introduce a backward incompatible change.
Lets assume you have a microservice which serves products in a rest endpoint like this
/api/v1/products
when you do your backward incompatible change you should introduce the new version by keeping the existing one still working
/api/v1/products
/api/v2/products
You should set a sunset for your first service endpoint and communicate this with your clients. In your case it is the frontend part but in other situations there could be so many other client out there (different frontend services, different backend services etc.)
The drawback of this approach you may need to support several version of the same service which could be tricky but it is inevitable. Communication with clients would also be tricky in many situation.
On the other hand it gives you true power of microservice isolation and freedom.
I think If you use docker in your DevOps env you can use docker-compose with depends_on property depends_on startup-order OR you should create a script bash (for example) that check the correct version of the frontend deployed before continue and included in your pipeline

Advantages of Templates ( ie infrastructure as code) over API calls

I am trying to setup a module to deploy resources in the cloud (it could be any cloud provider). I don't see the advantages of using templates (ie. the deploy manager) over direct API calls :
Creation of VM using a template :
# deployment.yaml
resources:
- type: compute.v1.instance
name: quickstart-deployment-vm
properties:
zone: us-central1-f
machineType: f1-micro
...
# bash command to deploy yaml file
gcloud deployment-manager deployments create vm-deploy --config deployment.yaml
Creation of VM using a API call :
def addInstance(http, listOfHeaders):
url = "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances"
body = {
"name": "quickstart-deployment-vm",
"zone": " us-central1-f",
"machineType": "f1-micro",
...
}]
bodyContentURLEncoded = urllib.urlencode(bodyContent)
http.request(uri=url, method="POST", body=body)
Can someone explain to me what benefits I get using templates?
readability\easy of use\authentication handled for you\no need to be a coder\etc. There can be many advantages, it really depends on how you look at it. It depends on your background\tools you use.
It might be more beneficial to use python all the way for you specifically.
It's easier to use templates and you get a lot of builtin functionality such as running a validation on your template to scan for possible security vulnerabilities and similar. You can also easily delete your infra using the same template as you create it. FWIW, I've gone all the way with templates and do as much as I can with templates and in smaller units. It makes it easy to move out a part of the infra or duplicate it to another project, using a pipeline in GitLab to deploy it for example.
The reason to use templates over API calls is that templates can be used in use cases where a deterministic outcome is required.
Both Template and API call has its own benefits. There is always a tradeoff between the two options. If you want more flexibility in the deployment, then the API call suits you better. On the other hand, if the security and complete revision is your priority, then Template should be your choice. Details can be found in this online documentation.
When using a template, orchestration of the deployment is handled by the platform. When using API calls (or other imperative approaches) you need to handle orchestration.

Operator or Helm chart for MongoDB replicas

What is the pro/cons of using an operator (like https://github.com/kbst/mongodb) to manage mongodb inside k8s over using Helm chart (like https://github.com/helm/charts/tree/master/stable/mongodb-replicaset) ?
The operator you linked to does not appear to be very useful (or well documented), so please consider my answer a more general one...
Technically speaking all a Helm chart can do is use existing Kubernetes primitives, e.g., StatefulSet, Service, Deployment, and so forth.
But sometimes we need more custom/specialized tools that are more aware of specifically what they control and are responsible to run.
So for example, a MySQL operator might make it easier to take (reliable) backups or reliably restore the DB from those backups -- something specific to MySQL that Kubernetes doesn't (and shouldn't) know anything about.
Another example would be scaling-up; some distributed systems require steps beyond just running a new container in order for that container to join an existing cluster.
The operator can take care of that, whereas Helm/Tiller provide no such tools (and are not meant/designed to).
Hope this helps!

How do micro services in Cloud Foundry communicate?

I'm a newbie in Cloud Foundry. In following the reference application provided by Predix (https://www.predix.io/resources/tutorials/tutorial-details.html?tutorial_id=1473&tag=1610&journey=Connect%20devices%20using%20the%20Reference%20App&resources=1592,1473,1600), the application consisted of several modules and each module is implemented as micro service.
My question is, how do these micro services talk to each other? I understand they must be using some sort of REST calls but the problem is:
service registry: Say I have services A, B, C. How do these components 'discover' the REST URLs of other components? As the component URL is only known after the service is pushed to cloud foundry.
How does cloud foundry controls the components dependency during service startup and service shutdown? Say A cannot start until B is started. B needs to be shutdown if A is shutdown.
The ref-app 'application' consists of several 'apps' and Predix 'services'. An app is bound to the service via an entry in the manifest.yml. Thus, it gets the service endpoint and other important configuration information via this binding. When an app is bound to a service, the 'cf env ' command returns the needed info.
There might still be some Service endpoint info in a property file, but that's something that will be refactored out over time.
The individual apps of the ref-app application are put in separate microservices, since they get used as components of other applications. Hence, the microservices approach. If there were startup dependencies across apps, the CI/CD pipeline that pushes the apps to the cloud would need to manage these dependencies. The dependencies in ref-app are simply the obvious ones, read-on.
While it's true that coupling of microservices is not in the design. There are some obvious reasons this might happen. Language and function. If you have a "back-end" microservice written in Java used by a "front-end" UI microservice written in Javascript on NodeJS then these are pushed as two separate apps. Theoretically the UI won't work too well without the back-end, but there is a plan to actually make that happen with some canned JSON. Still there is some logical coupling there.
The nice things you get from microservices is that they might need to scale differently and cloud foundry makes that quite easy with the 'cf scale' command. They might be used by multiple other microservices, hence creating new scale requirements. So, thinking about what needs to scale and also the release cycle of the functionality helps in deciding what comprises a microservice.
As for ordering, for example, the Google Maps api might be required by your application so it could be said that it should be launched first and your application second. But in reality, your application should take in to account that the maps api might be down. Your goal should be that your app behaves well when a dependent microservice is not available.
The 'apps' of the 'application' know about each due to their name and the URL that the cloud gives it. There are actually many copies of the reference app running in various clouds and spaces. They are prefaced with things like Dev or QA or Integration, etc. Could we get the Dev front end talking to the QA back-end microservice, sure, it's just a URL.
In addition to the aforementioned, etcd (which I haven't tried yet), you can also create a CUPS service 'definition'. This is also a set of key/value pairs. Which you can tie to the Space (dev/qa/stage/prod) and bind them via the manifest. This way you get the props from the environment.
If micro-services do need to talk to each other, generally its via REST as you have noticed.However microservice purists may be against such dependencies. That apart, service discovery is enabled by publishing available endpoints on to a service registry - etcd in case of CloudFoundry. Once endpoint is registered, various instances of a given service can register themselves to the registry using a POST operation. Client will need to know only about the published end point and not the individual service instance's end point. This is self-registration. Client will either communicate to a load balancer such as ELB, which looks up service registry or client should be aware of the service registry.
For (2), there should not be such a hard dependency between micro-services as per micro-service definition, if one is designing such a coupled set of services that indicates some imminent issues such as orchestrating and synchronizing. If such dependencies do emerge, you will have rely on service registries, health-checks and circuit-breakers for fall-back.

How do you manage per-environment data in Docker-based microservices?

In a microservice architecture, I'm having a hard time grasping how one can manage environment-specific config (e.g. IP address and credentials for database or message broker).
Let's say you have three microservices ("A", "B", and "C"), each owned and maintained by a different team. Each team is going to need a team integration environment... where they work with the latest snapshot of their microservice, along with stable versions of all dependency microservices. Of course, you'll also need QA/staging/production environments as well. A simplified view of the big picture would look like this:
"Microservice A" Team Environment
Microservice A (SNAPSHOT)
Microservice B (STABLE)
Microservice C (STABLE)
"Microservice B" Team Environment
Microservice A (STABLE)
Microservice B (SNAPSHOT)
Microservice C (STABLE)
"Microservice C" Team Environment
Microservice A (STABLE)
Microservice B (STABLE)
Microservice C (SNAPSHOT)
QA / Staging / Production
Microservice A (STABLE, RELEASE, etc)
Microservice B (STABLE, RELEASE, etc)
Microservice C (STABLE, RELEASE, etc)
That's a lot of deployments, but that problem can be solved by a continuous integration server and perhaps something like Chef/Puppet/etc. The really hard part is that each microservice would need some environment data particular to each place in which it's deployed.
For example, in the "A" Team Environment, "A" needs one address and set of credentials to interact with "B". However, over in the "B" Team Environment, that deployment of "A" needs a different address and credentials to interact with that deployment of "B".
Also, as you get closer to production, environmental config info like this probably needs security restrictions (i.e. only certain people are able to modify or even view it).
So, with a microservice architecture, how to you maintain environment-specific config info and make it available to the apps? A few approaches come to mind, although they all seem problematic:
Have the build server bake them into the application at build-time - I suppose you could create a repo of per-environment properties files or scripts, and have the build process for each microservice reach out and pull in the appropriate script (you could also have a separate, limited-access repo for the production stuff). You would need a ton of scripts, though. Basically a separate one for every microservice in every place that microservice can be deployed.
Bake them into base Docker images for each environment - If the build server is putting your microservice applications into Docker containers as the last step of the build process, then you could create custom base images for each environment. The base image would contain a shell script that sets all of the environment variables you need. Your Dockerfile would be set to invoke this script prior to starting your application. This has similar challenges to the previous bullet-point, in that now you're managing a ton of Docker images.
Pull in the environment info at runtime from some sort of registry - Lastly, you could store your per-environment config inside something like Apache ZooKeeper (or even just a plain ol' database), and have your application code pull it in at runtime when it starts up. Each microservice application would need a way of telling which environment it's in (e.g. a startup parameter), so that it knows which set of variable to grab from the registry. The advantage of this approach is that now you can use the exact same build artifact (i.e. application or Docker container) all the way from the team environment up to production. On the other hand, you would now have another runtime dependency, and you'd still have to manage all of that data in your registry anyway.
How do people commonly address this issue in a microservice architecture? It seems like this would be a common thing to hear about.
Docker compose supports extending compose files, which is very useful for overriding specific parts of your configuration.
This is very useful at least for development environments and may be useful in small deployments too.
The idea is having a base shared compose file you can override for different teams or environments.
You can combine that with environment variables with different settings.
Environment variables are good if you want to replace simple values, if you need to make more complex changes then you use an extension file.
For instance, you can have a base compose file like this:
# docker-compose.yml
version: '3.3'
services:
service-a:
image: "image-name-a"
ports:
- "${PORT_A}"
service-b:
image: "image-name-b"
ports:
- "${PORT_B}"
service-c:
image: "image-name-c"
ports:
- "${PORT_C}"
If you want to change the ports you could just pass different values for variables PORT_X.
For complex changes you can have separate files to override specific parts of the compose file. You can override specific parameters for specific services, any parameter can be overridden.
For instance you can have an override file for service A with a different image and add a volume for development:
# docker-compose.override.yml
services:
service-a:
image: "image-alternative-a"
volumes:
- /my-dev-data:/var/lib/service-a/data
Docker compose picks up docker-compose.yml and docker-compose.override.yml by default, if you have more files, or files with different names, you need to specify them in order:
docker-compose -f docker-compose.yml -f docker-compose.dev.yml -f docker-compose.dev-service-a.yml up -d
For more complex environments the solution is going to depend on what you use, I know this is a docker question, but nowadays it's hard to find pure docker systems as most people use Kubernetes. In any case you are always going to have some sort of secret management provided by the environment and managed externally, then from the docker side of things you just have variables that are going to be provided by that environment.