Complex environments with Ansible - deployment

I've read this question and this guide, but I still don't understand how to properly model our environment with ansible.
Our environment:
Prod
A, B, C, D, E, F, G, each installed on its own server, plus a HA-backup
Pre-Prod
A, B, C, D, E, F, G, each installed on its own server
External Dev Instances (5 instances, sometimes more)
A, B, C, D, E, all installed on a single VM
F, G on a shared environment for ext-dev
Internal Dev Instances (50+ instances)
A, B, C, D, E, F, G all installed on a single VM
Each of the parts A-G have lots of configuration key/value pairs spread over ~30 files, e.g. consisting of url/db-connection-strings/..., the hostnames of the other instances (A requires the name of hostname/port of B, etc) or various performance settings.
Many of the config key/value pairs are shared for dev instances, but not all. All external instances (Dev, Pre-Prod, Prod) share some further pairs.
How should I structure my environment so I can place all keys at the proper level , e.g. prod, ext-dev, in a way that I don't have to repeat the shared keys multiple times?
The existing answers above all seem to only work for simpler environments. And even if I create a complex structure like this:
env/
prod/
group_vars/
all.yml
a.yml
b.yml
c.yml
d.yml
e.yml
f.yml
g.yml
hosts
pre_prod/
group_vars/
all.yml
a.yml
b.yml
c.yml
d.yml
e.yml
f.yml
g.yml
hosts
ext_dev/
group_vars/
all.yml
abcd.yml
fg.yml
hosts
int_dev/
group_vars/
all.yml
abcdefg.yml
hosts
roles/... [not important for this part]
playbook.yml
I cannot seem to properly set up the main playbook.yml file such that everything is correctly mapped. But I'm not sure what I am doing wrong, as this seems (to me) the proper mapping of the setup described in the two resources above to our environment.
Especially since some config items are instance-specific, some prod/dev/.. specific, some global.
How should I set up our ansible project structure so it can handle our complex environment?

"How should I structure my environment so I can place all keys at the proper level , e.g. prod, ext-dev, in a way that I don't have to repeat the shared keys multiple times?"
You shouldn't. The keys might have the same values now, but you cannot guarantee that they will always be the same in the future.
You will be better off separating your environments completely.

Related

How to know to upgrade my pod when a container is updated?

I have a pod with 3 containers (ca, cb, cc).
This container is owned by TeamA. Team A creates and owns ca.the other two containers are developed by two other teams. TeamB for cb and TeamC for cc.
Both cb and cc also run independently (outside the TeamA pod) as services within this same cluster.
How can TeamA pod find out when cb or cc deploy a newer version in the cluster and how can we ensure that it triggers a refresh?
As you may have guesses cb and cc are services that ca relies on heavily and they are also services in their own right.
Is there a way to ensure that TeamA pod keeps cb and cc updated whenever TeamB and TeamC deploy new versions to the cluster?
This is not a task for Kubernetes. You should configure that in your CI/CD tool. For example, whenever a new commit is pushed to service A, it will first trigger the pipeline for A, and then trigger corresponding pipelines for services B and C. Every popular CI/CD system has this ability and this is how it's normally done.
Proper CI/CD tests will also protect you from mistakes. If service A update breaks compatibility with B and C, your pipeline should fail and notify you about that.
There's no one specific answer. https://github.com/jetstack/version-checker provides a metrics/alerting approach. Numerous kubectl plugins give a CLI reporting approach. Stuff like https://fluxcd.io/docs/guides/image-update/ can do upgrades automatically within certain parameters.
Or maybe just actually talk to your coworkers, this is both a technological and social problem and you'll need answers from both sides.

How to use deploy keys to automatically clone the same repository on multiple servers?

I have created a deploy key from a server instance, say host A for repository R1. Now I am scaling to 3 servers, i.e. to host B and host C and want to access R1 from them.
How to use deploy keys when it comes to multiple server instances? Do I have to copy the deploy key to other servers or do I have to generate new ones on host B and host C and add them to the GitHub repository settings?
Which of them is an ideal way for supporting auto scalability?
If you have a central service whose job it is to deploy code, then the best approach is to use one deploy key for that service and control your deployments using that service. However, it doesn't sound like that's what's going on here.
Absent that, you generally want to use one deploy key for each environment that you're deploying to, not each server. For example, if you have code that needs to be deployed to production application servers as well as developer shell servers, then use a separate deploy key for each environment and copy each environment's key to all of the servers in that environment. That way, you can later restrict access to the repository if a certain environment no longer needs it without impacting other environments.
Using a separate key for each environment means that if you have deployment scripts that need to work with multiple deploy keys (for multiple repositories), they'll function in the same way on every server in that deployment.
You should make sure that your deploy keys are well labeled so that future you (or your successor) can figure out at a glance what each key is for.

Publishing metadata to Service Fabric

So, I have this idea I'm working on, where services on some nodes need to discover other services dynamically at runtime, based on metadata that they might publish. And I'm trying to figure out the best way to go about this.
Some of this metadata will be discovered from the local machine at runtime, but it then has to be published to the Fabric so that other services can make decisions on it.
I see the Extension stuff in the ServiceManifests. This is a good start. But it doesn't seem like you can alter or add extensions at runtime. That would be nice!
Imagine my use case. I have a lot of machines on a Fabric, with a lot of services deployed to them. What I'm advertising is the audio codecs that a given machine might support. Some nodes have DirectShow. So, they would publish the local codecs available. Some machines are running 32 bit services, and publish the 32 bit DirectShow codecs they have (this is actually what I need, since I have some proprietary ACM codecs that only run in 32 bit). Some machines are Linux machines, and want to make available their GStreamer codecs.
Each of these needs to publish the associated metadata about what they can do, so that other services can string together from that metadata a graph about how to process a given media file.
And then each will nicely report their health and load information, so the fabric can determine how to scale.
Each of these services would support the same IService interface, but each would only be used by clients that decided to use them based on the published metadata.
In Service Fabric the way to think about this kind of problem is from a service point of view, rather than a machine point of view. In other words, what does each service in the cluster support, rather than what does each machine support. This way you can use a lot of Service Fabric's built-in service discovery and querying stuff, because the abstraction the platform provides is really about services more than it is about machines.
One way you can do this is with placement constraints and service instances representing each codec that the cluster supports as a whole. What that means is that you'll have an instance of a service representing a codec that only runs on machines that support that codec. Here's a simplified example:
Let's say I have a Service Type called "AudioProcessor" which does some audio processing using whatever codec is available.
And let's I have 5 nodes in the cluster, where each node supports one of codecs A, B, C, D, and E. I will mark each node with a node property corresponding to the codec it supports (a node property can just be any string I want). Note this assumes I, the owner of the cluster, know the codecs supported by each machine.
Now I can create 5 instances of the AudioProcessor Service Type, one for each codec. Because each instance gets a unique service name that is in URI format, I can create a hierarchy with the codec names in it for discovery through Service Fabric's built-in Naming Service and querying tools, e.g., "fabric:/AudioApp/Processor/A" for codec A. Then I use a placement constraint for each instance that corresponds to the node property I set on each node to ensure the codec represented by the service instance is available on the node.
Here's what all this looks like when everything is deployed:
Node 1 - Codec: A Instance: fabric/AudioApp/Processor/A
Node 2 - Codec: B Instance: fabric/AudioApp/Processor/B
Node 3 - Codec: C Instance: fabric/AudioApp/Processor/C
Node 4 - Codec: D Instance: fabric/AudioApp/Processor/D
Node 5 - Codec: E Instance: fabric/AudioApp/Processor/E
So now I can do things like:
Find all the codecs the cluster supports by querying for a list of AudioProcessor service instances and examining their names (similar to getting a list of URIs in an HTTP API).
Send a processing request to the service that supports codec B by resolving fabric:/AudioApp/AudioProcessor/B
Scale out processing capacity of codec C by adding more machines that support codec C - Service Fabric will automatically put a new "C" AudioProcessor instance on the new node.
Add machines that support multiple codecs. Using multiple node properties on it, Service Fabric will automatically place the correct service instances on it.
The way a consumer thinks about this application now is along the lines of "is there a service that support codec E?" or "I need to talk to service A, C, and D to process this file because they have the codecs I need."

How do you manage per-environment data in Docker-based microservices?

In a microservice architecture, I'm having a hard time grasping how one can manage environment-specific config (e.g. IP address and credentials for database or message broker).
Let's say you have three microservices ("A", "B", and "C"), each owned and maintained by a different team. Each team is going to need a team integration environment... where they work with the latest snapshot of their microservice, along with stable versions of all dependency microservices. Of course, you'll also need QA/staging/production environments as well. A simplified view of the big picture would look like this:
"Microservice A" Team Environment
Microservice A (SNAPSHOT)
Microservice B (STABLE)
Microservice C (STABLE)
"Microservice B" Team Environment
Microservice A (STABLE)
Microservice B (SNAPSHOT)
Microservice C (STABLE)
"Microservice C" Team Environment
Microservice A (STABLE)
Microservice B (STABLE)
Microservice C (SNAPSHOT)
QA / Staging / Production
Microservice A (STABLE, RELEASE, etc)
Microservice B (STABLE, RELEASE, etc)
Microservice C (STABLE, RELEASE, etc)
That's a lot of deployments, but that problem can be solved by a continuous integration server and perhaps something like Chef/Puppet/etc. The really hard part is that each microservice would need some environment data particular to each place in which it's deployed.
For example, in the "A" Team Environment, "A" needs one address and set of credentials to interact with "B". However, over in the "B" Team Environment, that deployment of "A" needs a different address and credentials to interact with that deployment of "B".
Also, as you get closer to production, environmental config info like this probably needs security restrictions (i.e. only certain people are able to modify or even view it).
So, with a microservice architecture, how to you maintain environment-specific config info and make it available to the apps? A few approaches come to mind, although they all seem problematic:
Have the build server bake them into the application at build-time - I suppose you could create a repo of per-environment properties files or scripts, and have the build process for each microservice reach out and pull in the appropriate script (you could also have a separate, limited-access repo for the production stuff). You would need a ton of scripts, though. Basically a separate one for every microservice in every place that microservice can be deployed.
Bake them into base Docker images for each environment - If the build server is putting your microservice applications into Docker containers as the last step of the build process, then you could create custom base images for each environment. The base image would contain a shell script that sets all of the environment variables you need. Your Dockerfile would be set to invoke this script prior to starting your application. This has similar challenges to the previous bullet-point, in that now you're managing a ton of Docker images.
Pull in the environment info at runtime from some sort of registry - Lastly, you could store your per-environment config inside something like Apache ZooKeeper (or even just a plain ol' database), and have your application code pull it in at runtime when it starts up. Each microservice application would need a way of telling which environment it's in (e.g. a startup parameter), so that it knows which set of variable to grab from the registry. The advantage of this approach is that now you can use the exact same build artifact (i.e. application or Docker container) all the way from the team environment up to production. On the other hand, you would now have another runtime dependency, and you'd still have to manage all of that data in your registry anyway.
How do people commonly address this issue in a microservice architecture? It seems like this would be a common thing to hear about.
Docker compose supports extending compose files, which is very useful for overriding specific parts of your configuration.
This is very useful at least for development environments and may be useful in small deployments too.
The idea is having a base shared compose file you can override for different teams or environments.
You can combine that with environment variables with different settings.
Environment variables are good if you want to replace simple values, if you need to make more complex changes then you use an extension file.
For instance, you can have a base compose file like this:
# docker-compose.yml
version: '3.3'
services:
service-a:
image: "image-name-a"
ports:
- "${PORT_A}"
service-b:
image: "image-name-b"
ports:
- "${PORT_B}"
service-c:
image: "image-name-c"
ports:
- "${PORT_C}"
If you want to change the ports you could just pass different values for variables PORT_X.
For complex changes you can have separate files to override specific parts of the compose file. You can override specific parameters for specific services, any parameter can be overridden.
For instance you can have an override file for service A with a different image and add a volume for development:
# docker-compose.override.yml
services:
service-a:
image: "image-alternative-a"
volumes:
- /my-dev-data:/var/lib/service-a/data
Docker compose picks up docker-compose.yml and docker-compose.override.yml by default, if you have more files, or files with different names, you need to specify them in order:
docker-compose -f docker-compose.yml -f docker-compose.dev.yml -f docker-compose.dev-service-a.yml up -d
For more complex environments the solution is going to depend on what you use, I know this is a docker question, but nowadays it's hard to find pure docker systems as most people use Kubernetes. In any case you are always going to have some sort of secret management provided by the environment and managed externally, then from the docker side of things you just have variables that are going to be provided by that environment.

Lite virtualization of processes

I'm trying to figure out (if it is possible) how to run an user process in an insulated context (memory, network and other resources).
Let's assume that the program x is stored into a filesystem of an host machine (h).
I would like to execute x in an insulated hosted context (c) (in other words, without creation of virtual hosted OSs).
The process elaborates output files into is context c. Then i would like to use those files into h context.
I heared about LXC, docker, dockerlite, openvz, etc. but it seems that one must create a container starting from an OS image.
So, shortly, is there a way to run x into c and get results (if any) into h?
Using Docker you could create c (a container) and share a directory from the host (h) where you put your results from x. Please see the volume docs on docs.docker.io.
c doesn't need to contain a full OS image. The busybox base container, for example, is about 2.5MB.