How to cache resources that haven't changed rather than rebuild or delete? - pulumi

I have a pulumi repository setup for an AWS project such that I have a directory of services
Each service has its own docker file and application code (i.e. node or go micro service).
There is a pulumi script in the root index.ts that currently scans the services directory for directories with directory name matching the pattern: *-service.
For each service directory a fargateType ECS service is created.
These services are then added to their own target group and attached to an Application Load Balancer using a ALB listener with path based routing condition so that
/user/* -> user service /recommendation/* -> recommendation service /chat/* -> chat service ...etc
This is all working fine and dandy!!
The only issue is I wish to build a git pipeline with incremental builds... Meaning If there is no diff to the user-service I do not want to build the docker image or have pulumi calculate a diff of aws resources I want to skip all that without deleting the resource... It would be simple enough to just check to see if the file has been modified either using git to see what files have changed since last commit, or use a checksum.
I can do that but currently pulumi will delete those resources if they are skipped in the "pulumi up" script.
I would like to do this without creating a separate stack for each service, as it is convenient to reproduce the entire environment by creating a single new stack for all resources.
I want those resources to stay as they were if there is no change without pulumi having to create all those resources.


Deploying and Update Process of On Premise Kubernetes Environment Application

We are developing a microservice based system that is orchestrated using Kubernetes. Part of our use case is supplying our clients an On-Premise installation where they receive an Image (VMDK / QCOW2) with all the system deployed.
One of our main challenges is handling the update process of such system, currently the plan is to have an API endpoint that will receive an encrypted and signed package that will contain all the images and a certain update shell script. The API endpoint will start an asynchronous process that will extract the images and execute the shell script that eventually should call the Kubernetes to update all the images with the new code.
The question is where this API endpoint should be defined?
Be in a special "Maintenance" service that will be outside of the Kubernetes and control it, this service will be updated last in case it's code should be also updated.
Be part of one of the microservices containers that run inside Kubernetes - but then this image can be part of the updated images so any API that should return the update status can be un-available
What is the common way to export an interface to System Update or System Deployment wizard processes?

Copying directories into minikube and persisting them

I am trying to copy some directories into the minikube VM to be used by some of the pods that are running. These include API credential files and template files used at run time by the application. I have found you can copy files using scp into the /home/docker/ directory, however these files are not persisted over reboots of the VM. I have read files/directories are persisted if stored in the /data/ directory on the VM (among others) however I get permission denied when trying to copy files to these directories.
Are there:
A: Any directories in minikube that will persist data that aren't protected in this way
B: Any other ways of doing the above without running into this issue (could well be going about this the wrong way)
To clarify, I have already been able to mount the files from /home/docker/ into the pods using volumes, so it's just the persisting data I'm unclear about.
Kubernetes has dedicated object types for these sorts of things. API credential files you might store in a Secret, and template files (if they aren't already built into your Docker image) could go into a ConfigMap. Both of them can either get translated to environment variables or mounted as artificial volumes in running containers.
In my experience, trying to store data directly on a node isn't a good practice. It's common enough to have multiple nodes, to not directly have login access to those nodes, and for them to be created and destroyed outside of your direct control (imagine an autoscaler running on a cloud provider that creates a new node when all of the existing nodes are 90% scheduled). There's a good chance your data won't (or can't) be on the host where you expect it.
This does lead to a proliferation of Kubernetes objects and associated resources, and you might find a Helm chart to be a good resource to tie them together. You can check the chart into source control along with your application, and deploy the whole thing in one shot. While it has a couple of useful features beyond just packaging resources together (a deploy-time configuration system, a templating language for the Kubernetes YAML itself) you can ignore these if you don't need them and just write a bunch of YAML files and a small control file.
For minikube, data kept in $HOME/.minikube/files directory is copied to / directory in VM host by minikube.

How to handle recurring short-lived tasks with Kubernetes

I have a setup with a webserver (NGINX) and a react-based frontend that uses webpack to build the final static sources.
The webserver has its own kubernetes deployment + service.
The frontend needs to be build before the webserver can serve the static html/js/css files - but after that, the pod/container can stop.
My idea was to share a volume between the webserver and the frontend pod. The frontend will write the generated files to the volume and the webserver can serve them from there. Whenever there is an update to the frontend sourcecode, the files need to be regenerated.
What is the best way to accomplish that using kubernetes tools?
Right now, I'm using a init-container to build - but this leads to a restart of the webserver pod as well, which wouldn't be neccessary.
Is this the best/only solution to this problem or should I use kubernetes' jobs for this kind of tasks?
There are multiple ways to do this. Here's how I think about this:
Option 1: The static files represent built source code
In this case, the static files that you want to serve should actually be packaged and built into the docker image of your nginx webserver (in the html directory say). When you want to update your frontend, you update the version of the image used and update the pod.
Option 2: The static files represent state
In this case, your approach is correct. Your 'state' (like a database) is stored in a folder. You then run an init container/job to initialise 'state' and then your webserver pod works fine.
I believe option 1 to be better for 2 reasons:
You can horizontally scale your webserver trivially by increasing the pod replica number. In option 2, you're actually dealing with state so that's a problem when you want to add more nodes to your underlying k8s cluster (you'll have to copy files/folders from one volume/folder to another).
The static files are actually the source code of your app. These are not uploaded media files or similar. In this case, it absolutely makes sense to make them a part of your docker image. Otherwise, it kind of defeats that advantage of containerising and deploying.
Jobs, Init containers, or alternatively a gitRepo type of Volume would work for you.
It is not clear in your question why you want to update the static content without simply re-deploying / updating the Pod.
Since somewhere, somehow, you have to build the webserver Docker image, it seems best to build the static content into the image: no moving parts once deployed, no need for volumes or storage. Overall it is simpler.
If you use any kind of automation tool for Docker builds, it's easy.
I personally use Jenkins to build Docker images based on a hook from git repo, and the image is simply rebuilt and deployed whenever the code changes.
Running a Job or Init container doesn't gain you much: sure the web server keeps running, but it's as easy to have a Deployment with rolling updates which will deploy the new Pod before the old one is torn down and you server will always be up too.
Keep it simple...

Best practice to deploy wso2 esb policies

I have setup an ESB cluster using jdbc connections to ms sql databases for local and remotely mounted config and gov registries. 1x mgt and 2xworker
Our .car file contains some ws-security policy artifacts which go to config. When I deploy to mgt it deploys OK. I have SVN dep sync setup to the cluster and when it picks up the .car it starts to deploy on the worker but fails when loading the policy files into conf. It is trying to duplicate the policy in the shared conf and fails - of course that is right but; how should I deploy these 'shared' artifacts when a .car file is distributed by svn? I need to be able to control the deploy properly. The only way I can see is via the dev studio which is terrible for our change management practice.
Thanks for you help.
I can recommend multiple solutions. You can decide what to choose from them.
Since you have only 2 worker nodes, you can get rid of (disable) deployment synchronization and deploy the car files to all the nodes. I believe you have some automated process, so it wont be a problem to deploy to all nodes. While doing so, modify your project to bundle the policies to a separate car file and the services to another. When deploying, you deploy the policies only to management node and the services to all nodes.
Second option is to, add the policies to local registry. i.e. Not the config registry, not the governance registry. Then, when you deploy the car to the management node, it will add the policies to local registry of the management node. When the car file is dep-synced, worker nodes will deploy them and they will add the policies to their local registry. This will avoid the worker nodes trying to add the policies to the same location.
By going through the question, I felt you have external databases to the local registry too. But, its not necessary. You can use the internal H2 database for the local registry. H2 databases sometimes get corrupted. If such a thing happens, all you have to do is, delete the H2 database and restart the server with -Dsetup option. Having an external DB is fine. But, thats an overkill.

Using Ansible to automatically configure AWS autoscaling group instances

I'm using Amazon Web Services to create an autoscaling group of application instances behind an Elastic Load Balancer. I'm using a CloudFormation template to create the autoscaling group + load balancer and have been using Ansible to configure other instances.
I'm having trouble wrapping my head around how to design things such that when new autoscaling instances come up, they can automatically be provisioned by Ansible (that is, without me needing to find out the new instance's hostname and run Ansible for it). I've looked into Ansible's ansible-pull feature but I'm not quite sure I understand how to use it. It requires a central git repository which it pulls from, but how do you deal with sensitive information which you wouldn't want to commit?
Also, the current way I'm using Ansible with AWS is to create the stack using a CloudFormation template, then I get the hostnames as output from the stack, and then generate a hosts file for Ansible to use. This doesn't feel quite right – is there "best practice" for this?
Yes, another way is just to simply run your playbooks locally once the instance starts. For example you can create an EC2 AMI for your deployment that in the rc.local file (Linux) calls ansible-playbook -i <inventory-only-with-localhost-file> <your-playbook>.yml. rc.local is almost the last script run at startup.
You could just store that sensitive information in your EC2 AMI, but this is a very wide topic and really depends on what kind of sensitive information it is. (You can also use private git repositories to store sensitive data).
If for example your playbooks get updated regularly you can create a cron entry in your AMI that runs every so often and that actually runs your playbook to make sure your instance configuration is always up to date. Thus avoiding having "push" from a remote workstation.
This is just one approach there could be many others and it depends on what kind of service you are running, what kind data you are using, etc.
I don't think you should use Ansible to configure new auto-scaled instances. Instead use Ansible to configure a new image, of which you will create an AMI (Amazon Machine Image), and order AWS autoscaling to launch from that instead.
On top of this, you should also use Ansible to easily update your existing running instances whenever you change your playbook.
There are a few ways to do this. First, I wanted to cover some alternative ways.
One option is to use Ansible Tower. This creates a dependency though: your Ansible Tower server needs to be up and running at the time autoscaling or similar happens.
The other option is to use something like and build fully-functioning server AMIs. You can install all your code into these using Ansible. This doesn't have any non-AWS dependencies, and has the advantage that it means servers start up fast. Generally speaking building AMIs is the recommended approach for autoscaling.
Ansible Config in S3 Buckets
The alternative route is a bit more complex, but has worked well for us when running a large site (millions of users). It's "serverless" and only depends on AWS services. It also supports multiple Availability Zones well, and doesn't depend on running any central server.
I've put together a GitHub repo that contains a fully-working example with Cloudformation. I also put together a presentation for the London Ansible meetup.
Overall, it works as follows:
Create S3 buckets for storing the pieces that you're going to need to bootstrap your servers.
Save your Ansible playbook and roles etc in one of those S3 buckets.
Have your Autoscaling process run a small shell script. This script fetches things from your S3 buckets and uses it to "bootstrap" Ansible.
Ansible then does everything else.
All secret values such as Database passwords are stored in CloudFormation Parameter values. The 'bootstrap' shell script copies these into an Ansible fact file.
So that you're not dependent on external services being up you also need to save any build dependencies (eg: any .deb files, package install files or similar) in an S3 bucket. You want this because you don't want to require or similar to be up and running for your Autoscale bootstrap script to be able to run. Generally speaking I've tried to only depend on Amazon services like S3.
In our case, we then also use AWS CodeDeploy to actually install the Rails application itself.
The key bits of the config relating to the above are:
S3 Bucket Creation
Script that copies things to S3
Script to copy Bootstrap Ansible. This is the core of the process. This also writes the Ansible fact files based on the CloudFormation parameters.
Use the Facts in the template.