why airflow does not have features for building dags and tasks with UIs like drag n drop? - workflow

I have been using apache airflow for months.
and also have experiences with GCP composer, AWS data pipeline, and Glue which is managed services
In Airflow, I know that DAGs and tasks are written in Python, not using GUI. UI for Airflow is not for building dags and tasks. However, there are many pipeline solutions(ex. AWS data pipeline, Glue, etc.) that have features for building those dags and tasks with UI like drag n drop or something else with minimum coding.
Can someone explain why those capabilties are not needed in Airflow?
Thanks

Airflow pipelines are configuration as code.
Advantages of configuration as code:
Automation and standardisation
Versioning of changes
Traceability of changes
Coding assistance and validation

Related

Kubernetes Preview environments

I would like to ask what people use to provision an ephemeral preview environment in AWS EKS for your service under test. Also in addition, I am curious to know how you provision any dependent services (such as Database).
E.g. I am working on a back-end service and would like to deploy an isolated ephemeral version of this service packaged from my feature branch, including the database. Furthermore, I would also like copy of a front-end service in my isolated environment to test my back-end.
Any thoughts would be appreciated
Thanks
Sachin
You can roll your own solution: by wiring together your own CI/CD (Jenkins, CircleCI, BuildKite, Github Actions, etc) solution to trigger building and deploying of a preview environment by tying in to webhooks on your source repository. This would have to include your building of the modified code, then deploying that code to some staging environments, then of course seeding those environments with some type of data.
There is a bit of nuance to getting this right. You should check out https://ephemeralenvironments.io/ which is a good template of what needs to go in to these environments.
A lot of other folks use services that provide this as a SaaS platform, Shipyard.build, Release, and Velocity.tech are a few of your options.
Disclaimer: I'm on the Operations team at Shipyard
Hope this helps!

AWS Glue - version control and setting up for continuous integration

We are in the process of setting up the CI / CD process for AWS Glue ETL Process. The existing ETL process contains the following AWS Glue Components - Crawlers, Registered tables in catalog, Jobs, Triggers and workflows.
Obviously the first step is to set up a code repository and link the existing artifacts from different components mentioned above to the repository, which will ideally need to facilitate the developers in performing the check-ins and pull request from the tool (Something similar to ADF and Databricks). However as far as we have explored, AWS glue does not have integration to any of the source code repository which can directly provide this feature unless we are missing something.
Hence what is the method to setup the environment for CI (I'm still not talking about CD), the below link gives a reference for CI/CD:
https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-serverless-aws-glue-etl-applications-using-aws-developer-tools/
However it mentions at the beginning that, AWS CloudFormation template file for deploying the ETL jobs are both committed to version control - so not clear on how this is done for the on-going regular commits from the developers.
However as far as we have explored, AWS glue does not have integration
to any of the source code repository which can directly provide this
feature unless we are missing something.
Correct, Glue does not have VC integration.
I develop (python and cloudformation) locally on vscode and use it's git integration plugin. And I use a container if I want to test something locally, but Glue also has a Dev Endpoint for similar tasks.

How to manage software updates on docker-compose with one machine per user architecture?

We are deploying a Java backend and React UI application using docker-compose. Our Docker containers are running Java, Caddy, and Postgres.
What's unusual about this architecture is that we are not running the application as a cluster. Each user gets their own server with their own subdomain. Everything is working nicely, but we need a strategy for managing/updating machines as the number of users grows.
We can accept some down time in the middle of the night, so we don't need to have high availability.
We're just not sure what would be the best way to update software on all machines. And we are pretty new to Docker and have no experience with Kubernetes or Ansible, Chef, Puppet, etc. But we are quick to pick things up.
We expect to have hundreds to thousands of users. Each machine runs the same code but has environment variables that are unique to the user. Our original provisioning takes care of that, so we do not anticipate having to change those with software updates. But a solution that can also provide that ability would not be a bad thing.
So, the question is, when we make code changes and want to deploy the updated Java jar or the React application, what would be the best way to get those out there in an automated fashion?
Some things we have considered:
Docker Hub (concerns about rate limiting)
Deploying our own Docker repo
Kubernetes
Ansible
https://containrrr.dev/watchtower/
Other things that we probably need include GitHub actions to build and update the Docker images.
We are open to ideas that are not listed here, because there is a lot we don't know about managing many machines running docker-compose. So please feel free to offer suggestions. Many thanks!
In your case I advice you to use Kubernetes combination with CD tools. One of it is Buddy. I think it is the best way to make such updates in an automated fashion. Of course you can use just Kubernetes, but with Buddy or other CD tools you will make it faster and easier. In my answer I am describing Buddy but there are a lot of popular CD tools for automating workflows in Kubernetes like for example: GitLab or CodeFresh.io - you should pick which one is actually best for you. Take a look: CD-automation-tools-Kubernetes.
With Buddy you can avoid most of these steps while automating updates - (executing kubectl apply, kubectl set image commands ) by doing a simple push to Git.
Every time you updates your application code or Kubernetes configuration, you have two possibilities to update your cluster: kubectl apply or kubectl set image.
Such workflow most often looks like:
1. Edit application code or configuration .YML file
2. Push changes to your Git repository
3. Build an new Docker image
4. Push the Docker image
5. Log in to your K8s cluster
6. Run kubectl apply or kubectl set image commands to apply changes into K8s cluster
Buddy is a CD tool that you can use to automate your whole K8s release workflows like:
managing Dockerfile updates
building Docker images and pushing them to the Docker registry
applying new images on your K8s cluster
managing configuration changes of a K8s Deployment
etc.
With Buddy you will have to configure just one pipeline.
With every change in your app code or the YAML config file, this tool will apply the deployment and Kubernetes will start transforming the containers to the desired state.
Pipeline configuration for running Kubernetes pods or jobs
Assume that we have application on a K8s cluster and the its repository contains:
source code of our application
a Dockerfile with instructions on creating an image of your app
DB migration scripts
a Dockerfile with instructions on creating an image that will run the migration during the deployment (db migration runner)
In this case, we can configure a pipeline that will:
1. Build application and migrate images
2. Push them to the Docker Hub
3. Trigger the DB migration using the previously built image. We can define the image, commands and deployment and use YAML file.
4. Use either Apply K8s Deployment or Set K8s Image to update the image in your K8s application.
You can adjust above workflow properly to your environment/applications properties.
Buddy supports GitLab as a Git provider. Integration of these two tools is easy and only requires authorizing GitLab in your profile. Thanks to this integration you can create pipelines that will build, test and deploy your app code to the server. But of course if you are using GitLab there is no need to set up Buddy as an extra tool because GitLab is also CD tools tool for automating workflows in Kubernetes.
More information you can find here: buddy-workflow-kubernetes.
Read also: automating-workflows-kubernetes.
As it turns out, we found that a paid Docker Hub plan addressed all of our needs. I appreciate the excellent information from #Malgorzata.

Creating kubernetes deployment in gitlab pipeline

I have a private gitlab instance with multiple projects and Gitlab CI enabled. The infrastructure is provided by Google Cloud Platform and Gitlab Pipeline Runner is configured in Kubernetes cluster.
This setup works very well for basic pipelines running tests etc. Now I'd like to start with CD and to do that I need some manual acceptance on the pipeline which means the person reviewing it needs to have the access to the current state of the app.
What I'm thinking is having a kubernetes deployment for the pipeline that would be executed once you try to access it (so we don't waste cluster resources) and would be destroyed once the reviewer accepts the pipeline or after some threshold.
So the deployment would be executed in the same cluster as Gitlab Runner (or different?) and would be accessible by unique URI (we're mostly talking about web-server apps) e.g. https://pipeline-58949526.git.mydomain.com
While in theory, it all makes sense to me, I don't really know how to set this up properly.
Does anyone have a similar setup? Is my view on this topic too simple? Let me know!
Thanks
If you want to see how to automate CI/CD with multiple environments on GKE using GitOps for promotion between environments and Preview Environments on Pull Requests you might wanna check out my recent talk on Jenkins X at DevOxx UK where I do a live demo of this on GKE.

How should I manage deployments with kubernetes

I am hoping to find a good way to automate the process of going from code to a deployed application on my kubernetes cluster.
In order to build and deploy my app I need to first build the docker image, tag it, and then push it to ECR. I then need to update my deployment.yaml with the new tag for the docker image and run the deployment with kubectl apply -f deployment.yaml.
This will go and perform a rolling deployment on the kubernetes cluster updating the pods to the new version of the container image, once this deployment has completed I may need to do other application specific things such as running database migrations, or cache clear/warming which may or may not need to run for a given deployment.
I suppose I could just write a shell script that runs all of these commands, and run it whenever I want to start up a new deployment, but I am hoping there is a better/industry standard way to solve these problems that I have missed.
As I was writing this question I noticed stackoverflow recommend this question: Kubernetes Deployments. One of the answers to it seems to imply at least some of what I am looking for is coming soon to kubernetes, but I want to make sure that if there is a better solution I could be using now that I at least know about it.
My colleague has a good blog post about this topic:
http://blog.jonparrott.com/building-a-paas-on-kubernetes/
Basically, Kubernetes is not a Platform-as-a-Service, it's a toolkit on which you can build your own Platform-a-as-Service. It's not very opinionated by design, instead it focuses on solving some tricky problems with scheduling, networking, and coordinating containers, and lets you layer in your opinions on top of it.
One of the simplest ways to automate the workflows you're describing is using a Makefile.
A step up from that, you can design your own miniature PaaS, which the author of the first blog post did here:
https://github.com/jonparrott/noel
Or, you could get involved in more sophisticated efforts to build an open source PaaS on Kubernetes, like OpenShift:
https://www.openshift.com/
or Deis, which is building a Heroku-like platform on Kubernetes:
https://deis.com/
or Redspread, which is building "Git for Kubernetes cluster":
https://redspread.com/
and there are many other examples of people building PaaS on top of Kubernetes. But I think it will be a long time, if ever, that there is an "industry standard" way to deploy to Kubernetes, since half the purpose is to enable multiple deployment workflows for different use cases.
I do want to note that as far as building container images, Google Cloud Container Builder can be a useful tool, since you can do things like use it to automatically build an image any time you push to a repository which could then get deployed. Alternatively, Jenkins is a popular way to automate CI/CD flows with Kubernetes.
I suppose I could just write a shell script that runs all of these commands, and run it whenever I want to start up a new deployment, but I am hoping there is a better/industry standard way to solve these problems that I have missed.
The company I work for (Weaveworks) and other folks in the space had been advocating for an approach that we call GitOps, please take a look at our series of blog posts covering the topic:
GitOps - Operations by Pull Request
The GitOps Pipeline - Part 2
GitOps Part 3 - Observability
Storing Secure Sealed Secrets using GitOps
The gist of it is that you push images from CI, your checked YAML manifests in git (usually different repo from app code). This repo with manifests is then applied to each of your clusters (dev/prod) by a reconciliation operator. You can automate it all yourself quite easily, but also do take a look at what we have built.
Disclaimer: I am a Kubernetes contributor and Weaveworks employee. We build open-source and commercial tools that help people to get to production with Kubernetes sooner.
We're working on an open source project called Jenkins X which is a proposed sub project of the Jenkins foundation aimed at automating CI/CD on Kubernetes using Jenkins and GitOps for promotion.
When you merge a change to the master branch, Jenkins X creates a new semantically versioned distribution of your app (pom.xml, jar, docker image, helm chart). The pipeline then automates the generation of Pull Requests to promote your application through all of the Environments via GitOps.
Here's a demo of how to automate CI/CD with multiple environments on Kubernetes using GitOps for promotion between environments and Preview Environments on Pull Requests - using Spring Boot and nodejs apps (but we support many languages + frameworks).