How to ingest the boston housing dataset into Cassandra in Kubernetes? - kubernetes

I am new to Kubernetes and have tried to set up my first cluster using minikube. I have installed Cassandra using helm chart throug the following.
helm install bitnami/cassandra
I have Cassandra running right now on one pod. I would like to explore and understand how I can interact with Cassandra inside my Kubernetes cluster.
My goal right now is therefore to ingest the Boston Housing dataset into Cassandra. And I have tried to read up on how this is done in Kubernetes. Has anyone done anything similar to this? And what is the correct way to ingest data into Cassandra in kubernetes? I have a hard time finding the right information on how to do this. Is it done through jobs?
Would love any tips or insights into this.

Before installing Cassandra via helm, you can fetch it to local current foler via:
$ helm fetch bitnami/cassandra --untar
$ cd cassandra
Then in folder and create job template there, and add to hook annotations to of this template and helm will recognize it as hook not as part of release.
...
annotations:
# This is what defines this resource as a hook. Without this line, the
# job is considered part of the release.
"helm.sh/hook": post-install # It will run after deploying all resources
# Job will be deleted after successfully completed
"helm.sh/hook-delete-policy": hook-succeeded
...
You can see full example template of helm hook in official doc
After adding your hook job template, you can install your chart via:
$ # Make sure you are in cassandra folder
$ pwd
~/cassandra
$ # And install
$ helm install cassandra .
Related more about kubernetes jobs, you can visit official documentation
Hope it helps!

Related

Howto export Helm chart values programmatically and periodically from pod

I thought it to be a simple task but can't find a solution. The goal here is to have the full output of values of an Helm Chart actually installed and used in a k8s cluster. Helm is installed locally so it's easy to do a helm get values but what if I want these values extracted periodically and sent to a third place by a pod running in the same cluster? Like exporting in json and save them in a DB or something.
Can I write a go/python script that run in a pod with Helm installed? Are there some API?

Kubernetes single deployment yaml file for spinning up the application

I am setting up kubernetes for an application with 8 microservices,activemq,postgres,redis and mongodb.
After the entire configuration of pods and deployment ,is there any way to create a single master deployment yaml file which will create the entire set of services,replcas etc for the entire application.
Note:I will be using multiple deployment yaml files,statefulsets etc for all above mentioned services.
You can use this script:
NAMESPACE="your_namespace"
RESOURCES="configmap secret daemonset deployment service hpa"
for resource in ${RESOURCES};do
rsrcs=$(kubectl -n ${NAMESPACE} get -o json ${resource}|jq '.items[].metadata.name'|sed "s/\"//g")
for r in ${rsrcs};do
dir="${NAMESPACE}/${resource}"
mkdir -p "${dir}"
kubectl -n ${NAMESPACE} get -o yaml ${resource} ${r} > "${dir}/${r}.yaml"
done
done
Remember to specify what resources you want exported in the script.
More info here
Is there any way to create a single master deployment yaml file which will create the entire set of services,replicas etc for the entire application.
Since you already mentioned kubernetes-helm why don't you actually used it for that exact purpose? In short helm is sort of package manager for Kubernetes, some say similar to yum or apt. It deploys charts which you can actually refer to as packed application. Its pack of all your pre-configured applications which can be deploy as one unit. It's not entirely one file but more collection of files that build so called helm chart.
What are the helm charts?
Well they are basically K8s yaml manifest combined into a single package that can be installed to your cluster. And installing the package is just as simple as running single command such as helm install. Once done the charts are highly reusable which reduces the time for creating dev, test and prod environments.
As an example of a complex helm chart deploying multiple resources you many want to check Stackstorm.
Basically once deployed without any custom config this chart will deploy 2 replicas for each component of StackStorm as well as backends like RabbitMQ, MongoDB and Redis.

Dependency between pods in helm charts

I am trying to deploy a helm chart and I need help for my use case.
My requirement is that in helm chart templates folder, I have few deployment yml and .tpl files, When I invoke helm install command, one of the deployment yml in the template folder will deploy as kind 'job' with only one pod associated to it. The other deployment ymls in the templates folder should wait for this job to be finished successfully and then only should get deployed on kubernetes as a pod.
When I will trigger helm install command , helm will read all the yml and hence will try to deploy all the pods at once which I don't want. I want my job to be succeeded first and then only the other pods should start getting deployed. While the job is running , all the other pods should wait or should not start as they all are dependent on job to be successful.
How can I achieve this case using helm. Please suggest.How can I make other pods wait and let them know that job has been successfully completed now.
You are looking for helm hooks:
Helm provides a hook mechanism to allow chart developers to intervene
at certain points in a release's life cycle. For example, you can use
hooks to:
Load a ConfigMap or Secret during install before any other charts are
loaded.
Execute a Job to back up a database before installing a new
chart, and then execute a second job after the upgrade in order to
restore data.
Run a Job before deleting a release to gracefully take a
service out of rotation before removing it.
Add the following annotation to your job:
metadata:
annotations:
"helm.sh/hook": "pre-install"
You can even configure your hook to be run before any install or upgrade (see other options here)
metadata:
annotations:
"helm.sh/hook": "pre-install, pre-upgrade"
The resources that a hook creates are not tracked or managed as part of the release. Once Tiller verifies that the hook has reached its ready state, it will leave your job resource alone (or you can set a "helm.sh/hook-delete-policy" to delete it).

Different deployment configurations using Helm

I would like to have a slightly different deployment configuration in different invironments. That is, in Prod and Ver, I don't want all containers to be deployed.
With docker-compose we solve that by having incremental docker-compose files that we combine, like: docker-compose up -f docker-compose.yml -f docker-compose-prod.yml
How can that be done using Helm charts?
We have a structure with Chart.yaml and values.yaml in the top, and then one yaml file per container in a subfolder. The naive solution would be to copy that structure and leave out some of the chart files, but I would prefer to have only one file (at most one file!) per service.
We deploy to AKS using CircleCI.
To summarize:
Today, each service has it's own yaml file, and on every deploy, all of them gets deployed. I want to configure my charts so that only a subset of the services gets deployed in certain environments.
EDIT:
kubectl has the the possibility to use selectors, like kubectl create cfg.yaml --selector=tier=frontend or kubectl create cfg.yaml --selector=environment=prod and I already tag my containers, so that would have been simple. But helm install does not have the possibility to accept a similar flag and pass it to kubectl.
just create one values file for each environment and target those:
helm install . -f values.production.yaml
helm install . -f values.development.yaml
you can use condition to toggle deployments, imagine you have something,yaml which you want conditionally deployed:
{{ if .Values.something}}
something.yaml original content goes here
{{ end }}

Setting up Spring Cloud Data Flow on Kubernetes

Do I need to install an instance of Spring Cloud Data Flow on the master server myself, or is this getting installed "automatically" as part of the deployment?
This isn't quite clear from the description at
http://docs.spring.io/spring-cloud-dataflow-server-kubernetes/docs/current-SNAPSHOT/reference/htmlsingle/#_deploying_streams_on_kubernetes
I've followed the guide, though removed every config for MySQL. Maybe this is required. Though I'm somewhat stuck since it's just not assigning an external IP and I do not see why, how to debug, and whether I missed to install some required component.
Edit:
To clarify, I see a scdf service entry when I run
kubectl get svc
But this service never gets an external IP.
Do I need to install an instance of Spring Cloud Data Flow on the master server myself, or is this getting installed "automatically" as part of the deployment?
Spring Cloud Data Flow server needs to be setup either outside (that knows how to connect to the kubernetes environment) or you can use the Spring Cloud Data Flow server docker image to run inside the kubernetes while the latter approach is better.
Step 6 in the link you posted above runs the SCDF docker image inside the kubernetes cluster:
```
Deploy the Spring Cloud Data Flow Server for Kubernetes using the Docker image and the configuration settings you just modified.
$ kubectl create -f src/etc/kubernetes/scdf-config-kafka.yml
$ kubectl create -f src/etc/kubernetes/scdf-secrets.yml
$ kubectl create -f src/etc/kubernetes/scdf-service.yml
$ kubectl create -f src/etc/kubernetes/scdf-controller.yml
```
MySql is required, that's why it's in the steps.
Spring Cloud Data Flow uses an RDBMS instead of Redis for stream/task
definitions, application registration, and for job repositories.
You can also use any of the other supported RDMBSes.
You can install it using Helm Charts.
https://dataflow.spring.io/docs/installation/kubernetes/helm/
At first install Helm
Then install Spring Cloud Data Flow
helm install --name my-release stable/spring-cloud-data-flow
It will install and config relevant pods such as spring-cloud-dataflow-server, mysql, skipper, rabbitmq, etc.
Also you can customize versions and configurations.