Issues when using AKS to manage containers - kubernetes

I am using docker + AKS to manage my containers. When I run my containers locally/or on a VM using docker-compose ..my services(which are containerized) can communicate with my databases which are also in containers. The bridge between these containers is created using networks. After I converted the docker-compose file for all of my applications to the respective yaml counterparts and deployed my containers to AKS (single node), my containerized services are not able to reach the database.
All my containers have 3 yaml files
Pvc
deployment(for pods)
svc.
I've gone through many of the getting started with AKS examples and for some reason am not able to figure it out. All application services are exposed publicly using load balancers. My question is more like how do I define which db the application services should connect to now that the concept of networks doesn't exist anymore.
In the examples provided for KS all the the front end services do is create a env and specify the name of the backend service. I tried that as well and my application still doesn't work. Sample that I referred to validate my setup is https://learn.microsoft.com/en-gb/azure/aks/kubernetes-walkthrough#run-the-application.
Any help would be great.

If you need these services internally only, you should not expose it publicly using load balancers.
Kubernetes has two possibilities for service discovery. DNS and environment variables. While DNS is an optional component, I did not see any cluster without it. Also I assume that AKS uses it.
So, for example you have a Postgres database and want to use it somewhere else:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: postgres
labels:
app: postgres
spec:
replicas: 1
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: db
image: postgres:11
ports:
- name: postgres
containerPort: 5432
This creates a deployment with exposes the port 5432. The label app: postgres is also important here, since we need it later to identify the created Pods.
Now we need to create a service for it:
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
type: ClusterIP # default value
selector:
app: postgres
ports:
- port: 5432
This creates a virtual IP address and registers all ready pods with the label app: postgres to it. Since the name of the service is postgres and it is the default namespace, postgres is now accessible via postgres.default.svc.cluster.local:5432. You can you this address and port in your other application (eg Python) to connect to the database.

Related

K8s service doesn't use autoscaled mongodb pods

I am trying to deploy mongodb with kubernetes (gke more precisely). This database is used by a microservice which only needs to read in the database, so I thought of deploying multiple pods with a mongodb docker in each of them, so that the work is shared between them. To do that I created a mongodb image in which I uploaded my mongodb that I previously used with a single docker.
(Here is its Dockerfile, a single deployment of this image works in k8s so I guess that this may not be linked to the issue)
FROM mongo:latest
EXPOSE 27017
COPY /mdb/ /data/db
As the number of requests to the db varies during the day, I want to use gke horizontal autoscaling for those "mongodb pods". Autoscaling works as new pods are created when the cpu utilization goes over the target I fixed in my horinzontal pod autoscaler, but these new pods are not used by the service I created for the deployment file used to deploy those pods, and that's my issue.
Something strange to me is that the new pods local IP addresses appear in my service endpoints, and when I delete the initial pod, which is the only one working at that moment, then the other pods created by the autoscaler get activated and so I finally get a performance improvement. However, this is obviously not a solution to me, and moreover other pods created after I deleted the initial one don't get used either.
Here are the yamls files for my mongodb deployment and service :
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongodb-deployment
labels:
app: mongodb
spec:
replicas: 1
selector:
matchLabels:
app: mongodb
template:
metadata:
labels:
app: mongodb
spec:
containers:
- name: mongodb
image: "$my_mongodb_image_in_which_I_have_my_db"
ports:
- containerPort: 27017
resources:
requests:
memory: "1800Mi"
cpu: "3000m"
apiVersion: v1
kind: Service
metadata:
name: mongodb-service
spec:
type: LoadBalancer
loadBalancerIP: $IP_reserved_for_this_service
selector:
app: mongodb
ports:
- protocol: TCP
port: 80
targetPort: 27017
And I am accessing those mongodb through pymongo, in programs that run in another pod in the same gke cluster:
def get_db(database: str):
client = MongoClient(host="$IP_reserved_for_this_service",
port=80,
username="...",
password="...",
authSource="admin")
return client.get_database(database)
This way of using and autoscaling mongodb might be weird and quite impractical but it's only a first model for me and I would like to make it work (or understand why it can't work).
Here are screenshots showing the behaviour of those pods:
state 1: only the initial pod is working
...but all ips appear in service endpoints
state 2: initial pod deleted, the other work now (except the new one created by the autoscaler after the deletion)
...and the endpoints are updated in the service (the update is in the "+ 1 more ...", I checked in the google console)
I feel that the problem might come either from the configuration of my mongodb-service or from the way k8s or gke deals with mongodb images (anyway since I'm new to k8s I might be completely wrong on that too).
Any help or comment will be appreciated, and if you need more information let me know.
There sticky connection in Kubernetes, a common and well-known feature of Kubernetes. Kubernetes doesn't balance packages, it's balancing connections. Once your app established a connection to a service, all requests to this service will through this connection. Kubernetes doesn't guarantee that the next one won't go via another connection.
Once of the option to solve - headless service https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
Or service mesh, but it's too much for your case ))

Redis master/slave replication on Kubernetes for ultra-low latency

A graph is always better than the last sentences, so here is what I would like to do :
To sum up:
I want to have a Redis master instance outside (or inside, this is not relevant here) my K8S cluster
I want to have a Redis slave instance per node replicating the master instance
I want that when removing a node, the Redis slave pod gets unregistered from master
I want that when adding a node, a Redis slave pod is added to the node and registered to the master
I want all pods in one node to consume only the data of the local Redis slave (easy part I think)
Why do I want such an architecture?
I want to take advantage of Redis master/slave replication to avoid dealing with cache invalidation myself
I want to have ultra-low latency calls to Redis cache, so having one slave per node is the best I can get (calling on local host network)
Is it possible to automate such deployments, using Helm for instance? Are there domcumentation resources to make such an architecture with clean dynamic master/slave binding/unbinding?
And most of all, is this architecture a good idea for what I want to do? Is there any alternative that could be as fast?
i remember we had a discussion on this topic previously here, no worries adding more here.
Read more about the Redis helm chart : https://github.com/bitnami/charts/tree/master/bitnami/redis#choose-between-redis-helm-chart-and-redis-cluster-helm-chart
You should also be asking the question of how my application will be
connecting to POD on same Node without using the service of Redis.
For that, you can use the `environment variables and expose them to application POD.
Something like :
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
It will give you the value of Node IP on which the POD is running, then you can use that IP to connect to DeamonSet (Redis slave if you are running).
You can read more at : https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
Is it possible to automate such deployments, using Helm for instance?
Yes, you can write down your own Helm chart and deploy the generated YAML manifest.
And most of all, is this architecture a good idea for what I want to
do? Is there any alternative that could be as fast?
If you think then it is a good idea, as per my consideration this could create the $$$ issue & higher cluster resources usage.
What if you are running the 200 nodes on each you will be running the slave of Redis ? Which might consume resources on each node and add cost to your infra.
OR
if you are planning for specific deployment
Your above suggestion is also good, but still, if you are planning to use the Redis with only Specific deployment you can use the sidecar pattern also and connect multiple Redis together using configuration.
apiVersion: v1
kind: Service
metadata:
name: web
labels:
app: web
spec:
ports:
- port: 80
name: redis
targetPort: 5000
selector:
app: web
type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
selector:
matchLabels:
app: web
replicas: 3
template:
metadata:
labels:
app: web
spec:
containers:
- name: redis
image: redis
ports:
- containerPort: 6379
name: redis
protocol: TCP
- name: web-app
image: web-app
env:
- name: "REDIS_HOST"
value: "localhost"

Why Do I Need a NodePort in My Local Kubernetes Cluster?

Excuse my relative networking ignorance, but I've read a lot of docs and still have trouble understanding this (perhaps due to lack of background in networks).
Given this Dockerfile:
from node:lts-slim
RUN mkdir /code
COPY package.json /code/
WORKDIR /code
RUN npm install
COPY server.js /code/
EXPOSE 3000
CMD ["node", "server.js"]
...this deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-deployment
spec:
replicas: 2
selector:
matchLabels:
app: web-pod
template:
metadata:
labels:
app: web-pod
spec:
containers:
- name: web
image: kahunacohen/hello-k8s
ports:
- containerPort: 3000
protocol: TCP
and this service:
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
type: NodePort
selector:
app: web-pod
ports:
- port: 80
targetPort: 3000
protocol: TCP
name: http
My understanding is that:
The app in my container is exposing itself to the outside world on 3000
my deployment yaml is saying, "the container is listening on 3000"
my service is saying map 3000 internally to port 80, which is the default port, so you don't have to add the port to the host.
I'm using the NodePort type because on local clusters like Docker Desktop it works out of the box instead of LoadBalancer. It opens up a random port on every node (pod?) to the outside in the cluster between 30000–32767. That node port is how I access my app from outside. E.g. localhost:30543.
Are my assumptions correct? I am unclear why I can't access my app at localhost:80, or just localhost, if the service makes the mapping between the container port and the outside world? What's the point of the mapping between 3000 and 80 in the service?
In short, why do I need NodePort?
There are two networking layers, which we could call "inside the cluster" and "outside the cluster". The Pod and the Service each have their own IP address, but these are only inside the cluster. You need the NodePort to forward a request from outside the cluster to inside the cluster.
In a "real" Kubernetes cluster, you'd make a request...
...to http://any-kubernetes-node.example.com:31245/, with a "normal" IP address in the way you'd expect a physical system to have, connecting to the NodePort port, which forwards...
...to http://web-service.default.svc.cluster.local:80/, with a cluster-internal IP address and the service port, which looks at the pods it selects and forwards...
...to http://10.20.30.40:3000/, using the cluster-internal IP address of any of the matching pods and the target port from the service.
The containerPort: in the pod spec isn't strictly required (but if you give it name: http then you can have the service specify targetPort: http without knowing the specific port number). EXPOSE in the Dockerfile means pretty much nothing in this sequence.
This sequence also gives you some flexibility in not needing to know where things are running. Say you have 100 nodes and 3 replicas of your pod; the initial connection can be to any node, and the service will forward to all of the target pods, without you needing to know any of these details from the caller.
(For completeness, a LoadBalancer type service requests that a load balancer be created outside the cluster; for example, an AWS ELB. This forwards to any of the cluster nodes as in step 1 above. If you're not in a cloud environment and the cluster doesn't know how to create the external load balancer automatically, it's the same as NodePort.)
If we reduce this to a local Kubernetes installation (Docker Desktop, minikube, kind) the only real difference is that there's only one node; the underlying infrastructure is still built as though it were a multi-node distributed cluster. How exactly you access a service differs across these installations. In Docker Desktop, from the host system, you can use localhost as the "normal" "external" node IP address in the first step.

Connecting two containers with Kubernetes using environment variables

I'm new to k8s and need some direction on how to troubleshoot.
I have a postgres container and a graphql container. The graphql container is tries to connect to postgres on startup.
Problem
The graphql container can't connect to postgres. This is the error on startup:
{"internal":"could not connect to server: Connection refused\n\tIs the server running on host "my-app" (xxx.xx.xx.xxx) and accepting\n\tTCP/IP connections on port 5432?\n",
"path":"$","error":"connection error","code":"postgres-error"}
My understanding is that the graphql-container doesn't recognize the IP my-app (xxx.xx.xx.xxx). This is the actual Pod Host IP, so I'm confused as to why it doesn't recognize it. How do I troubleshoot errors like these?
What I tried
Hardcoding the host in the connection uri in deployment.yaml to the actual pod host IP. Same error.
Bashed into the graphql container and verified that it had the correct env values with the env command.
deployment.yaml
spec:
selector:
matchLabels:
service: my-app
template:
metadata:
labels:
service: my-app
...
- name: my-graphql-container
image: image-name:latest
env:
- name: MY_POSTGRES_HOST
value: my-app
- name: MY_DATABASE
value: db
- name: MY_POSTGRES_DB_URL # the postgres connection url that the graphql container uses
value: postgres://$(user):$(pw)#$(MY_POSTGRES_HOST):5432/$(MY_DATABASE)
...
- name: my-postgres-db
image: image-name:latest
In k8s docs about pods you can read:
Pods in a Kubernetes cluster are used in two main ways:
Pods that run a single container. [...]
Pods that run multiple containers that need to work together. [...]
Note: Grouping multiple co-located and co-managed containers in a
single Pod is a relatively advanced use case. You should use this
pattern only in specific instances in which your containers are
tightly coupled.
Each Pod is meant to run a single instance of a given
application. [...]
Notice that your deployment doesn't fit this descriprion because you are trying to run two applications in one pod.
Remember to always use one pod per container and only use multiple containers per pod if it's impossible to separate them (and for some reason they have to run together).
And the rest was already mentioned by David.

Access local Kubernetes cluster running in Virtualbox

I have configured a Kubernetes cluster using kubeadm, by creating 3 Virtualbox nodes, each node running CentOS (master, node1, node2). Each virtualbox virtual machine is configured using 'Bridge' networking.
As a result, I have the following setup:
Master node 'master.k8s' running at 192.168.19.87 (virtualbox)
Worker node 1 'node1.k8s' running at 192.168.19.88 (virtualbox)
Worker node 2 'node2.k8s' running at 192.168.19.89 (virtualbox
Now I would like to access services running in the cluster from my local machine (the physical machine where the virtualbox nodes are running).
Running kubectl cluster-info I see the following output:
Kubernetes master is running at https://192.168.19.87:6443
KubeDNS is running at ...
As an example, let's say I deploy the dashboard inside my cluster, how do I open the dashboard UI using a browser running on my physical machine?
The traditional way is to use kubectl proxy or a Load Balancer, but since you are in a development machine a NodePort can be used to publish the applications, as a Load balancer is not available in VirtualBox.
The following example deploys 3 replicas of an echo server running nginx and publishes the http port using a NodePort:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: my-echo
image: gcr.io/google_containers/echoserver:1.8
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service-np
labels:
name: nginx-service-np
spec:
type: NodePort
ports:
- port: 8082 # Cluster IP http://10.109.199.234:8082
targetPort: 8080 # Application port
nodePort: 30000 # Example (EXTERNAL-IP VirtualBox IPs) http://192.168.50.11:30000/ http://192.168.50.12:30000/ http://192.168.50.13:30000/
protocol: TCP
name: http
selector:
app: nginx
You can access the servers using any of the VirtualBox IPs, like
http://192.168.50.11:30000 or http://192.168.50.12:30000 or http://192.168.50.13:30000
See a full example at Building a Kubernetes Cluster with Vagrant and Ansible (without Minikube).
The traditional way of getting access to the kubernetes dashboard is documented in their readme and is to use kubectl proxy.
One should not have to ssh into the cluster to access any kubernetes service, since that would defeat the purpose of having a cluster, and would absolutely shoot a hole in the cluster's security model. Any ssh to Nodes should be reserved for "in case of emergency, break glass" situations.
More generally speaking, a well configured Ingress controller will surface services en-masse and also has the very pleasing side-effect of meaning your local cluster will operate exactly the same as your "for real" cluster, without any underhanded ssh-ery rules required