Kubernetes rollout give 503 error when switching web pods - kubernetes

I'm running this command:
kubectl set image deployment/www-deployment VERSION_www=newImage
Works fine. But there's a 10 second window where the website is 503, and I'm a perfectionist.
How can I configure kubernetes to wait for the image to be available before switching the ingress?
I'm using the nginx ingress controller from here:
gcr.io/google_containers/nginx-ingress-controller:0.8.3
And this yaml for the web server:
# Service and Deployment
apiVersion: v1
kind: Service
metadata:
name: www-service
spec:
ports:
- name: http-port
port: 80
protocol: TCP
targetPort: http-port
selector:
app: www
sessionAffinity: None
type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: www-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: www
spec:
containers:
- image: myapp/www
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /healthz
port: http-port
name: www
ports:
- containerPort: 80
name: http-port
protocol: TCP
resources:
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- mountPath: /etc/env-volume
name: config
readOnly: true
imagePullSecrets:
- name: cloud.docker.com-pull
volumes:
- name: config
secret:
defaultMode: 420
items:
- key: www.sh
mode: 256
path: env.sh
secretName: env-secret
The Docker image is based on a node.js server image.
/healthz is a file in the webserver which returns ok I thought that liveness probe would make sure the server was up and ready before switching to the new version.
Thanks in advance!

within the Pod lifecycle it's defined that:
The default state of Liveness before the initial delay is Success.
To make sure you don't run into issues better configure the ReadinessProbe for your Pods too and consider to configure .spec.minReadySeconds for your Deployment.
You'll find details in the Deployment documentation

Related

Error syncing load balancer: failed to ensure load balancer: failed to build load-balancer

We are only trying out the Kubernetes setup and strictly following the docs (at this point).
We are on DigitalOcean and there is a bunch of tutorials and docs related to it as well (added all of these below for a reference).
At this point, I managed to deploy the two pods and now trying to configure the load balancer for them in the simplest way possible. Everything is getting deployed, but load balancer is failing to be initialized with the following error:
Error syncing load balancer: failed to ensure load balancer: failed to build load-balancer request: specified health check port 8080 does not exist on service default/https-with-cert
I verified that the health check is actually working on the pods if I ping them directly. In fact, this is the same health check that we are using for the last 2 years in manually setup infrastructure.
The build is running through github actions and everything is passing without issues:
where deployment.yml looks like this:
---
kind: Service
apiVersion: v1
metadata:
name: https-with-cert
annotations:
service.beta.kubernetes.io/do-loadbalancer-protocol: "http"
service.beta.kubernetes.io/do-loadbalancer-algorithm: "round_robin"
service.beta.kubernetes.io/do-loadbalancer-tls-ports: "443"
service.beta.kubernetes.io/do-loadbalancer-certificate-id: "c1eae56c-42cd-4953-9ab9-1a6facae87f8"
# "api.priz.guru" should be configured to point at the IP address of the DO load-balancer
service.beta.kubernetes.io/do-loadbalancer-hostname: "api.priz.guru"
service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol: "true"
service.beta.kubernetes.io/do-loadbalancer-disable-lets-encrypt-dns-records: "false"
service.beta.kubernetes.io/do-loadbalancer-size-unit: "2"
service.beta.kubernetes.io/do-loadbalancer-healthcheck-port: "8080"
service.beta.kubernetes.io/do-loadbalancer-healthcheck-protocol: "http"
service.beta.kubernetes.io/do-loadbalancer-healthcheck-path: "/v1/ping"
spec:
type: LoadBalancer
selector:
app: priz-api
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080
- name: https
protocol: TCP
port: 443
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: priz-api
labels:
app: priz-api
spec:
# modify replicas according to your case
replicas: 2
strategy:
type: RollingUpdate
selector:
matchLabels:
app: priz-api
template:
metadata:
labels:
app: priz-api
spec:
containers:
- name: priz-api
image: <IMAGE>
env:
- name: PRIZ_DATABASE_URL
value: "${PRIZ_DATABASE_URL_PROD}"
- name: PRIZ_DATABASE_USER
value: "${PRIZ_DATABASE_USER_PROD}"
- name: PRIZ_DATABASE_PASSWORD
value: "${PRIZ_DATABASE_PASSWORD_PROD}"
- name: PRIZ_AUTH0_DOMAIN
value: "${PRIZ_AUTH0_DOMAIN_PROD}"
- name: PRIZ_AUTH0_API_DOMAIN
value: "${PRIZ_AUTH0_API_DOMAIN_PROD}"
- name: PRIZ_AUTH0_API_CLIENT_ID
value: "${PRIZ_AUTH0_API_CLIENT_ID_PROD}"
- name: PRIZ_AUTH0_API_CLIENT_SECRET
value: "${PRIZ_AUTH0_API_CLIENT_SECRET_PROD}"
- name: PRIZ_APP_BASE_URL
value: "${PRIZ_APP_BASE_URL_PROD}"
- name: PRIZ_STRIPE_API_KEY_SECRET
value: "${PRIZ_STRIPE_API_KEY_SECRET_PROD}"
- name: PRIZ_SEARCH_HOST
value: "${PRIZ_SEARCH_HOST_PROD}"
ports:
- containerPort: 8080
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 2000m
memory: 2000Mi
How do I even debug this issue? What is missing?
Some references that we used:
https://docs.digitalocean.com/products/kubernetes/how-to/add-load-balancers/
https://docs.digitalocean.com/products/kubernetes/how-to/configure-load-balancers/
https://github.com/digitalocean/digitalocean-cloud-controller-manager/tree/master/docs/controllers/services/examples

Converting a docker-compose redis file to work on kubernetes

I'm migrating our swarm cluster to a k8s one, and that means I need to rewrite all the composes files to k8s files. Everything was going smothy, till I reach the redis compose...
The compose file from redis:
Yes, Its simple because is just to test during development for cache stuff...
version: "3"
services:
db:
image: redis:alpine
ports:
- "6380:6379"
deploy:
labels:
- traefik.frontend.rule=Host:our-redis-url.com
placement:
constraints:
- node.labels.so==linux
networks:
- traefik
networks:
traefik:
external: true
So, we have 4 nodes in that swarm... my DNS (our-redis-url.com) is pointing to one of them, and it works like a charm. I simple connect to redis using that url + the port 6380.
Now.... I have created the same thing, but for k8s, as follow:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-ms
namespace: prod
spec:
replicas: 1
selector:
matchLabels:
app: redis-ms
template:
metadata:
labels:
app: redis-ms
spec:
containers:
- name: redis-ms
image: redis:alpine
ports:
- containerPort: 6379
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
name: redis-ms
namespace: prod
spec:
selector:
app: redis-ms
ports:
- protocol: TCP
port: 6380
targetPort: 6379
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: redis-ms
namespace: prod
annotations:
kubernetes.io/ingress.class: traefik
spec:
rules:
- host: our-redis-url.com
http:
paths:
- backend:
service:
name: redis-ms
port:
number: 6380
path: /
pathType: Prefix
And that didn't work.
The pod run, and by the logs I can see it's waiting for connections, BUT I don't know how to do the trick like in docker-compose (traefik.frontend.rule=Host:redis-ms.mstech.com.br to bind the url and the port part).
I have tried to use the tool kompose to convert this compose file... It didn't work to lol
If anyone could bring me some advice, or help me fix the problem I'll thankfull.
I'm using k8s with traefik as ingress controler.
As mentioned in comments, the Ingress system is only for HTTP traffic. Traefik does also support TCP and UDP traffic but that's separate from Ingress stuff and had to be configured through Traefik's more-specific tools (either their custom resources or a config file). More commonly you would use a LoadBalancer-type Service which creates a TCP LB in your cloud provider.

gRPC socket closed on kubernetes with ingress

I have a gRPC server that works fine on my local machine. I can send grpc requests from a python app and get back the right responses.
I put the server into a GKE cluster (with only one node). I had a normal TCP load balancer in front of the cluster. In this setup my local client was able to get the correct response from some requests, but not others. I think it was the gRPC streaming that didn't work.
I assumed that this is because the streaming requires an HTTP/2 connection which requires SSL.
The standard load balancer I got in GKE didn't seem to support SSL, so I followed the docs to set up an ingress load balancer which does. I'm using a Lets-Encrypt certificate with it.
Now all gRPC requests return
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string =
"{"created":"#1556172211.931158414","description":"Error received from
peer
ipv4:ip.of.ingress.service:443", "file":"src/core/lib/surface/call.cc", "file_line":1041,"grpc_message":"Socket closed","grpc_status":14}"
The IP address is the external IP address of my ingress service.
The ingress yaml looks like this:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: rev79-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: "rev79-ip"
ingress.gcp.kubernetes.io/pre-shared-cert: "lets-encrypt-rev79"
kubernetes.io/ingress.allow-http: "false" # disable HTTP
spec:
rules:
- host: sub-domain.domain.app
http:
paths:
- path: /*
backend:
serviceName: sandbox-nodes
servicePort: 60000
The subdomain and domain of the request from my python app match the host in the ingress rule.
It connects to a node-port that looks like this:
apiVersion: v1
kind: Service
metadata:
name: sandbox-nodes
spec:
type: NodePort
selector:
app: rev79
environment: sandbox
ports:
- protocol: TCP
port: 60000
targetPort: 9000
The node itself has two containers and looks like this:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rev79-sandbox
labels:
app: rev79
environment: sandbox
spec:
replicas: 1
template:
metadata:
labels:
app: rev79
environment: sandbox
spec:
containers:
- name: esp
image: gcr.io/endpoints-release/endpoints-runtime:1.31
args: [
"--http2_port=9000",
"--service=rev79.endpoints.rev79-232812.cloud.goog",
"--rollout_strategy=managed",
"--backend=grpc://0.0.0.0:3011"
]
ports:
- containerPort: 9000
- name: rev79-uac-sandbox
image: gcr.io/rev79-232812/uac:latest
imagePullPolicy: Always
ports:
- containerPort: 3011
env:
- name: RAILS_MASTER_KEY
valueFrom:
secretKeyRef:
name: rev79-secrets
key: rails-master-key
The target of the node port is the ESP container which connects to the gRPC service deployed in the cloud, and the backend which is a Rails app that implements the backend of the API. This rails app isn't running the rails server, but a specialised gRPC server that comes with the grpc_for_rails gem
The grpc_server in the Rails app doesn't record any action in the logs, so I don't think the request gets that far.
kubectl get ingress reports this:
NAME HOSTS ADDRESS PORTS AGE
rev79-ingress sub-domain.domain.app my.static.ip.addr 80 7h
showing port 80, even though it's set up with SSL. That seems to be a bug. When I check with curl -kv https://sub-domain.domain.app the ingress server handles the request fine, and uses HTTP/2. It reurns an HTML formatted server error, but I'm not sure what generates that.
The API requires an API key, which the python client inserts into the metadata of each request.
When I go to the endpoints page of my GCP console I see that the API is not registering any requests since putting in the ingress loadbalancer, so it looks like the requests are not reaching the EPS container.
So why am I getting "socket closed" errors with gRPC?
I said I would come back and post an answer here once I got it working. It looks like I never did. Being a man of my word I'll post now my config files which are working for me.
in my deployment I've put a liveness and readiness probe for the ESP container. This made deployments happen smoothly without downtime:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rev79-sandbox
labels:
app: rev79
environment: sandbox
spec:
replicas: 3
template:
metadata:
labels:
app: rev79
environment: sandbox
spec:
volumes:
- name: nginx-ssl
secret:
secretName: nginx-ssl
- name: gcs-creds
secret:
secretName: rev79-secrets
items:
- key: gcs-credentials
path: "gcs.json"
containers:
- name: esp
image: gcr.io/endpoints-release/endpoints-runtime:1.45
args: [
"--http_port", "8080",
"--ssl_port", "443",
"--service", "rev79-sandbox.endpoints.rev79-232812.cloud.goog",
"--rollout_strategy", "managed",
"--backend", "grpc://0.0.0.0:3011",
"--cors_preset", "cors_with_regex",
"--cors_allow_origin_regex", ".*",
"-z", " "
]
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 8080
timeoutSeconds: 5
failureThreshold: 1
volumeMounts:
- name: nginx-ssl
mountPath: /etc/nginx/ssl
readOnly: true
ports:
- containerPort: 8080
- containerPort: 443
protocol: TCP
- name: rev79-uac-sandbox
image: gcr.io/rev79-232812/uac:29eff5e
imagePullPolicy: Always
volumeMounts:
- name: gcs-creds
mountPath: "/app/creds"
ports:
- containerPort: 3011
name: end-grpc
- containerPort: 3000
env:
- name: RAILS_MASTER_KEY
valueFrom:
secretKeyRef:
name: rev79-secrets
key: rails-master-key
This is my service config that exposes the deployment to the load balancer:
apiVersion: v1
kind: Service
metadata:
name: rev79-srv-ingress-sandbox
labels:
type: rev79-srv
annotations:
service.alpha.kubernetes.io/app-protocols: '{"rev79":"HTTP2"}'
cloud.google.com/neg: '{"ingress": true}'
spec:
type: NodePort
ports:
- name: rev79
port: 443
protocol: TCP
targetPort: 443
selector:
app: rev79
environment: sandbox
And this is my ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: rev79-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: "rev79-global-ip"
spec:
tls:
- secretName: sandbox-api-rev79-app-tls
rules:
- host: sandbox-api.rev79.app
http:
paths:
- backend:
serviceName: rev79-srv-ingress-sandbox
servicePort: 443
I'm using cert-manager to manage the certificates.
It was a long time agao now. I can't remember if there was anything else I did to solve the issue I was having

defining 2 ports in deployment.yaml in Kubernetes

I have a docker image from I am doing
docker run --name test -h test -p 9043:9043 -p 9443:9443 -d ibmcom/websphere-traditional:install
I am trying to put into a kubernetes deploy file and I have this:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: websphere
spec:
replicas: 1
template:
metadata:
labels:
app: websphere
spec:
containers:
- name: websphere
image: ibmcom/websphere-traditional:install
ports:
- containerPort: 9443
resources:
requests:
memory: 500Mi
cpu: 0.5
limits:
memory: 500Mi
cpu: 0.5
imagePullPolicy: Always
my service.yaml
apiVersion: v1
kind: Service
metadata:
name: websphere
labels:
app: websphere
spec:
type: NodePort #Exposes the service as a node ports
ports:
- port: 9443
protocol: TCP
targetPort: 9443
selector:
app: websphere
May I have guidance on how to map 2 ports in my deployment file?
You can add as many ports as you need.
Here your deployment.yml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: websphere
spec:
replicas: 1
template:
metadata:
labels:
app: websphere
spec:
containers:
- name: websphere
image: ibmcom/websphere-traditional:install
ports:
- containerPort: 9043
- containerPort: 9443
resources:
requests:
memory: 500Mi
cpu: 0.5
limits:
memory: 500Mi
cpu: 0.5
imagePullPolicy: IfNotPresent
Here your service.yml:
apiVersion: v1
kind: Service
metadata:
name: websphere
labels:
app: websphere
spec:
type: NodePort #Exposes the service as a node ports
ports:
- port: 9043
name: hello
protocol: TCP
targetPort: 9043
nodePort: 30043
- port: 9443
name: privet
protocol: TCP
targetPort: 9443
nodePort: 30443
selector:
app: websphere
Check on your kubernetes api-server configuration what is the range for nodePorts (usually 30000-32767, but it's configurable).
EDIT
If I remove from deployment.yml the resources section, it starts correctly (after about 5 mins).
Here a snippet of the logs:
[9/10/18 8:08:06:004 UTC] 00000051 webcontainer I
com.ibm.ws.webcontainer.VirtualHostImpl addWebApplication SRVE0250I:
Web Module Default Web Application has been bound to
default_host[:9080,:80,:9443,:506 0,:5061,:443].
Problems come connecting to it (I use ingress with traefik), because of certificates (I suppose):
[9/10/18 10:15:08:413 UTC] 000000a4 SSLHandshakeE E SSLC0008E:
Unable to initialize SSL connection. Unauthorized access was denied
or security settings have expired. Exception is
javax.net.ssl.SSLException: Unrecognized SSL message, plaintext
connection?
To solve that (I didn't go further) this may help: SSLHandshakeE E SSLC0008E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired
Trying to connect with port-forward:
and using dthe browser to connect, I land on this page:
Well in kubernetes you can define your ports using #port label. This label comes under ports configuration in your deployment. According to the configurations you can simply define any numbers of ports you wish. Following example shows how to define two ports.
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: MyApp
ports:
- name: http
protocol: TCP
port: 80
targetPort: 9376
- name: https
protocol: TCP
port: 443
targetPort: 9377

How to use a custom scheduler for kubernetes (on google cloud), written in bash language?

at http://blog.kubernetes.io/2017/03/advanced-scheduling-in-kubernetes.html an example of a custom scheduler for kubernetes is given, which is written in bash language.
My question is how can such a custom scheduler be used for a pod?
It says "Note that you need to run this along with kubectl proxy for it to work", which is not clear to me.
I would appreciate any help.
thanks
You would need to deploy the scheduler. Then associate that scheduler to your pod.
This is a great write-up: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
Here is an example deployment of my-scheduler:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
labels:
component: scheduler
tier: control-plane
name: my-scheduler
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
component: scheduler
tier: control-plane
version: second
spec:
containers:
- command:
- /usr/local/bin/kube-scheduler
- --address=0.0.0.0
- --leader-elect=false
- --scheduler-name=my-scheduler
image: gcr.io/my-gcp-project/my-kube-scheduler:1.0
livenessProbe:
httpGet:
path: /healthz
port: 10251
initialDelaySeconds: 15
name: kube-second-scheduler
readinessProbe:
httpGet:
path: /healthz
port: 10251
resources:
requests:
cpu: '0.1'
securityContext:
privileged: false
volumeMounts: []
hostNetwork: false
hostPID: false
volumes: []
Here is how to connect a pod to your scheduler:
apiVersion: v1
kind: Pod
metadata:
name: annotation-second-scheduler
labels:
name: multischeduler-example
spec:
schedulerName: my-scheduler
containers:
- name: pod-with-second-annotation-container
image: gcr.io/google_containers/pause:2.0
The key part in the above are spec.schedulerName.