Skaffold deploys an extra pod when deploying with helm - minikube

I have skaffold build that creates 2 docker images - myimg1 and myimg2.
When I try to deploy via helm, skaffold deploys them fine, then tries to deploy and additional pod for myimg1. I can see that in minikube dashboard. At first see myimg1:<tag> but in a few seconds that changes to myimg1-0
deploy:
kubectl:
manifests:
- k8s/pgadmin.yaml
kubeContext: minikube
helm:
releases:
- name: myrelease
chartPath: charts/mychart
artifactOverrides:
myimg1.container.image: myimg1
- name: myrelease
chartPath: charts/mychart
artifactOverrides:
myimg2.container.image: myimg2
Values.yam looks like this:
myimg1:
name: myimg1
container:
image: jfrog.host.com/docker_repo/myimg1:latest
pullPolicy: IfNotPresent
port: 5432
service:
type: ClusterIP
port: 5432
myimg2:
name: myimag2
container:
image: jfrog.host.com/docker_repo/myimg1:latest
pullPolicy: IfNotPresent
port: 8080
service:
type: ClusterIP
port: 8080
portName: http
Now, if I run helm manually and override the image tags via command line arguments, it deploys fine:
helm install --atomic --debug myrelease ./charts/mychart/ -f ./charts/mychart/values.yaml --set-string myimg1.container.image=myimg1:latest --set-string myimg2.container.image: myimg2:latest
When I run skaffold dev however, both containers are deployed, then the second image is deployed again and it tries to pulls from the remote registry.
As far as I can tell everything is set up fine:
Is there anyway to debug this?
I see the following log message:
time="2022-10-07T10:40:50-04:00" level=info msg="Streaming logs from pod: myimg1-0 container: myimg1" subtask=-1 task=DevLoop
time="2022-10-07T10:40:50-04:00" level=debug msg="Running command: [kubectl --context minikube logs --since=7s -f myimg1-0 -c myrelease --namespace default]" subtask=-1 task=DevLoop

Related

Gitlab CI/CD pipeline passed, but no changes were applied to the server

I am testing automation by applying Gitlab CI/CD to a GKE cluster. The app is successfully deployed, but the source code changes are not applied (eg renaming the html title).
I have confirmed that the code has been changed in the gitlab repository master branch. No other branch.
CI/CD simply goes through the process below.
push code to master branch
builds the NextJS code
builds the docker image and pushes it to GCR
pulls the docker image and deploys it in.
The content of the menifest file is as follows.
.gitlab-ci.yml
stages:
- build-push
- deploy
image: docker:19.03.12
variables:
GCP_PROJECT_ID: PROJECT_ID..
GKE_CLUSTER_NAME: cicd-micro-cluster
GKE_CLUSTER_ZONE: asia-northeast1-b
DOCKER_HOST: tcp://docker:2375/
DOCKER_TLS_CERTDIR: ""
REGISTRY_HOSTNAME: gcr.io/${GCP_PROJECT_ID}
DOCKER_IMAGE_NAME: ${CI_PROJECT_NAME}
DOCKER_IMAGE_TAG: latest
services:
- docker:19.03.12-dind
build-push:
stage: build-push
before_script:
- docker info
- echo "$GKE_ACCESS_KEY" > key.json
- docker login -u _json_key --password-stdin https://gcr.io < key.json
script:
- docker build --tag $REGISTRY_HOSTNAME/$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG .
- docker push $REGISTRY_HOSTNAME/$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG
deploy:
stage: deploy
image: google/cloud-sdk
script:
- export USE_GKE_GCLOUD_AUTH_PLUGIN=True
- echo "$GKE_ACCESS_KEY" > key.json
- gcloud auth activate-service-account --key-file=key.json
- gcloud config set project $GCP_PROJECT_ID
- gcloud config set container/cluster $GKE_CLUSTER_NAME
- gcloud config set compute/zone $GKE_CLUSTER_ZONE
- gcloud container clusters get-credentials $GKE_CLUSTER_NAME --zone $GKE_CLUSTER_ZONE --project $GCP_PROJECT_ID
- kubectl apply -f deployment.yaml
- gcloud container images list-tags gcr.io/$GCP_PROJECT_ID/${CI_PROJECT_NAME} --filter='-tags:*' --format="get(digest)" --limit=10 > tags && while read p; do gcloud container images delete "gcr.io/$GCP_PROJECT_ID/${CI_PROJECT_NAME}#$p" --quiet; done < tags
Dockerfile
# Install dependencies only when needed
FROM node:16-alpine AS deps
# Check https://github.com/nodejs/docker-node/tree/b4117f9333da4138b03a546ec926ef50a31506c3#nodealpine to understand why libc6-compat might be needed.
RUN apk add --no-cache libc6-compat
WORKDIR /app
# Install dependencies based on the preferred package manager
COPY package.json yarn.lock* package-lock.json* pnpm-lock.yaml* ./
RUN \
if [ -f yarn.lock ]; then yarn --frozen-lockfile; \
elif [ -f package-lock.json ]; then npm ci; \
elif [ -f pnpm-lock.yaml ]; then yarn global add pnpm && pnpm i --frozen-lockfile; \
else echo "Lockfile not found." && exit 1; \
fi
# Rebuild the source code only when needed
FROM node:16-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
# Next.js collects completely anonymous telemetry data about general usage.
# Learn more here: https://nextjs.org/telemetry
# Uncomment the following line in case you want to disable telemetry during the build.
# ENV NEXT_TELEMETRY_DISABLED 1
RUN yarn build
# If using npm comment out above and use below instead
# RUN npm run build
# Production image, copy all the files and run next
FROM node:16-alpine AS runner
WORKDIR /app
ENV NODE_ENV production
# Uncomment the following line in case you want to disable telemetry during runtime.
# ENV NEXT_TELEMETRY_DISABLED 1
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
COPY --from=builder /app/public ./public
# Automatically leverage output traces to reduce image size
# https://nextjs.org/docs/advanced-features/output-file-tracing
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
USER nextjs
EXPOSE 3000
ENV PORT 3000
CMD ["node", "server.js"]
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontweb-lesson-prod
labels:
app: frontweb-lesson
spec:
selector:
matchLabels:
app: frontweb-lesson
template:
metadata:
labels:
app: frontweb-lesson
spec:
containers:
- name: frontweb-lesson-prod-app
image: gcr.io/PROJECT_ID../REPOSITORY_NAME..:latest
ports:
- containerPort: 3000
resources:
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: frontweb-lesson-prod-svc
labels:
app: frontweb-lesson
spec:
selector:
app: frontweb-lesson
ports:
- name: http
port: 80
protocol: TCP
targetPort: 3000
type: LoadBalancer
loadBalancerIP: "EXTERNAL_IP.."
Is there something I'm missing?
By default,imagepullpolicy will be Always but there could be chances if there is no change in the deployment file when applying it might not update the deployment. As you are using the same label each time latest.
As there different between kubectl apply and kubectl patch command
What you can do is add minor label change or annotation change in deployment and check image will get updated with kubectl apply command too otherwise it will be mostly unchange response of kubectl apply
Ref : imagepullpolicy
You should avoid using the :latest tag when deploying containers in
production as it is harder to track which version of the image is
running and more difficult to roll back properly.

Unable to import local minikube cluster to rancher on Mac (cattle-cluster-agent fails on :8080/health)

I have installed rancher on Mac, and used custom port numbers; am able to connect to localhost:443 and work through rancher GUI.
docker run --privileged -d --restart=unless-stopped -p 980:80 -p 981:443 --name rancher rancher/rancher
I then created a minikube local cluster, & tried to import that into rancher via GUI.
minikube -p dxmcs3 start
As suggested by Rancher as part of GUI, I ran the following to import; since the minikube POD needs to be able to access rancher endpoint on my host machine (mac), I updated the CATTLE_SERVER host to point to https://host.minikube.internal:981 prior to running.
curl --insecure -sfL https://localhost:981/v3/import/zsh5mtnkkrtz7tj7scbpb59q5tsjzmhg7r5476z7gdnh4xgjczt7cd_c-6sfj7.yaml | sed 's/https:\/\/localhost:981/https:\/\/host.minikube.internal:981/g' | kubectl apply -f -
The deployments go well, but the cluster's import shows up as "pending" forever on Rancher UI.
on digging, I noticed that the rancher/rancher-agent:v2.5.9 POD fails to start. More specifically, "cluster-register" container fails on health check up start. It tries to to http://(pod-ip):8080/health & fails.
containers:
- env:
- name: CATTLE_FEATURES
- name: CATTLE_IS_RKE
value: "false"
- name: CATTLE_SERVER
value: https://host.minikube.internal:981
- name: CATTLE_CA_CHECKSUM
value: d8e8de5d121fac709e414123ff792458931530569555a3d233d555deb62a9490
- name: CATTLE_CLUSTER
value: "true"
- name: CATTLE_K8S_MANAGED
value: "true"
- name: CATTLE_CLUSTER_REGISTRY
image: rancher/rancher-agent:v2.5.9
imagePullPolicy: IfNotPresent
name: cluster-register
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 2
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
Appreciate the community's help with resolving this. Or, Any pointer to a page I can refer. I really want to be able to have Rancher manage multiple minikube based local k8s environments for my learning & experimentation.

Deploying Node.js apps with Kubernetes

I was trying to deploy a very basic Express app, a small server listening on 8080 on a EC2 server (Ubuntu 16.04) following this tutorial. On that server, it was created a Kubernetes cluster through kops 1.8.0.
After that, I created a Dockerfile like the following:
FROM node:carbon
ENV NPM_CONFIG_PREFIX=/home/node/.npm-global
ENV PATH=$PATH:/home/node/.npm-global/bin
# Create app directory
WORKDIR /usr/src/app
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm#5+)
COPY package*.json ./
RUN npm install
# Bundle app source
COPY . .
EXPOSE 8080
CMD [ "node", "server.js" ]
# At the end, set the user to use when running this image
USER node
After that, I built the image with docker build -t ccastelli/stupid_server:test1, I specified my credentials with docker login -u ccastelli, I copied the imaged ID from docker images, tagged it docker tag c549618dcd86 org/test:first_try and pushed with docker push org/test on a private repository in cloud.docker.com.
After that I created a cluster secret with kubectl create secret docker-registry ccastelli-regcred --docker-server=docker.com --docker-username=ccastelli --docker-password='pass' --docker-email=myemail#gmail.com
After that I created a deployment file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: stupid-server-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: stupid-server
spec:
containers:
- name: stupid-server
image: org/test:first_try
imagePullPolicy: Always
ports:
- containerPort: 8080
imagePullSecrets:
- name: ccastelli-regcred
I see from kubectl get pods that the image transitioned from ErrPullImage to ImagePullBackOff and it's not ready. Anyway the docker container was working on the client instance but not in the cluster. At this point, I'm a bit lost. What am I doing wrong?
Thanks
Edit: message error:
Failed to pull image "org/test:first_try": rpc error: code =
Unknown desc = Error response from daemon: repository pycomio/test not
found: does not exist or no pull access
your --docker-server should be index.docker.io
DOCKER_REGISTRY_SERVER=https://index.docker.io/v1/
DOCKER_USER=Type your dockerhub username, same as when you `docker login`
DOCKER_EMAIL=Type your dockerhub email, same as when you `docker login`
DOCKER_PASSWORD=Type your dockerhub pw, same as when you `docker login`
kubectl create secret docker-registry myregistrykey \
--docker-server=$DOCKER_REGISTRY_SERVER \
--docker-username=$DOCKER_USER \
--docker-password=$DOCKER_PASSWORD \
--docker-email=$DOCKER_EMAIL

kubectl pull image from gitlab unauthorized: HTTP Basic: Access denied

I am trying to configure gitlab ci to deploy app to google compute engine. I have succesfully pushed image to gitlab repository but after applying kubernetes deployment config i see following error in kubectl describe pods:
Failed to pull image "registry.gitlab.com/proj/subproj/api:v1": rpc error: code = 2
desc = Error response from daemon: {"message":"Get https://registry.gitlab.com/v2/proj/subproj/api/manifests/v1: unauthorized: HTTP Basic: Access denied"}
Here is my deployment gitlab-ci job:
docker:
stage: docker_images
image: docker:latest
services:
- docker:dind
script:
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
- docker build -t registry.gitlab.com/proj/subproj/api:v1 -f Dockerfile .
- docker push registry.gitlab.com/proj/subproj/api:v1
only:
- master
dependencies:
- build_java
k8s-deploy:
image: google/cloud-sdk
stage: deploy
script:
- echo "$GOOGLE_KEY" > key.json # Google Cloud service account key
- gcloud auth activate-service-account --key-file key.json
- gcloud config set compute/zone us-central1-c
- gcloud config set project proj
- gcloud config set container/use_client_certificate True
- gcloud container clusters get-credentials proj-cluster
- kubectl delete secret registry.gitlab.com --ignore-not-found
- kubectl create secret docker-registry registry.gitlab.com --docker-server=https://registry.gitlab.com/v1/ --docker-username="$CI_REGISTRY_USER" --docker-password="$CI_REGISTRY_PASSWORD" --docker-email=some#gmail.com
- kubectl apply -f cloud-kubernetes.yml
and here is cloud-kubernetes.yml:
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
name: proj
labels:
app: proj
spec:
type: LoadBalancer
ports:
- port: 8082
name: proj
targetPort: 8082
nodePort: 32756
selector:
app: proj
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: projdeployment
spec:
replicas: 1
template:
metadata:
labels:
app: proj
spec:
containers:
- name: projcontainer
image: registry.gitlab.com/proj/subproj/api:v1
imagePullPolicy: Always
env:
- name: SPRING_PROFILES_ACTIVE
value: "cloud"
ports:
- containerPort: 8082
imagePullSecrets:
- name: registry.gitlab.com
I have followed this article
There is workaround, image could be pushed to google container registry, and then pulled from gcr without security. We can push image to gcr without gcloud cli using json token file. So .gitlab-ci.yaml could look like:
docker:
stage: docker_images
image: docker:latest
services:
- docker:dind
script:
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
- docker build -t registry.gitlab.com/proj/subproj/api:v1 -f Dockerfile .
- docker push registry.gitlab.com/proj/subproj/api:v1
- docker tag registry.gitlab.com/proj/subproj/api:v1 gcr.io/proj/api:v1
- docker login -u _json_key -p "$GOOGLE_KEY" https://gcr.io
- docker push gcr.io/proj/api:v1
only:
- master
dependencies:
- build_java
k8s-deploy:
image: google/cloud-sdk
stage: deploy
script:
- echo "$GOOGLE_KEY" > key.json # Google Cloud service account key
- gcloud auth activate-service-account --key-file key.json
- gcloud config set compute/zone us-central1-c
- gcloud config set project proj
- gcloud config set container/use_client_certificate True
- gcloud container clusters get-credentials proj-cluster
- kubectl apply -f cloud-kubernetes.yml
And image in cloud-kubernetes.yaml should be:
gcr.io/proj/api:v1
You must use --docker-server=CI_REGISTRY.
The same as you sue for docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY.
Also note that your docker secrets must be in the same namespace with Deployment/ReplicaSet/DaemonSet/StatefullSet/Job.

Why Kubernetes Pod gets into Terminated state giving Completed reason and exit code 0?

I am struggling to find any answer to this in the Kubernetes documentation. The scenario is the following:
Kubernetes version 1.4 over AWS
8 pods running a NodeJS API (Express) deployed as a Kubernetes Deployment
One of the pods gets restarted for no apparent reason late at night (no traffic, no CPU spikes, no memory pressure, no alerts...). Number of restarts is increased as a result of this.
Logs don't show anything abnormal (ran kubectl -p to see previous logs, no errors at all in there)
Resource consumption is normal, cannot see any events about Kubernetes rescheduling the pod into another node or similar
Describing the pod gives back TERMINATED state, giving back COMPLETED reason and exit code 0. I don't have the exact output from kubectl as this pod has been replaced multiple times now.
The pods are NodeJS server instances, they cannot complete, they are always running waiting for requests.
Would this be internal Kubernetes rearranging of pods? Is there any way to know when this happens? Shouldn't be an event somewhere saying why it happened?
Update
This just happened in our prod environment. The result of describing the offending pod is:
api:
Container ID: docker://7a117ed92fe36a3d2f904a882eb72c79d7ce66efa1162774ab9f0bcd39558f31
Image: 1.0.5-RC1
Image ID: docker://sha256:XXXX
Ports: 9080/TCP, 9443/TCP
State: Running
Started: Mon, 27 Mar 2017 12:30:05 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 24 Mar 2017 13:32:14 +0000
Finished: Mon, 27 Mar 2017 12:29:58 +0100
Ready: True
Restart Count: 1
Update 2
Here it is the deployment.yaml file used:
apiVersion: "extensions/v1beta1"
kind: "Deployment"
metadata:
namespace: "${ENV}"
name: "${APP}${CANARY}"
labels:
component: "${APP}${CANARY}"
spec:
replicas: ${PODS}
minReadySeconds: 30
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
component: "${APP}${CANARY}"
spec:
serviceAccount: "${APP}"
${IMAGE_PULL_SECRETS}
containers:
- name: "${APP}${CANARY}"
securityContext:
capabilities:
add:
- IPC_LOCK
image: "134078050561.dkr.ecr.eu-west-1.amazonaws.com/${APP}:${TAG}"
env:
- name: "KUBERNETES_CA_CERTIFICATE_FILE"
value: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: "metadata.namespace"
- name: "ENV"
value: "${ENV}"
- name: "PORT"
value: "${INTERNAL_PORT}"
- name: "CACHE_POLICY"
value: "all"
- name: "SERVICE_ORIGIN"
value: "${SERVICE_ORIGIN}"
- name: "DEBUG"
value: "http,controllers:recommend"
- name: "APPDYNAMICS"
value: "true"
- name: "VERSION"
value: "${TAG}"
ports:
- name: "http"
containerPort: ${HTTP_INTERNAL_PORT}
protocol: "TCP"
- name: "https"
containerPort: ${HTTPS_INTERNAL_PORT}
protocol: "TCP"
The Dockerfile of the image referenced in the above Deployment manifest:
FROM ubuntu:14.04
ENV NVM_VERSION v0.31.1
ENV NODE_VERSION v6.2.0
ENV NVM_DIR /home/app/nvm
ENV NODE_PATH $NVM_DIR/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/v$NODE_VERSION/bin:$PATH
ENV APP_HOME /home/app
RUN useradd -c "App User" -d $APP_HOME -m app
RUN apt-get update; apt-get install -y curl
USER app
# Install nvm with node and npm
RUN touch $HOME/.bashrc; curl https://raw.githubusercontent.com/creationix/nvm/${NVM_VERSION}/install.sh | bash \
&& /bin/bash -c 'source $NVM_DIR/nvm.sh; nvm install $NODE_VERSION'
ENV NODE_PATH $NVM_DIR/versions/node/$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/versions/node/$NODE_VERSION/bin:$PATH
# Create app directory
WORKDIR /home/app
COPY . /home/app
# Install app dependencies
RUN npm install
EXPOSE 9080 9443
CMD [ "npm", "start" ]
npm start is an alias for a regular node app.js command that starts a NodeJS server on port 9080.
Check the version of docker you run, and whether the docker daemon was restarted during that time.
If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.
If this is still relevant, we just had a similar problem in our cluster.
We have managed to find more information by inspecting the logs from docker itself. ssh onto your k8s node and run the following:
sudo journalctl -fu docker.service
I hade similar problem when we upgraded to version 2.x Pos get restarted even after the Dags ran successfully.
I later resolved it after a long time of debugging by overriding the pod template and specifying it in the airflow.cfg file.
[kubernetes]
....
pod_template_file = {{ .Values.airflow.home }}/pod_template.yaml
---
# pod_template.yaml
apiVersion: v1
kind: Pod
metadata:
name: dummy-name
spec:
serviceAccountName: default
restartPolicy: Never
containers:
- name: base
image: dummy_image
imagePullPolicy: IfNotPresent
ports: []
command: []