Kubernetes cluster won't send logs to Loggly - kubernetes

I have a Kubernetes cluster which I am trying to get to run fluentd to send logs to Loggly for viewing. I am having an issue where I know my token is correct since it was working on another cluster before.
Below is my manifet.yaml file, I have no logs to share as the pod isn't logging anything.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-es-v1.20
namespace: kube-system
labels:
k8s-app: fluentd-loggly
kubernetes.io/cluster-service: "true"
version: v1.20
spec:
selector:
matchLabels:
k8s-app: fluentd-loggly
template:
metadata:
labels:
k8s-app: fluentd-loggly
kubernetes.io/cluster-service: "true"
version: v1.20
spec:
containers:
- name: fluentd-loggly
image: garland/kubernetes-fluentd-loggly:1.0
command:
- '/bin/sh'
- '-c'
- '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log'
env:
- name: LOGGLY_URL
value: "https://logs-01.loggly.com/inputs/<token>/tag/<tag>"
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
Edit 1
Logs requested
2021-06-21 16:33:43 +0000 [warn]: pattern not match: "2021-06-21T16:33:43.548145693Z stdout F 2021-06-21 16:33:43 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider docker: temporary failure in dockerutil, will retry later: try delay not elapsed yet"
This repeats over and over with nothing else in the logs.

Related

Nexus on k3s on restart does not persist Users and data

I have installed on K3S raspberry pi cluster nexus with the following setups for kubernetes learning purposes. First I created a StatefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nexus
namespace: dev-ops
spec:
serviceName: "nexus"
replicas: 1
selector:
matchLabels:
app: nexus-server
template:
metadata:
labels:
app: nexus-server
spec:
containers:
- name: nexus
image: klo2k/nexus3:latest
env:
- name: MAX_HEAP
value: "800m"
- name: MIN_HEAP
value: "300m"
resources:
limits:
memory: "4Gi"
cpu: "1000m"
requests:
memory: "2Gi"
cpu: "500m"
ports:
- containerPort: 8081
volumeMounts:
- name: nexusstorage
mountPath: /sonatype-work
volumes:
- name: nexusstorage
persistentVolumeClaim:
claimName: nexusstorage
Storage class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nexusstorage
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "30"
fsType: "ext4"
diskSelector: "ssd"
nodeSelector: "ssd"
pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nexusstorage
namespace: dev-ops
spec:
accessModes:
- ReadWriteOnce
storageClassName: nexusstorage
resources:
requests:
storage: 50Gi
Service
apiVersion: v1
kind: Service
metadata:
name: nexus-server
namespace: dev-ops
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: /
prometheus.io/port: '8081'
spec:
selector:
app: nexus-server
type: LoadBalancer
ports:
- port: 8081
targetPort: 8081
nodePort: 32000
this setup will spin up nexus, but if I restart the pod the data will not persist and I have to create all the setups and users from scratch.
What I'm missing in this case?
UPDATE
I got it working, nexus needs on mount permissions on directory. The working StatefulSet looks as it follow
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nexus
namespace: dev-ops
spec:
serviceName: "nexus"
replicas: 1
selector:
matchLabels:
app: nexus-server
template:
metadata:
labels:
app: nexus-server
spec:
securityContext:
runAsUser: 200
runAsGroup: 200
fsGroup: 200
containers:
- name: nexus
image: klo2k/nexus3:latest
env:
- name: MAX_HEAP
value: "800m"
- name: MIN_HEAP
value: "300m"
resources:
limits:
memory: "4Gi"
cpu: "1000m"
requests:
memory: "2Gi"
cpu: "500m"
ports:
- containerPort: 8081
volumeMounts:
- name: nexus-storage
mountPath: /nexus-data
volumes:
- name: nexus-storage
persistentVolumeClaim:
claimName: nexus-storage
important snippet to get it working
securityContext:
runAsUser: 200
runAsGroup: 200
fsGroup: 200
I'm not familiar with that image, although checking dockerhub, they mention using a Dockerfile similar to that of Sonatype. Then, I would change the mountpoint for your volume, to /nexus-data
This is the default path storing data (they set this env var, then declare a VOLUME). Which we can confirm, looking at the repository that most likely produced your arm-capable image
And following up on your last comment, let's try to also mount it in /opt/sonatype/sonatype-work/nexus3...
In your statefulset, change volumeMounts, to this:
volumeMounts:
- name: nexusstorage
mountPath: /nexus-data
- name: nexusstorage
mountPath: /opt/sonatype/sonatype-work/nexus3
volumes:
- name: nexusstorage
persistentVolumeClaim:
claimName: nexusstorage
Although the second volumeMount entry should not be necessary, as far as I understand. Maybe something's wrong with your storage provider?
Are you sure your PVC is write-able? Reverting back to your initial configuration, enter your pod (kubectl exec -it) and try to write a file at the root of your PVC.

why does my daemonset crash when a node goes down?

I have configured this DaemonSet in my cluster that is on the official Kubernetes page and everything works fine since it repartitions the replicas of my applications between my two available work nodes. The problem comes when one node goes down, then all the replicas start running on the other node. Once the downed node recovers the pods are not automatically partitioned between my nodes, so I have to manually remove all replicas and scale them again to get the DaemonSet to work.
How can i fix this?
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: gcr.io/fluentd-elasticsearch/fluentd:v2.5.1
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers

Time Zone in Kubernetes Pods Using Environment Variable

I am trying to update my pod time to Asia/Kolkata zone as per kubernetes timezone in POD with command and argument. However, the time still remains the same UTC time. Only the time zone is getting updated from UTC to Asia.
I was able to fix it using the volume mounts as below. Create a config map and apply the deployment yaml.
kubectl create configmap tz --from-file=/usr/share/zoneinfo/Asia/Kolkata -n <required namespace>
Why is the environmental variable method not working? Will a pod eviction occur from one host to another if we use volume mount time and will if affect the volume mount time after pod eviction?
The EV deployment YAML is below which does not update the time
apiVersion: apps/v1
kind: Deployment
metadata:
name: connector
labels:
app: connector
namespace: clients
spec:
replicas: 1
selector:
matchLabels:
app: connector
template:
metadata:
labels:
app: connector
spec:
containers:
- image: connector
name: connector
resources:
requests:
memory: "32Mi" # "64M"
cpu: "250m"
limits:
memory: "64Mi" # "128M"
cpu: "500m"
ports:
- containerPort: 3307
protocol: TCP
env:
- name: TZ
value: Asia/Kolkata
volumeMounts:
- name: connector-rd
mountPath: /home/mongobi/mongosqld.conf
subPath: mongosqld.conf
volumes:
- name: connector-rd
configMap:
name: connector-rd
items:
- key: mongod.conf
Volume Mount yaml is below.
apiVersion: apps/v1
kind: Deployment
metadata:
name: connector
labels:
app: connector
namespace: clients
spec:
replicas: 1
selector:
matchLabels:
app: connector
template:
metadata:
labels:
app: connector
spec:
containers:
- image: connector
name: connector
resources:
requests:
memory: "32Mi" # "64M"
cpu: "250m"
limits:
memory: "64Mi" # "128M"
cpu: "500m"
ports:
- containerPort: 3307
protocol: TCP
volumeMounts:
- name: tz-config
mountPath: /etc/localtime
- name: connector-rd
mountPath: /home/mongobi/mongosqld.conf
subPath: mongosqld.conf
volumes:
- name: connector-rd
configMap:
name: connector-rd
items:
- key: mongod.conf
path: mongosqld.conf
- name: tz-config
hostPath:
path: /usr/share/zoneinfo/Asia/Kolkata
In this scenario you need to mention type attribute as File for hostPath in the deployment configuration. The below configuration should work for you.
- name: tz-config
hostPath:
path: /usr/share/zoneinfo/Asia/Kolkata
type: File
Simply setting TZ env variable in deployment works for me

How can I give grafana user appropriate permission so that it can start successfully?

env:
kubernetes provider: gke
kubernetes version: v1.13.12-gke.25
grafana version: 6.6.2 (official image)
grafana deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
# securityContext:
# runAsUser: 104
# allowPrivilegeEscalation: true
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Problem
when I deployed this grafana dashboard first time, its working fine. after sometime I restarted the pod to check whether volume mount is working or not. after restarting, I getting below error.
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
what I understand from this error, user could create these files. How can I give this user appropriate permission to start grafana successfully?
I recreated your deployment with appropriate PVC and noticed that grafana pod was failing.
Output of command: $ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
grafana-6466cd95b5-4g95f 0/1 Error 2 65s
Further investigation pointed the same errors as yours:
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
This error showed on first creation of a pod and the deployment. There was no need to recreate any pods.
What I did to make it work was to edit your deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
securityContext:
runAsUser: 472
fsGroup: 472
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Please take a specific look on part:
securityContext:
runAsUser: 472
fsGroup: 472
It is a setting described in official documentation: Kubernetes.io: set the security context for a pod
Please take a look on this Github issue which is similar to yours and pointed me to solution that allowed pod to spawn correctly:
https://github.com/grafana/grafana-docker/issues/167
Grafana had some major updates starting from version 5.1. Please take a look: Grafana.com: Docs: Migrate to v5.1 or later
Please let me know if this helps.
On v8.0, I do that setting runAsUser: 0.
It works.
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- name: grafana-tcp
port: 3000
protocol: TCP
targetPort: 3000
selector:
project: grafana
type: LoadBalancer
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
project: grafana
name: grafana
spec:
replicas: 1
selector:
matchLabels:
project: grafana
strategy:
type: RollingUpdate
template:
metadata:
labels:
project: grafana
name: grafana
spec:
securityContext:
runAsUser: 0
containers:
- image: grafana/grafana
name: grafana
ports:
- containerPort: 3000
protocol: TCP
resources: {}
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-volume
volumes:
- name: grafana-volume
hostPath:
# directory location on host
path: /opt/grafana
# this field is optional
type: DirectoryOrCreate
restartPolicy: Always
status: {}

Deploy pods in different nodes

I have a namespace called airflow that has 2 pods: webserver and scheduler. I want to deploy scheduler on node A and webserver on node B.
And here you can see deployment files:
scheduler:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: airflow
name: airflow-scheduler
labels:
name: airflow-scheduler
spec:
replicas: 1
template:
metadata:
labels:
app: airflow-scheduler
spec:
terminationGracePeriodSeconds: 60
containers:
- name: scheduler
image: 123423.dkr.ecr.us-east-1.amazonaws.com/airflow:$COMMIT_SHA1
volumeMounts:
- name: logs
mountPath: /logs
command: ["airflow"]
args: ["scheduler"]
imagePullPolicy: Always
resources:
limits:
memory: "3072Mi"
requests:
cpu: "500m"
memory: "2048Mi"
volumes:
- name: logs
persistentVolumeClaim:
claimName: logs
webserver:
apiVersion: v1
kind: Service
metadata:
name: airflow-webserver
namespace: airflow
labels:
run: airflow-webserver
spec:
ports:
- port: 80
targetPort: 8080
selector:
run: airflow-webserver
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: airflow-webserver
namespace: airflow
annotations:
kubernetes.io/ingress.class: nginx
certmanager.k8s.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- airflow.awesome.com.br
secretName: airflow-crt
rules:
- host: airflow.awesome.com.br
http:
paths:
- path: /
backend:
serviceName: airflow-webserver
servicePort: 80
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: airflow
name: airflow-webserver
labels:
run: airflow-webserver
spec:
replicas: 1
template:
metadata:
labels:
run: airflow-webserver
spec:
terminationGracePeriodSeconds: 60
containers:
- name: webserver
image: 123423.dkr.ecr.us-east-1.amazonaws.com/airflow:$COMMIT_SHA1
volumeMounts:
- name: logs
mountPath: /logs
ports:
- containerPort: 8080
command: ["airflow"]
args: ["webserver"]
imagePullPolicy: Always
resources:
limits:
cpu: "200m"
memory: "3072Mi"
requests:
cpu: "100m"
memory: "2048Mi"
volumes:
- name: logs
persistentVolumeClaim:
claimName: logs
What's the proper way to ensure that pods will be deployed on different nodes?
edit1:
antiaffinity is not working:
I've tried to set podAntiAffinity on scheduler but it's not working:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: name
operator: In
values:
- airflow-webserver
topologyKey: "kubernetes.io/hostname"
If you want to have these pods run on different nodes but you don't care about which nodes exactly, you can use the Pod anti-affinity feature. It basically defines that the pod X should not run on the same node (it can be also used with failure domain / zones etc., not just with nodes) as pod Y and uses labels to specify the pods. So you will need to add some labels and specify them in the spec sections. More info about it is in Kube docs.
If in addition you want to also specify on which node it should run, you can use the Node affinity feature. See Kube docs for more details.