JBoss tracing JDBC on OpenShift - jboss

I have an OpenShift Deployment with 14 replicas in production environment.
I need to activate a trace on a single pod/replica and I have found the following jboss-cli.sh commands to do it
/subsystem=datasources/data-source=MySQLPool/:write-attribute(name=spy,value=true)
/subsystem=logging/logger=jboss.jdbc.spy/:add(level=TRACE)
/subsystem=jca/cached-connection-manager=cached-connection-manager/:write-attribute(name=error,value=true)
but when i enter those commands reload is required.
If I do a
:reload
the pod I am on restarts and the given configuration is lost.
Is there an altervative way to activate pool tracing ?
thanx a lot in advance!

I have found the problem: it wai in OpenShift Heath Check configuration
Liveness probe period was too small, so it caused killing while jboss reloading was in progress.
Rising the value gives Jboss time for reloading without killing the Pod.

Related

Kubernetes: Deployment generation is 35, but latest observed generation is 34

I have a Rancher Kubernetes Cluster running and my application containing sevral pods is running as a helm chart. When I wanted to update my application, I updated my container image and redeployed the pod. Since 3 years, this worked well. Suddenly, when I try to redeploy my frontend pod, I get the following error message from the rancher gui:
Deployment generation is 35, but latest observed generation is 34
I googled the error and "deployment generation", but it seems umcommon to have this problem. There are almost no results from google, what makes me wonder...The pod is not getting deployed currently.
Does anyone has a hint on why this suddenly happens and how to fix that?
Thanks in advance,
Ben
Just need to pause and redeploy the deployment.
It was due to consecutive failures of the deployment to try to be at a running state due to so internal errors of the app like the database being down or something else which was an internal server error.
I don't know the exact solution to this, but I had to reboot the whole cluster to get rid of this message and regain the ability to deploy. Seems to be something really odd.

Kubectl connection refused existing cluster

Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.

Kubernetes Node NotReady: ContainerGCFailed / ImageGCFailed context deadline exceeded

Worker node is getting into "NotReady" state with an error in the output of kubectl describe node:
ContainerGCFailed rpc error: code = DeadlineExceeded desc = context deadline exceeded
Environment:
Ubuntu, 16.04 LTS
Kubernetes version: v1.13.3
Docker version: 18.06.1-ce
There is a closed issue on that on Kubernetes GitHub k8 git, which is closed on the merit of being related to Docker issue.
Steps done to troubleshoot the issue:
kubectl describe node - error in question was found(root cause isn't clear).
journalctl -u kubelet - shows this related message:
skipping pod synchronization - [container runtime status check may not have completed yet PLEG is not healthy: pleg has yet to be successful]
it is related to this open k8 issue Ready/NotReady with PLEG issues
Check node health on AWS with cloudwatch - everything seems to be fine.
journalctl -fu docker.service : check docker for errors/issues -
the output doesn't show any erros related to that.
systemctl restart docker - after restarting docker, the node gets into "Ready" state but in 3-5 minutes becomes "NotReady" again.
It all seems to start when I deployed more pods to the node( close to its resource capacity but don't think that it is direct dependency) or was stopping/starting instances( after restart it is ok, but after some time node is NotReady).
Questions:
What is the root cause of the error?
How to monitor that kind of issue and make sure it doesn't happen?
Are there any workarounds to this problem?
What is the root cause of the error?
From what I was able to find it seems like the error happens when there is an issue contacting Docker, either because it is overloaded or because it is unresponsive. This is based on my experience and what has been mentioned in the GitHub issue you provided.
How to monitor that kind of issue and make sure it doesn't happen?
There seem to be no clarified mitigation or monitoring to this. But it seems like the best way would be to make sure your node will not be overloaded with pods. I have seen that it is not always shown on disk or memory pressure of the Node - but this is probably a problem of not enough resources allocated to Docker and it fails to respond in time. Proposed solution is to set limits for your pods to prevent overloading the Node.
In case of managed Kubernetes in GKE (not sure but other vendors probably have similar feature) there is a feature called node auto-repair. Which will not prevent node pressure or Docker related issue but when it detects an unhealthy node it can drain and redeploy the node/s.
If you already have resources and limits it seems like the best way to make sure this does not happen is to increase memory resource requests for pods. This will mean fewer pods per node and the actual used memory on each node should be lower.
Another way of monitoring/recognizing this could be done by SSH into the node check the memory, the processes with PS, monitoring the syslog and command $docker stats --all
I have got the same issue. I have cordoned and evicted the pods.
Rebooted the server. automatically node came into ready state.

kubernetes minikube faster uptime

I am using minikube and building my projects by tearing down the previous project and rebuilding it with
kubectl delete -f myprojectfiles
kubectl apply -f myprojectfiles
The files are a deployment and a service.
When I access my website I get a 503 error as I'm waiting for kubernetes to bring up the deployment. Is there anyway to speed this up? I see that my application is already built because the logs show it is ready. However it stays showing 503 for what feels like a few minute before everything in kubernetes triggers and starts serving me the application.
What are some things I can do to speed up the uptime?
Configure what is called readinessProbe, it won't fasten your boot up time, but it will help you by not giving false sense that application is up and running. With this your traffic will only be sent to your application pod when it is ready to accept the connection. Please read about it here.
FWIW your application might be waiting on some dependency to be up and running, also add these kinda health checks to that dependency pod.
You should not delete your Kubernetes resources. Use either kubectl apply or kubectl replace to update your project.
If you delete it, the nginx ingress controller won't find any upstream for a short period of time and puts on a blacklist for some seconds.
Also you should make sure, that you use Deployment which is able to do a rolling update without any downtime.

Updating deployment in GCE leads to node restart

We have some odd issue happening with GCE.
We have 2 clusters dev and prod each consisting of 2 nodes.
Production nodes are n1-standard-2, dev - n1-standard-1.
Typically dev cluster is busier with more pods eating more resources.
We deploy updates mostly with deployments (few projects still recreate RCs to update to latest versions)
Normally, the process is: build project, build docker image, docker push, create new deployment config and kubectl apply new config.
What's constantly happening on production is after applying new config, single or both nodes restart. Cluster does not seem to be starving with memory/cpu and we could not find anything in the logs that would explain those restarts.
Same procedure on staging never causes nodes to restart.
What can we do to diagnose the issue? Any specific events,logs we should be looking at?
Many thanks for any pointers.
UPDATE:
This is still happening and I found following in Computer Engine - Operations:
repair-1481931126173-543cefa5b6d48-9b052332-dfbf44a1
Operation type: compute.instances.repair.recreateInstance
Status message : Instance Group Manager 'projects/.../zones/europe-west1-c/instanceGroupManagers/gke-...' initiated recreateInstance on instance 'projects/.../zones/europe-west1-c/instances/...'. Reason: instance's intent is RUNNING but instance's health status is TIMEOUT.
We still can't figure out why this is happening and it's having a negative effect on our production environment every time we deploy our code.