When I run the following command to get info from my on-prem cluster,
kubectl cluster-info dump
I see the followings per each node.
On master
"addresses": [
{
"type": "ExternalIP",
"address": "10.10.15.47"
},
{
"type": "InternalIP",
"address": "10.10.15.66"
},
{
"type": "InternalIP",
"address": "10.10.15.47"
},
{
"type": "InternalIP",
"address": "169.254.6.180"
},
{
"type": "Hostname",
"address": "k8s-dp-masterecad4834ec"
}
],
On worker node1
"addresses": [
{
"type": "ExternalIP",
"address": "10.10.15.57"
},
{
"type": "InternalIP",
"address": "10.10.15.57"
},
{
"type": "Hostname",
"address": "k8s-dp-worker5887dd1314"
}
],
On worker node2
"addresses": [
{
"type": "ExternalIP",
"address": "10.10.15.33"
},
{
"type": "InternalIP",
"address": "10.10.15.33"
},
{
"type": "Hostname",
"address": "k8s-dp-worker6d2f4b4c53"
}
],
My question here is..
1.) Why some nodes have different ExternalIP and InternalIP and some don't?
2.) Also for the node that have different ExternalIP and InternalIP are in same CIDR range and both can be reached from outside. What is so internal / external about these two IP address? (What is the purpose?)
3.) Why some node have random 169.x.x.x IP-address?
Trying to still learn more about Kubernetes and it would be greatly helpful if someone can help me understand. I use contiv as network plug-in
What you see is part of the status of these nodes:
InternalIP: IP address of the node accessible only from within the cluster
ExternalIP: IP address of the node accessible from everywhere
Hostname: hostname of the node as reported by the kernel
These fields are set when a node is added to the cluster and their exact meaning depends on the cluster configuration and is not completely standardised, as stated in the Kubernetes documentation.
So, the values that you see are like this, because your specific Kubernetes configuration sets them like this. With another configuration you get different values.
For example, on Amazon EKS, each node has a distinct InternalIP, ExternalIP, InternalDNS, ExternalDNS, and Hostname (identical to InternalIP). Amazon EKS sets these fields to the corresponding values of the node in the cloud infrastructure.
Related
I have an application that runs a code and at the end it sends an email with a report of the data. When I deploy pods on GKE , certain pods get terminated and a new pod is created due to Auto Scale, but the problem is that the termination is done after my code is finished and the email is sent twice for the same data.
Here is the JSON file of the deploy API:
{
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"name": "$name",
"namespace": "$namespace"
},
"spec": {
"template": {
"metadata": {
"name": "********"
},
"spec": {
"priorityClassName": "high-priority",
"containers": [
{
"name": "******",
"image": "$dockerScancatalogueImageRepo",
"imagePullPolicy": "IfNotPresent",
"env": $env,
"resources": {
"requests": {
"memory": "2000Mi",
"cpu": "2000m"
},
"limits":{
"memory":"2650Mi",
"cpu":"2650m"
}
}
}
],
"imagePullSecrets": [
{
"name": "docker-secret"
}
],
"restartPolicy": "Never"
}
}
}
}
and here is a screen-shot of the pod events:
Any idea how to fix that?
Thank you in advance.
"Perhaps you are affected by this "Note that even if you specify .spec.parallelism = 1 and .spec.completions = 1 and .spec.template.spec.restartPolicy = "Never", the same program may sometimes be started twice." from doc. What happens if you increase terminationgraceperiodseconds in your yaml file? – "
#danyL
my problem was that I had another jobs that deploy pods on my nodes with more priority , so it was trying to terminate my running pods but the job was already done and the email was already sent , so i fixed the problem by fixing the request and the limit resources on all my json files , i don't know if it's the perfect solution but for now it solved my problem.
Thank you all for you help
What thresholds should be set in Service Fabric Placement / Load balancing config for Cluster with large number of guest executable applications?
I am having trouble with Service Fabric trying to place too many services onto a single node too fast.
To give an example of cluster size, there are 2-4 worker node types, there are 3-6 worker nodes per node type, each node type may run 200 guest executable applications, and each application will have at least 2 replicas. The nodes are more than capable of running the services while running, it is just startup time where CPU is too high.
The problem seems to be the thresholds or defaults for placement and load balancing rules set in the cluster config. As examples of what I have tried: I have turned on InBuildThrottlingEnabled and set InBuildThrottlingGlobalMaxValue to 100, I have set the Global Movement Throttle settings to be various percentages of the total application count.
At this point there are two distinct scenarios I am trying to solve for. In both cases, the nodes go to 100% for an amount of time such that service fabric declares the node as down.
1st: Starting an entire cluster from all nodes being off without overwhelming nodes.
2nd: A single node being overwhelmed by too many services starting after a host comes back online
Here are my current parameters on the cluster:
"Name": "PlacementAndLoadBalancing",
"Parameters": [
{
"Name": "UseMoveCostReports",
"Value": "true"
},
{
"Name": "PLBRefreshGap",
"Value": "1"
},
{
"Name": "MinPlacementInterval",
"Value": "30.0"
},
{
"Name": "MinLoadBalancingInterval",
"Value": "30.0"
},
{
"Name": "MinConstraintCheckInterval",
"Value": "30.0"
},
{
"Name": "GlobalMovementThrottleThresholdForPlacement",
"Value": "25"
},
{
"Name": "GlobalMovementThrottleThresholdForBalancing",
"Value": "25"
},
{
"Name": "GlobalMovementThrottleThreshold",
"Value": "25"
},
{
"Name": "GlobalMovementThrottleCountingInterval",
"Value": "450"
},
{
"Name": "InBuildThrottlingEnabled",
"Value": "false"
},
{
"Name": "InBuildThrottlingGlobalMaxValue",
"Value": "100"
}
]
},
Based on discussion in answer below, wanted to leave a graph-image: if a node goes down, the act of shuffling services on to the remaining nodes will cause a second node to go down, as noted here. Green node goes down, then purple goes down due to too many resources being shuffled onto it.
From SF's perspective, 1 & 2 are the same problem. Also as a note, SF doesn't evict a node just because CPU consumption is high. So: "The nodes go to 100% for an amount of time such that service fabric declares the node as down." needs some more explanation. The machines might be failing for other reasons, or I guess could be so loaded that the kernel level failure detectors can't ping other machines, but that isn't very common.
For config changes: I would remove all of these to go with the defaults
{
"Name": "PLBRefreshGap",
"Value": "1"
},
{
"Name": "MinPlacementInterval",
"Value": "30.0"
},
{
"Name": "MinLoadBalancingInterval",
"Value": "30.0"
},
{
"Name": "MinConstraintCheckInterval",
"Value": "30.0"
},
For the inbuild throttle to work, this needs to flip to true:
{
"Name": "InBuildThrottlingEnabled",
"Value": "false"
},
Also, since these are likely constraint violations and placement (not proactive rebalancing) we need to explicitly instruct SF to throttle those operations as well. There is config for this in SF, although it is not documented or publicly supported at this time, you can see it in the settings. By default only balancing is throttled, but you should be able to turn on throttling for all phases and set appropriate limits via something like the below.
These first two settings are also within PlacementAndLoadBalancing, like the ones above.
{
"Name": "ThrottlePlacementPhase",
"Value": "true"
},
{
"Name": "ThrottleConstraintCheckPhase",
"Value": "true"
},
These next settings to set the limits are in their own sections, and are a map of the different node type names to the limit you want to throttle for that node type.
{
"name": "MaximumInBuildReplicasPerNodeConstraintCheckThrottle",
"parameters": [
{
"name": "YourNodeTypeNameHere",
"value": "100"
},
{
"name": "YourOtherNodeTypeNameHere",
"value": "100"
}
]
},
{
"name": "MaximumInBuildReplicasPerNodePlacementThrottle",
"parameters": [
{
"name": "YourNodeTypeNameHere",
"value": "100"
},
{
"name": "YourOtherNodeTypeNameHere",
"value": "100"
}
]
},
{
"name": "MaximumInBuildReplicasPerNodeBalancingThrottle",
"parameters": [
{
"name": "YourNodeTypeNameHere",
"value": "100"
},
{
"name": "YourOtherNodeTypeNameHere",
"value": "100"
}
]
},
{
"name": "MaximumInBuildReplicasPerNode",
"parameters": [
{
"name": "YourNodeTypeNameHere",
"value": "100"
},
{
"name": "YourOtherNodeTypeNameHere",
"value": "100"
}
]
}
I would make these changes and then try again. Additional information like what is actually causing the nodes to be down (confirmed via events and SF health info) would help identify the source of the problem. It would probably also be good to verify that starting 100 instances of the apps on the node actually works and whether that's an appropriate threshold.
New to Kubernetes.
I have a private dockerhub image deployed on a Kubernetes instance. When I exec into the pod I can run the following so I know my docker image is running:
root#private-reg:/# curl 127.0.0.1:8085
Hello world!root#private-reg:/#
From the dashboard I can see my service has an external endpoint which ends with port 8085. When I try to load this I get 404. My service YAML is as below:
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "test",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/services/test",
"uid": "a1a2ae23-339b-11e9-a3db-ae0f8069b739",
"resourceVersion": "3297377",
"creationTimestamp": "2019-02-18T16:38:33Z",
"labels": {
"k8s-app": "test"
}
},
"spec": {
"ports": [
{
"name": "tcp-8085-8085-7vzsb",
"protocol": "TCP",
"port": 8085,
"targetPort": 8085,
"nodePort": 31859
}
],
"selector": {
"k8s-app": "test"
},
"clusterIP": "******",
"type": "LoadBalancer",
"sessionAffinity": "None",
"externalTrafficPolicy": "Cluster"
},
"status": {
"loadBalancer": {
"ingress": [
{
"ip": "******"
}
]
}
}
}
Can anyone point me in the right direction.
What is the output from the below command
curl cluzterIP:8085
If you get Hello world message then it means that the service is routing the traffic Correctly to the backend pod.
curl HostIP:NODEPORT should also be working
Most likely that service is not bound to the backend pod. Did you define the below label on the pod?
labels: {
"k8s-app": "test"
}
You didn't mention what type of load balancer or cloud provider you are using but if your load balancer provisioned correctly which you should be able to see in your kube-controller-manager logs, then you should be able to access your service with what you see here:
"status": {
"loadBalancer": {
"ingress": [
{
"ip": "******"
}
]
}
Then you could check by running:
$ curl <ip>:<whatever external port your lb is fronting>
It's likely that this didn't provision if as described in other answers this works:
$ curl <clusterIP for svc>:8085
and
$ curl <NodeIP>:31859 # NodePort
Give a check on services on kuberntes, there are a few types:
https://kubernetes.io/docs/concepts/services-networking/service/
ClusterIP: creates access to service only inside the cluster.
NodePort: Access service through a given port on the nodes.
LoadBalancer: service externally acessible through a LB.
I am assuming you are running on GKE.
What kind of service is it, the one launched?
I got the task to setup a Kubernetes setup in place 2 days ago with no background in that technology. So sorry if my questions or setup are not good.
The topology is quite simple, a public IP, a dedicated HA proxy configured to forward requests to a Kubernetes services containing a deployment of 2 pods. (Stickiness required!)
Service setup
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "api-admin2",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/services/api-admin2",
"uid": "98121d0d-698b-11e8-8d90-262e68d4dba8",
"resourceVersion": "245163",
"creationTimestamp": "2018-06-06T13:14:50Z",
"labels": {
"app": "api-admin"
},
"annotations": {
"service.beta.kubernetes.io/azure-load-balancer-internal": "true"
}
},
"spec": {
"ports": [
{
"protocol": "TCP",
"port": 80,
"targetPort": 6543,
"nodePort": 31302
}
],
"selector": {
"app": "api-admin"
},
"clusterIP": "10.100.22.118",
"type": "LoadBalancer",
"sessionAffinity": "ClientIP",
"externalTrafficPolicy": "Local",
"healthCheckNodePort": 32660,
"sessionAffinityConfig": {
"clientIP": {
"timeoutSeconds": 10800
}
}
},
"status": {
"loadBalancer": {
"ingress": [
{
"ip": "10.100.21.97"
}
]
}
}
}
The traffic arrives on pods but not in round robin, the entire traffic goes to the same pod. To have traffic going to another pod, I have to stop the one getting it... Which is not the purpose of this...
Any idea how to have the traffic properly loadbalanced with stickiness ?
Thanks !
from the service documentation for proxy mode: IPVS:
In any of these proxy model, any traffic bound for the Service’s
IP:Port is proxied to an appropriate backend without the clients
knowing anything about Kubernetes or Services or Pods. Client-IP based
session affinity can be selected by setting
service.spec.sessionAffinity to “ClientIP” (the default is “None”),
and you can set the max session sticky time by setting the field
service.spec.sessionAffinityConfig.clientIP.timeoutSeconds if you have
already set service.spec.sessionAffinity to “ClientIP” (the default is
“10800”).
In your configuration the session affinity which is responsible for choosing the pod is set to clientIP which means 10800 is the sticky time, all the traffic will be forwarded to the same pod for 3 hours if they are coming from the same client.
If you want to specify time, as well, this is what needs to be changed:
sessionAffinityConfig:
clientIP:
timeoutSeconds: _TIME_
This will allow you to change the time of sickness, so if you changed TIME to 10 the service will switch pods every 10 seconds.
I have a kubernetes pod to which I attach a GCE persistent volume using a persistence volume claim. (For the even worse issue without a volume claim see: Mounting a gcePersistentDisk kubernetes volume is very slow)
When there is no volume attached, the pod starts in no time (max 2 seconds). But when the pod has a GCE persistent volume mount, the Running state is reached somewhere between 20 and 60 seconds. I was testing with different disk sizes (10, 200, 500 GiB) and multiple pod creations and the size does not seem to be correlated with the delay.
And this delay is not only happening in the beginning but also when rolling updates are performed with the replication controllers or when the code crashes during runtime.
Below I have the kubernetes specifications:
The replication controller
{
"apiVersion": "v1",
"kind": "ReplicationController",
"metadata": {
"name": "a1"
},
"spec": {
"replicas": 1,
"template": {
"metadata": {
"labels": {
"app": "a1"
}
},
"spec": {
"containers": [
{
"name": "a1-setup",
"image": "nginx",
"ports": [
{
"containerPort": 80
},
{
"containerPort": 443
}
]
}
]
}
}
}
}
The volume claim
{
"apiVersion": "v1",
"kind": "PersistentVolumeClaim",
"metadata": {
"name": "myclaim"
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "10Gi"
}
}
}
}
And the volume
{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "mydisk",
"labels": {
"name": "mydisk"
}
},
"spec": {
"capacity": {
"storage": "10Gi"
},
"accessModes": [
"ReadWriteOnce"
],
"gcePersistentDisk": {
"pdName": "a1-drive",
"fsType": "ext4"
}
}
}
Also
GCE (along with AWS and OpenStack) must first attach a disk/volume to the node before it can be mounted and exposed to your pod. The time required for attachment is dependent on the cloud provider.
In the case of pods created by a ReplicationController, there is an additional detach operation that has to happen. The same disk cannot be attached to more than one node (at least not in read/write mode). Detaching and pod cleanup happen in a different thread than attaching. To be specific, Kubelet running on a node has to reconcile the pods it currently has (and the sum of their volumes) with the volumes currently present on the node. Orphaned volumes are unmounted and detached. If your pod was scheduled on a different node, it must wait until the original node detaches the volume.
The cluster eventually reaches the correct state, but it might take time for each component to get there. This is your wait time.