Google Container Engine Auto deleting services/pods

Google Container Engine Auto deleting services/pods - kubernetes

I am testing goolge container engine and everything was fine until I found this really weird issue.
bash-3.2# kubectl get services --namespace=es
NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE
elasticsearch-logging 10.67.244.176 <none> 9200/TCP name=elasticsearch-logging 5m
bash-3.2# kubectl describe service elasticsearch-logging --namespace=es
Name: elasticsearch-logging
Namespace: es
Labels: k8s-app=elasticsearch-logging,kubernetes.io/cluster-service=true,kubernetes.io/name=Elasticsearch
Selector: name=elasticsearch-logging
Type: ClusterIP
IP: 10.67.248.242
Port: <unnamed> 9200/TCP
Endpoints: <none>
Session Affinity: None
No events.
after exactly 5 minutes, the service was deleted automatically.
kubectl get events --namespace=es
1m 1m 1 elasticsearch-logging Service DeletingLoadBalancer {service-controller } Deleting load balancer
1m 1m 1 elasticsearch-logging Service DeletedLoadBalancer {service-controller } Deleted load balancer
Anyone got a clue why? thanks.

The label kubernetes.io/cluster-service=true is a special, reserved label that shouldn't be used by user resources. That's used by a system process that manages the cluster's addons, like the DNS and kube-ui pods that you'll see in your cluster's kube-system namespace.
The reason your service is being deleted is because the system process is checking for resources with that label, seeing one that it doesn't know about, and assuming that it's something that it started previously that isn't meant to exist anymore. This is explained a little more in this doc about cluster addons.
In general, you shouldn't have any labels that are prefixed with kubernetes.io/ on your resources, since that's a reserved namespace.

after removing the following from metadata/labels in the yaml file, the problem went away.
**kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Elasticsearch"**

Related

Kubernetes clusterIP does not load balance requests [duplicate]

My Environment: Mac dev machine with latest Minikube/Docker
I built (locally) a simple docker image with a simple Django REST API "hello world".I'm running a deployment with 3 replicas. This is my yaml file for defining it:
apiVersion: v1
kind: Service
metadata:
name: myproj-app-service
labels:
app: myproj-be
spec:
type: LoadBalancer
ports:
- port: 8000
selector:
app: myproj-be
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myproj-app-deployment
labels:
app: myproj-be
spec:
replicas: 3
selector:
matchLabels:
app: myproj-be
template:
metadata:
labels:
app: myproj-be
spec:
containers:
- name: myproj-app-server
image: myproj-app-server:4
ports:
- containerPort: 8000
env:
- name: DATABASE_URL
value: postgres://myname:#10.0.2.2:5432/myproj2
- name: REDIS_URL
value: redis://10.0.2.2:6379/1
When I apply this yaml it generates things correctly.
- one deployment
- one service
- three pods
Deployments:
NAME READY UP-TO-DATE AVAILABLE AGE
myproj-app-deployment 3/3 3 3 79m
Services:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 83m
myproj-app-service LoadBalancer 10.96.91.44 <pending> 8000:31559/TCP 79m
Pods:
NAME READY STATUS RESTARTS AGE
myproj-app-deployment-77664b5557-97wkx 1/1 Running 0 48m
myproj-app-deployment-77664b5557-ks7kf 1/1 Running 0 49m
myproj-app-deployment-77664b5557-v9889 1/1 Running 0 49m
The interesting thing is that when I SSH into the Minikube, and hit the service using curl 10.96.91.44:8000 it respects the LoadBalancer type of the service and rotates between all three pods as I hit the endpoints time and again. I can see that in the returned results which I have made sure to include the HOSTNAME of the pod.
However, when I try to access the service from my Hosting Mac -- using kubectl port-forward service/myproj-app-service 8000:8000 -- Every time I hit the endpoint, I get the same pod to respond. It doesn't load balance. I can see that clearly when I kubectl logs -f <pod> to all three pods and only one of them is handling the hits, as the other two are idle...
Is this a kubectl port-forward limitation or issue? or am I missing something greater here?

kubectl port-forward looks up the first Pod from the Service information provided on the command line and forwards directly to a Pod rather than forwarding to the ClusterIP/Service port. The cluster doesn't get a chance to load balance the service like regular service traffic.
The kubernetes API only provides Pod port forward operations (CREATE and GET). Similar API operations don't exist for Service endpoints.
kubectl code
Here's a little bit of the flow from the kubectl code that seems to back that up (I'll just add that Go isn't my primary language)
The portforward.go Complete function is where kubectl portforward does the first look up for a pod from options via AttachablePodForObjectFn:
The AttachablePodForObjectFn is defined as attachablePodForObject in this interface, then here is the attachablePodForObject function.
To my (inexperienced) Go eyes, it appears the attachablePodForObject is the thing kubectl uses to look up a Pod to from a Service defined on the command line.
Then from there on everything deals with filling in the Pod specific PortForwardOptions (which doesn't include a service) and is passed to the kubernetes API.

The reason was that my pods were randomly in a crashing state due to Python *.pyc files that were left in the container. This causes issues when Django is running in a multi-pod Kubernetes deployment. Once I removed this issue and all pods ran successfully, the round-robin started working.

ambassador service stays "pending"

Currently running a fresh "all in one VM" (stacked master/worker approach) kubernetes v1.21.1-00 on Ubuntu Server 20 LTS, using
cri-o as container runtime interface
calico for networking/security
also installed the kubernetes-dashboard (but I guess that's not important for my issue 😉). Taking this guide for installing ambassador: https://www.getambassador.io/docs/edge-stack/latest/topics/install/yaml-install/ I come along the issue that the service is stuck in status "pending".
kubectl get svc -n ambassador prints out the following stuff
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ambassador LoadBalancer 10.97.117.249 <pending> 80:30925/TCP,443:32259/TCP 5h
ambassador-admin ClusterIP 10.101.161.169 <none> 8877/TCP,8005/TCP 5h
ambassador-redis ClusterIP 10.110.32.231 <none> 6379/TCP 5h
quote ClusterIP 10.104.150.137 <none> 80/TCP 5h
While changing the type from LoadBalancer to NodePort in the service sets it up correctly, I'm not sure of the implications coming along. Again, I want to use ambassador as an ingress component here - with my setup (only one machine), "real" loadbalancing might not be necessary.
For covering all the subdomain stuff, I setup a wildcard recording for pointing to my machine, means I got a CNAME for *.k8s.my-domain.com which points to this host. Don't know, if this approach was that smart for setting up an ingress.
Edit: List of events, as requested below:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 116s default-scheduler Successfully assigned ambassador/ambassador-redis-584cd89b45-js5nw to dev-bvpl-099
Normal Pulled 116s kubelet Container image "redis:5.0.1" already present on machine
Normal Created 116s kubelet Created container redis
Normal Started 116s kubelet Started container redis
Additionally, here's the service pending in yaml presenation (exported via kubectl get svc -n ambassador -o yaml ambassador)
apiVersion: v1
kind: Service
metadata:
annotations:
a8r.io/bugs: https://github.com/datawire/ambassador/issues
a8r.io/chat: http://a8r.io/Slack
a8r.io/dependencies: ambassador-redis.ambassador
a8r.io/description: The Ambassador Edge Stack goes beyond traditional API Gateways
and Ingress Controllers with the advanced edge features needed to support developer
self-service and full-cycle development.
a8r.io/documentation: https://www.getambassador.io/docs/edge-stack/latest/
a8r.io/owner: Ambassador Labs
a8r.io/repository: github.com/datawire/ambassador
a8r.io/support: https://www.getambassador.io/about-us/support/
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"a8r.io/bugs":"https://github.com/datawire/ambassador/issues","a8r.io/chat":"http://a8r.io/Slack","a8r.io/dependencies":"ambassador-redis.ambassador","a8r.io/description":"The Ambassador Edge Stack goes beyond traditional API Gateways and Ingress Controllers with the advanced edge features needed to support developer self-service and full-cycle development.","a8r.io/documentation":"https://www.getambassador.io/docs/edge-stack/latest/","a8r.io/owner":"Ambassador Labs","a8r.io/repository":"github.com/datawire/ambassador","a8r.io/support":"https://www.getambassador.io/about-us/support/"},"labels":{"app.kubernetes.io/component":"ambassador-service","product":"aes"},"name":"ambassador","namespace":"ambassador"},"spec":{"ports":[{"name":"http","port":80,"targetPort":8080},{"name":"https","port":443,"targetPort":8443}],"selector":{"service":"ambassador"},"type":"LoadBalancer"}}
creationTimestamp: "2021-05-22T07:18:23Z"
labels:
app.kubernetes.io/component: ambassador-service
product: aes
name: ambassador
namespace: ambassador
resourceVersion: "4986406"
uid: 68e4582c-be6d-460c-909e-dfc0ad84ae7a
spec:
clusterIP: 10.107.194.191
clusterIPs:
- 10.107.194.191
externalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http
nodePort: 32542
port: 80
protocol: TCP
targetPort: 8080
- name: https
nodePort: 32420
port: 443
protocol: TCP
targetPort: 8443
selector:
service: ambassador
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer: {}
EDIT#2: I wonder, if https://stackoverflow.com/a/44112285/667183 applies for my process as well?

Answer is pretty much here: https://serverfault.com/questions/1064313/ambassador-service-stays-pending . After installing a load balancer the whole setup worked. I decided to go with metallb (https://metallb.universe.tf/installation/#installation-by-manifest for installation). I decided to go with the following configuration for a single-node kubernetes cluster:
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 10.16.0.99-10.16.0.99
After a few seconds the load balancer is detected and everything goes fine.

What is IP field in the output of "kubectl describe pod" command

This is my Pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: pod-nginx-container
spec:
containers:
- name: nginx-alpine-container-1
image: nginx:alpine
ports:
- containerPort: 80
Below is output of my "kubectl describe pod" command:
C:\Users\so.user\Desktop\>kubectl describe pod pod-nginx-container
Name: pod-nginx-container
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Mon, 15 Feb 2021 23:44:22 +0530
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.244.0.29
IPs:
IP: 10.244.0.29
Containers:
nginx-alpine-container-1:
Container ID: cri-o://01715e35d3d809bdfe70badd53698d6e26c0022d16ae74f7053134bb03fa73d2
Image: nginx:alpine
Image ID: docker.io/library/nginx#sha256:01747306a7247dbe928db991eab42e4002118bf636dd85b4ffea05dd907e5b66
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 15 Feb 2021 23:44:24 +0530
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-sxlc9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-sxlc9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-sxlc9
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m52s default-scheduler Successfully assigned default/pod-nginx-container to minikube
Normal Pulled 7m51s kubelet Container image "nginx:alpine" already present on machine
Normal Created 7m50s kubelet Created container nginx-alpine-container-1
Normal Started 7m50s kubelet Started container nginx-alpine-container-1
I couldn't understand what is IP address mentioned in "IPs:" field of this output. I am sure this is not my Node's IP, so I am wondering what IP is this. And please note that I have not exposed a Service, infact there is no Service in my Kubernetes cluster, so I not able to figure out this.
Also, how "Port" and "Host Port" are different, from Googling I could understand little bit but if someone can explain with an example then it would be great.
NOTE: I have already Googled "explanation of kubectl describe pod command" and tried searching a lot, but I can't find my answers, so posting this question.

Pods
A pod in Kubernetes is the smallest deployment unit. A pod is a group of one or more containers. The containers in a pod share storage and network resources.
Pod networking
In Kubernetes, each pod is assigned a unique IP address, this IP address is local within the cluster. Containers within the same pod use localhost to communicate with each other. Networking with other pods or services is done with IP networking.
When doing kubectl describe pod <podname> you see the IP address for the pod.
See Pod networking
Application networking in a cluster
A pod is a single instance of an application. You typically run an application as a Deployment with one ore more replicas (instances). When upgrading a Deployment with a new version of your container image, new pods is created - this means that all your instances get new IP addresses.
To keep a stable network address for your application, create a Service - and always use the service name when sending traffic to other applications within the cluster. The traffic addressed to a service is load balanced to the replicas (instances).
Exposing an application outside the cluster
To expose an application to clients outside the cluster, you typically use an Ingress resource - it typically represents a load balancer (e.g. cloud load balancer) with reverse proxy functionality - and route traffic for some specific paths to your services.

Thats pod's ip.
Every Pod gets its own IP address.
When you will create service, the service will internally map to this pod's ip.
If you delete the pod and recreate it again. you will notice a new ip. thats the reason why it is recommended to create service object which will keep track of pod's ip based on label selector.
Found this image online to explain
I dont know about the difference between port and hostport field under containerSpec.

PodIP is the local ip of the pod within the cluster. Each pod gets a dynamic IP allocated to it.
You can see the explanation from kubectl command
kubectl explain po.status.podIP
IP address allocated to the pod. Routable at least within the cluster.
Empty if not yet allocated.

Configuring Istio, Kubernetes and MetalLB to use a Istio LoadBalancer

I’m struggling with the last step of a configuration using MetalLB, Kubernetes, Istio on a bare-metal instance, and that is to have a web page returned from a service to the outside world via an Istio VirtualService route. I’ve just updated the instance to
MetalLB (version 0.7.3)
Kubernetes (version 1.12.2)
Istio (version 1.0.3)
I’ll start with what does work.
All complementary services have been deployed and most are working:
Kubernetes Dashboard on http://localhost:8001
Prometheus Dashboard on http://localhost:10010 (I had something else on 9009)
Envoy Admin on http://localhost:15000
Grafana (Istio Dashboard) on http://localhost:3000
Jaeger on http://localhost:16686
I say most because since the upgrade to Istio 1.0.3 I've lost the telemetry from istio-ingressgateway in the Jaeger dashboard and I'm not sure how to bring it back. I've dropped the pod and re-created to no-avail.
Outside of that, MetalLB and K8S appear to be working fine and the load-balancer is configured correctly (using ARP).
kubectl get svc -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana ClusterIP 10.109.247.149 <none> 3000/TCP 9d
istio-citadel ClusterIP 10.110.129.92 <none> 8060/TCP,9093/TCP 28d
istio-egressgateway ClusterIP 10.99.39.29 <none> 80/TCP,443/TCP 28d
istio-galley ClusterIP 10.98.219.217 <none> 443/TCP,9093/TCP 28d
istio-ingressgateway LoadBalancer 10.108.175.231 192.168.1.191 80:31380/TCP,443:31390/TCP,31400:31400/TCP,15011:30805/TCP,8060:32514/TCP,853:30601/TCP,15030:31159/TCP,15031:31838/TCP 28d
istio-pilot ClusterIP 10.97.248.195 <none> 15010/TCP,15011/TCP,8080/TCP,9093/TCP 28d
istio-policy ClusterIP 10.98.133.209 <none> 9091/TCP,15004/TCP,9093/TCP 28d
istio-sidecar-injector ClusterIP 10.102.158.147 <none> 443/TCP 28d
istio-telemetry ClusterIP 10.103.141.244 <none> 9091/TCP,15004/TCP,9093/TCP,42422/TCP 28d
jaeger-agent ClusterIP None <none> 5775/UDP,6831/UDP,6832/UDP,5778/TCP 27h
jaeger-collector ClusterIP 10.104.66.65 <none> 14267/TCP,14268/TCP,9411/TCP 27h
jaeger-query LoadBalancer 10.97.70.76 192.168.1.193 80:30516/TCP 27h
prometheus ClusterIP 10.105.176.245 <none> 9090/TCP 28d
zipkin ClusterIP None <none> 9411/TCP 27h
I can expose my deployment using:
kubectl expose deployment enrich-dev --type=LoadBalancer --name=enrich-expose
it all works perfectly fine and I can hit the webpage from the external load balanced IP address (I deleted the exposed service after this).
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
enrich-expose LoadBalancer 10.108.43.157 192.168.1.192 31380:30170/TCP 73s
enrich-service ClusterIP 10.98.163.217 <none> 80/TCP 57m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 36d
If I create a K8S Service in the default namespace (I've tried multiple)
apiVersion: v1
kind: Service
metadata:
name: enrich-service
labels:
run: enrich-service
spec:
ports:
- name: http
port: 80
protocol: TCP
selector:
app: enrich
followed by a gateway and a route (VirtualService), the only response I get is a 404 outside of the mesh. You'll see in the gateway I'm using the reserved word mesh but I've tried both that and naming the specific gateway. I've also tried different match prefixes for specific URI and the port you can see below.
Gateway
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: enrich-dev-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
VirtualService
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: enrich-virtualservice
spec:
hosts:
- "enrich-service.default"
gateways:
- mesh
http:
- match:
- port: 80
route:
- destination:
host: enrich-service.default
subset: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: enrich-destination
spec:
host: enrich-service.default
trafficPolicy:
loadBalancer:
simple: LEAST_CONN
subsets:
- name: v1
labels:
app: enrich
I've double checked it's not the DNS playing up because I can go into the shell of the ingress-gateway either via busybox or using the K8S dashboard
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/shell/istio-system/istio-ingressgateway-6bbdd58f8c-glzvx/?namespace=istio-system
and do both an
nslookup enrich-service.default
and
curl -f http://enrich-service.default/
and both work successfully, so I know the ingress-gateway pod can see those. The sidecars are set for auto-injection in both the default namespace and the istio-system namespace.
The logs for the ingress-gateway show the 404:
[2018-11-01T03:07:54.351Z] "GET /metadataHTTP/1.1" 404 - 0 0 1 - "192.168.1.90" "curl/7.58.0" "6c1796be-0791-4a07-ac0a-5fb07bc3818c" "enrich-service.default" "-" - - 192.168.224.168:80 192.168.1.90:43500
[2018-11-01T03:26:39.339Z] "GET /HTTP/1.1" 404 - 0 0 1 - "192.168.1.90" "curl/7.58.0" "ed956af4-77b0-46e6-bd26-c153e29837d7" "enrich-service.default" "-" - - 192.168.224.168:80 192.168.1.90:53960
192.168.224.168:80 is the IP address of the gateway.
192.168.1.90:53960 is the IP address of my external client.
Any suggestions, I've tried hitting this from multiple angles for a couple of days now and I feel I'm just missing something simple. Suggested logs to look at perhaps?

Just to close this question out for the solution to the problem in my instance. The mistake in configuration started all the way back in the Kubernetes cluster initialisation. I had applied:
kubeadm init --pod-network-cidr=n.n.n.n/n --apiserver-advertise-address 0.0.0.0
the pod-network-cidr using the same address range as the local LAN on which the Kubernetes installation was deployed i.e. the desktop for the Ubuntu host used the same IP subnet as what I'd assigned the container network.
For the most part, everything operated fine as detailed above, until the Istio proxy was trying to route packets from an external load-balancer IP address to an internal IP address which happened to be on the same subnet. Project Calico with Kubernetes seemed to be able to cope with it as that's effectively Layer 3/4 policy but Istio had a problem with it a L7 (even though it was sitting on Calico underneath).
The solution was to tear down my entire Kubernetes deployment. I was paranoid and went so far as to uninstall Kubernetes and deploy again and redeploy with a pod network in the 172 range which wasn't anything to do with my local lan. I also made the same changes in the Project Calico configuration file to match pod networks. After that change, everything worked as expected.
I suspect that in a more public configuration where your cluster was directly attached to a BGP router as opposed to using MetalLB with an L2 configuration as a subset of your LAN wouldn't exhibit this issue either. I've documented it more in this post:
Microservices: .Net, Linux, Kubernetes and Istio make a powerful combination

How to troubleshoot why the Endpoints in my service don't get updated?

I have a Kubernetes cluster running on the Google Kubernetes Engine.
I have a deployment that I manually (by editing the hpa object) scaled up from 100 replicas to 300 replicas to do some load testing. When I was load testing the deployment by sending HTTP requests to the service, it seemed that not all pods were getting an equal amount of traffic, only around 100 pods were showing that they were processing traffic (by looking at their CPU-load, and our custom metrics). So my suspicion was that the service is not load balancing the requests among all the pods equally.
If I checked the deployment, I could see that all 300 replicas were ready.
$ k get deploy my-app --show-labels
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE LABELS
my-app 300 300 300 300 21d app=my-app
On the other hand, when I checked the service, I saw this:
$ k describe svc my-app
Name: my-app
Namespace: production
Labels: app=my-app
Selector: app=my-app
Type: ClusterIP
IP: 10.40.9.201
Port: http 80/TCP
TargetPort: http/TCP
Endpoints: 10.36.0.5:80,10.36.1.5:80,10.36.100.5:80 + 114 more...
Port: https 443/TCP
TargetPort: https/TCP
Endpoints: 10.36.0.5:443,10.36.1.5:443,10.36.100.5:443 + 114 more...
Session Affinity: None
Events: <none>
What was strange to me is this part
Endpoints: 10.36.0.5:80,10.36.1.5:80,10.36.100.5:80 + 114 more...
I was expecting to see 300 endpoints there, is that assumption correct?
(I also found this post, which is about a similar issue, but there the author was experiencing only a few minutes delay until the endpoints were updated, but for me it didn't change even in half an hour.)
How could I troubleshoot what was going wrong? I read that this is done by the Endpoints controller, but I couldn't find any info about where to check its logs.
Update: We managed to reproduce this a couple more times. Sometimes it was less severe, for example 381 endpoints instead of 445. One interesting thing we noticed is that if we retrieved the details of the endpoints:
$ k describe endpoints my-app
Name: my-app
Namespace: production
Labels: app=my-app
Annotations: <none>
Subsets:
Addresses: 10.36.0.5,10.36.1.5,10.36.10.5,...
NotReadyAddresses: 10.36.199.5,10.36.209.5,10.36.239.2,...
Then a bunch of IPs were "stuck" in the NotReadyAddresses state (not the ones that were "missing" from the service though, if I summed the number of IPs in Addresses and NotReadyAddresses, that was still less than the total number of ready pods). Although I don't know if this is related at all, I couldn't find much info online about this NotReadyAddresses field.

It turned out that this is caused by using preemptible VMs in our node pools, it doesn't happen if the nodes are not preemtibles.
We couldn't figure out more details of the root cause, but using preemtibles as the nodes is not an officially supported scenario anyway, so we switched to regular VMs.

Pod IPs can be added to NotReadyAddresses if a health/readiness probe is failing. This will in turn cause the pod IP to fail to be automatically added to the endpoints, meaning that the kubernetes service can't connect to the pod.

I refer to your first try with 300 pods.
I would check the following:
kubectl get po -l app=my-app to see if you get a 300 item list. Your service says you have 300 available pods, which makes your issue very interesting to analyze.
whether your pod/deployment manifest defined limit and request resources. This better helps scheduler.
whether some of your nodes have taints incompatible with your pod/deployment manifest
whether your pod/deploy manifest has liveness and readyness probes (please post them)
whether you defined some resourceQuota object, which limit the creation of pods/deployments

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse