I'm looking to configure Redis for Sidekiq and Rails in k8s. Using Google Cloud Memory Store with an IP address.
I have a helm template like the following (with gcpRedisMemorystore specified separately) - My question is what does the Service object add to the system? Is it necessary or does the Endpoint provide all the needed access?
charts/app/templates/app-memorystore.service.yaml
kind: Service
apiVersion: v1
metadata:
name: app-memorystore
spec:
type: ClusterIP
clusterIP: None
ports:
- name: redis
port: {{ .Values.gcpredis.port }}
protocol: TCP
---
kind: Endpoints
apiVersion: v1
metadata:
name: app-memorystore
subsets:
- addresses:
- ip: {{ .Values.gcpredis.ip }}
ports:
- port: {{ .Values.gcpredis.port }}
name: redis
protocol: TCP
Yes, you still need it.
Generally speaking, the Service is the name which is consumed by applications to connect to an Endpoint. Usually, a Service with a selector will automatically create a corresponding endpoint with the IP addresses of the Pods found by the selector.
When you define a Service without a selector you need to give the corresponding Endpoint of the same name so the Service has somewhere to go. This bit of information is in documentation but a bit buried. At https://kubernetes.io/docs/concepts/services-networking/service/#without-selectors it is mentioned in the second bullet point for headless services without selectors:
For headless services that do not define selectors, the endpoints controller does not create Endpoints records. However, the DNS system looks for and configures either:
CNAME records for ExternalName-type services.
A records for any Endpoints that share a name with the service, for all other types.
Related
I am naive in Kubernetes world. I was going through a interesting concept called headless service.
I have read it, understand it, and I can create headless service. But I am still not convinced about use cases. Like why do we need it. There are already three types of service clusterIP, NodePort and loadbalancer service with their separate use cases.
Could you please tell me what is exactly which headless service solve and all those other three services could not solve it.
I have read it that headless is mainly used with the application which is stateful like dB based pod for example cassandra, MongoDB etc. But my question is why?
A headless service doesn't provide any sort of proxy or load balancing -- it simply provides a mechanism by which clients can look up the ip address of pods. This means that when they connect to your service, they're connecting directly to the pods; there's no intervening proxy.
Consider a situation in which you have a service that matches three pods; e.g., I have this Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: example
name: example
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- image: docker.io/traefik/whoami:latest
name: whoami
ports:
- containerPort: 80
name: http
If I'm using a typical ClusterIP type service, like this:
apiVersion: v1
kind: Service
metadata:
labels:
app: example
name: example
spec:
ports:
- name: http
port: 80
targetPort: http
selector:
app: example
Then when I look up the service in a client pod, I will see the ip address of the service proxy:
/ # host example
example.default.svc.cluster.local has address 10.96.114.63
However, when using a headless service, like this:
apiVersion: v1
kind: Service
metadata:
labels:
app: example
name: example-headless
spec:
clusterIP: None
ports:
- name: http
port: 80
targetPort: http
selector:
app: example
I will instead see the addresses of the pods:
/ # host example-headless
example-headless.default.svc.cluster.local has address 10.244.0.25
example-headless.default.svc.cluster.local has address 10.244.0.24
example-headless.default.svc.cluster.local has address 10.244.0.23
By removing the proxy from the equation, clients are aware of the actual pod ips, which may be important for some applications. This also simplifies the path between clients and the service, which may have performance benefits.
Kubernetes Services of type ClusterIP, NodePort and LoadBalancer have one thing in common: They loadbalance between all pods that match the service's selector, so you can talk to all pods via a virtual ip.
That's nice because you can have multiple pods of the same application and all requests will be spread between those, avoiding overloading one pod while others are still idle.
For some applications you might require to talk to the pods directly instead of over an virtual ip. Still you'll want a stable hostname that points to the same pod, even if the pod's ip changes, e.g. because it needs to be rescheduled on a different host.
Use cases are mostly databases where applications should always connect to the same instance for data and session consistency.
A headless service does not provide a virtual IP covering all the endpoints in its endpoint slice. If you query that service DNS, you will get all the endpoint IP addresses back. A regular service on the other hand will have sort of a virtual IP, so clients connecting to it are not aware of the underlying endpoints.
To answer your question, why this is important for a stateful. Usually, members of a statefulset need to be aware of each other. Let's say you want to run a distributed database in something like a cluster formation. The members of this cluster need to have a way to discover each other. This is where the headless service comes into play. Because the individual database pods can get the address of the other database pods by using the headless service.
The headless service also provides a stable network identify for each member of the statefulset. You can read about that here. This is, again, useful as the members of the statefulset need to coordinate with each other in some way. Let's say the pod-0 will always take the initial leader role, and every other member knows it should report itself to pod-0 to form a cluster like configuration.
I used Cloud Foundry a lot previously, when an app is bind with a service, all the service connection info will be injected into app's environment variables. In Kubernetes world, I think this is same for normal service.
For me, I try to use headless service to describe an external PostgreSQL using below service yaml.
---
kind: "Service"
apiVersion: "v1"
metadata:
name: "postgresql"
spec:
clusterIP: None
ports:
- protocol: "TCP"
port: 5432
targetPort: 5432
nodePort: 0
---
kind: "Endpoints"
apiVersion: "v1"
metadata:
name: "postgresql"
subsets:
-
addresses:
- ip: "10.29.0.123"
ports:
- port: 5432
After deploy the headless service to cluster, the container does not has any environment variables for that, I guess it is because the ClusterIP = None.
The apps can use postgresql:5432 to access by DNS, but I just wonder why Kubernetes does not inject the headless service and its endpoints into the app's environment variable, so the app can get both ip and port from it?
Is there any way to do so?
Thanks!
The Kube-proxy does not manage HeadLess Service, a request made to theses service is only forwarded to the it.
Kubernetes does not really aknowledge theses endpoints (cf https://kubernetes.io/docs/concepts/services-networking/service/#headless-services).
To pass the IP of your postgreSQL DB, you will have to add a environment variable in your deployment, like this:
env:
- name: POSTGRESQL_ADDR
value: "10.29.0.123:5432"
I found the answer to the question. For a headless service, the service info will not be shown in pod's environment variables. If service info is to be available in the environment var, you need to use the service without selectors, simply remove the "clusterIP: None".
The client pod can use both DNS and environment var for external service discovery.
I have very specific case when my Pod should access to another LoadBalancer service via an ExternalIP.
Is there any way to assign LoadBalancer ExternalIP as an ENV variable to Deployment.yaml?
Thank you in advance!
I don't think this is directly possible in any of the standard templating tools. Part of the problem is that creating a cloud-hosted load balancer is an asynchronous operation, so that external-IP value won't be available until some time after kubectl apply (or the equivalent helm install) has finished.
If you can create the Service in advance then you can hard-code its external IP address or host name into other configuration, but this is intrinsically two steps. (If you're bought into Kubernetes operators, this should be possible with custom code: watch the Service, and once it gets its external address, create a corresponding ConfigMap that holds the address.)
Depending on your specific use case it may also work to just target the LoadBalancer Service within your cluster, the same as any other Service. This won't go out through the cloud provider's load-balancer tier, but it should be indistinguishable otherwise.
I found the way how to do it but #David Maze was perfectly right - there is no straight way how to do it.
So, my solution to add DNS with public and private zones:
apiVersion: v1
kind: Service
metadata:
name: nginx-lb
labels:
app.kubernetes.io/name: nginx-lb
annotations:
external-dns.alpha.kubernetes.io/hostname: mycoolservice.{{ .Values.dns_external_zone }}.
external-dns.alpha.kubernetes.io/zone-type: public,private
external-dns.alpha.kubernetes.io/ttl: "1"
spec:
type: LoadBalancer
ports:
- name: https
port: 443
targetPort: https
- name: http
port: 80
targetPort: http
selector:
app.kubernetes.io/name: nginx
Gist
I have a ConfigMap which provides necessary environment variables to my pods:
apiVersion: v1
kind: ConfigMap
metadata:
name: global-config
data:
NODE_ENV: prod
LEVEL: info
# I need to set API_URL to the public IP address of the Load Balancer
API_URL: http://<SOME IP>:3000
DATABASE_URL: mongodb://database:27017
SOME_SERVICE_HOST: some-service:3000
I am running my Kubernetes Cluster on Google Cloud, so it will automatically create a public endpoint for my service:
apiVersion: v1
kind: Service
metadata:
name: gateway
spec:
selector:
app: gateway
ports:
- name: http
port: 3000
targetPort: 3000
nodePort: 30000
type: LoadBalancer
Issue
I have an web application that needs to make HTTP requests from the client's browser to the gateway service. But in order to make a request to the external service, the web app needs to know it's ip address.
So I've set up the pod, which serves the web application in a way, that it picks up an environment variable "API_URL" and as a result makes all HTTP requests to this url.
So I just need a way to set the API_URL environment variable to the public IP address of the gateway service to pass it into a pod when it starts.
I know this isn't the exact approach you were going for, but I've found that creating a static IP address and explicitly passing it in tends to be easier to work with.
First, create a static IP address:
gcloud compute addresses create gke-ip --region <region>
where region is the GCP region your GKE cluster is located in.
Then you can get your new IP address with:
gcloud compute addresses describe gke-ip --region <region>
Now you can add your static IP address to your service by specifying an explicit loadBalancerIP.1
apiVersion: v1
kind: Service
metadata:
name: gateway
spec:
selector:
app: gateway
ports:
- name: http
port: 3000
targetPort: 3000
nodePort: 30000
type: LoadBalancer
loadBalancerIP: "1.2.3.4"
At this point, you can also hard-code it into your ConfigMap and not worry about grabbing the value from the cluster itself.
1If you've already created a LoadBalancer with an auto-assigned IP address, setting an IP address won't change the IP of the underlying GCP load balancer. Instead, you should delete the LoadBalancer service in your cluster, wait ~15 minutes for the underlying GCP resources to get cleaned up, and then recreate the LoadBalancer with the explicit IP address.
You are trying to access gateway service from client's browser.
I would like to suggest you another solution that is slightly different from what you are currently trying to achieve
but it can solve your problem.
From your question I was able to deduce that your web app and gateway app are on the same cluster.
In my solution you dont need a service of type LoadBalancer and basic Ingress is enough to make it work.
You only need to create a Service object (notice that option type: LoadBalancer is now gone)
apiVersion: v1
kind: Service
metadata:
name: gateway
spec:
selector:
app: gateway
ports:
- name: http
port: 3000
targetPort: 3000
nodePort: 30000
and you alse need an ingress object (remember that na Ingress Controller needs to be deployed to cluster in order to make it work) like one below:
More on how to deploy Nginx Ingress controller you can finde here
and if you are already using one (maybe different one) then you can skip this step.
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: gateway-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: gateway.foo.bar.com
http:
paths:
- path: /
backend:
serviceName: gateway
servicePort: 3000
Notice the host field.
The same you need to repeat for your web application. Remember to use appropriate host name (DNS name)
e.g. for web app: foo.bar.com and for gateway: gateway.foo.bar.com
and then just use the gateway.foo.bar.com dns name to connect to the gateway app from clients web browser.
You also need to create a dns entry that points *.foo.bar.com to Ingress's public ip address
as Ingress controller will create its own load balancer.
The flow of traffic would be like below:
+-------------+ +---------+ +-----------------+ +---------------------+
| Web Browser |-->| Ingress |-->| gateway Service |-->| gateway application |
+-------------+ +---------+ +-----------------+ +---------------------+
This approach is better becaues it won't cause issues with Cross-Origin Resource Sharing (CORS) in clients browser.
Examples of Ingress and Service manifests I took from official kubernetes documentation and modified slightly.
More on Ingress you can find here
and on Services here
The following deployment reads the external IP of a given service using kubectl every 10 seconds and patches a given configmap with it:
apiVersion: apps/v1
kind: Deployment
metadata:
name: configmap-updater
labels:
app: configmap-updater
spec:
selector:
matchLabels:
app: configmap-updater
template:
metadata:
labels:
app: configmap-updater
spec:
containers:
- name: configmap-updater
image: alpine:3.10
command: ['sh', '-c' ]
args:
- | #!/bin/sh
set -x
apk --update add curl
curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.16.0/bin/linux/amd64/kubectl
chmod +x kubectl
export CONFIGMAP="configmap/global-config"
export SERVICE="service/gateway"
while true
do
IP=`./kubectl get services $CONFIGMAP -o go-template --template='{{ (index .status.loadBalancer.ingress 0).ip }}'`
PATCH=`printf '{"data":{"API_URL": "https://%s:3000"}}' $IP`
echo ${PATCH}
./kubectl patch --type=merge -p "${PATCH}" $SERVICE
sleep 10
done
You probably have RBAC enabled in your GKE cluster and would still need to create the appropriate Role and RoleBinding for this to work correctly.
You've got a few possibilities:
If you really need this to be hacked into your setup, you could use a similar approach with a sidecar container in your pod or a global service like above. Keep in mind that you would need to recreate your pods if the configmap actually changed for the changes to be picked up by the environment variables of your containers.
Watch and query the Kubernetes-API for the external IP directly in your application, eliminating the need for an environment variable.
Adopt your applications to not directly depend on the external IP.
I am running Kubernetes on "Docker Desktop" in Windows.
I have a LoadBalancer Service for a deployment which has 3 replicas.
I would like to access SPECIFIC pod through some means (such as via URL path : < serviceIP >:8090/pod1).
Is there any way to achieve this usecase?
deployment.yaml :
apiVersion: v1
kind: Service
metadata:
name: my-service1
labels:
app: stream
spec:
ports:
- port: 8090
targetPort: 8090
name: port8090
selector:
app: stream
# clusterIP: None
type: LoadBalancer
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: stream-deployment
labels:
app: stream
spec:
replicas: 3
selector:
matchLabels:
app: stream
strategy:
type: Recreate
template:
metadata:
labels:
app: stream
spec:
containers:
- image: stream-server-mock:latest
name: stream-server-mock
imagePullPolicy: Never
env:
- name: STREAMER_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: STREAMER_ADDRESS
value: stream-server-mock:8090
ports:
- containerPort: 8090
My end goal is to achieve horizontal auto-scaling of pods.
How Application designed/and works as of now (without kubernetes) :
There are 3 components : REST-Server, Stream-Server (3 instances
locally on different JVM on different ports), and RabbitMQ.
1 - The client sends a request to "REST-Server" for a stream url.
2 - The REST-Server puts in the RabbitMQ queue.
3 - One of the Stream-Server picks it up and populates its IP and sends back to REST-Server through RabbitMQ.
4 - The client receives the IP and establishes a direct WS connection using the IP.
The Problem what I face is :
1 - When the client requests for a stream IP, one of the pods (lets say POD1) picks it up and sends its URL (which is service URL, comes through LoadBalancer Service).
2 - Next time when the client tries to connect (WebSocket Connection) using the Service IP, it wont be the same pod which accepted the request.
It should be the same pod which accepted the request, and must be accessible by the client.
You can use StatefulSets if you are not required to use deployment.
For replica 3, you will have 3 pods named
stream-deployment-0
stream-deployment-1
stream-deployment-2
You can access each pod as $(podname).$(service name).$(namespace).svc.cluster.local
For details, check this
You may also want to set up an ingress to point each pod from outside of the cluster.
As mentioned by aerokite, you can use StatefulSets. However, if you don't want to modify your deployments, you can simply use Headless Services. As specified in the documentation:
For headless Services, a cluster IP is not allocated.
For headless Services that define selectors, the endpoints controller
creates Endpoints records in the API, and modifies the DNS
configuration to return records (addresses) that point directly to the
Pods backing the Service.
This means that whenever you query the DNS name for your Service (i.e. my-svc.my-namespace.svc.cluster-domain.example), what you get is a list of all the Pod IPs (unlike regular services where you get the cluster IP). You can then select your Pods using your own mechanisms.
Regarding your new question, if that is your only issue, you can use session affinity. If you set service.spec.sessionAffinity to ClientIP, then connections from a particular client will always go to the same Pod each time. You don't need other modifications like the headless Services mentioned above.
IMO, the only way to achieve this will be:
Instead of using a deployment with 3 replicas, use 3 deployments with 1 replicas each (or just create pods only); deployment1 -> pod1, deployment2 -> pod2, deployment3 -> pod3
Expose all the deployments on a separate service, service1 -> deployment1, service2 -> deployment2, service3 -> deployment3
Create an ingress resource and route to each pod using the service for each deployment. For example:
ingress-url/service1
ingress-url/service2
ingress-url/service3