API auth error connecting Prometheus to Kubernetes (Openshift Origin) - kubernetes

I have a Kubernetes cluster (Openshift Origin v3.6) and Prometheus (v1.8.1) running in two separate servers. I am trying to monitor Kubernetes with Prometheus.
I got the default service account token and put it on /etc/prometheus/token.
oc sa get-token default
Then added this to Prometheus configuration file:
...
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
api_server: 'https://my.kubernetes.master:8443'
scheme: https
bearer_token_file: /etc/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
...
After restarting prometheus, I see following error log repeating over and over again:
Nov 23 22:43:05 prometheus prometheus[17830]: time="2017-11-23T22:43:05Z" level=error msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:183: Failed to list *v1.Pod: User "system:anonymous" cannot list all pods in the cluster" component="kube_client_runtime" source="kubernetes.go:76"
I found this here:
If no access token or certificate is presented, the authentication layer assigns the system:anonymous virtual user and the system:unauthenticated virtual group to the request. This allows the authorization layer to determine which requests, if any, an anonymous user is allowed to make.
I believe my configuration is wrong somewhere, and Prometheus is not using the token to authenticate.
So, what's wrong with my setup and how could I fix it?. Thanks in advance.

Let's begin with Authentication, As you have provided Prometheus with the default service account token which means It's authenticated normally. API Server knows who it is.
Now, Authorization is giving you the problem here.As you can see here
"system:anonymous" cannot list all pods in the cluster"
It means authenticated service account does not have capability or permission to execute this operation therefore, you can not do it.
Solution to your problem
Check if there is a suitable clusterRole available for Prometheus. As
Prometheus needs to have cluster-wide permission to operate its task.
If not create a clusterRole.
Check if there is a clusterRoleBinding for default Service Account. If not create a clusterRoleBinding.
I have attached a link for further reading RBAC

Related

Istio enabled GKE cluster not reliably communicating with Google Service Infrastructure APIs

I have been unable to reliably allow my istio enabled Google Kubernetes Engine cluster to connect to Google Cloud Endpoints (service management API) via the extensible service proxy. When I deploy my Pods the proxy will always fail to startup causing the Pod to be restarted, and output the following error:
INFO:Fetching an access token from the metadata service
WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fea4abece90>: Failed to establish a new connection: [Errno 111] Connection refused',)': /computeMetadata/v1/instance/service-accounts/default/token
ERROR:Failed fetching metadata attribute: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
However after restarting, the proxy reports everything is fine, it was able to grab an access token and I am able to make requests to the Pod successfully:
INFO:Fetching an access token from the metadata service
INFO:Fetching the service config ID from the rollouts service
INFO:Fetching the service configuration from the service management service
INFO:Attribute zone: europe-west2-a
INFO:Attribute project_id: my-project
INFO:Attribute kube_env: KUBE_ENV
nginx: [warn] Using trusted CA certificates file: /etc/nginx/trusted-ca-certificates.crt
10.154.0.5 - - [23/May/2020:21:19:36 +0000] "GET /domains HTTP/1.1" 200 221 "-" "curl/7.58.0"
After about an hour, presumably because the access token has expired, the proxy logs indicate that it was again unable to fetch an access token and I can no longer make requests to my Pod.
2020/05/23 22:14:04 [error] 9#9: upstream timed out (110: Connection timed out)
2020/05/23 22:14:04[error]9#9: Failed to fetch service account token
2020/05/23 22:14:04[error]9#9: Fetch access token unexpected status: INTERNAL: Failed to fetch service account token
I have in place a ServiceEntry resource that should be allowing the proxy to make requests to the metadata server on the GKE node:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: google-metadata-server
spec:
hosts:
- metadata.google.internal # GCE metadata server
addresses:
- 169.254.169.254 # GCE metadata server
location: MESH_EXTERNAL
ports:
- name: http
number: 80
protocol: HTTP
- name: https
number: 443
protocol: HTTPS
I have confirmed this is working by execing into one of the containers and running:
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
How can I prevent this behaviour and reliably have the proxy communicate with the Google Service Infrastructure APIs?
Although I am not entirely convinced this is the solution it appears that using a dedicated service account to generate access tokens within the extensible service proxy container prevents the behaviour reported above from happening, and I am able to reliably make requests to the proxy and upstream service, even after an hour.
The service account I am using has the following roles:
roles/cloudtrace.agent
roles/servicemanagement.serviceController
Assuming this is a stable solution to the problem I am much happier with this as an outcome because I am not 100% comfortable using the metadata server since it relies on the service account associated with the GKE node. This service account is often more powerful that it needs to be for ESP to do its job.
I will however be continuing to monitor this just in case the proxy upstream becomes unreachable again.

How do I forward headers to different services in Kubernetes (Istio)

I have a sample application (web-app, backend-1, backend-2) deployed on minikube all under a JWT policy, and they all have proper destination rules, Istio sidecar and MTLS enabled in order to secure the east-west traffic.
apiVersion: authentication.istio.io/v1alpha1
kind: Policy
metadata:
name: oidc
spec:
targets:
- name: web-app
- name: backend-1
- name: backend-2
peers:
- mtls: {}
origins:
- jwt:
issuer: "http://myurl/auth/realms/test"
jwksUri: "http://myurl/auth/realms/test/protocol/openid-connect/certs"
principalBinding: USE_ORIGIN
When I run the following command I receive a 401 unauthorized response when requesting the data from the backend, which is due to $TOKEN not being forwarded to backend-1 and backend-2 headers during the http request.
$> curl http://minikubeip/api "Authorization: Bearer $TOKEN"
Is there a way to forward http headers to backend-1 and backend-2 using native kubernetes/istio? Am I forced to make application code changes to accomplish this?
Edit:
This is the error I get after applying my oidc policy. When I curl web-app with the auth token I get
{"errors":[{"code":"APP_ERROR_CODE","message":"401 Unauthorized"}
Note that when I curl backend-1 or backend-2 with the same auth-token I get the appropriate data. Also, there is no other destination rule/policy applied to these services currently, policy enforcement is on, and my istio version is 1.1.15.
This is the policy I am applying:
apiVersion: authentication.istio.io/v1alpha1
kind: Policy
metadata:
name: default
namespace: default
spec:
# peers:
# - mtls: {}
origins:
- jwt:
issuer: "http://10.148.199.140:8080/auth/realms/test"
jwksUri: "http://10.148.199.140:8080/auth/realms/test/protocol/openid-connect/certs"
principalBinding: USE_ORIGIN
should the token be propagated to backend-1 and backend-2 without any other changes?
Yes, policy should transfer token to both backend-1 and backend-2
There is a github issue , where users had same issue like You
A few informations from there:
The JWT is verified by an Envoy filter, so you'll have to check the Envoy logs. For the code, see https://github.com/istio/proxy/tree/master/src/envoy/http/jwt_auth
Pilot retrieves the JWKS to be used by the filter (it is inlined into the Envoy config), you can find the code for that in pilot/pkg/security
And another problem with that in stackoverflow
where accepted answer is:
The problem was resolved with two options: 1. Replace Service Name and port by external server ip and external port (for issuer and jwksUri) 2. Disable the usage of mTLS and its policy (Known issue: https://github.com/istio/istio/issues/10062).
From istio documentation
For each service, Istio applies the narrowest matching policy. The order is: service-specific > namespace-wide > mesh-wide. If more than one service-specific policy matches a service, Istio selects one of them at random. Operators must avoid such conflicts when configuring their policies.
To enforce uniqueness for mesh-wide and namespace-wide policies, Istio accepts only one authentication policy per mesh and one authentication policy per namespace. Istio also requires mesh-wide and namespace-wide policies to have the specific name default.
If a service has no matching policies, both transport authentication and origin authentication are disabled.
Istio supports header propagation. Probably didn't support when this thread was created.
You can allow the original header to be forwarded by using forwardOriginalToken: true in JWTRules or forward a valid JWT payload using outputPayloadToHeader in JWTRules.
Reference: ISTIO JWTRule documentation

Is there a way to apply authentication to every service apart from a single specific service in Istio?

I have the following policy, which integrates with our Auth0 account:
apiVersion: authentication.istio.io/v1alpha1
kind: Policy
metadata:
name: auth-policy
spec:
targets:
- name: my-service
origins:
- jwt:
issuer: "https://<redacted>.eu.auth0.com/"
jwksUri: "https://<redacted>.eu.auth0.com/.well-known/jwks.json"
principalBinding: USE_ORIGIN
which applies our Auth0 config to everything, but I want to disable this for a single service. I suppose I want to set targets as 'everything except x'.
Is there a way of doing this?
On Policy's spec.targets you only specified one service name my-service which applies only to one service with this name only. On Target selector docs you can find information that using this way you need to specify each service name but you can specify also port number. There also mentioned about 3 types of policies: Mesh-wide, Namespace-wide and Service-specific.
Regarding solution for your question I can advise you 3 options:
1. Separate namespaces for enabled and disabled services
You can place your services which should be authenticate to another namespace and enable namespace-wide policy. Another variation of this option would be to move services which should not be authenticate to different namespace and enable authentication in default namespace.
2. Namespace-wide policy with individual service
It is similar to previous but instead of creating new namespace-wide policy you just need to create a service-specific policy which will disable authentication in one service.
It should work as the priority order is
service-specific > namespace-wide > mesh-wide
3. MeshPolicy with Individual service
If you want to keep all services in one namespace you can use MeshPolicy to apply this policy to all services in all namespaces in mesh and then create another service-specific policy to disable authentication. As before mentioned service-specific has higher priority than MeshPolicy.
Hope it will help.

openshift Crash Loop Back Off error with turbine-server

Hi I created a project in Openshift and attempted to add a turbine-server image to it. A Pod was added but I keep receiving the following error in the logs. I am very new to OpenShift and i would appreciate any advice or suggestions as to how to resolve this error. I can supply either further information that is required.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/booking/pods/turbine-server-2-q7v8l . Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked..
How to diagnose
Make sure you have configured a service account, role, and role binding to the account. Make sure the service account is set to the pod spec.
spec:
serviceAccountName: your-service-account
Start monitoring atomic-openshift-node service on the node the pod is deployed and the API server.
$ journalctl -b -f -u atomic-openshift-node
Run the pod and monitor the journald output. You would see "Forbidden".
Jan 28 18:27:38 <hostname> atomic-openshift-node[64298]:
logging error output: "Forbidden (user=system:serviceaccount:logging:appuser, verb=get, resource=nodes, subresource=proxy)"
This means the service account appuser doest not have the authorisation to do get on the nodes/proxy resource. Then update the role to be able to allow the verb "get" on the resource.
- apiGroups: [""]
resources:
- "nodes"
- "nodes/status"
- "nodes/log"
- "nodes/metrics"
- "nodes/proxy" <----
- "nodes/spec"
- "nodes/stats"
- "namespaces"
- "events"
- "services"
- "pods"
- "pods/status"
verbs: ["get", "list", "view"]
Note that some resources are not default legacy "" group as in Unable to list deployments resources using RBAC.
How to verify the authorisations
To verify who can execute the verb against the resource, for example patch verb against pod.
$ oadm policy who-can patch pod
Namespace: default
Verb: patch
Resource: pods
Users: auser
system:admin
system:serviceaccount:cicd:jenkins
Groups: system:cluster-admins
system:masters
OpenShift vs K8S
OpenShift has command oc policy or oadm policy:
oc policy add-role-to-user <role> <user-name>
oadm policy add-cluster-role-to-user <role> <user-name>
This is the same with K8S role binding. You can use K8S RBAC but the API version in OpenShift needs to be v1 instead of rbac.authorization.k8s.io/v1 in K8s.
References
Managing Authorization Policies
Using RBAC Authorization
User and Role Management
Hi thank you for the replies - I was able to resolve the issue by executing the following commands using the oc command line utility:
oc policy add-role-to-group view system:serviceaccounts -n <project>
oc policy add-role-to-group edit system:serviceaccounts -n <project>

forbidden returned when mounting the default tokens in HA kubernetes cluster

I have a problem with mounting the default tokens in kubernetes it no longer works with me, I wanted to ask directly before creating an issue on Github, so my setup consists of basically a HA bare metal cluster with manually deployed etcd (which includes certs ca, keys).The deployments run the nodes register, I just cannot deploy pods, always giving the error:
MountVolume.SetUp failed for volume "default-token-ddj5s" : secrets "default-token-ddj5s" is forbidden: User "system:node:tweak-node-1" cannot get secrets in the namespace "default": no path found to object
where tweak-node-1 is one of my node names and hostnames, I have found some similar issues:
- https://github.com/kubernetes/kubernetes/issues/18239
- https://github.com/kubernetes/kubernetes/issues/25828
but none came close to fixing my issue as the issue was not the same.I only use default namespaces when trying to run pods and tried setting both RBAC ABAC, both gave the same result, this is a template I use for deploying showing version an etcd config:
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
advertiseAddress: IP1
bindPort: 6443
authorizationMode: ABAC
kubernetesVersion: 1.8.5
etcd:
endpoints:
- https://IP1:2379
- https://IP2:2379
- https://IP3:2379
caFile: /opt/cfg/etcd/pki/etcd-ca.crt
certFile: /opt/cfg/etcd/pki/etcd.crt
keyFile: /opt/cfg/etcd/pki/etcd.key
dataDir: /var/lib/etcd
etcdVersion: v3.2.9
networking:
podSubnet: 10.244.0.0/16
apiServerCertSANs:
- IP1
- IP2
- IP3
- DNS-NAME1
- DNS-NAME2
- DNS-NAME3
Your node must use credentials that match its Node API object name, as described in https://kubernetes.io/docs/admin/authorization/node/#overview
In order to be authorized by the Node authorizer, kubelets must use a credential that identifies them as being in the system:nodes group, with a username of system:node:. This group and user name format match the identity created for each kubelet as part of kubelet TLS bootstrapping.
update
So the specific solution, the problem was because I was using version 1.8.x and was copying the certs and keys manually each kubelet didn't have its own system:node binding or specific key as specified in https://kubernetes.io/docs/admin/authorization/node/#overview:
RBAC Node Permissions In 1.8, the binding will not be created at all.
When using RBAC, the system:node cluster role will continue to be
created, for compatibility with deployment methods that bind other
users or groups to that role.
I fixed using either two ways :
1 - Using kubeadm join instead of copying the /etc/kubernetes file from master1
2 - after deployment patching the clusterrolebinding for system:node
kubectl patch clusterrolebinding system:node -p '{"apiVersion":
"rbac.authorization.k8s.io/v1beta1","kind":
"ClusterRoleBinding","metadata": {"name": "system:node"},"subjects":
[{"kind": "Group","name": "system:nodes"}]}'