GKE Config Connector StorageBucket resource times out on kubectl apply - kubernetes

I'm trying to apply the following StorageBucket resource from Google's sample manifest:
apiVersion: storage.cnrm.cloud.google.com/v1alpha2
kind: StorageBucket
metadata:
labels:
label-one: "value-one"
name: dmacthedestroyer-hdjkwhekhjewkjeh-storagebucket-sample
spec:
lifecycleRule:
- action:
type: Delete
condition:
age: 7
versioning:
enabled: true
cors:
- origin: ["http://example.appspot.com"]
responseHeader: ["Content-Type"]
method: ["GET", "HEAD", "DELETE"]
maxAgeSeconds: 3600
The response times out with the following errors:
$ kubectl apply -f sample.yaml
Error from server (Timeout): error when creating "sample.yaml": Timeout: request did not complete within requested timeout 30s
UPDATE:
For some unknown reason, the error message has changed to this:
Error from server (InternalError): error when creating "sample.yaml": Internal error occurred: failed calling webhook "cnrm-deny-unknown-fields-webhook.google.com": Post https://cnrm-validating-webhook-service.cnrm-system.svc:443/deny-unknown-fields?timeout=30s: net/http: TLS handshake timeout
I've tested this on two different networks, with the same error result.
I installed the Config Connector components as described in their documentation, using a dedicated service account with the roles/owner permissions, exactly as stated in the above instructions.
I have successfully deployed IAMServiceAccount and IAMServiceAccountKey resources with this setup.
How should I proceed to troubleshoot this?

My issue was due to an incorrect service account configuration.
In particular, I was assigning the owner role to a different project.
After properly configuring my service account, the timeout errors are resolved.

Related

HTTPRoute set a timeout

I am trying to set up a multi-cluster architecture. I have a Spring Boot API that I want to run on a second cluster (for isolation purposes). I have set that up using the gateway.networking.k8s.io API. I am using a Gateway that has an SSL certificate and matches an IP address that's registered to my domain in the DNS registry. I am then setting up an HTTPRoute for each service that I am running on the second cluster. That works fine and I can communicate between our clusters and everything works as intended but there is a problem:
There is a timeout of 30s by default and I cannot change it. I want to increase it as the application in the second cluster is a WebSocket and I obviously would like our WebSocket connections to stay open for more than 30s at a time. I can see that in the backend service that's created from our HTTPRoute there is a timeout specified as 30s. I found a command to increase it gcloud compute backend-services update gkemcg1-namespace-store-west-1-8080-o1v5o5p1285j --timeout=86400
When I run that command it would increase the timeout and the webSocket connection will be kept alive. But after a few minutes this change gets overridden (I suspect that it's because it's managed by the yaml file). This is the yaml file for my backend service
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
name: public-store-route
namespace: namespace
labels:
gateway: external-http
spec:
hostnames:
- "my-website.example.org"
parentRefs:
- name: external-http
rules:
- matches:
- path:
type: PathPrefix
value: /west
backendRefs:
- group: net.gke.io
kind: ServiceImport
name: store-west-1
port: 8080
I have tried to add either a timeout, timeoutSec, or timeoutSeconds under every level with no success. I always get the following error:
error: error validating "public-store-route.yaml": error validating data: ValidationError(HTTPRoute.spec.rules[0].backendRefs[0]): unknown field "timeout" in io.k8s.networking.gateway.v1beta1.HTTPRoute.spec.rules.backendRefs; if you choose to ignore these errors, turn validation off with --validate=false
Surely there must be a way to configure this. But I wasn't able to find anything in the documentation referring to a timeout. Am I missing something here?
How do I configure the timeout?
Edit:
I have found this resource: https://cloud.google.com/kubernetes-engine/docs/how-to/configure-gateway-resources
I have been trying to set up a LBPolicy and attatch it it the Gateway, HTTPRoute, Service, or ServiceImport but nothing has made a difference. Am I doing something wrong or is this not working how it is supposed to? This is my yaml:
kind: LBPolicy
apiVersion: networking.gke.io/v1
metadata:
name: store-timeout-policy
namespace: sandstone-test
spec:
default:
timeoutSec: 50
targetRef:
name: public-store-route
group: gateway.networking.k8s.io
kind: HTTPRoute

Kubernetes: Receiving Server returned HTTP response code: 403 for URL: https://kubernetes.default.svc.cluster.local:443 while deploying Ignite

While deploying Apache Ignite on Kubernetes I am getting following error in logs of the apache ignite pods
Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/ignite-stack/endpoints/ignite-service
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at org.apache.ignite.internal.kubernetes.connection.KubernetesServiceAddressResolver.getServiceAddresses(KubernetesServiceAddressResolver.java:109)
... 21 more
These kind of error :
Server returned HTTP response code: 403 for URL:https://kubernetes.default.svc.cluster.local:443
usally happens when serviceaccount, secret token for service account, and clusterrolebinding is not in sync (means there is some discrepancy either in token not to service account, or they are totally in different namespaces). Therefore, apiServer cant validate the request as authentic request and gives 403 error.
In my case, service account was having wrong namespace as shown below:
Where as if you see my service account details i configured with following namespace
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: "2022-09-12T14:35:38Z"
name: ignite
namespace: ignite-stack
resourceVersion: "854530"
uid: b6177528-b05c-4d10-9446-09f54164ca39
secrets:
- name: ignite-secret
So make sure to check these mistakes first.

Webhook failing in rabbtimq

I have install rabbitmq cluster using rabbitmq cluster operator. I have also added rabbitmq topology operator. I am trying to create queues using rabbitmq topology operator using following yml file
kind: Queue
metadata:
name: software-results
namespace: rabbitmq-system
spec:
name: software-results # name of the queue
type: quorum # without providing a queue type, rabbitmq creates a classic queue
autoDelete: false
durable: true # seting 'durable' to false means this queue won't survive a server restart
rabbitmqClusterReference:
name: client-queues
I am getting error as
Error from server (InternalError): error when creating "singleQueue.yml": Internal error occurred: failed calling webhook "vqueue.kb.io": failed to call webhook: Post "https://webhook-service.rabbitmq-system.svc:443/validate-rabbitmq-com-v1beta1-queue?timeout=10s": dial tcp 10.97.65.156:443: connect: connection refused
I tried to search for the same but didn't find much. Can anyone help me what exactly is going wrong here ?

GKE config connector issue - Post i/o timeout

I am running into the below error when creating compute IP.
Config connector is already enabled, and it is a private cluster hosted on a shared network.
Version 1.17.15-gke.800
$ kubectl apply -f webapp-compute-ip. yaml
Error from server (InternalError): error when creating "webapp-compute-ip.yaml": Internal error occurred: failed calling webhook "annotation-defaulter.cnrm.cloud.google.com": Post https://cnrm-validating-webhook.cnrm-system.svc:443/annotation-defaulter?timeout=30s: dial tcp 192.168.66.130:9443: i/o timeout
$cat webapp-compute-ip.yaml
apiVersion: compute.cnrm.cloud.google.com/v1beta1
kind: ComputeAddress
metadata:
name: webapp-ip-test
namespace: sandbox
labels:
app: webapp
environment: test
annotations:
cnrm.cloud.google.com/project-id: "cluster-name"
spec:
location: global`
This problem was due to a config connector version issue.
There was a change in the webhook default port, from 443 to 9443, see
Config connector version depends on GKE version, I did not have any control over it, moreover there no is public documentation available on what config connector version is available with the GKE version. There is an existing request here.
The solution was for me to add port 9443 in the firewall rule.

GKE: Service account for Config Connector lacks permissions

I'm attempting to get Config Connector up and running on my GKE project and am following this getting started guide.
So far I have enabled the appropriate APIs:
> gcloud services enable cloudresourcemanager.googleapis.com
Created my service account and added policy binding:
> gcloud iam service-accounts create cnrm-system
> gcloud iam service-accounts add-iam-policy-binding ncnrm-system#test-connector.iam.gserviceaccount.com --member="serviceAccount:test-connector.svc.id.goog[cnrm-system/cnrm-controller-manager]" --role="roles/iam.workloadIdentityUser"
> kubectl wait -n cnrm-system --for=condition=Ready pod --all
Annotated my namespace:
> kubectl annotate namespace default cnrm.cloud.google.com/project-id=test-connector
And then run through trying to apply the Spanner yaml in the example:
~ >>> kubectl describe spannerinstance spannerinstance-sample
Name: spannerinstance-sample
Namespace: default
Labels: label-one=value-one
Annotations: cnrm.cloud.google.com/management-conflict-prevention-policy: resource
cnrm.cloud.google.com/project-id: test-connector
API Version: spanner.cnrm.cloud.google.com/v1beta1
Kind: SpannerInstance
Metadata:
Creation Timestamp: 2020-09-18T18:44:41Z
Generation: 2
Resource Version: 5805305
Self Link: /apis/spanner.cnrm.cloud.google.com/v1beta1/namespaces/default/spannerinstances/spannerinstance-sample
UID:
Spec:
Config: northamerica-northeast1-a
Display Name: Spanner Instance Sample
Num Nodes: 1
Status:
Conditions:
Last Transition Time: 2020-09-18T18:44:41Z
Message: Update call failed: error fetching live state: error reading underlying resource: Error when reading or editing SpannerInstance "test-connector/spannerinstance-sample": googleapi: Error 403: Request had insufficient authentication scopes.
Reason: UpdateFailed
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning UpdateFailed 6m41s spannerinstance-controller Update call failed: error fetching live state: error reading underlying resource: Error when reading or editing SpannerInstance "test-connector/spannerinstance-sample": googleapi: Error 403: Request had insufficient authentication scopes.
I'm not really sure what's going on here, because my cnrm service account has ownership of the project my cluster is in, and I have the APIs listed in the guide enabled.
The CC pods themselves appear to be healthy:
~ >>> kubectl wait -n cnrm-system --for=condition=Ready pod --all
pod/cnrm-controller-manager-0 condition met
pod/cnrm-deletiondefender-0 condition met
pod/cnrm-resource-stats-recorder-58cb6c9fc-lf9nt condition met
pod/cnrm-webhook-manager-7658bbb9-kxp4g condition met
Any insight in to this would be greatly appreciated!
By the error message you have posted, I should supposed that it might be an error in your GKE scopes.
To GKE access others GCP APIs you must allow this access when creating the cluster. You can check the enabled scopes with the command:
gcloud container clusters describe <cluster-name> and find in the result for oauthScopes.
Here you can see the scope's name for Cloud Spanner, you must enable the scope https://www.googleapis.com/auth/cloud-platform as minimum permission.
To verify in the GUI, you can see the permission in: Kubernetes Engine > <Cluster-name> > expand the section permissions and find for Cloud Platform