What am I missing to set up an HA vault cluster using raft and autounseal with transit engine - hashicorp-vault

I am setting up Vault with a Raft backend and I'm attempting to set up a cluster using this
guide https://learn.hashicorp.com/tutorials/vault/raft-storage
I got it working without TLS, however I am experiencing errors when trying to implment TLS. Setting up the transit engine and the first raft node is fine, however when trying to set up the 3rd node which would be the second in the cluster, I get the following errors.
> 2022-07-21T11:08:41.407Z [INFO] core: stored unseal keys supported,
> attempting fetch 2022-07-21T11:08:41.407Z [WARN] failed to unseal
> core: error="stored unseal keys are supported, but none were found"
> 2022-07-21T11:08:41.407Z [INFO] core: security barrier not
> initialized 2022-07-21T11:08:41.408Z [INFO] core: attempting to join
> possible raft leader node: leader_addr=https://10.20.30.42:8200
> 2022-07-21T11:08:41.462Z [WARN] core: join attempt failed:
> error="error during raft bootstrap init call: Put
> "https://10.20.30.42:8200/v1/sys/storage/raft/bootstrap/challenge":
> x509: certificate signed by unknown authority"
> 2022-07-21T11:08:41.462Z [INFO] core: security barrier not
> initialized 2022-07-21T11:08:41.462Z [INFO] core: attempting to join
> possible raft leader node: leader_addr=https://10.20.30.43:8200
> 2022-07-21T11:08:41.477Z [WARN] core: join attempt failed:
> error="error during raft bootstrap init call: Put
> "https://10.20.30.43:8200/v1/sys/storage/raft/bootstrap/challenge":
> x509: certificate signed by unknown authority"
> 2022-07-21T11:08:41.477Z [ERROR] core: failed to retry join raft
> cluster: retry=2s 2022-07-21T11:08:41.477Z [INFO] http: TLS handshake
> error from 172.17.0.1:56062: remote error: tls: bad certificate
> 2022-07-21T11:08:43.477Z [INFO] core: security barrier not
> initialized
I thought that setting the VAULT_CACERT env variable with the parth of the correct cert was enough to stop the unknown authority error, this has worked on setting up the original node but for some reason doesnt work on setting up the transit cluster.

Raft consensus occurs over the cluster port and uses a custom, Vault managed certificate. A valid TLS connection is required to call the join API. I don't know if Vault (when running as a server) honors the VAULT_SKIP_VERIFY environment variable, but even if it does, setting would lower the security of your installation.
The error log shows that Vault is trying to reach the leader with its IP address:
> error="error during raft bootstrap init call: Put
> "https://10.20.30.43:8200/v1/sys/storage/raft/bootstrap/challenge":
> x509: certificate signed by an unknown authority"
Make sure your configuration file sets the api_addr parameter to a name that matches the certificate you are using.

The issue was that I needed to pass
leader_ca_cert_file = "route/to/pem/file"
into the retry_join block in the config file. I thought that having declared it as an env variable was enough but apparently not

Related

Consul agent on kubernetes, on node or pod?

I deployed an aws eks cluster via terraform. I also deployed Consul following hasicorp’s tutorial and I see the nodes in consul’s UI.
Now I’m wondering how al the consul agents will know about the pods I deploy? I deploy something and it’s not shown anywhere on consul.
I can’t find any documentation as to how to register pods (services) on consul via the node’s consul agent, do I need to configure that somewhere? Should I not use the node’s agent and register the service straight from the pod? Hashicorp discourages this since it may increase resource utilization depending on how many pods one deploy on a given node. But then how does the node’s agent know about my services deployed on that node?
Moreover, when I deploy a pod in a node and ssh into the node, and install consul, consul’s agent can’t find the consul server (as opposed from the node, which can find it)
EDIT:
Bottom line is I can't find WHERE to add the configuration. If I execute ON THE POD:
consul members
It works properly and I get:
Node Address Status Type Build Protocol DC Segment
consul-consul-server-0 10.0.103.23:8301 alive server 1.10.0 2 full <all>
consul-consul-server-1 10.0.101.151:8301 alive server 1.10.0 2 full <all>
consul-consul-server-2 10.0.102.112:8301 alive server 1.10.0 2 full <all>
ip-10-0-101-129.ec2.internal 10.0.101.70:8301 alive client 1.10.0 2 full <default>
ip-10-0-102-175.ec2.internal 10.0.102.244:8301 alive client 1.10.0 2 full <default>
ip-10-0-103-240.ec2.internal 10.0.103.245:8301 alive client 1.10.0 2 full <default>
ip-10-0-3-223.ec2.internal 10.0.3.249:8301 alive client 1.10.0 2 full <default>
But if i execute:
# consul agent -datacenter=voip-full -config-dir=/etc/consul.d/ -log-file=log-file -advertise=$(wget -q -O - http://169.254.169.254/latest/meta-data/local-ipv4)
I get the following error:
==> Starting Consul agent...
Version: '1.10.1'
Node ID: 'f10070e7-9910-06c7-0e12-6edb6cc4c9b9'
Node name: 'ip-10-0-3-223.ec2.internal'
Datacenter: 'voip-full' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.0.3.223 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2021-08-16T18:23:06.936Z [WARN] agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.936Z [WARN] agent: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.946Z [WARN] agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.947Z [WARN] agent.auto_config: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.948Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: ip-10-0-3-223.ec2.internal 10.0.3.223
2021-08-16T18:23:06.948Z [INFO] agent.router: Initializing LAN area manager
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=udp
2021-08-16T18:23:06.950Z [WARN] agent.client.serf.lan: serf: Failed to re-join any previously known node
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2021-08-16T18:23:06.951Z [INFO] agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
2021-08-16T18:23:06.951Z [WARN] agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
2021-08-16T18:23:06.953Z [INFO] agent: started state syncer
2021-08-16T18:23:06.953Z [INFO] agent: Consul agent running!
2021-08-16T18:23:06.953Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:06.954Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2021-08-16T18:23:34.169Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:34.169Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
So where to add the config?
I also tried adding a service in k8s pointing to the pod, but the service doesn't come up on consul's UI...
What do you guys recommend?
Thanks
Consul knows where these services are located because each service
registers with its local Consul client. Operators can register
services manually, configuration management tools can register
services when they are deployed, or container orchestration platforms
can register services automatically via integrations.
if you planning to use manual option you have to register the service into the consul.
Something like
echo '{
"service": {
"name": "web",
"tags": [
"rails"
],
"port": 80
}
}' > ./consul.d/web.json
You can find the good example at : https://thenewstack.io/implementing-service-discovery-of-microservices-with-consul/
Also this is a very nice document for having detailed configuration of the health check and service discovery : https://cloud.spring.io/spring-cloud-consul/multi/multi_spring-cloud-consul-discovery.html
Official document : https://learn.hashicorp.com/tutorials/consul/get-started-service-discovery
BTW, I was finally able to figure out the issue.
consul-dns is not deployed by default, i had to manually deploy it, then forward all .consul requests from coredns to consul-dns.
All is working now. Thanks!

Hashicorp Vault: "Code: 400. Errors" Error Message

When using Vault Agent with a secret ID file, I received the following error message:
$ ./vault agent --config auth_config.hcl
==> Vault server started! Log data will stream in below:
==> Vault agent configuration:
Api Address 1: http://127.0.0.1:8300
Cgo: disabled
Log Level: info
Version: Vault v1.3.0
2020-02-04T14:08:28.352-0800 [INFO] auth.handler: starting auth handler
2020-02-04T14:08:28.352-0800 [INFO] auth.handler: authenticating
2020-02-04T14:08:28.352-0800 [INFO] sink.server: starting sink server
2020-02-04T14:08:28.352-0800 [INFO] template.server: starting template server
2020-02-04T14:08:28.352-0800 [INFO] template.server: no templates found
2020-02-04T14:08:28.352-0800 [INFO] template.server: template server stopped
2020-02-04T14:08:28.354-0800 [ERROR] auth.handler: error authenticating: error="Error making API request.
URL: PUT http://127.0.0.1:8200/v1/auth/approle/login
Code: 400. Errors:
* invalid secret id" backoff=2.190384035
The command I executed was:
vault agent --config auth_config.hcl
The contents of my auth_config.hcl file is:
vault {
address = "http://127.0.0.1:8200"
}
auto_auth {
method "approle" {
config {
role_id_file_path = "./role_id"
secret_id_file_path = "./secret_id"
remove_secret_id_file_after_reading = false
}
}
}
cache {
use_auto_auth_token = true
}
listener "tcp" {
address = "127.0.0.1:8300"
tls_disable = true
}
My secret ID was generated using the following command:
vault write -f auth/approle/role/payments_service/secret-id -format=json | sed -E -n 's/.*"secret_id": "([^"]*).*/\1/p' > secret_id
Why is this error happening?
I found that the usual reason that this happens because the secret ID file wasn't generated correctly in the first place. See this Github thread for example. Unfortunately, in my case, the file was generated. The file secret_id referenced in auth_config.hcl contained the secret ID.
In my case, the problem was that after I generated the file, secret_id, I executed the command vault write -f auth/approle/role/payments_service/secret-id a second time. This new command didn't write over the original file with a new secret ID. The consequence of this new command was that it respawned a new secret ID which invalidated the previous secret ID which was written to the secret_id file.
My solution was to rerun the command that wrote the secret ID to the file, secret_id, and then immediately run the Vault Agent. Problem solved.
My case was because the app (kes) was trying to use http, instead of https, to connect to vault, while the tls was enabled both in vault and the app (kes). Once it was updated, the app could connect to vault without any issue
Error: failed to connect to Vault: Error making API request.
URL: PUT http://vault.vault:8200/v1/auth/approle/login
Code: 400. Raw Message:
Client sent an HTTP request to an HTTPS server.
Authenticating to Hashicorp Vault 'http://vault.vault:8200'

Rundeck 3.x error for remote execution - Failed: AuthenticationFailure: Authentication failure connecting to node. Could not authenticate

Getting this error in the rundeck 3.x (latest) console when trying to do a remote uptime to the host in question. Can ssh into the host from the rundeck server as rundeck user and root and have set the necessary public keys there and in key storage on rundeck server.
For resources.xml properties file, what should the settings be as that is where it is discovered.
The error in the rundeck server /var/log/rundeck/service.log is:
[2020-01-21 01:15:50.826] ERROR ExecutionUtilService --- [eduler_Worker-1] Execution failed: 29 in project TestProject: [Workflow result\
: , step failures: {1=Dispatch failed on 1 nodes: [some-random-host: AuthenticationFailure: Authentication failure connecting to node: "\
some-random-host". Could not authenticate. + {dataContext=MultiDataContextImpl(map={ContextView(node:some-random-host)=BaseDataContext{{\
exec={exitCode=-1}}}, ContextView(step:1, node:some-random-host)=BaseDataContext{{exec={exitCode=-1}}}}, base=null)} ]}, Node failures: \
{some-random-host=[AuthenticationFailure: Authentication failure connecting to node: "some-random-host". Could not authenticate. + {data\
Context=MultiDataContextImpl(map={ContextView(node:some-random-host)=BaseDataContext{{exec={exitCode=-1}}}, ContextView(step:1, node:som\
e-random-host)=BaseDataContext{{exec={exitCode=-1}}}}, base=null)} ]}, status: failed]
Thanks.
Make sure that you have the remote node well configured (and well referenced in your node definition), you have a good guide here.

Error instantiating chaincode in Hyperledger Fabric 1.4 over AKS kubernetes

I am trying to instantiate "sacc" chaincode (which comes with fabric samples) in an hyperledger fabric network deployed over kubernetes in AKS. After hours trying different adjustments, I've not been able to finish the task. I'm always getting the error:
Error: could not assemble transaction, err proposal response was not
successful, error code 500, msg timeout expired while starting
chaincode sacc:0.1 for transaction
Please note that there is no transaccion ID in the error (I've googled some similar cases, but in all of them, there was an ID for the transaction. Not my case, eventhough the error is the same)
The message in the orderer:
2019-07-23 12:40:13.649 UTC [orderer.common.broadcast] Handle -> WARN
047 Error reading from 10.1.0.45:52550: rpc error: code = Canceled
desc = context canceled 2019-07-23 12:40:13.649 UTC [comm.grpc.server]
1 -> INFO 048 streaming call completed {"grpc.start_time":
"2019-07-23T12:34:13.591Z", "grpc.service": "orderer.AtomicBroadcast",
"grpc.method": "Broadcast", "grpc.peer_address": "10.1.0.45:52550",
"error": "rpc error: code = Canceled desc = context canceled",
"grpc.code": "Canceled", "grpc.call_duration": "6m0.057953469s"}
I am calling for instantiation from inside a cli peer, defining variables CORE_PEER_LOCALMSPID, CORE_PEER_TLS_ROOTCERT_FILE, CORE_PEER_MSPCONFIGPATH, CORE_PEER_ADDRESS and ORDERER_CA with appropriate values before issuing the instantiation call:
peer chaincode instantiate -o -n sacc -v 0.1 -c
'{"Args":["init","hi","1"]}' -C mychannelname --tls 'true' --cafile
$ORDERER_CA
All the peers have declared a dockersocket volume pointing to
/run/docker.sock
All the peers have declared the variable CORE_VM_ENDPOINT to unix:///host/var/run/docker.sock
All the orgs were joined to the channel
All the peers have the chaincode installed
I can't see any further message/error in the cli, nor in the orderer, nor in the peers involved in the channel.
Any ideas on what can be going wrong? Or how could I continue troubleshooting the problem? Is it possible to see logs from the docker container that is being created by the peers? how?
Thanks

Zalenium Readiness probe failed: HTTP probe failed with statuscode: 502

I am trying to deploy the zalenium helm chart in my newly deployed aks Kuberbetes (1.9.6) cluster in Azure. But I don't get it to work. The pod is giving the log below:
[bram#xforce zalenium]$ kubectl logs -f zalenium-zalenium-hub-6bbd86ff78-m25t2 Kubernetes service account found. Copying files for Dashboard... cp: cannot create regular file '/home/seluser/videos/index.html': Permission denied cp: cannot create directory '/home/seluser/videos/css': Permission denied cp: cannot create directory '/home/seluser/videos/js': Permission denied Starting Nginx reverse proxy... Starting Selenium Hub... ..........08:49:14.052 [main] INFO o.o.grid.selenium.GridLauncherV3 - Selenium build info: version: '3.12.0', revision: 'unknown' 08:49:14.120 [main] INFO o.o.grid.selenium.GridLauncherV3 - Launching Selenium Grid hub on port 4445 ...08:49:15.125 [main] INFO d.z.e.z.c.k.KubernetesContainerClient - Initialising Kubernetes support ..08:49:15.650 [main] WARN d.z.e.z.c.k.KubernetesContainerClient - Error initialising Kubernetes support. io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [zalenium-zalenium-hub-6bbd86ff78-m25t2] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:206) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162) at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.(KubernetesContainerClient.java:87) at de.zalando.ep.zalenium.container.ContainerFactory.createKubernetesContainerClient(ContainerFactory.java:35) at de.zalando.ep.zalenium.container.ContainerFactory.getContainerClient(ContainerFactory.java:22) at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.(DockeredSeleniumStarter.java:59) at de.zalando.ep.zalenium.registry.ZaleniumRegistry.(ZaleniumRegistry.java:74) at de.zalando.ep.zalenium.registry.ZaleniumRegistry.(ZaleniumRegistry.java:62) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.openqa.grid.web.Hub.(Hub.java:93) at org.openqa.grid.selenium.GridLauncherV3$2.launch(GridLauncherV3.java:291) at org.openqa.grid.selenium.GridLauncherV3.launch(GridLauncherV3.java:122) at org.openqa.grid.selenium.GridLauncherV3.main(GridLauncherV3.java:82) Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc not verified: certificate: sha256/OyzkRILuc6LAX4YnMAIGrRKLmVnDgLRvCasxGXDhSoc= DN: CN=client, O=system:masters subjectAltNames: [10.0.0.1] at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:308) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:268) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:160) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:256) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:134) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:113) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:125) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:56) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:107) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200) at okhttp3.RealCall.execute(RealCall.java:77) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:770) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:195) ... 16 common frames omitted 08:49:15.651 [main] INFO d.z.e.z.c.k.KubernetesContainerClient - About to clean up any left over selenium pods created by Zalenium Usage: [options] Options: --debug, -debug : enables LogLevel.FINE. Default: false --version, -version Displays the version and exits. Default: false -browserTimeout in seconds : number of seconds a browser session is allowed to hang while a WebDriver command is running (example: driver.get(url)). If the timeout is reached while a WebDriver command is still processing, the session will quit. Minimum value is 60. An unspecified, zero, or negative value means wait indefinitely. -matcher, -capabilityMatcher class name : a class implementing the CapabilityMatcher interface. Specifies the logic the hub will follow to define whether a request can be assigned to a node. For example, if you want to have the matching process use regular expressions instead of exact match when specifying browser version. ALL nodes of a grid ecosystem would then use the same capabilityMatcher, as defined here. -cleanUpCycle in ms : specifies how often the hub will poll running proxies for timed-out (i.e. hung) threads. Must also specify "timeout" option -custom : comma separated key=value pairs for custom grid extensions. NOT RECOMMENDED -- may be deprecated in a future revision. Example: -custom myParamA=Value1,myParamB=Value2 -host IP or hostname : usually determined automatically. Most commonly useful in exotic network configurations (e.g. network with VPN) Default: 0.0.0.0 -hubConfig filename: a JSON file (following grid2 format), which defines the hub properties -jettyThreads, -jettyMaxThreads : max number of threads for Jetty. An unspecified, zero, or negative value means the Jetty default value (200) will be used. -log filename : the filename to use for logging. If omitted, will log to STDOUT -maxSession max number of tests that can run at the same time on the node, irrespective of the browser used -newSessionWaitTimeout in ms : The time after which a new test waiting for a node to become available will time out. When that happens, the test will throw an exception before attempting to start a browser. An unspecified, zero, or negative value means wait indefinitely. Default: 600000 -port : the port number the server will use. Default: 4445 -prioritizer class name : a class implementing the Prioritizer interface. Specify a custom Prioritizer if you want to sort the order in which new session requests are processed when there is a queue. Default to null ( no priority = FIFO ) -registry class name : a class implementing the GridRegistry interface. Specifies the registry the hub will use. Default: de.zalando.ep.zalenium.registry.ZaleniumRegistry -role options are [hub], [node], or [standalone]. Default: hub -servlet, -servlets : list of extra servlets the grid (hub or node) will make available. Specify multiple on the command line: -servlet tld.company.ServletA -servlet tld.company.ServletB. The servlet must exist in the path: /grid/admin/ServletA /grid/admin/ServletB -timeout, -sessionTimeout in seconds : Specifies the timeout before the server automatically kills a session that hasn't had any activity in the last X seconds. The test slot will then be released for another test to use. This is typically used to take care of client crashes. For grid hub/node roles, cleanUpCycle must also be set. -throwOnCapabilityNotPresent true or false : If true, the hub will reject all test requests if no compatible proxy is currently registered. If set to false, the request will queue until a node supporting the capability is registered with the grid. -withoutServlet, -withoutServlets : list of default (hub or node) servlets to disable. Advanced use cases only. Not all default servlets can be disabled. Specify multiple on the command line: -withoutServlet tld.company.ServletA -withoutServlet tld.company.ServletB org.openqa.grid.common.exception.GridConfigurationException: Error creating class with de.zalando.ep.zalenium.registry.ZaleniumRegistry : null at org.openqa.grid.web.Hub.(Hub.java:97) at org.openqa.grid.selenium.GridLauncherV3$2.launch(GridLauncherV3.java:291) at org.openqa.grid.selenium.GridLauncherV3.launch(GridLauncherV3.java:122) at org.openqa.grid.selenium.GridLauncherV3.main(GridLauncherV3.java:82) Caused by: java.lang.ExceptionInInitializerError at de.zalando.ep.zalenium.registry.ZaleniumRegistry.(ZaleniumRegistry.java:74) at de.zalando.ep.zalenium.registry.ZaleniumRegistry.(ZaleniumRegistry.java:62) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.openqa.grid.web.Hub.(Hub.java:93) ... 3 more Caused by: java.lang.NullPointerException at java.util.TreeMap.putAll(TreeMap.java:313) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.withLabels(BaseOperation.java:411) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.withLabels(BaseOperation.java:48) at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.deleteSeleniumPods(KubernetesContainerClient.java:393) at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.initialiseContainerEnvironment(KubernetesContainerClient.java:339) at de.zalando.ep.zalenium.container.ContainerFactory.createKubernetesContainerClient(ContainerFactory.java:38) at de.zalando.ep.zalenium.container.ContainerFactory.getContainerClient(ContainerFactory.java:22) at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.(DockeredSeleniumStarter.java:59) ... 11 more ...........................................................................................................................................................................................GridLauncher failed to start after 1 minute, failing... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 182 100 182 0 0 36103 0 --:--:-- --:--:-- --:--:-- 45500
A describe pod gives:
Warning Unhealthy 4m (x12 over 6m) kubelet, aks-agentpool-93668098-0 Readiness probe failed: HTTP probe failed with statuscode: 502
Zalenium Image Version(s):
dosel/zalenium:3
If using Kubernetes, specify your environment, and if relevant your manifests:
I use the templates as is from https://github.com/zalando/zalenium/tree/master/docs/k8s/helm
I guess it has to do something with rbac because of this part
"Error initialising Kubernetes support. io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [zalenium-zalenium-hub-6bbd86ff78-m25t2] in namespace: [default] failed. at "
I created a clusterrole and clusterrolebinding for the service account zalenium-zalenium that is automatically created by the Helm chart.
kubectl create clusterrole zalenium --verb=get,list,watch,update,delete,create,patch --resource=pods,deployments,secrets
kubectl create clusterrolebinding zalenium --clusterrole=zalnium --serviceaccount=zalenium-zalenium --namespace=default
Issue had to do with Azure's AKS and Kubernetes. It has been fixed.
See github issue 399
If the typo, mentioned by Ignacio, is not the case why don't you just do
kubectl create clusterrolebinding zalenium --clusterrole=cluster-admin serviceaccount=zalenium-zalenium
Note: no need to specify a namespace if creating cluster role binding, as it is cluster wide (role binding goes with namespaces).