Connect Consul client on VM to Consul Server in Kubernetes - kubernetes

I'm having trouble setting up this scenario. I have a server that's running 4 VMs:
k8s master
k8s worker 1
k8s worker 2
vm1
I've deployed a Consul cluster with Helm chart into k8s cluster, resulting in 1 server on each worker node, 1 client on each worker node. Here's current config (I'm trying all sorts of things so some might be commented/disabled)
global:
name: consul
enabled: true
datacenter: dc1
gossipEncryption:
autoGenerate: true
tls:
enabled: true
enableAutoEncrypt: true
verify: true
acls:
manageSystemACLs: true
# client:
# exposeGossipPorts: true
server:
replicas: 2
# exposeGossipAndRPCPorts: true
# ports:
# serflan:
# port: 9301
extraConfig: |
{ "log_level": "debug" }
exposeService:
enabled: true
type: NodePort
nodePort:
http: 31500 # 8500 + 23k
https: 31501 # 8501 + 23k
grpc: 31503 # 8503 + 23k
serf: 32301 # 9301 + 23k
rpc: 31300 # 8300 + 23k
securityContext:
runAsNonRoot: false
runAsUser: 0
storageClass: nfs-synology-test
connectInject:
enabled: true
controller:
enabled: true
syncCatalog:
enabled: true
dns:
enabled: true
This works mostly fine in the cluster itself (I can run a Job that does a dig to a Consul service name and I do get a response). Now I wanted to install another Consul client, this time on the vm1 VM and join it into my Consul in K8s cluster.
As you can see in the comments, I've tried exposing gossip and RPC ports as host ports, now instead exposing the server service as NodePort service on given ports.
From the VM I can verify with nmap that those ports are indeed open on TCP, but for the love of all that's holy I can't figure out what to configure in the vm1 client. Here's my current config:
{
"server": false,
"domain": "consul",
"datacenter": "dc1",
"data_dir": "/etc/consul/data",
"tls": {
"defaults": {
"ca_file": "/etc/consul/tls/ca/tls.crt",
"verify_incoming": false,
"verify_outgoing": true
},
"internal_rpc": {
"verify_server_hostname": true
}
},
"auto_encrypt": { "tls": true },
"encrypt": "redacted",
"log_level": "DEBUG",
"enable_syslog": true,
"leave_on_terminate": true,
"retry_join": [
"192.168.1.207:32301",
"192.168.1.208:32301",
"10.233.94.138:8300",
"10.233.119.94:8300"
],
"advertise_addr": "192.168.1.230",
"bind_addr": "0.0.0.0",
"ports": { "server": 31300 },
"acl": {
"tokens": {
"agent": "redacted",
"default": "redacted"
}
}
}
I've taken the value of encrypt from the secret in k8s, same as the tls.crt. I've tried to generate a token with the GUI assigned to client-policy, defined as:
node_prefix "" {
policy = "write"
}
service_prefix "" {
policy = "read"
}
But all to no avail. The client generally fails UDP connections, tries to connect to internal k8s cluster IPs (even without me adding them to retry_join, again just trying), overall geting timeouts and rpc error: lead thread didn't get connection.
I'm out of ideas and I'm at the stage of just trying random ports and configs until I hit jackpots. Can anyone help?

I somewhat figured out my answer so posting if anyone is stuck on the same thing. While I like exposing a service on node ports, it didn't quite work out. The servers were accessible from the 32301 port (so VM client was able to join), but the server themselves were advertising themselves on the pod IP which was inaccessible from outside.
The solution in my case was:
actually use the commented out client.exposeGossipPorts: true, server.exposeGossipAndRPCPorts: true and server.ports.serflan.port: 9301
retry_join in the config should use the node IP + 9301 port
the way I set up the TLS cert, tokens and encrypt was probably right
as far as RPC and failing UDP, the problem was in the fact that Kube cluster failed to deploy the servers and clients correctly (Serflan-UDP ports weren't exposed via UDP). When I kubectl patched it, it started to work (eventually we fixed the cluster itself instead)

Related

Access kubernetes cluster outside of VPN

I configured a kubernetes cluster with rke on premise (for now single node - control plane, worker and etcd).
The VM which I launched the cluster on is inside a VPN.
After succesfully initializng the cluster, I managed to access the cluster with kubectl from inside the VPN.
I tried to access the cluster outside of the VPN so I updated the kubeconfig file and changed the followings:
server: https://<the VM IP> to be server: https://<the external IP>.
I also exposed port 6443.
When trying to access the cluster I get the following error:
E0912 16:23:39 proxy_server.go:147] Error while proxying request: x509: certificate is valid for <the VM IP>, 127.0.0.1, 10.43.0.1, not <the external IP>
My question is, how can I add the external IP to the certificate so I will be able to access the cluster with kubectl outside of the VPN.
The rke configuration yml.
# config.yml
nodes:
- address: <VM IP>
hostname_override: control-plane-telemesser
role: [controlplane, etcd, worker]
user: motti
ssh_key_path: /home/<USR>/.ssh/id_rsa
ssh_key_path: /home/<USR>/.ssh/id_rsa
cluster_name: my-cluster
ignore_docker_version: false
kubernetes_version:
services:
etcd:
backup_config:
interval_hours: 12
retention: 6
snapshot: true
creation: 6h
retention: 24h
kube-api:
service_cluster_ip_range: 10.43.0.0/16
service_node_port_range: 30000-32767
pod_security_policy: false
kube-controller:
cluster_cidr: 10.42.0.0/16
service_cluster_ip_range: 10.43.0.0/16
kubelet:
cluster_domain: cluster.local
cluster_dns_server: 10.43.0.10
fail_swap_on: false
extra_args:
max-pods: 110
network:
plugin: flannel
options:
flannel_backend_type: vxlan
dns:
provider: coredns
authentication:
strategy: x509
authorization:
mode: rbac
ingress:
provider: nginx
options:
use-forwarded-headers: "true"
monitoring:
provider: metrics-server
Thanks,
So I found the solution for RKE cluster configuration.
You to add sans to the the cluster.yml file at the authentication section:
authentication:
strategy: x509
sans:
- "10.18.160.10"
After you saved the file just run rke up again and it will update the cluster.

Vault on EKS with KMS auto-unseal: "http://127.0.0.1:8200/v1/sys/seal-status": dial tcp 127.0.0.1:8200: connect: connection refused

I've installed Vault 1.9.1 using the Vault Helm Chart ver 0.18.0 in my AWS EKS cluster with kubernetes 1.21.0, I'm not able to init it typing the command:
kubectl --namespace=vault exec vault-0 -- vault operator init
I get the error
Error initializing: Put "http://127.0.0.1:8200/v1/sys/init": dial tcp 127.0.0.1:8200:
connect: connection refused
The pod is running but not in READY status, the readiness probe fails due to:
Error checking seal status: Get "http://127.0.0.1:8200/v1/sys/seal-status": dial tcp 127.0.0.1:8200: connect: connection refused
This is my chart values:
vault:
injector:
enabled: false
csi:
enabled: true
server:
enabled: true
extraVolumes:
- name: vault-storage-config
type: secret
extraArgs: -config=/vault/userconfig/vault-storage-config/config.hcl
ha:
enabled: true
replicas: 3
and the config.hcl
ui = true
storage "postgresql" {
connection_url = "postgres://<user>:<pwd>#<rds.url>/vault"
table="vault_kv_store"
ha_enabled="true"
ha_table="vault_ha_locks"
}
service_registration "kubernetes" {}
seal "awskms" {
kms_key_id = <my_kms_key_id>
}
I've enabled the auto-unseal feature leveraging the integration with AWS KMS.
I've already checked that the EKS worker nodes are able to reach the postgres RDS instance and able to call the AWS KMS service, they are granted to
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:DescribeKey",
"ec2:DescribeInstances"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
any help?
Found the solution in my own. I was focusing on the auto-unsealing feature supposing an issue on permissions, actually later I found out it was an issue related to the postgres data storage.
Due to the restarting, I lost the logs that later I was able to get typing
kubectl logs vault-0 -n mynamespace --previous
So I noticed the error
Error initializing storage of type postgresql:
failed to check for native upsert: dial tcp <rds_ip_instance>:5342: connect: connection timed
Early I checked the reachability of the rds instance using psql from the host, but I didn't notice I misconfigured the postgres port 5343 instead 5432.
Basically, I has been victim of the jumbled word effect that has driven me crazy for 2 long days!

What's the correct connection profile for accessing hyperledger fabric v2.x deployed on k8s

I have my hyperledger fabric blockchain deployed on k8s in the namespace: hlf-blockchain and my client app is deployed is in another namespace: hlf-app
The cpp-profile template is below. url-> grpcs://<service-name>.<namespace>:<port> which enables cross namespace communication.
{
"name": "test-network",
"version": "1.0.0",
"client": {
"organization": "MyOrg",
"connection": {
"timeout": {
"peer": {
"endorser": "10000"
}
}
}
},
"organizations": {
"TboxOrg": {
"mspid": "MyOrg",
"peers": [
"peer0",
"peer1",
"peer2"
],
"certificateAuthorities": [
"ca-myorg"
]
}
},
"peers": {
"peer0": {
"url": "grpcs://peer0.hlf-blockchain:${P0PORT}",
"tlsCACerts": {
"pem": "${PEERPEM}"
},
"grpcOptions": {
"ssl-target-name-override": "peer0",
"hostnameOverride": "peer0",
"request-timeout": 10000,
"grpc.keepalive_time_ms": 60000
}
},
"peer1": {
"url": "grpcs://peer1.hlf-blockchain:${P1PORT}",
"tlsCACerts": {
"pem": "${PEERPEM}"
},
"grpcOptions": {
"ssl-target-name-override": "peer1",
"hostnameOverride": "peer1",
"request-timeout": 10000,
"grpc.keepalive_time_ms": 60000
}
},
"peer2-tbox": {
"url": "grpcs://peer2.hlf-blockchain:${P2PORT}",
"tlsCACerts": {
"pem": "${PEERPEM}"
},
"grpcOptions": {
"ssl-target-name-override": "peer2",
"hostnameOverride": "peer2",
"request-timeout": 10000,
"grpc.keepalive_time_ms": 60000
}
}
},
"certificateAuthorities": {
"ca-tboxorg": {
"url": "https://ca-myorg.hlf-blockchain:${CAPORT}",
"caName": "ca-myorg",
"tlsCACerts": {
"pem": ["${CAPEM}"]
},
"httpOptions": {
"verify": false
}
}
}
}
From my client-app using fabrid-sdk-go I am able to connect to the network using the gateway. While invoking the chaincode I am getting the following error:
Endorser Client Status Code: (2) CONNECTION_FAILED. Description: dialing connection on target [peer0:7051]: connection is in TRANSIENT_FAILURE\nTransaction processing for endorser
I am able to invoke the transactions using cli command from the same namespace: hfl-blockchain
My peer service configuration:
kind: Service
apiVersion: v1
metadata:
name: peer0
labels:
app: peer0
spec:
selector:
name: peer0
type: ClusterIP
ports:
- name: grpc
port: 7051
protocol: TCP
- name: event
port: 7061
protocol: TCP
- name: couchdb
port: 5984
protocol: TCP
I believe this error is due to communication error between different namespace, which the client apps gets from the cpp-profile.
What's the correct way to configure the peer service or the cpp connection profile?
You are correct, the discovery service is returning network URLs that are unreachable from the hlf-blockchain namespace.
It is possible to run a Gateway client in a different namespace from the Fabric network. If you are using Kube DNS, each
of the fabric nodes can be referenced with a fully qualified host name <service-name>.<namespace>.svc.cluster.local.
In order to connect a gateway client across namespaces, you will need to introduce the .svc.cluster.local
Fully Qualified Domain Name to the fabric URLs returned by discovery:
In your TLS CA enrollments, make sure that the certificate signing requests include a valid Subject Alternate Name
with the FQDN. For example, if your peer0 TLS certificate is only valid for the host peer0, then the grpcs://
connection will be rejected in the TLS handshake when connecting to grpcs://peer0.hlf-blockchain.svc.cluster.local.
In the Gateway Client Connection Profile, use the FQDN when connecting to the discovery peers. In addition
to the peer url attribute, make sure to address host names in the grpcOptions stanzas.
Discovery will return the peer host names as specified in the core.yaml peer.gossip.externalendpoint
(CORE_PEER_GOSSIP_EXTERNALENDPOINT env) parameter. Make sure that this specifies the FQDN for all peers
visible to discovery.
Discovery will return the orderer host names as specified in the configtx.yaml organization OrdererEndpoints stanza.
Make sure that these URLs specify the FQDN for all orderers.
Regarding the general networking, make sure to double-check that the gateway client application has visibility and a
network route to the pods running fabric services in a different namespace. Depending on your Calico configuration
and Kube permissions, it's possible that traffic is getting blocked before it ever reaches the Fabric services.
Please, feel free to correct me on this.
So the problem lies with the discovery service. When the client tries to establish the connection to peer through gateway (which is embedded in the peer) in order to invoke transaction, the client receives the network topology- through the Discovery service which describes the no. of endorser peers, their url and other metadata.
This metadata is built from the configtx/core.yaml where we specify the peers and orderer (including their host and port).
Since my client app is in different namespace than the hyperledger fabric blockchain, the client needs the url in grpcs://.: but my configuration in the configtx/core.yaml were : which works if the client app is also in the same namespace as the blockchain nodes.
Also, one might think to name the service as peer0-myorg.<namespace> but k8s does not allows . in the name.
To fix my issue, I just moved the client app in the same namespace as that of blockchain network. Depending on your use-case if you're using DNS name for the services then you might not face this.

Metricbeat kubernetes module can’t connect to kubelet

We have a setup, where Metricbeat is deployed as a DaemonSet on a Kubernetes cluster (specifically -- AWS EKS).
All seems to be functioning properly, but the kubelet connection.
To clarify, the following module:
- module: kubernetes
enabled: true
metricsets:
- state_pod
period: 10s
hosts: ["kube-state-metrics.system:8080"]
works properly (the events flow into logstash/elastic).
This module configuration, however, doesn't work in any variants of hosts value (localhost/kubernetes.default/whatever):
- module: kubernetes
period: 10s
metricsets:
- pod
hosts: ["localhost:10255"]
enabled: true
add_metadata: true
in_cluster: true
NOTE: using cluster IP instead of localhost (so that it goes to
control plane) also works (although doesn't retrieve the needed
information, of course).
The configuration above was taken directly from the Metricbeat
documentation and immediately struck me as odd -- how does localhost
get translated (from within Metricbeat docker) to corresponding
kubelet?
The error is, as one would expect, in light of the above:
error making http request: Get http://localhost:10255/stats/summary:
dial tcp [::1]:10255: connect: cannot assign requested address
which indicates some sort of connectivity issue.
However, when SSH-ing to any node Metricbeat is deployed on, http://localhost:10255/stats/summary provides the correct output:
{
"node": {
"nodeName": "...",
"systemContainers": [
{
"name": "pods",
"startTime": "2018-12-06T11:22:07Z",
"cpu": {
"time": "2018-12-23T06:54:06Z",
...
},
"memory": {
"time": "2018-12-23T06:54:06Z",
"availableBytes": 17882275840,
....
I must be missing something very obvious. Any suggestion would do.
NOTE: I cross-posted (and got no response for a couple of days) the same on Elasticsearch Forums
Inject the Pod's Node's IP via the valueFrom provider in the env: list:
env:
- name: HOST_IP
valueFrom:
fieldRef: status.hostIP
and then update the metricbeat config file to use the host's IP:
hosts: ["${HOST_IP}:10255"]
which metricbeat will resolve via its environment variable config injection

Connecting to Google Cloud SQL from Container Engine: can't resolve cloud sql proxy

I'm trying to connect to Google Cloud SQL from a node app, which is running in a Google Container Engine pod managed by Kubernetes. I've followed the instructions here to create a Cloud SQL proxy.
When I run the app, I receive:
{
"code": "ENOTFOUND",
"errno": "ENOTFOUND",
"syscall": "getaddrinfo",
"hostname": "127.0.0.1:3306",
"host": "127.0.0.1:3306",
"port": 3306,
"fatal": true
}
So it looks as though the proxy can't be resolved.
I've run kubectl describe pods <pod_name> and the proxy appears to be healthy:
cloudsql-proxy:
Container ID: docker://47dfb6d22d5e0924f0bb4e1df85220270b4f21e971228d03148fef6b3aad6c6c
Image: b.gcr.io/cloudsql-docker/gce-proxy:1.05
Image ID: docker://sha256:338793fcb60d519482682df9d6f88da99888ba69bc6da96b18a636e1a233e5ec
Port:
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=touch-farm:asia-east1:api-staging=tcp:3306
-credential_file=/secrets/cloudsql/credentials.json
Requests:
cpu: 100m
State: Running
Started: Sat, 01 Oct 2016 20:38:40 +1000
Ready: True
Restart Count: 0
Environment Variables: <none>
The only thing that seems unusual to me is that the Port field is blank, however there was no instruction in the guide referenced above to expose a port in the deployment config file. I've also tried specifying the 3306 port in the configuration file, but although the port then shows in the kubectl describe pods output, node still can't find the proxy.
What am I missing here? Why can't I resolve the proxy?
Edit (more info)
Logs from the cloudsql-proxy container:
2016-10-01T11:44:40.108529344Z 2016/10/01 11:44:40 Listening on 127.0.0.1:3306 for touch-farm:asia-east1:api-staging
2016-10-01T11:44:40.108561194Z 2016/10/01 11:44:40 Ready for new connections
It looks like you are specifying the host as 127.0.0.1:3306 instead of 127.0.0.1.