Vault server deployment error, address already in use - hashicorp-vault

I use latest version of docker image hashicorp/vault
i do all sptps from official documentation.
mount place config gile config.hcl in folder /vault/config/config.hcl
set environment variables :
VAULT_DEV_ROOT_TOKEN_ID: 'myroot_id'
VAULT_ADDR: "http://127.0.0.1:8200"
but containder are exited status , Error parsing listener configuration.
Error initializing listener of type tcp: listen tcp 127.0.0.1:8200: bind: address already in use
that is config file from official documentation
storage "raft" {
path = "./vault/data"
node_id = "node1"
}
listener "tcp" {
address = "127.0.0.1:8200"
tls_disable = "true"
}
api_addr = "http://127.0.0.1:8200"
cluster_addr = "https://127.0.0.1:8201"
ui = true
i try to change envidonment variables, ip adresses, ports, but no succes

Related

Terraform kubectl provider error: failed to created kubernetes rest client for read of resource

I have a Terraform config that (among other resources) creates a Google Kubernetes Engine cluster on Google Cloud. I'm using the kubectl provider to add YAML manifests for a ManagedCertificate and a FrontendConfig, since these are not part of the kubernetes or google providers.
This works as expected when applying the Terraform config from my local machine, but when I try to execute it in our CI pipeline, I get the following error for both of the kubectl_manifest resources:
Error: failed to create kubernetes rest client for read of resource: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused
Since I'm only facing this issue during CI, my first guess is that the service account is missing the right scopes, but as far as I can tell, all scopes are present. Any suggestions and ideas are greatly appreciated!
The provider trying to connect with localhost, which means either to you need to provide a proper kube-config file or set it dynamically in the terraform.
Although you didn't mention how are setting the auth, but here is two way
Poor way
resource "null_resource" "deploy-app" {
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
command = <<EOT
kubectl apply -f myapp.yaml ./temp/kube-config.yaml;
EOT
}
# will run always, its bad
triggers = {
always_run = "${timestamp()}"
}
depends_on = [
local_file.kube_config
]
}
resource "local_file" "kube_config" {
content = var.my_kube_config # pass the config file from ci variable
filename = "${path.module}/temp/kube-config.yaml"
}
Proper way
data "google_container_cluster" "cluster" {
name = "your_cluster_name"
}
data "google_client_config" "current" {
}
provider "kubernetes" {
host = data.google_container_cluster.cluster.endpoint
token = data.google_client_config.current.access_token
cluster_ca_certificate = base64decode(
data.google_container_cluster.cluster.master_auth[0].cluster_ca_certificate
)
}
data "kubectl_file_documents" "app_yaml" {
content = file("myapp.yaml")
}
resource "kubectl_manifest" "app_installer" {
for_each = data.kubectl_file_documents.app_yaml.manifests
yaml_body = each.value
}
If the cluster in the same module , then provider should be
provider "kubernetes" {
load_config_file = "false"
host = google_container_cluster.my_cluster.endpoint
client_certificate = google_container_cluster.my_cluster.master_auth.0.client_certificate
client_key = google_container_cluster.my_cluster.master_auth.0.client_key
cluster_ca_certificate = google_container_cluster.my_cluster.master_auth.0.cluster_ca_certificate
}
Fixed the issue by adding load_config_file = false to the kubectl provider config. My provider config now looks like this:
data "google_client_config" "default" {}
provider "kubernetes" {
host = "https://${endpoint from GKE}"
token = data.google_client_config.default.access_token
cluster_ca_certificate = base64decode(CA certificate from GKE)
}
provider "kubectl" {
host = "https://${endpoint from GKE}"
token = data.google_client_config.default.access_token
cluster_ca_certificate = base64decode(CA certificate from GKE)
load_config_file = false
}

Error core: failed to lookup token: error=failed to read entry, dial tcp [::1]:8500: getsockopt: connection refused in Vault log

We are performing load test on our application using Jmeter, our application uses consul and vault as a backend service for reading/storing application configuration related data. While performing load testing, our application queries the vault for authentication data and this happens for each incoming request. Initially it runs fine for some duration (10 to 15 minutes) and I can see the success response in Jmete, but eventually after sometime the responses starts failing for all the requests. I see the following error in the vault log for each request but do not see any error/exception in the consul log.
Error in Vault log
[ERROR] core: failed to lookup token: error=failed to read entry: Get http://localhost:8500/v1/kv//vault/sys/token/id/87f7b82131cb8fa1ef71aa52579f155d4cf9f095: dial tcp [::1]:8500: getsockopt: connection refused
As of now the load is 100 request (users) in each 10 milliseconds with a ramp-up period of 60 seconds. And this executes over a loop. What could be the cause of this error? Is it due to the limited connection to port 8500
Below is my vault and consul configuration
Vault
backend "consul" {
address = "localhost:8500"
path = "app/vault/"
}
listener "tcp" {
address = "10.88.97.216:8200"
cluster_address = "10.88.97.216:8201"
tls_disable = 0
tls_min_version = "tls12"
tls_cert_file = "/var/certs/vault.crt"
tls_key_file = "/var/certs/vault.key"
}
Consul
{
"data_dir": "/var/consul",
"log_level": "info",
"server": true,
"leave_on_terminate": true,
"ui": true,
"client_addr": "127.0.0.1",
"ports": {
"dns": 53,
"serf_lan": 8301,
"serf_wan" : 8302
},
"disable_update_check": true,
"enable_script_checks": true,
"disable_remote_exec": false,
"domain": "primehome",
"limits": {
"http_max_conns_per_client": 1000,
"rpc_max_conns_per_client": 1000
},
"service": {
"name": "nginx-consul-https",
"port": 443,
"checks": [{
"http": "https://localhost/nginx_status",
"tls_skip_verify": true,
"interval": "10s",
"timeout": "5s",
"status": "passing"
}]
}
}
I have also configured the http_max_conns_per_client & rpc_max_conns_per_client, thinking that it might be due to the limited connection perclicent. But still I am seeing this error in vault log.
After taking another look at this, the issue appears to be that Vault is attempting to contact Consul over the IPv6 loopback address–likely due to the v4 and v6 addresses being present in /etc/hosts–but Consul is only listening on the IPv4 loopback address.
You can likely resolve this through one of the following methods.
Use 127.0.0.1 instead of localhost for Consul's address in the Vault config.
backend "consul" {
address = "127.0.0.1:8500"
path = "app/vault/"
}
Configure Consul to listen on both the IPv4 and IPv6 loopback addresses.
{
"client_addr": "127.0.0.1 [::1]"
}
(Rest of the config omitted for brevity.)
Remove the localhost hostname from the IPv6 loopback in /etc/hosts
127.0.0.1 localhost
# Old hosts entry for ::1
#::1 localhost ip6-localhost ip6-loopback
# New entry
::1 ip6-localhost ip6-loopback

Tiller: dial tcp 127.0.0.1:80: connect: connection refused

From the time I have upgraded the versions of my eks terraform script. I keep getting error after error.
currently I am stuck on this error:
Error: Get http://localhost/api/v1/namespaces/kube-system/serviceaccounts/tiller: dial tcp 127.0.0.1:80: connect: connection refused
Error: Get http://localhost/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/tiller: dial tcp 127.0.0.1:80: connect: connection refused
The script is working fine and I can still use this with old version but I am trying to upgrade the cluster version .
provider.tf
provider "aws" {
region = "${var.region}"
version = "~> 2.0"
assume_role {
role_arn = "arn:aws:iam::${var.target_account_id}:role/terraform"
}
}
provider "kubernetes" {
config_path = ".kube_config.yaml"
version = "~> 1.9"
}
provider "helm" {
service_account = "${kubernetes_service_account.tiller.metadata.0.name}"
namespace = "${kubernetes_service_account.tiller.metadata.0.namespace}"
kubernetes {
config_path = ".kube_config.yaml"
}
}
terraform {
backend "s3" {
}
}
data "terraform_remote_state" "state" {
backend = "s3"
config = {
bucket = "${var.backend_config_bucket}"
region = "${var.backend_config_bucket_region}"
key = "${var.name}/${var.backend_config_tfstate_file_key}" # var.name == CLIENT
role_arn = "${var.backend_config_role_arn}"
skip_region_validation = true
dynamodb_table = "terraform_locks"
encrypt = "true"
}
}
kubernetes.tf
resource "kubernetes_service_account" "tiller" {
#depends_on = ["module.eks"]
metadata {
name = "tiller"
namespace = "kube-system"
}
automount_service_account_token = "true"
}
resource "kubernetes_cluster_role_binding" "tiller" {
depends_on = ["module.eks"]
metadata {
name = "tiller"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cluster-admin"
}
subject {
kind = "ServiceAccount"
name = "tiller"
api_group = ""
namespace = "kube-system"
}
}
terraform version: 0.12.12
eks module version: 6.0.2
It means your server: entry in your .kube_config.yml is pointing to the wrong port (and perhaps even the wrong protocol, as normal kubernetes communication travels over https and is secured via mutual TLS authentication), or there is no longer a proxy that was listening on localhost:80, or perhaps the --insecure-port used to be 80 and is now 0 (as is strongly recommended)
Regrettably, without more specifics, no one can guess what the correct value was or should be changed to
I am sure that there is a need to set up Kubernetes provider on your terraform configuration.
Something like this:
provider "kubernetes" {
config_path = module.EKS_cluster.kubeconfig_filename
}
This happened to me when I miss-configure credentials for terraform with cluster and when there is no access to the cluster. If you configure your kubectl / what ever you are using to authenticate, this should be solved.

EKS kube-system deployments CrashLoopBackOff

I am trying to deploy Kube State Metrics into the kube-system namespace in my EKS Cluster (eks.4) running Kubernetes v1.14.
Kubernetes Connection
provider "kubernetes" {
host = var.cluster.endpoint
token = data.aws_eks_cluster_auth.cluster_auth.token
cluster_ca_certificate = base64decode(var.cluster.certificate)
load_config_file = true
}
Deployment Manifest (as .tf)
resource "kubernetes_deployment" "kube_state_metrics" {
metadata {
name = "kube-state-metrics"
namespace = "kube-system"
labels = {
k8s-app = "kube-state-metrics"
}
}
spec {
replicas = 1
selector {
match_labels = {
k8s-app = "kube-state-metrics"
}
}
template {
metadata {
labels = {
k8s-app = "kube-state-metrics"
}
}
spec {
container {
name = "kube-state-metrics"
image = "quay.io/coreos/kube-state-metrics:v1.7.2"
port {
name = "http-metrics"
container_port = 8080
}
port {
name = "telemetry"
container_port = 8081
}
liveness_probe {
http_get {
path = "/healthz"
port = "8080"
}
initial_delay_seconds = 5
timeout_seconds = 5
}
readiness_probe {
http_get {
path = "/"
port = "8080"
}
initial_delay_seconds = 5
timeout_seconds = 5
}
}
service_account_name = "kube-state-metrics"
}
}
}
}
I have deployed all the required RBAC manifests from https://github.com/kubernetes/kube-state-metrics/tree/master/kubernetes as well - redacted here for brevity.
When I run terraform apply on the deployment above, the Terraform output is as follows :
kubernetes_deployment.kube_state_metrics: Still creating... [6m50s elapsed]
Eventually timing out at 10m.
Here are the outputs of the logs for the kube-state-metrics pod
I0910 23:41:19.412496 1 main.go:140] metric white-blacklisting: blacklisting the following items:
W0910 23:41:19.412535 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0910 23:41:19.412565 1 client_config.go:546] error creating inClusterConfig, falling back to default config: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
F0910 23:41:19.412782 1 main.go:148] Failed to create client: invalid configuration: no configuration has been provided
Adding the following to the spec has taken me to a successful deployment.
automount_service_account_token = true
For posterity :
resource "kubernetes_deployment" "kube_state_metrics" {
metadata {
name = "kube-state-metrics"
namespace = "kube-system"
labels = {
k8s-app = "kube-state-metrics"
}
}
spec {
replicas = 1
selector {
match_labels = {
k8s-app = "kube-state-metrics"
}
}
template {
metadata {
labels = {
k8s-app = "kube-state-metrics"
}
}
spec {
automount_service_account_token = true
container {
name = "kube-state-metrics"
image = "quay.io/coreos/kube-state-metrics:v1.7.2"
port {
name = "http-metrics"
container_port = 8080
}
port {
name = "telemetry"
container_port = 8081
}
liveness_probe {
http_get {
path = "/healthz"
port = "8080"
}
initial_delay_seconds = 5
timeout_seconds = 5
}
readiness_probe {
http_get {
path = "/"
port = "8080"
}
initial_delay_seconds = 5
timeout_seconds = 5
}
}
service_account_name = "kube-state-metrics"
}
}
}
}
I didn't try with terraform.
I have just run this deployment locally i got the same error.
Please run your deployment locally to see the state of your deployment and pods.
I0910 13:25:49.632847 1 main.go:140] metric white-blacklisting: blacklisting the following items:
W0910 13:25:49.632871 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
and finally:
I0910 13:25:49.634748 1 main.go:185] Testing communication with server
I0910 13:25:49.650994 1 main.go:190] Running with Kubernetes cluster version: v1.12+. git version: v1.12.8-gke.10. git tree state: clean. commit: f53039cc1e5295eed20969a4f10fb6ad99461e37. platform: linux/amd64
I0910 13:25:49.651028 1 main.go:192] Communication with server successful
I0910 13:25:49.651598 1 builder.go:126] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses
I0910 13:25:49.651607 1 main.go:226] Starting metrics server: 0.0.0.0:8080
I0910 13:25:49.652149 1 main.go:201] Starting kube-state-metrics self metrics server: 0.0.0.0:8081
verification:
Connected to kube-state-metrics (xx.xx.xx.xx) port 8080 (#0)
GET /metrics HTTP/1.1
Host: kube-state-metrics:8080
User-Agent: curl/7.58.0
Accept: */*
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4
Date: Tue, 10 Sep 2019 13:39:52 GMT
Transfer-Encoding: chunked
[49027 bytes data]
HELP kube_certificatesigningrequest_labels Kubernetes labels converted to
Prometheus labels.
If you are building own image please follow issues on gihtub and docs
update: Just to clarify.
AS mentioned in my answer. I didn't try with terraform but it seems that the first question described only one problem W0910 13:25:49.632871 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
So I suggested to run this deployment locally and verify all errors from the logs. Later occurred that there is a problem with automount_service_account_token. This important errors wasn't applied to the the original question.
So please follow terraform issues on github how you can manage to solve this problem
As per description on github:
I spent hours trying to figure out why a service account and deployment wasn't working in Terraform, but worked with no issues in kubectl - it was the AutomountServiceAccountToken being hardcoded to False in the deployment resource.
At a minimum this should be documented in the Terraform docs for the resource with something noting the resource does not behave like kubectl does.
I hope it explains this problem.

Connect two machines in AKKA remotely ,connection refused

I'm new to akka and wanted to connect two PC using akka remotely just to run some code in both as (2 actors). I had tried the example in akka doc. But what I really do is to add the 2 IP addresses into config file I always get this error?
First machine give me this error:
[info] [ERROR] [11/20/2018 13:58:48.833]
[ClusterSystem-akka.remote.default-remote-dispatcher-6]
[akka.remote.artery.Association(akka://ClusterSystem)] Outbound
control stream to [akka://ClusterSystem#192.168.1.2:2552] failed.
Restarting it. Handshake with [akka://ClusterSystem#192.168.1.2:2552]
did not complete within 20000 ms
(akka.remote.artery.OutboundHandshake$HandshakeTimeoutException:
Handshake with [akka://ClusterSystem#192.168.1.2:2552] did not
complete within 20000 ms)
And second machine:
Exception in thread "main"
akka.remote.RemoteTransportException: Failed to bind TCP to
[192.168.1.3:2552] due to: Bind failed because of
java.net.BindException: Cannot assign requested address: bind
Config file content :
akka {
actor {
provider = cluster
}
remote {
artery {
enabled = on
transport = tcp
canonical.hostname = "192.168.1.3"
canonical.port = 0
}
}
cluster {
seed-nodes = [
"akka://ClusterSystem#192.168.1.3:2552",
"akka://ClusterSystem#192.168.1.2:2552"]
# auto downing is NOT safe for production deployments.
# you may want to use it during development, read more about it in the docs.
auto-down-unreachable-after = 120s
}
}
# Enable metrics extension in akka-cluster-metrics.
akka.extensions=["akka.cluster.metrics.ClusterMetricsExtension"]
# Sigar native library extract location during tests.
# Note: use per-jvm-instance folder when running multiple jvm on one host.
akka.cluster.metrics.native-library-extract-folder=${user.dir}/target/native
First of all, you don't need to add cluster configuration for AKKA remoting. Both the PCs or nodes should be enabled remoting with a concrete port instead of "0" that way you know which port to connect.
Have below configurations
PC1
akka {
actor {
provider = remote
}
remote {
artery {
enabled = on
transport = tcp
canonical.hostname = "192.168.1.3"
canonical.port = 19000
}
}
}
PC2
akka {
actor {
provider = remote
}
remote {
artery {
enabled = on
transport = tcp
canonical.hostname = "192.168.1.4"
canonical.port = 18000
}
}
}
Use below actor path to connect any actor in remote from PC1 to PC2
akka://<PC2-ActorSystem>#192.168.1.4:18000/user/<actor deployed in PC2>
Use below actor path to connect from PC2 to PC1
akka://<PC2-ActorSystem>#192.168.1.3:19000/user/<actor deployed in PC1>
Port numbers and IP address are samples.