I'm working on an automating a hashicorp vault process, and I need to repeatedly run the vault operator init command because of trial and error testing, I tried uninstalling vault and installing it again, but it seems like that doesn't remove the previous unseal keys + root token it generates, how can I do this?
I read somewhere that I needed to delete my storage "file" path which I already did but its not working (Actually my /opt/vault/data/ directory is empty), here is my vault.hcl file:
# Full configuration options can be found at
https://www.vaultproject.io/docs/configuration
ui = true
#mlock = true
#disable_mlock = true
storage "file" {
path = "/opt/vault/data"
}
#storage "consul" {
# address = "127.0.0.1:8500"
# path = "vault"
#}
# HTTP listener
#listener "tcp" {
# address = "127.0.0.1:8200"
# tls_disable = 1
#}
# HTTPS listener
listener "tcp" {
address = "0.0.0.0:8200"
tls_cert_file = "/opt/vault/tls/tls.crt"
tls_key_file = "/opt/vault/tls/tls.key"
}
# Enterprise license_path
# This will be required for enterprise as of v1.8
#license_path = "/etc/vault.d/vault.hclic"
# Example AWS KMS auto unseal
#seal "awskms" {
# region = "us-east-1"
# kms_key_id = "REPLACE-ME"
#}
# Example HSM auto unseal
#seal "pkcs11" {
# lib = "/usr/vault/lib/libCryptoki2_64.so"
# slot = "0"
# pin = "AAAA-BBBB-CCCC-DDDD"
# key_label = "vault-hsm-key"
# hmac_key_label = "vault-hsm-hmac-key"
#}
Best practice for this type of setup is actually terraform or chef or any other stateful transformer. That way you can bring the environment to an ideal state (terraform apply) and easily removed (terraform destroy).
To reinit vault, you can bring it down, delete the data folder: "/opt/vault/data" in your case. Bring up another instance.
Delete /opt/vault/data
Reboot your computer
(You many also need to delete the file located at ~/.vault-token)
If you want to do the testing only why don't you use the vault in dev mode?
Related
Hello and wish you a great time!
I've got following terraform service account deffinition:
resource "google_service_account" "gke_service_account" {
project = var.context
account_id = var.gke_account_name
display_name = var.gke_account_description
}
That I use in GCP kubernetes node pool:
resource "google_container_node_pool" "gke_node_pool" {
name = "${var.context}-gke-node"
location = var.region
project = var.context
cluster = google_container_cluster.gke_cluster.name
management {
auto_repair = "true"
auto_upgrade = "true"
}
autoscaling {
min_node_count = var.gke_min_node_count
max_node_count = var.gke_max_node_count
}
initial_node_count = var.gke_min_node_count
node_config {
machine_type = var.gke_machine_type
service_account = google_service_account.gke_service_account.email
metadata = {
disable-legacy-endpoints = "true"
}
# Needed for correctly functioning cluster, see
# https://www.terraform.io/docs/providers/google/r/container_cluster.html#oauth_scopes
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/cloud-platform"
]
}
}
However the current solution requires the prod and dev envs to be on various GCP accounts but use the same image from prod artifact registry.
As for now I have JSON key file for service account in prod having access to it's registry. Maybe there's a pretty way to use the json file as a second service account for kubernetes or update current k8s service account with json file to have additional permissions to the remote registry?
I've seen the solutions like put it to a secret or user cross-account-service-account.
But it's not the way I want to resolve it since we have some internal restrictions.
Hope someone faced similar task and has a solution to share - it'll save me real time.
Thanks in advance!
I have a Terraform config that (among other resources) creates a Google Kubernetes Engine cluster on Google Cloud. I'm using the kubectl provider to add YAML manifests for a ManagedCertificate and a FrontendConfig, since these are not part of the kubernetes or google providers.
This works as expected when applying the Terraform config from my local machine, but when I try to execute it in our CI pipeline, I get the following error for both of the kubectl_manifest resources:
Error: failed to create kubernetes rest client for read of resource: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused
Since I'm only facing this issue during CI, my first guess is that the service account is missing the right scopes, but as far as I can tell, all scopes are present. Any suggestions and ideas are greatly appreciated!
The provider trying to connect with localhost, which means either to you need to provide a proper kube-config file or set it dynamically in the terraform.
Although you didn't mention how are setting the auth, but here is two way
Poor way
resource "null_resource" "deploy-app" {
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
command = <<EOT
kubectl apply -f myapp.yaml ./temp/kube-config.yaml;
EOT
}
# will run always, its bad
triggers = {
always_run = "${timestamp()}"
}
depends_on = [
local_file.kube_config
]
}
resource "local_file" "kube_config" {
content = var.my_kube_config # pass the config file from ci variable
filename = "${path.module}/temp/kube-config.yaml"
}
Proper way
data "google_container_cluster" "cluster" {
name = "your_cluster_name"
}
data "google_client_config" "current" {
}
provider "kubernetes" {
host = data.google_container_cluster.cluster.endpoint
token = data.google_client_config.current.access_token
cluster_ca_certificate = base64decode(
data.google_container_cluster.cluster.master_auth[0].cluster_ca_certificate
)
}
data "kubectl_file_documents" "app_yaml" {
content = file("myapp.yaml")
}
resource "kubectl_manifest" "app_installer" {
for_each = data.kubectl_file_documents.app_yaml.manifests
yaml_body = each.value
}
If the cluster in the same module , then provider should be
provider "kubernetes" {
load_config_file = "false"
host = google_container_cluster.my_cluster.endpoint
client_certificate = google_container_cluster.my_cluster.master_auth.0.client_certificate
client_key = google_container_cluster.my_cluster.master_auth.0.client_key
cluster_ca_certificate = google_container_cluster.my_cluster.master_auth.0.cluster_ca_certificate
}
Fixed the issue by adding load_config_file = false to the kubectl provider config. My provider config now looks like this:
data "google_client_config" "default" {}
provider "kubernetes" {
host = "https://${endpoint from GKE}"
token = data.google_client_config.default.access_token
cluster_ca_certificate = base64decode(CA certificate from GKE)
}
provider "kubectl" {
host = "https://${endpoint from GKE}"
token = data.google_client_config.default.access_token
cluster_ca_certificate = base64decode(CA certificate from GKE)
load_config_file = false
}
I'm have a Gitlab cloud connected to a k8s cluster running on Google (GKE).
The cluster was created via Gitlab cloud.
I want to customise the config.toml because I want to fix the cache on k8s as suggested in this issue.
I found the config.toml configuration in the runner-gitlab-runner ConfigMap.
I updated the ConfigMap to contain this config.toml setup:
config.toml: |
concurrent = 4
check_interval = 3
log_level = "info"
listen_address = '[::]:9252'
[[runners]]
executor = "kubernetes"
cache_dir = "/tmp/gitlab/cache"
[runners.kubernetes]
memory_limit = "1Gi"
[runners.kubernetes.node_selector]
gitlab = "true"
[[runners.kubernetes.volumes.host_path]]
name = "gitlab-cache"
mount_path = "/tmp/gitlab/cache"
host_path = "/home/core/data/gitlab-runner/data"
To apply the changes I deleted the runner-gitlab-runner-xxxx-xxx pod so a new one gets created with the updated config.toml.
However, when I look into the new pod, the /home/gitlab-runner/.gitlab-runner/config.toml now contains 2 [[runners]] sections:
listen_address = "[::]:9252"
concurrent = 4
check_interval = 3
log_level = "info"
[session_server]
session_timeout = 1800
[[runners]]
name = ""
url = ""
token = ""
executor = "kubernetes"
cache_dir = "/tmp/gitlab/cache"
[runners.kubernetes]
host = ""
bearer_token_overwrite_allowed = false
image = ""
namespace = ""
namespace_overwrite_allowed = ""
privileged = false
memory_limit = "1Gi"
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
[runners.kubernetes.node_selector]
gitlab = "true"
[runners.kubernetes.volumes]
[[runners.kubernetes.volumes.host_path]]
name = "gitlab-cache"
mount_path = "/tmp/gitlab/cache"
host_path = "/home/core/data/gitlab-runner/data"
[[runners]]
name = "runner-gitlab-runner-xxx-xxx"
url = "https://gitlab.com/"
token = "<my-token>"
executor = "kubernetes"
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.kubernetes]
host = ""
bearer_token_overwrite_allowed = false
image = "ubuntu:16.04"
namespace = "gitlab-managed-apps"
namespace_overwrite_allowed = ""
privileged = true
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
[runners.kubernetes.volumes]
The file /scripts/config.toml is the configuration as I created it in the ConfigMap.
So I suspect the /home/gitlab-runner/.gitlab-runner/config.toml is somehow updated when registering the Gitlab-Runner with the Gitlab cloud.
If if changing the config.toml via the ConfigMap does not work, how should I then change the configuration? I cannot find anything about this in Gitlab or Gitlab documentation.
Inside the mapping you can try to append the volume and the extra configuration parameters:
# Add docker volumes
cat >> /home/gitlab-runner/.gitlab-runner/config.toml << EOF
[[runners.kubernetes.volumes.host_path]]
name = "var-run-docker-sock"
mount_path = "/var/run/docker.sock"
EOF
I did the runner deployment using a helm chart; I guess you did the same, in the following link you will find more information about the approach I mention: https://gitlab.com/gitlab-org/gitlab-runner/issues/2578
If after appending the config your pod is not able to start, check the logs, I did test the appending approach and had some errors like "Directory not Found," and it was because I was appending in the wrong path, but after fixing those issues, the runner works fine.
Seems to me you should be modifying config.template.toml (within your relevant configmap, that is)
If you want modify existing config.toml in /home/gitlab-runner/.gitlab-runner you need to set environment variables in deployment. For example, this is default set of variables in case you have installed gitlab-runner by pressing install button in gitlab.
Environment:
CI_SERVER_URL: http://git.example.com/
CLONE_URL:
RUNNER_REQUEST_CONCURRENCY: 1
RUNNER_EXECUTOR: kubernetes
REGISTER_LOCKED: true
RUNNER_TAG_LIST:
RUNNER_OUTPUT_LIMIT: 4096
KUBERNETES_IMAGE: ubuntu:16.04
KUBERNETES_PRIVILEGED: true
KUBERNETES_NAMESPACE: gitlab-managed-apps
KUBERNETES_POLL_TIMEOUT: 180
KUBERNETES_CPU_LIMIT:
KUBERNETES_CPU_LIMIT_OVERWRITE_MAX_ALLOWED:
KUBERNETES_MEMORY_LIMIT:
KUBERNETES_MEMORY_LIMIT_OVERWRITE_MAX_ALLOWED:
KUBERNETES_CPU_REQUEST:
KUBERNETES_CPU_REQUEST_OVERWRITE_MAX_ALLOWED:
KUBERNETES_MEMORY_REQUEST:
KUBERNETES_MEMORY_REQUEST_OVERWRITE_MAX_ALLOWED:
KUBERNETES_SERVICE_ACCOUNT:
KUBERNETES_SERVICE_CPU_LIMIT:
KUBERNETES_SERVICE_MEMORY_LIMIT:
KUBERNETES_SERVICE_CPU_REQUEST:
KUBERNETES_SERVICE_MEMORY_REQUEST:
KUBERNETES_HELPER_CPU_LIMIT:
KUBERNETES_HELPER_MEMORY_LIMIT:
KUBERNETES_HELPER_CPU_REQUEST:
KUBERNETES_HELPER_MEMORY_REQUEST:
KUBERNETES_HELPER_IMAGE:
Modify existing values or add new ones - it will appear in correct section of config.toml.
I am using terraform v0.10.6 to spin up a droplet on digitalocean. I am referencing a key and SSH fingerprint that has already been added to digitalocean in my terraform config (copied below). I am able to log onto existing droplets using this ssh key but not on a newly formed droplet (SSH simply fails). Any thoughts on how to troubleshoot this so that when I launch the droplet via terraform, I should be able to log onto the droplet via the key that has already been added on digitalocean (and visible on DO console). Currently, the droplet appears on the digitalocean admin console but I am never able to SSH onto the server (connection gets denied).
test.tf
# add base droplet with name
resource "digitalocean_droplet" "do-mail" {
image = "ubuntu-16-04-x64"
name = "tmp.validdomain.com"
region = "nyc3"
size = "1gb"
private_networking = true
ssh_keys = [
"${var.ssh_fingerprint}",
]
connection {
user = "root"
type = "ssh"
private_key = "${file(var.private_key)}"
timeout = "2m"
}
provisioner "remote-exec" {
inline = [
"export PATH=$PATH:/usr/bin",
"sudo apt-get update",
]
}
}
terraform.tfvars
digitalocean_token = "correcttoken"
public_key = "~/.ssh/id_rsa.pub"
private_key = "~/.ssh/id_rsa"
ssh_fingerprint = "correct:finger:print"
provider.tf
provider "digitalocean" {
token = "${var.digitalocean_token}"
}
variables.tf
##variables used by terraform
# DO token
variable "digitalocean_token" {
type = "string"
}
# DO public key file location on local server
variable "public_key" {
type = "string"
}
# DO private key file location on local server
variable "private_key" {
type = "string"
}
# DO ssh key fingerprint
variable "ssh_fingerprint" {
type = "string"
}
I was able to setup a new droplet with the SSH key at initialization time when I specified the digitalocean token as an environment variable (as opposed to relying on the terraform.tfvars file).
I have built my job jar using sbt assembly to have all dependencies in one jar. When I try to submit my binary to spark-jobserver I am getting akka.pattern.AskTimeoutException
I modified my configuration to be able to submit large jars (I added parsing.max-content-length = 300m to my configuration) I also increased some of timeouts in configuration but nothing helped.
After I run:
curl --data-binary #matching-ml-assembly-1.0.jar localhost:8090/jars/matching-ml
I am getting:
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/binary-manager#1785133213]] after [3000 ms]. Sender[null] sent message of type \"spark.jobserver.StoreBinary\".",
"errorClass": "akka.pattern.AskTimeoutException",
"stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:745)"]
}
My configuration:
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = "local[4]"
# master = "mesos://vm28-hulk-pub:5050"
# master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
context-per-jvm = false
# Note: JobFileDAO is deprecated from v0.7.0 because of issues in
# production and will be removed in future, now defaults to H2 file.
jobdao = spark.jobserver.io.JobSqlDAO
filedao {
rootdir = /tmp/spark-jobserver/filedao/data
}
datadao {
# storage directory for files that are uploaded to the server
# via POST/data commands
rootdir = /tmp/spark-jobserver/upload
}
sqldao {
# Slick database driver, full classpath
slick-driver = slick.driver.H2Driver
# JDBC driver, full classpath
jdbc-driver = org.h2.Driver
# Directory where default H2 driver stores its data. Only needed for H2.
rootdir = /tmp/spark-jobserver/sqldao/data
# Full JDBC URL / init string, along with username and password. Sorry, needs to match above.
# Substitutions may be used to launch job-server, but leave it out here in the default or tests won't pass
jdbc {
url = "jdbc:h2:file:/tmp/spark-jobserver/sqldao/data/h2-db"
user = ""
password = ""
}
# DB connection pool settings
dbcp {
enabled = false
maxactive = 20
maxidle = 10
initialsize = 10
}
}
# When using chunked transfer encoding with scala Stream job results, this is the size of each chunk
result-chunk-size = 1m
}
# Predefined Spark contexts
# contexts {
# my-low-latency-context {
# num-cpu-cores = 1 # Number of cores to allocate. Required.
# memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
# }
# # define additional contexts here
# }
# Universal context configuration. These settings can be overridden, see README.md
context-settings {
num-cpu-cores = 2 # Number of cores to allocate. Required.
memory-per-node = 2G # Executor memory per node, -Xmx style eg 512m, #1G, etc.
# In case spark distribution should be accessed from HDFS (as opposed to being installed on every Mesos slave)
# spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"
# URIs of Jars to be loaded into the classpath for this context.
# Uris is a string list, or a string separated by commas ','
# dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]
# Add settings you wish to pass directly to the sparkConf as-is such as Hadoop connection
# settings that don't use the "spark." prefix
passthrough {
#es.nodes = "192.1.1.1"
}
}
# This needs to match SPARK_HOME for cluster SparkContexts to be created successfully
# home = "/home/spark/spark"
}
# Note that you can use this file to define settings not only for job server,
# but for your Spark jobs as well. Spark job configuration merges with this configuration file as defaults.
spray.can.server {
# uncomment the next lines for making this an HTTPS example
# ssl-encryption = on
# path to keystore
#keystore = "/some/path/sjs.jks"
#keystorePW = "changeit"
# see http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#SSLContext for more examples
# typical are either SSL or TLS
encryptionType = "SSL"
keystoreType = "JKS"
# key manager factory provider
provider = "SunX509"
# ssl engine provider protocols
enabledProtocols = ["SSLv3", "TLSv1"]
idle-timeout = 60 s
request-timeout = 20 s
connecting-timeout = 5s
pipelining-limit = 2 # for maximum performance (prevents StopReading / ResumeReading messages to the IOBridge)
# Needed for HTTP/1.0 requests with missing Host headers
default-host-header = "spray.io:8765"
# Increase this in order to upload bigger job jars
parsing.max-content-length = 300m
}
akka {
remote.netty.tcp {
# This controls the maximum message size, including job results, that can be sent
# maximum-frame-size = 10 MiB
}
}
I came to the similar issue. The way how to solve it is a bit tricky. First you need to add spark.jobserver.short-timeout to your configuration. Just modify your configuration like this:
jobserver {
port = 8090
context-per-jvm = false
short-timeout = 60s
...
}
The second (tricky) part is you can't fix it without modifying code of the spark-job-application. The attribute which cause timeout is in class BinaryManager:
implicit val daoAskTimeout = Timeout(3 seconds)
The default is set to 3 second which apparently for big jar is not enough. You can increase it to for example 60 second which solve problem for me.
implicit val daoAskTimeout = Timeout(60 seconds)
Actually you can bring down the size of the jars easily. Also some of the dependent jars can be passed using dependent-jar-uris instead of assembling into one big fat jar.