How to implement Hasicorp vault 3 node cluster? - hashicorp-vault

We have implemented hasicorp open source vault as single node with consul backend.
We need help regarding implementing vault 3 node cluster for HA in single datacenter and also well as in multidatacenter.
Could you please help me on this.

The Vault Deployment guide has more on this.
https://learn.hashicorp.com/vault/operations/ops-deployment-guide#help-and-reference
Combine it with this guide: https://learn.hashicorp.com/vault/operations/ops-vault-ha-consul
I shall assume, just based on the fact that you've already gotten a single node up with a Consul backend, that you already know a little about scripting, Git, configuring new SSH connections, installing software, and Virtual Machines.
Also, these are hard to explain, and have much better resources elsewhere.
Further if you get stuck with the prerequisites, tools to install, or downloading the code, please have a look at the resources on the internet.
If you get an error with Vault working improperly, though, make a Github issue ASAP.
Anyway, with that out of the way, the short answer is this:
Step 1: Set up 3 Consul servers, each with references to each other.
Step 2: Set up 3 Vault servers, each of them independent, but with a reference to a Consul
address as their Storage Backend.
Step 3: Initialize the Cluster with your brand new Vault API.
Now for the long answer.
Prerequisites
OS-Specific Prerequisites
MacOS: OSX 10.13 or later
Windows: Windows must have Powershell 3.0 or later. If you're on Windows 7, I recommend Windows Management Framework 4.0, because it's easier to install
Vagrant
Set up Vagrant, because it will take care of all of the networking and resource setup for the underlying infrastructure to use, here.
Especially for Vagrant, the Getting Started guide takes about 30 minutes once you have Vagrant and Virtualbox installed: https://www.vagrantup.com/intro/getting-started/index.html
Install Tools
Make sure you have Git installed
Install the latest version of Vagrant (NOTE: WINDOWS 7 AND WINDOWS 8 REQUIRE POWERSHELL >= 3)
Install the latest version of VMWare or Virtualbox
Download the Code for this demonstration
Related Vendor Documentation Link: https://help.github.com/articles/cloning-a-repository
git clone https://github.com/v6/super-duper-vault-train.git
Use this Code to Make a Vault Cluster
Related Vagrant Vendor Documentation Link: https://www.vagrantup.com/intro/index.html#why-vagrant-
cd super-duper-vault-train
vagrant up ## NOTE: You may have to wait a while for this, and there will be some "connection retry" errors for a long time before a successful connection occurs, because the VM is booting. Make sure you have the latest version, and try the Vagrant getting started guide, too
vagrant status
vagrant ssh instance5
After you do this, you'll see your command prompt change to show vagrant#instance5.
You can also vagrant ssh to other VMs listed in the output of vagrant status.
You can now use Vault or Consul from within the VM for which you ran vagrant ssh.
Vault
Explore the Vault Cluster
ps -ef | grep vault ## Check the Vault process (run while inside a Vagrant-managed Instance)
ps -ef | grep consul ## Check the Consul process (run while inside a Vagrant-managed Instance)
vault version ## Output should be Vault v0.10.2 ('3ee0802ed08cb7f4046c2151ec4671a076b76166')
consul version ## Output should show Consul Agent version and Raft Protocol version
The Vagrant boxes have the following IP addresses:
192.168.13.35
192.168.13.36
192.168.13.37
Vault is on port 8200.
Consul is on port 8500.
Click the Links
http://192.168.13.35:8200 (Vault)
http://192.168.13.35:8500 (Consul)
http://192.168.13.36:8200 (Vault)
http://192.168.13.36:8500 (Consul)
http://192.168.13.37:8200 (Vault)
http://192.168.13.37:8500 (Consul)
Start Vault Data
Related Vendor Documentation Link: https://www.vaultproject.io/api/system/init.html
Start Vault.
Run this command on one of the Vagrant-managed VMs, or somewhere on your computer that has curl installed.
curl -s --request PUT -d '{"secret_shares": 3,"secret_threshold": 2}' http://192.168.13.35:8200/v1/sys/init
Unseal Vault
Related Vendor Documentation Link: https://www.vaultproject.io/api/system/unseal.html
This will unseal the Vault at 192.168.13.35:8200. You can use the same process for 192.168.13.36:8200 and 192.168.13.37:8200.
Use your unseal key to replace the value for key abcd1430890..., and run this on the Vagrant-managed VM.
curl --request PUT --data '{"key":"abcd12345678..."}' http://192.168.13.35:8200/v1/sys/unseal
Run that curl command again. But use a different value for "key":. Replace efgh2541901... with a different key than you used in the previous step, from the keys you received when running the v1/sys/init endpoint.
curl --request PUT --data '{"key":"efgh910111213..."}' http://192.168.13.35:8200/v1/sys/unseal
Non-Vagrant
Please refer to the file PRODUCTION_INSTALLATION.md in this repository.
Codified Vault Policies and Configuration
To Provision Vault via its API, please refer to the
provision_vault folder.
It has data and scripts.
The data folder's tree corresponds to the HashiCorp Vault API
endpoints, similar to the following:
https://www.hashicorp.com/blog/codifying-vault-policies-and-configuration#layout-and-design
You can use the Codified Vault
Policies and Configuration
with your initial Root token, after
initializing and unsealing Vault,
to configure Vault quickly via its API.
The .json files inside each folder
correspond to the payloads to send to Vault
via its API, but there may also be .hcl,
.sample, and .sh files for convenience's sake.

Hashicorp have written some guidance on how to get started with setting up a HA Vault and Consul configuration:
https://www.vaultproject.io/guides/operations/vault-ha-consul.html

Related

hashicorp same vault binary started on different linux fails to work with same etcd

Having fight with weird issue.
I one of my env vault is not able to work stable with etcd as storage.
So here is story.
I have etcd server 3.5 version installed. Works perfectly with etcdctl tool
When I run on one system.
Ubuntu 20.04.2 LTS
I am having issues with JWT token.
In vault logs I am having
{"level":"warn","ts":"2022-03-29T16:23:27.614Z","logger":"etcd-client","caller":"v3#v3.5.0/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002b7500/#initially=[https://etcd-server:2379]","attempt":99,"error":"rpc error: code = Unauthenticated desc = etcdserver: invalid auth token"}
But some records vault is able to read. So sometimes it thinks JWT is OK.
When I copy same binary on Fedora 35 and run there I do not have an issue.
From etcd logs I can extract JWT token in both cases. And can verify it using JWT tools.
Both correct and signature is OK as well.
Etcd token is runnning with
name: ETCD_AUTH_TOKEN
value: jwt,priv-key=jwt-token.pem,sign-method=RS256,ttl=10m
Interesting thing that if I will run same on other Fedora 35 box I am having JWT issue as well.
If I am setting ETCD_AUTH_TOKEN into 'simple' then as expected all OS starts working without issues.
So I am really lost why first of it does not work with JWT on all. And second why it really works only on one system.
Binary of vault is static and downloaded from Hashicorp site as iti is. So does not depends on system libs.
Time is synced on all systems.
Will be appreciate any help and ideas.
Thank you

Connect containerized self-hosted agent with Azure DevOps

I followed the instructions in the ms docs guide, and the agent started without any issues. However it never showed up in my agent pool. I tried a different version of the start.sh script found on github and it connected immediately. Is there anything else I can do to try and troubleshoot this? Logs from the non-working agent below
❯ kubectl logs azpagent-55864668dc-zgdrn
1. Determining matching Azure Pipelines agent...
2. Downloading and installing Azure Pipelines agent...
3. Configuring Azure Pipelines agent...
>> End User License Agreements:
Building sources from a TFVC repository requires accepting the Team Explorer Everywhere End User License Agreement. This step is not required for building sources from Git repositories.
A copy of the Team Explorer Everywhere license agreement can be found at:
/azp/agent/externals/tee/license.html
>> Connect:
Connecting to server ...
>> Register Agent:
Scanning for tool capabilities.
Connecting to the server.
Successfully replaced the agent
Testing agent connection.
2019-08-03 04:22:56Z: Settings Saved.
4. Running Azure Pipelines agent...
Starting Agent listener interactively
Started listener process
Started running service
Scanning for tool capabilities.
Connecting to the server.
2019-08-03 04:23:08Z: Agent connect error: The signature is not valid.. Retrying until reconnected.
Not really sure what else to try -- has anyone else seen this issue, or had success with the linux agent guide?
Looking at the error message:
The signature is not valid.
There might be a problem with the provided PAT. I'd suggest generating a new PAT, as described by this guide, and trying again.
Let me know if this has helped.
Update
According to the error info The signature is not valid..
Are you using and building sources from a TFVC repository which requires accepting the Team Explorer Everywhere End User License Agreement. This step is not required for building sources from Git repositories.
If so have a try with building from Git repo.
The doc you referred a different version of the start.sh script which is deprecated. It's for an old build agent.
According to this and related error The signature is not valid.. Retrying until reconnected. Few things I would suggest:
You may on a pretty old agent version , try the latest version agent
https://github.com/microsoft/azure-pipelines-agent/releases
You need to restart the agent process in order to make those
environment take affect.
Check with your IT department, make sure the network between your
build machine and tfs server/Azure DevOps Service is reliable, see
whether there is any change in your network.
Also make sure your build machine/VM not run out of resource.
In case this or a similar issue occurs for anyone else, the suggestion from #juliobbv was very helpful. If you comment out the last line of the script, and replace it with
./bin/Agent.Listener run & wait $!
you can get a clearer view of any error messages.
In my case, I didn't realize that AGENT_NAME and POOL were no longer the same variable, and the original error message didn't indicate that the issue was my lack of permissions to the default pool.
My final changes to the script are below -- I defaulted to the agent name using the hostname, and maintained the previous behavior of using a custom pool
./config.sh --unattended \
--agent "$(hostname)" \
--url "$AZP_URL" \
--auth PAT \
--token $(cat "$AZP_TOKEN_FILE") \
--pool "${AZP_POOL:-Default}" \
--work "${AZP_WORK:-_work}" \
--replace \
--acceptTeeEula & wait $!

How to completely remove an installed CentOS RPM file?

Folks, how do I make sure all files of a RPM (CENTOS) where removed? The problem is that I've installed a software called shiny-proxy https://www.shinyproxy.io/ - and after 3 days running as a test server, we received a message called NetScan Detected from Germany. Now we want to clean everything up removing the RPM but it seams not that easy as something else is left on the system that continues to send and receive lots of packages (40kps). I really apologize shinyproxy folks if that is not part of their work, so far this is the last system under investigation.
your docker API is bound to your public IP and therefore directly reachable from an external network. You should not do this as it would allow anybody to run arbitrary docker instances and even commands on your docker host.
You should secure your docker install:
- bind it to 127.0.0.1 (lo) interface and adapt the shinyproxy yml file accordingly
- setup TLS mutual auth (client certificate) on the docker API (it is supported by shinyproxy)

Apache CloudStack: No templates showing when adding instance

I have setup the apache cloudstack on CentOS 6.8 machine following quick installation guide. The management server and KVM are setup on the same machine. The management server is running without problems. I was able to add zone, pod, cluster, primary and secondary storage from the web interface. But when I tried to add an instance it is not showing any templates in the second stage as you can see in the screenshot
However, I am able to see two templates under Templates link in web UI.
But when I select the template and navigate to Zone tab, I see Timeout waiting for response from storage host and Ready field shows no.
When I check the management server logs, it seems there is an error when cloudstack tries to mount secondary storage for use. The below segment from cloudstack-management.log file describes this error.
2017-03-09 23:26:43,207 DEBUG [c.c.a.t.Request] (AgentManager-Handler-
14:null) (logid:) Seq 2-7686800138991304712: Processing: { Ans: , MgmtId:
279278805450918, via: 2, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":
{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
GetRootDir for nfs://172.16.10.2/export/secondary failed due to
com.cloud.utils.exception.CloudRuntimeException: Unable to mount
172.16.10.2:/export/secondary at /mnt/SecStorage/6e26529d-c659-3053-8acb-
817a77b6cfc6 due to mount.nfs: Connection timed out\n\tat
org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.getRootDir(Nf
sSecondaryStorageResource.java:2080)\n\tat
org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.execute(NfsSe
condaryStorageResource.java:1829)\n\tat
org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.executeReques
t(NfsSecondaryStorageResource.java:265)\n\tat
com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat
com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:833)\n\tat
com.cloud.utils.nio.Task.call(Task.java:83)\n\tat
com.cloud.utils.nio.Task.call(Task.java:29)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:262)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\
n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\
n\tat java.lang.Thread.run(Thread.java:745)\n","wait":0}}] }
Can anyone please guide me how to resolve this issue? I have been trying to figure it out for some hours now and don't know how to proceed further.
Edit 1: Please note that my LAN address was 10.103.72.50 which I assume is not /24 address. I tried to give CentOs a static IP by making the following settings in ifcg-eth0 file
DEVICE=eth0
HWADDR=52:54:00:B9:A6:C0
NM_CONTROLLED=no
ONBOOT=yes
BOOTPROTO=none
IPADDR=172.16.10.2
NETMASK=255.255.255.0
GATEWAY=172.16.10.1
DNS1=8.8.8.8
DNS2=8.8.4.4
But doing this would stop my internet. As a workaround, I reverted these changes and installed all the packages first. Then I changed the IP to static by the same configuration settings as above and ran the cloudstack management. Everything worked fine untill I bumped into this template thing. Please help me figure out what might have went wrong
I know I'm late, but for people trying out in the future, here it goes:
I hope you have successfully added a host as mentioned in Quick Install Guide before you changed your IP to static as it autoconfigures VLANs for different traffic and creates two bridges - generally with names 'cloud' or 'cloudbr'. Cloudstack uses the Secondary Storage System VM for doing all the storage-related operations in each Zone and Cluster. What seems to be the problem is that secondary storage system vm (SSVM) is not able to communicate with the management server at port 8250. If not, try manually mounting the NFS server's mount points in the SSVM shell. You can ssh into the SSVM using the below command:
ssh -i /var/cloudstack/management/.ssh/id_rsa -p 3922 root#<Private or Link local Ip address of SSVM>
I suggest you run the /usr/local/cloud/systemvm/ssvm-check.sh after doing ssh into the secondary storage system VM (assuming it is running) and has it's private, public and link local IP address. If that doesn't help you much, take a look at the secondary storage troubleshooting docs at Cloudstack.
I would further recommend, if anyone in future runs into similar issues, check if the SSVM is running and is in "Up" state in the System VMs section of Infrastructure tab and that you are able to open up a console session of it from the browser. If that is working go on to run the ssvm-check.sh script mentioned above which systematically checks each and every point of operation that SSVM executes. Even if console session cannot be opened up, you can still ssh using the link local IP address of SSVM which can be accessed by opening up details of SSVM and than execute the script. If it says, it cannot communicate with Management Server at port 8250, I recommend you check the iptables rules of management server and make sure all traffic is allowed at port 8250. A custom command to check the same is nc -v <mngmnt-server-ip> 8250. You can do a simple search and learn how to add port 8250 in your iptables rules if that is not opened. Next, you mentioned you used CentOS 6.8, so it probably uses older versions of nfs, so execute exportfs -a in your NFS server to make sure all the NFS shares are properly exported and there are no errors. I would recommend that you wait for the downloading status of CentOS 5.5 no GUI kvm template to be complete and its Ready status shown as 'Yes' before you start importing your own templates and ISOs to execute on VMs. Finally, if your ssvm-check.sh script shows everything is good and the download still does not start, you can run the command: service cloud restart and actually check if the service has gotten a PID using service cloud status as the older versions of system vm templates sometimes need us to manually start the cloud service using service cloud start even after the restart command. Restarting the cloud service in SSVM triggers the restart of downloading of all remaining templates and ISOs. Side note: the system VMs uses a Debian kernel if you want to do some more troubleshooting. Hope this helps.

NFS V4 at FreeBSD hosted, both client and server, mounts OK but there is no read or write on the filesystem, reporting Input/output error

I have successfully mounted and used NFS Version 4 having Solaris server and FreeBSD client.
Problem is when having FreeBSD server and FreeBSD client at version 4. Version 3 works excellent.
I use FreeBSD NFS server since FreeBSD verson 4.5 (then having IBM AiX clients).
The problem:
mounts OK, but there are no principals appear at the kerberos cache, and when trying to read or write on the mounted filesystem I get the error: Input/output error
nfs/server-fqdn#REALM and nfs/client-fqdn#REALM principals are created at kerberos server and stored at keytab files properly at both sides.
I issue tgt tickets from the KDC using the above for both sides for the root's kerberos cache.
I start services properly:
file /etc/rc.conf
rpcbind_enable="YES"
gssd_enable="YES"
rpc_statd_enable="YES"
rpc_lockd_enable="YES"
mountd_enable="YES"
nfsuserd_enable="YES"
nfs_server_enable="YES"
nfsv4_server_enable="YES"
then I start services
at client: rpcbind, gssd, nfsuserd,
at server all above having the exports file:
V4: /marble/nfs -sec=krb5:krb5i:krb5p -network 10.20.30.0 -mask 255.255.255.0
I mount:
# mount_nfs -o nfsv4 servername:/ /my/mounted/nfs
#
# mkdir /my/mounted/nfs/e
# mkdir: /my/mounted/nfs/e: Input/output error
#
Same result for even an ls command.
klist does not show any new principals at root's cache, or any other cache.
The amazing performance at version 3 I love, but need local lock files feature of NFS4.
Second reason is security. I need kerberised RPC calls (-sec=krbp).
If anyone of you has achieved this using FreeBSD server for NFS Version 4, please give a feedback to this question, I'll be glad if you do.
Comments are not good to give code examples. Here is the setup of FreeBSD client and FreeBSD server that works for me. I don't use Kerberos but if you make it working with this minimal configuration then you can add Kerberos afterwards (I believe).
Server rc.conf:
nfs_server_enable="YES"
nfs_server_flags="-u -t -n 4"
nfsv4_server_enable="YES"
nfsuserd_enable="YES"
mountd_flags="-r"
Server /etc/exports:
/parent/path1 -mapall=1001:1001 192.168.2.200
/parent/path2 -mapall=1001:1001 192.168.2.200
... (more shares)
V4: /parent/ -sec=sys 192.168.2.200
Client rc.conf:
nfs_client_enable="YES"
nfs_client_flags="-n 4"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
Client fstab:
192.168.2.100:/path1/ /mnt/path1/ nfs rw,bg,late,failok,nfsv4 0 0
192.168.2.100:/path2/ /mnt/path2/ nfs rw,bg,late,failok,nfsv4 0 0
... (more shares)
As you see the client tries to mount only what's after the /parent/ path specified in the V4 line on the server. 192.168.2.100 is server IP and 192.168.2.200 is the client IP. This setup will only allow that one client connect to the server.
I hope I haven't missed anything. BTW please rise questions like this on SuperUser or ServerFault rather than StackOverflow. I am surprised this question hasn't been closed yet because of that ;)