How to bring one master in one AZ in kops? - kubernetes

I deployed a cluster in AWS 3 AZ, I want to start one master on each AZ. Everything else works except I cannot start one master in one AZ.
Here is my validation:
INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
bastions Bastion t2.micro 1 1 utility-us-east-1a,utility-us-east-1c,utility-us-east-1d
master-us-east-1a Master m3.medium 1 1 us-east-1a
master-us-east-1c Master m3.medium 2 2 us-east-1c
master-us-east-1d Master m3.medium 1 1 us-east-1d
nodes Node m4.xlarge 3 3 us-east-1a,us-east-1c,us-east-1d
workers Node m4.2xlarge 2 2 us-east-1a,us-east-1c,us-east-1d
NODE STATUS
NAME ROLE READY
ip-10-0-100-34.ec2.internal node True
ip-10-0-107-127.ec2.internal master True
ip-10-0-120-160.ec2.internal node True
ip-10-0-35-184.ec2.internal node True
ip-10-0-39-224.ec2.internal master True
ip-10-0-59-109.ec2.internal node True
ip-10-0-87-169.ec2.internal node True
VALIDATION ERRORS
KIND NAME MESSAGE
InstanceGroup master-us-east-1c InstanceGroup "master-us-east-1c" did not have enough nodes 0 vs 2
Validation Failed
And if I use rolling update, it shows there is one master not started:
NAME STATUS NEEDUPDATE READY MIN MAX NODES
bastions Ready 0 1 1 1 0
master-us-east-1a Ready 0 1 1 1 1
master-us-east-1c Ready 0 0 1 1 0
master-us-east-1d Ready 0 1 1 1 1
nodes Ready 0 3 3 3 3
workers Ready 0 2 2 2 2
What shall I do to bring that machine up?

I solved this problem. It is because the m3.medium type (the default one in kops for master) is no longer available in that AZ. Change it to m4.large makes it work.

Related

Where does the igmp version get set in RedHat 7

Is there a different location/method to set the default igmp version for multicast on a RedHat 7 server other than using the force parameter (net.ipv4.conf.eth0.force_igmp_version = 0 ) in sysctl.conf or sysctl.d etc. In the example above the 0 implies that there is a default which I assume is V3. The output below has a value of V2 on eth0 but it is not set or forced anywhere that I can find.
Idx Device : Count Querier Group Users Timer Reporter
1 lo : 1 V3
010000E0 1 0:00000000 0
2 eth0 : 2 V2
0A0707E7 1 0:00000000 1
010000E0 1 0:00000000 0
3 eth1 : 1 V3
010000E0 1 0:00000000 0
4 eth2 : 1 V3
010000E0 1 0:00000000 0
Any Linux expert there with an idea
you can try net.ipv4.conf.all.force_igmp_version=2 instead of forcing it one place, force for every interface.

Dangling discs after cluster removal

As a part of university course I had to deploy an application to IBM Kubernetes.
I have pay-as-you-go type of account with my credit card attached to it.
I deployed the application to the cluster (the paid tier with public IP) and after few days and the demonstration the cluster was not needed anymore.
The cluster was configured to use dynamic provisioning with persistent storage via ibmcloud-block-storage-plugin.
The problem is that the cluster provisioned tens of discs and then when I removed it using IBM Cloud UI (with option to remove all persistent volumes checked) the discs are still displayed as active.
Result of invoking ibmcloud sl block volume-list:
77394321 SL02SEL1854117-1 dal13 endurance_block_storage 20 - 161.26.114.100 0 1
78180815 SL02SEL1854117-2 dal10 endurance_block_storage 20 - 161.26.98.107 0 1
78180817 SL02SEL1854117-3 dal10 endurance_block_storage 20 - 161.26.98.107 1 1
78180827 SL02SEL1854117-4 dal10 endurance_block_storage 20 - 161.26.98.106 3 1
78180829 SL02SEL1854117-5 dal10 endurance_block_storage 20 - 161.26.98.108 2 1
78184235 SL02SEL1854117-6 dal10 endurance_block_storage 20 - 161.26.98.88 4 1
78184249 SL02SEL1854117-7 dal10 endurance_block_storage 20 - 161.26.98.86 5 1
78184285 SL02SEL1854117-8 dal10 endurance_block_storage 20 - 161.26.98.107 6 1
78184289 SL02SEL1854117-9 dal10 endurance_block_storage 20 - 161.26.98.105 7 1
78184457 SL02SEL1854117-10 dal10 endurance_block_storage 20 - 161.26.98.85 9 1
78184465 SL02SEL1854117-11 dal10 endurance_block_storage 20 - 161.26.98.88 8 1
78184485 SL02SEL1854117-12 dal10 endurance_block_storage 20 - 161.26.98.86 10 1
78184521 SL02SEL1854117-13 dal10 endurance_block_storage 20 - 161.26.98.106 0 1
78184605 SL02SEL1854117-14 dal10 endurance_block_storage 20 - 161.26.98.87 1 1
78184643 SL02SEL1854117-15 dal10 endurance_block_storage 20 - 161.26.98.85 2 1
78184689 SL02SEL1854117-16 dal10 endurance_block_storage 20 - 161.26.98.87 3 1
78184725 SL02SEL1854117-17 dal10 endurance_block_storage 20 - 161.26.98.108 11 1
[ ... more entries there ... ]
All of those discs was created using default ibm bronze block storage class for Kubernetes clusters and have standard Remove policy set (so should've been delted automatically).
When I'm trying to delete any of those with ibmcloud sl block volume-cancel --immediate --force 77394321 I got:
Failed to cancel block volume: 77394321.
No billing item is found to cancel.
What's more the IBM Cloud displays those discs as active and there's no option to delete them (the option in the menu is grayed):
I don't want to get billing for more than 40 x 20GB discs as the cluster don't even need that many resources (the fault was in badly defined Kubernetes configs).
What is correct way to remove the discs or is it only a delay on IBM Cloud and everything will be fine with my billings (my billings show only around $19 for public IP for cluster, nothing more)?
Edit
It seems that after some time the problem was resolved (I created a ticket, but don't know if the sales team solved the problem. Probably it was just enough to wait as #Sandip Amin suggested in comments).
Opening a support case would probably be the best course of action here as we'll likely need some account info from you to figure out what happened (or rather, why the expected actions didn't happen).
Log into the cloud and visit https://cloud.ibm.com/unifiedsupport/supportcenter (or click the Support link in the masthead of the page). If you'll comment back here with your case number, I'll help follow up on it.

Running error opening app when deploying; uses a Github Watson Voice Bot tutorial

https://github.com/IBM/watson-voice-bot
I am fairly new to using Watson Assistant + IBM CLI but am trying to link a Watson Assistant to IBM's speech to text plugin/API. There's a great tutorial provided on GitHub, but I have run into problems trying to deploy the app, and have been unable to get any assistance so far (from people on Github).
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 crashed
FAILED
Error restarting application: Start unsuccessful
TIP: use 'cf logs watson-voice-bot-20181121030328640 --recent' for more information
Finished: FAILED
This is what occurs as I try to deploy it. What should I do?

kubernetes federation : controllermanager crash : Could not find resources from API Server

Using 1.6.6 release for all
I am trying to deploy K8s federation using this guide
I am using below command to create federation components in k8s cluster.
kubefed -v=9 init fellowship --dns-provider="kube-dns" --dns-zone-name="example.com" --host-cluster-context="kubernetes-admin#kubernetes" --api-server-service-type="NodePort" --api-server-advertise-address="xx.yy.zz.aa" --etcd-persistent-storage=false --kubeconfig="/etc/kubernetes/admin.conf"
this is what happening
federation-system fellowship-apiserver-1032646596-pc3bh 2/2 Running 0 14m
federation-system fellowship-controller-manager-2770733854-g593b 0/1 **CrashLoopBackOff** 7 14m
**an logs are as below** .
# more /var/log/pods/042190ab-576e-11e7-9706-0800270541db/controller-manager_2.log
{"log":"I0622 17:14:03.919937 1 controllermanager.go:93] v1.6.6\n","stream":"stderr","time":"2017-06-22T17:14:03.920258584Z"}
{"log":"I0622 17:14:03.921996 1 controllermanager.go:159] Loading client config for cluster controller \"cluster-controller\"\n","stream":"stderr","time":
"2017-06-22T17:14:03.922263896Z"}
{"log":"I0622 17:14:03.923489 1 controllermanager.go:161] Running cluster controller\n","stream":"stderr","time":"2017-06-22T17:14:03.923739515Z"}
{"log":"F0622 17:14:33.924245 1 controllermanager.go:166] Could not find resources from API Server: Get https://fellowship-apiserver/api: dial tcp: i/o timeout\n","stream":"stderr","time":"2017-06-22T17:14:33.927101427Z"}
Any guess what is happening here , am I missing something ?

What is the use of health-check-type attribute

I have deployed one app both in bluemix and pivotal. Below is the manifest file
---
applications:
- name: test
memory: 128M
instances: 1
no-route: true
health-check-type: none //Why we have to use this?
In bluemix, without the health-check-type attribute my app is getting started. But in pivotal, I get the below message continuously due to which the app is getting crashed.
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
FAILED
After passing health-check-type: none in manifest.yml (in pivotal), app is getting started without any issues.
So can someone tell me is it mandatory to use health-check-type attribute?
IBM Bluemix is on the older "DEA" architecture, while Pivotal is on the current "Diego" architecture. You can see how the two differ when it comes to the no-route option here.