Openshift 3.11 cloud integration fails with lookup RequestError: send request failed\\ncaused by: Post https://ec2.eu-west-.amazonaws.com - kubernetes

Following the docs: https://docs.openshift.com/container-platform/3.11/install_config/configuring_aws.html#aws-cluster-labeling
Configuring the cloud integration after the cluster build.
When the cluster services are restarted on the masters it fails looking up AWS instances:
22 16:32:10.112895 75995 server.go:261] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-0c5cbd50923f9c6d2: "error listing AWS instances: \"Request.service: main process exited, code=exited, status=255/n/a Error: send request failed\\ncaused by: Post https://ec2.eu-west-.amazonaws.com/: dial tcp: lookup ec2.eu-west-.amazonaws.com: no such host\""
On closer inspection seems to be due to incorrect hostname:
https://ec2.eu-west-.amazonaws.com/ VS https://ec2.eu-west-2.amazonaws.com/
So I double checked the config, which seems to be correct:
# cat /etc/origin/cloudprovider/aws.conf
[Global]
Zone = eu-west-2
Had a google and it seems to be a similar issue to this:
https://github.com/kubernetes-sigs/kubespray/issues/4345
Is there a way to work around this? Moving off 3.11 isn't an option right now.
Thanks.

Looks as though it needs to be zone, rather than the region.
# cat /etc/origin/cloudprovider/aws.conf
[Global]
Zone = eu-west-2a

Related

Ec2 Metadata updgrade from imdSV1 to imdSV2 causes 403 and 401 error- kube2iam

I recently updated my ec2 instances to use imdSV2 but had to rollback because of the following issue:
It looks like after i did the upgrade my init containers started failing and i saw the following in the logs:
time="2022-01-11T14:25:01Z" level=info msg="PUT /latest/api/token (403) took 0.753220 ms" req.method=PUT req.path=/latest/api/token req.remote=XXXXX res.duration=0.75322 res.status=403 time="2022-01-11T14:25:37Z" level=error msg="Error getting instance id, got status: 401 Unauthorized"
We are using Kube2iam for the same. Any advice what changes need to be done on the Kube2iam side to support imdSV2? Below is some info from my kube2iam daemonset:
EKS =1.21
image = "jtblin/kube2iam:0.10.9"

Consul agent on kubernetes, on node or pod?

I deployed an aws eks cluster via terraform. I also deployed Consul following hasicorp’s tutorial and I see the nodes in consul’s UI.
Now I’m wondering how al the consul agents will know about the pods I deploy? I deploy something and it’s not shown anywhere on consul.
I can’t find any documentation as to how to register pods (services) on consul via the node’s consul agent, do I need to configure that somewhere? Should I not use the node’s agent and register the service straight from the pod? Hashicorp discourages this since it may increase resource utilization depending on how many pods one deploy on a given node. But then how does the node’s agent know about my services deployed on that node?
Moreover, when I deploy a pod in a node and ssh into the node, and install consul, consul’s agent can’t find the consul server (as opposed from the node, which can find it)
EDIT:
Bottom line is I can't find WHERE to add the configuration. If I execute ON THE POD:
consul members
It works properly and I get:
Node Address Status Type Build Protocol DC Segment
consul-consul-server-0 10.0.103.23:8301 alive server 1.10.0 2 full <all>
consul-consul-server-1 10.0.101.151:8301 alive server 1.10.0 2 full <all>
consul-consul-server-2 10.0.102.112:8301 alive server 1.10.0 2 full <all>
ip-10-0-101-129.ec2.internal 10.0.101.70:8301 alive client 1.10.0 2 full <default>
ip-10-0-102-175.ec2.internal 10.0.102.244:8301 alive client 1.10.0 2 full <default>
ip-10-0-103-240.ec2.internal 10.0.103.245:8301 alive client 1.10.0 2 full <default>
ip-10-0-3-223.ec2.internal 10.0.3.249:8301 alive client 1.10.0 2 full <default>
But if i execute:
# consul agent -datacenter=voip-full -config-dir=/etc/consul.d/ -log-file=log-file -advertise=$(wget -q -O - http://169.254.169.254/latest/meta-data/local-ipv4)
I get the following error:
==> Starting Consul agent...
Version: '1.10.1'
Node ID: 'f10070e7-9910-06c7-0e12-6edb6cc4c9b9'
Node name: 'ip-10-0-3-223.ec2.internal'
Datacenter: 'voip-full' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.0.3.223 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2021-08-16T18:23:06.936Z [WARN] agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.936Z [WARN] agent: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.946Z [WARN] agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.947Z [WARN] agent.auto_config: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.948Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: ip-10-0-3-223.ec2.internal 10.0.3.223
2021-08-16T18:23:06.948Z [INFO] agent.router: Initializing LAN area manager
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=udp
2021-08-16T18:23:06.950Z [WARN] agent.client.serf.lan: serf: Failed to re-join any previously known node
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2021-08-16T18:23:06.951Z [INFO] agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
2021-08-16T18:23:06.951Z [WARN] agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
2021-08-16T18:23:06.953Z [INFO] agent: started state syncer
2021-08-16T18:23:06.953Z [INFO] agent: Consul agent running!
2021-08-16T18:23:06.953Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:06.954Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2021-08-16T18:23:34.169Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:34.169Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
So where to add the config?
I also tried adding a service in k8s pointing to the pod, but the service doesn't come up on consul's UI...
What do you guys recommend?
Thanks
Consul knows where these services are located because each service
registers with its local Consul client. Operators can register
services manually, configuration management tools can register
services when they are deployed, or container orchestration platforms
can register services automatically via integrations.
if you planning to use manual option you have to register the service into the consul.
Something like
echo '{
"service": {
"name": "web",
"tags": [
"rails"
],
"port": 80
}
}' > ./consul.d/web.json
You can find the good example at : https://thenewstack.io/implementing-service-discovery-of-microservices-with-consul/
Also this is a very nice document for having detailed configuration of the health check and service discovery : https://cloud.spring.io/spring-cloud-consul/multi/multi_spring-cloud-consul-discovery.html
Official document : https://learn.hashicorp.com/tutorials/consul/get-started-service-discovery
BTW, I was finally able to figure out the issue.
consul-dns is not deployed by default, i had to manually deploy it, then forward all .consul requests from coredns to consul-dns.
All is working now. Thanks!

localstack v0.11.5 and kcl v1.13.3

I am using kcl v1.13.3 with the latest localstack v0.11.5
The kcl client now uses edge service port 4566.
Are the kcl and localstack versions compatible?
I keep getting the following error:
com.amazonaws.SdkClientException: Unable to execute HTTP request: The target server failed to respond
Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond
com.amazonaws.SdkClientException: Unable to execute HTTP request: The target server failed to respond
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException (AmazonHttpClient.java:1163)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper (AmazonHttpClient.java:1109)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute (AmazonHttpClient.java:758)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer (AmazonHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute (AmazonHttpClient.java:714)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500 (AmazonHttpClient.java:674)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute (AmazonHttpClient.java:656)
at com.amazonaws.http.AmazonHttpClient.execute (AmazonHttpClient.java:520)
at com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke (AmazonKinesisClient.java:2782)
Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:141)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse (AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader (DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader (CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse (HttpRequestExecutor.java:273)
at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse (SdkHttpRequestExecutor.java:82)
at org.apache.http.protocol.HttpRequestExecutor.execute (HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute (MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute (ProtocolExec.java:185)
at org.apache.http.impl.client.InternalHttpClient.doExecute (InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:56)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute (SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest (AmazonHttpClient.java:1285)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper (AmazonHttpClient.java:1101)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute (AmazonHttpClient.java:758)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer (AmazonHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute (AmazonHttpClient.java:714)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500 (AmazonHttpClient.java:674)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute (AmazonHttpClient.java:656)
at com.amazonaws.http.AmazonHttpClient.execute (AmazonHttpClient.java:520)
Did you already confirm that localstack endpoint is functional on that port? For example:
aws kinesis list-streams --endpoint http://localhost:4566
(If you do not have and want aws cli installed, there's always the option of using docker)
Moreover, it might be helpful for you to share how you are boostrapping the AWS client. It should be something along the lines of:
AwsClientBuilder.EndpointConfiguration endpointConfig = new AwsClientBuilder.EndpointConfiguration("http://localhost:4566",
Regions.EU_WEST_1.getName());
return AmazonDynamoDBClientBuilder.standard()
.withEndpointConfiguration(endpointConfig)
.build();
Note that if you are running your kcl app inside another docker container, then you might want to change from "http://localhost:4566" to "http://localstack:4566".

Failed to send instantiate transaction and get notifications within the timeout period. undefined[fabric1.0 k8s]

I am trying to deploy Hyperledger fabric 1.0.5 on k8s, and use the balance transfer to test it. Everything is right before instantiate-chaincode, and I get this:
[2019-01-02 23:23:14.392] [ERROR] instantiate-chaincode - Failed to send instantiate transaction and get notifications within the timeout period. undefined
[2019-01-02 23:23:14.393] [ERROR] instantiate-chaincode - Failed to order the transaction. Error code: undefined
and I use kubectl logs to get the peer0's log which is like this:
[ConnProducer] NewConnection -> ERRO 61a Failed connecting to orderer2.orderer1:7050 , error: context deadline exceeded
[ConnProducer] NewConnection -> ERRO 61b Failed connecting to orderer1.orderer1:7050 , error: context deadline exceeded
[ConnProducer] NewConnection -> ERRO 61c Failed connecting to orderer0.orderer1:7050 , error: context deadline exceeded
[deliveryClient] connect -> DEBU 61d Connected to
[deliveryClient] connect -> ERRO 61e Failed obtaining connection: Could not connect to any of the endpoints: [orderer2.orderer1:7050 orderer1.orderer1:7050 orderer0.orderer1:7050]
I checked the connectivity of orderer0:7050 and found no problem.
What should I do next?
Thank for help!
You didn't describe what runbook you followed to deploy Hyperledger Fabric but looks like your pods cannot find each other through DNS. If you are following Kubernetes standards your pods should be in the orderer1 namespace and hopefully, you have Kubernetes services for orderer0, orderer1, and orderer2.
You can read more about communication between the Fabric components here in the "Communication between Fabric components" section. Also, read on the "Work around the chaincode sandbox" where it shows you a workaround for --dns-search.
It looks like firewall problem.
In my case to run hlf on k8s, I disabled firewall service.

Cloud foundry on Google Compute engine can't create container

I am very new with Cloud foundry. I have added cloud foundry for google compute engine platform by this guides source1 and source2.
Terraform was used for creating needed infrastructure. It seemed all was fine I didn't get any errors during deployment cloud foundry itself and bosh cck command returns that there are no any problems. But when I tried to deploy my hello world app, I got next error message in terminal after cf push command:
Creating container
Failed to create container
FAILED
Error restarting application: StagingError.
After checking log files I found next message:
{
"timestamp":"1474637304.026303530",
"source":"garden-linux",
"message":"garden-linux.loop-mounter.mount-file.mounting",
"log_level":2,
"data":{
"destPath":"/var/vcap/data/garden/aufs_graph/aufs/diff/08829a3252c1d60729e3b5482b0fb109652c9ab5beff9724e4e4ae756a0bc3ce",
"error":"exit status 32",
"filePath":"/var/vcap/data/garden/aufs_graph/backing_stores/08829a3252c1d60729e3b5482b0fb109652c9ab5beff9724e4e4ae756a0bc3ce",
"output":"mount: wrong fs type, bad option, bad superblock on /dev/loop0,\n missing codepage or helper program, or other error\n In some cases useful info is found in syslog - try\n dmesg | tail or so\n\n",
"session":"2.276"
}
}{
"timestamp":"1474637304.026949406",
"source":"garden-linux",
"message":"garden-linux.pool.acquire.provide-rootfs-failed",
"log_level":2,
"data":{
"error":"mounting file: mounting file: exit status 32",
"handle":"ec6e7469-0ef0-48a8-bcd0-82f4a2ea173f-5de2e641d9284aeea209ca447ffffb6d",
"session":"9.545"
}
}
{
"timestamp":"1474637304.027062416",
"source":"garden-linux",
"message":"garden-linux.garden-server.create.failed",
"log_level":2,
"data":{
"error":"mounting file: mounting file: exit status 32",
"request":{
"Handle":"ec6e7469-0ef0-48a8-bcd0-82f4a2ea173f-5de2e641d9284aeea209ca447ffffb6d",
"GraceTime":0,
"RootFSPath":"/var/vcap/packages/rootfs_cflinuxfs2/rootfs",
"BindMounts":[
{
"src_path":"/var/vcap/data/executor_cache/6942123d3462ad9d21a45729c3cae183-1474475979582384649-1.d",
"dst_path":"/tmp/lifecycle"
}
],
"Network":"",
"Privileged":true,
"Limits":{
"bandwidth_limits":{
},
"cpu_limits":{
"limit_in_shares":512
},
"disk_limits":{
"inode_hard":200000,
"byte_hard":6442450944,
"scope":1
},
"memory_limits":{
"limit_in_bytes":1073741824
}
}
},
"session":"11.44187"
}
}{
"timestamp":"1474637304.034646988",
"source":"garden-linux",
"message":"garden-linux.garden-server.destroy.failed",
"log_level":2,
"data":{
"error":"unknown handle: ec6e7469-0ef0-48a8-bcd0-82f4a2ea173f-5de2e641d9284aeea209ca447ffffb6d",
"handle":"ec6e7469-0ef0-48a8-bcd0-82f4a2ea173f-5de2e641d9284aeea209ca447ffffb6d",
"session":"11.44188"
}
}
And meantime in dmesg | tail I got next:
[161023.238082] aufs test_add:283:garden-linux[7681]: uid/gid/perm
/var/vcap/data/garden/aufs_graph/aufs/diff/d350dcd30f6d6f8b37eabe06a3b73bcea0a87f9aff4edf15f12792269fc9f97c
4294967294/4294967294/0755, 0/0/0755 [161023.238109] aufs
au_opts_verify:1597:garden-linux[7681]: dirperm1 breaks the protection
by the permission bits on the lower branch [161023.413392] device
wtj3qdqhig0t-0 entered promiscuous mode
I'm not sure that this issues connected or that it is issue at all, but I post them here in order to be sure, that I didn't miss anything.
I don't know how to fix this problem and where, should I look solution for terraform scripts or for bosh manifest files. We have micro service architecture with three nodes on node js and one on ruby, so deployment is very important question for us.
here is my application manifest.yml file:
---
applications:
- name: hello_cloud
memory: 128M
buildpack: https://github.com/cloudfoundry/nodejs-buildpack
instances: 1
random-route: true
command: "node server.js"
My goal is to be able deploy applications using cloud foundry. If you have any additional questions or I wrote something unclear feel free to write me.
This issue is related a conflict between garden and the 4.4 Linux kernel. To use the example cloudfoundry manfest, use the follow stemcell:
bosh upload stemcell https://bosh.io/d/stemcells/bosh-google-kvm-ubuntu-trusty-go_agent?v=3262.19
bosh deploy
You may need to delete your cf deployment before re-deploying due to quota issues.