Unable to bring up Eclipse che on Kubernetes - kubernetes

Getting ERR_TIMEOUT: Timeout set to pod wait timeout 300000 while dowloading images
I am new to Eclipse che and kubernetes. I got Kubernetes installed on Ubuntu and am trying to run chectl server:start but it is failing. What am doing wrong? Below is the trace i get. Is there a log file where i could get more details? Please help.
Details:
✔ Verify Kubernetes API...OK
✔  Looking for an already existing Che instance
✔ Verify if Che is deployed into namespace "che"
✔ Found running che deployment
✔ Found running plugin registry deployment
✔ Found running devfile registry deployment
✔  Starting already deployed Che
✔ Scaling up Che Deployments...done.
❯ ✅ Post installation checklist
❯ Che pod bootstrap
✔ scheduling...done.
✖ downloading images
→ ERR_TIMEOUT: Timeout set to pod wait timeout 300000
starting
Retrieving Che Server URL
Che status check
Error: ERR_TIMEOUT: Timeout set to pod wait timeout 300000
at KubeHelper.<anonymous> (/usr/local/lib/chectl/lib/api/kube.js:578:19)
at Generator.next (<anonymous>)
at fulfilled (/usr/local/lib/chectl/node_modules/tslib/tslib.js:107:62)
Values.yaml
#
# Copyright (c) 2012-2017 Red Hat, Inc.
# This program and the accompanying materials are made
# available under the terms of the Eclipse Public License 2.0
# which is available at https://www.eclipse.org/legal/epl-2.0/
#
# SPDX-License-Identifier: EPL-2.0
#
# the following section is for secure registries. when uncommented, a pull secret will be created
#registry:
# host: my-secure-private-registry.com
# username: myUser
# password: myPass
cheWorkspaceHttpProxy: ""
cheWorkspaceHttpsProxy: ""
cheWorkspaceNoProxy: ""
cheImage: eclipse/che-server:nightly
cheImagePullPolicy: Always
cheKeycloakRealm: "che"
cheKeycloakClientId: "che-public"
#customOidcUsernameClaim: ""
#customOidcProvider: ""
#workspaceDefaultRamRequest: ""
#workspaceDefaultRamLimit: ""
#workspaceSidecarDefaultRamLimit: ""
global:
cheNamespace: ""
multiuser: false
# This value can be passed if custom Oidc provider is used, and there is no need to deploy keycloak in multiuser mode
# default (if empty) is true
#cheDedicatedKeycloak: false
ingressDomain: <xx.xx.xx.xx.nip.io>
# See --annotations-prefix flag (https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/cli-arguments.md)
ingressAnnotationsPrefix: "nginx."
# options: default-host, single-host, multi-host
serverStrategy: multi-host
tls:
enabled: false
useCertManager: true
useStaging: true
secretName: che-tls
gitHubClientID: ""
gitHubClientSecret: ""
pvcClaim: "1Gi"
cheWorkspacesNamespace: ""
workspaceIdleTimeout: "-1"
log:
loggerConfig: ""
appenderName: "plaintext"

Try to increase timeout by setting --k8spodreadytimeout=500000
[1] https://github.com/che-incubator/chectl

Following https://github.com/eclipse/che/issues/13871 (which is for minishift)
kubectl delete namespaces che
chectl server:start --platform minikube
give it a try

I hope by now you would have installed Eclipse Che on Kubernetes successfully.

Related

`ddev magento` results in `permission denied: unknown`

After running ddev start i cannot run magento commands from outside of the container.
% ddev magento
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "/mnt/ddev_config/.global_commands/web/magento": permission denied: unknown
Failed to run magento : exit status 126
the obove mentioned path does exist inside the container.
ddev exec magento works.
ddev composer works.
name: myproject
type: magento2
docroot: pub
php_version: "7.4"
webserver_type: nginx-fpm
router_http_port: "80"
router_https_port: "443"
xdebug_enabled: false
additional_hostnames: []
additional_fqdns: []
mariadb_version: "10.3"
mysql_version: ""
use_dns_when_possible: true
composer_version: ""
web_environment: []
You don't mention your environment but I imagine you're on macOS with Docker and have enabled experimental settings. Please turn them off... they don't really work right yet. See macOS DDEV drush command Permission denied (Experimental docker settings)

Consul agent on kubernetes, on node or pod?

I deployed an aws eks cluster via terraform. I also deployed Consul following hasicorp’s tutorial and I see the nodes in consul’s UI.
Now I’m wondering how al the consul agents will know about the pods I deploy? I deploy something and it’s not shown anywhere on consul.
I can’t find any documentation as to how to register pods (services) on consul via the node’s consul agent, do I need to configure that somewhere? Should I not use the node’s agent and register the service straight from the pod? Hashicorp discourages this since it may increase resource utilization depending on how many pods one deploy on a given node. But then how does the node’s agent know about my services deployed on that node?
Moreover, when I deploy a pod in a node and ssh into the node, and install consul, consul’s agent can’t find the consul server (as opposed from the node, which can find it)
EDIT:
Bottom line is I can't find WHERE to add the configuration. If I execute ON THE POD:
consul members
It works properly and I get:
Node Address Status Type Build Protocol DC Segment
consul-consul-server-0 10.0.103.23:8301 alive server 1.10.0 2 full <all>
consul-consul-server-1 10.0.101.151:8301 alive server 1.10.0 2 full <all>
consul-consul-server-2 10.0.102.112:8301 alive server 1.10.0 2 full <all>
ip-10-0-101-129.ec2.internal 10.0.101.70:8301 alive client 1.10.0 2 full <default>
ip-10-0-102-175.ec2.internal 10.0.102.244:8301 alive client 1.10.0 2 full <default>
ip-10-0-103-240.ec2.internal 10.0.103.245:8301 alive client 1.10.0 2 full <default>
ip-10-0-3-223.ec2.internal 10.0.3.249:8301 alive client 1.10.0 2 full <default>
But if i execute:
# consul agent -datacenter=voip-full -config-dir=/etc/consul.d/ -log-file=log-file -advertise=$(wget -q -O - http://169.254.169.254/latest/meta-data/local-ipv4)
I get the following error:
==> Starting Consul agent...
Version: '1.10.1'
Node ID: 'f10070e7-9910-06c7-0e12-6edb6cc4c9b9'
Node name: 'ip-10-0-3-223.ec2.internal'
Datacenter: 'voip-full' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.0.3.223 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2021-08-16T18:23:06.936Z [WARN] agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.936Z [WARN] agent: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.946Z [WARN] agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.947Z [WARN] agent.auto_config: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.948Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: ip-10-0-3-223.ec2.internal 10.0.3.223
2021-08-16T18:23:06.948Z [INFO] agent.router: Initializing LAN area manager
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=udp
2021-08-16T18:23:06.950Z [WARN] agent.client.serf.lan: serf: Failed to re-join any previously known node
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2021-08-16T18:23:06.951Z [INFO] agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
2021-08-16T18:23:06.951Z [WARN] agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
2021-08-16T18:23:06.953Z [INFO] agent: started state syncer
2021-08-16T18:23:06.953Z [INFO] agent: Consul agent running!
2021-08-16T18:23:06.953Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:06.954Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2021-08-16T18:23:34.169Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:34.169Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
So where to add the config?
I also tried adding a service in k8s pointing to the pod, but the service doesn't come up on consul's UI...
What do you guys recommend?
Thanks
Consul knows where these services are located because each service
registers with its local Consul client. Operators can register
services manually, configuration management tools can register
services when they are deployed, or container orchestration platforms
can register services automatically via integrations.
if you planning to use manual option you have to register the service into the consul.
Something like
echo '{
"service": {
"name": "web",
"tags": [
"rails"
],
"port": 80
}
}' > ./consul.d/web.json
You can find the good example at : https://thenewstack.io/implementing-service-discovery-of-microservices-with-consul/
Also this is a very nice document for having detailed configuration of the health check and service discovery : https://cloud.spring.io/spring-cloud-consul/multi/multi_spring-cloud-consul-discovery.html
Official document : https://learn.hashicorp.com/tutorials/consul/get-started-service-discovery
BTW, I was finally able to figure out the issue.
consul-dns is not deployed by default, i had to manually deploy it, then forward all .consul requests from coredns to consul-dns.
All is working now. Thanks!

Kubespray Kubernetes Installation Fails - dockerd[8296]: unable to configure the Docker daemon with file /etc/docker/daemon.json

Above error occured when installing kubernetes using kubespray.
The installtion fails and through journal -xe i see the follow:
` node1 systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has begun starting up.
Dec 09 23:37:01 node1 dockerd[8296]: unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: lo
Dec 09 23:37:01 node1 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Dec 09 23:37:01 node1 systemd[1]: Failed to start Docker Application Container Engine.
how do I troubleshoot to fix the issue? Is there something that I am missing looking into?
The json file is as follows
[root#k8s-master01 kubespray]# cat /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
the docker.yml file is as follows:
cat inventory/sample/group_vars/all/docker.yml
---
## Uncomment this if you want to force overlay/overlay2 as docker storage driver
## Please note that overlay2 is only supported on newer kernels
# docker_storage_options: -s overlay2
## Enable docker_container_storage_setup, it will configure devicemapper driver on Centos7 or RedHat7.
docker_container_storage_setup: false
## It must be define a disk path for docker_container_storage_setup_devs.
## Otherwise docker-storage-setup will be executed incorrectly.
# docker_container_storage_setup_devs: /dev/vdb
## Uncomment this if you have more than 3 nameservers, then we'll only use the first 3.
docker_dns_servers_strict: false
# Path used to store Docker data
docker_daemon_graph: "/var/lib/docker"
## Used to set docker daemon iptables options to true
docker_iptables_enabled: "false"
# Docker log options
# Rotate container stderr/stdout logs at 50m and keep last 5
docker_log_opts: "--log-opt max-size=50m --log-opt max-file=5"
# define docker bin_dir
docker_bin_dir: "/usr/bin"
# keep docker packages after installation; speeds up repeated ansible provisioning runs when '1'
# kubespray deletes the docker package on each run, so caching the package makes sense
docker_rpm_keepcache: 0
## An obvious use case is allowing insecure-registry access to self hosted registries.
## Can be ipaddress and domain_name.
## example define 172.19.16.11 or mirror.registry.io
# docker_insecure_registries:
# - mirror.registry.io
# - 172.19.16.11
## Add other registry,example China registry mirror.
# docker_registry_mirrors:
# - https://registry.docker-cn.com
# - https://mirror.aliyuncs.com
## If non-empty will override default system MountFlags value.
## This option takes a mount propagation flag: shared, slave
## or private, which control whether mounts in the file system
## namespace set up for docker will receive or propagate mounts
## and unmounts. Leave empty for system default
# docker_mount_flags:
## A string of extra options to pass to the docker daemon.
## This string should be exactly as you wish it to appear.
docker_options: >-
the setup.cfg file is as below
[root#k8s-master01 kubespray]# cat setup.cfg
[metadata]
name = kubespray
summary = Ansible modules for installing Kubernetes
description-file =
README.md
author = Kubespray
author-email = smainklh#gmail.com
license = Apache License (2.0)
home-page = https://github.com/kubernetes-sigs/kubespray
classifier =
License :: OSI Approved :: Apache Software License
Development Status :: 4 - Beta
Intended Audience :: Developers
Intended Audience :: System Administrators
Intended Audience :: Information Technology
Topic :: Utilities
[global]
setup-hooks =
pbr.hooks.setup_hook
[files]
data_files =
usr/share/kubespray/playbooks/ =
cluster.yml
upgrade-cluster.yml
scale.yml
reset.yml
remove-node.yml
extra_playbooks/upgrade-only-k8s.yml
usr/share/kubespray/roles = roles/*
usr/share/kubespray/library = library/*
usr/share/doc/kubespray/ =
LICENSE
README.md
usr/share/doc/kubespray/inventory/ =
inventory/sample/inventory.ini
etc/kubespray/ =
ansible.cfg
etc/kubespray/inventory/sample/group_vars/ =
inventory/sample/group_vars/etcd.yml
etc/kubespray/inventory/sample/group_vars/all/ =
inventory/sample/group_vars/all/all.yml
inventory/sample/group_vars/all/azure.yml
inventory/sample/group_vars/all/coreos.yml
inventory/sample/group_vars/all/docker.yml
inventory/sample/group_vars/all/oci.yml
inventory/sample/group_vars/all/openstack.yml
[wheel]
universal = 1
[pbr]
skip_authors = True
skip_changelog = True
[bdist_rpm]
group = "System Environment/Libraries"
requires =
ansible
python-jinja2
python-netaddr
Take look on that you have defined in deamon.json file storage driver:
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
At the same time in docker.yaml file you didn't enable storage driver options :
## Uncomment this if you want to force overlay/overlay2 as docker storage driver
## Please note that overlay2 is only supported on newer kernels
# docker_storage_options: -s overlay2
Please uncomment docker_storage_options: -s overlay2 line.
make sure you have followed every steps from this tutorial.

Fresh macos install - kubectl outputs error message

On Macbook Pro, tried installing from binary with curl and then with brew.
Both installs generate an error at the end of output:
~ via 🐘 v7.1.23
➜ kubectl version --output=yaml
clientVersion:
buildDate: "2019-04-19T22:12:47Z"
compiler: gc
gitCommit: b7394102d6ef778017f2ca4046abbaa23b88c290
gitTreeState: clean
gitVersion: v1.14.1
goVersion: go1.12.4
major: "1"
minor: "14"
platform: darwin/amd64
error: unable to parse the server version: invalid character '<' looking for beginning of value
Is there a way to fix this?
I think there is another application listening on 8080 port. By default, kubectl will try to connect on localhost:8080 if no server is passed.
If you have deployed kubernetes apiserver on some other machine or port, pass --server=IP:PORT to kubectl.

Spinnaker "Create Application" menu doesn't load

I'm quite new to the Spinnaker and have to ask for some help I guess. Does anyone knows why it could be that I can't create any Application and just keep seeing this screen.
My installation is through Halyard 1.5.0 and Ubuntu 14.04.
We don't use any cloud provider but I did configure Docker and Kubernetes part
And here is the error I see in the /var/log/spinnaker/echo/echo.log:
2017-11-16 13:52:29.901 INFO 13877 --- [ofit-/pipelines] c.n.s.echo.services.Front50Service : java.net.SocketTimeoutException: timeout
at okio.Okio$3.newTimeoutException(Okio.java:207)
at okio.AsyncTimeout.exit(AsyncTimeout.java:261)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:215)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:306)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:300)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196)
at com.squareup.okhttp.internal.http.Http1xStream.readResponse(Http1xStream.java:186)
at com.squareup.okhttp.internal.http.Http1xStream.readResponseHeaders(Http1xStream.java:127)
at com.squareup.okhttp.internal.http.HttpEngine.readNetworkResponse(HttpEngine.java:739)
at com.squareup.okhttp.internal.http.HttpEngine.access$200(HttpEngine.java:87)
at com.squareup.okhttp.internal.http.HttpEngine$NetworkInterceptorChain.proceed(HttpEngine.java:724)
at com.squareup.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:578)
at com.squareup.okhttp.Call.getResponse(Call.java:287)
at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243)
at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205)
at com.squareup.okhttp.Call.execute(Call.java:80)
at retrofit.client.OkClient.execute(OkClient.java:53)
at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:326)
at retrofit.RestAdapter$RestHandler.access$100(RestAdapter.java:220)
at retrofit.RestAdapter$RestHandler$1.invoke(RestAdapter.java:265)
at retrofit.RxSupport$2.run(RxSupport.java:55)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at retrofit.Platform$Base$2$1.run(Platform.java:94)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:204)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at okio.Okio$2.read(Okio.java:139)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:211)
... 24 more
2017-11-16 13:52:29.901 INFO 13877 --- [ofit-/pipelines] c.n.s.echo.services.Front50Service : ---- END ERROR
#grizzthedj
thanks again for recommendations. It doesn't seem, however, solved the issue. I wonder if it has something to do with my Docker Registry or Kubernetes.
Here is what I have in my .hal/config:
dockerRegistry:
enabled: true
accounts:
- name: <hidden-name>
requiredGroupMembership: []
address: https://docker-registry.<hidden-name>.net/
cacheIntervalSeconds: 30
repositories:
- hellopod
- demoapp
primaryAccount: <hidden-name>
kubernetes:
enabled: true
accounts:
- name: <username>
requiredGroupMembership: []
dockerRegistries:
- accountName: <hidden-name>
namespaces: []
context: sre-os1-dev
namespaces:
- spinnaker
omitNamespaces: []
kubeconfigFile: /home/<username>/.kube/config
I suspect you may be using redis as the persistent storage type(I ran into the same issue).
If this is the case, persistent storage using redis doesn't seem to be working properly out-of-the-box, and it is not supported. I would try using an S3 target, if available.
More info here on support for redis
To configure S3 using Halyard, use the following commands:
echo <SECRET_ACCESS_KEY> | hal config storage s3 edit --access-key-id <ACCESS_KEY_ID> --endpoint <S3_ENDPOINT> --bucket <BUCKET_NAME> --root-folder spinnaker --secret-access-key
hal config storage edit --type s3
hal deploy apply
#grizzthedj,
Here is what I've found inside front50.log (I wiped out ID's of course for security reasons)
You may be right.
2017-11-20 12:40:29.151 INFO 682 --- [0.0-8080-exec-1] com.amazonaws.latency : ServiceName=[Amazon S3], AWSErrorCode=[NoSuchKey], StatusCode=[404], ServiceEndpoint=[https://s3-us-west-2.amazonaws.com], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: ...; S3 Extended Request ID: ...), S3 Extended Request ID: ...], RequestType=[GetObjectRequest], AWSRequestID=[...], HttpClientPoolPendingCount=0, RetryCapacityConsumed=0, HttpClientPoolAvailableCount=1, RequestCount=1, Exception=1, HttpClientPoolLeasedCount=0, ClientExecuteTime=[39.634], HttpClientSendRequestTime=[0.072], HttpRequestTime=[39.213], RequestSigningTime=[0.067], CredentialsRequestTime=[0.001, 0.0], HttpClientReceiveResponseTime=[39.059],
I had a similar issue on kubernetes/aws, when I opened up the chrome dev console I was getting lots of 404 errors trying to connect to localhost:8084, I had to reconfigure the deck and gate baseurls. This is what I did using halyard:
hal config security ui edit --override-base-url http://<deck-loadbalancer-dns-entry>:9000
hal config security api edit --override-base-url http://<gate-loadbalancer-dns-entry>:8084
i did hal deploy apply and when it came back I noticed the developer console was throwing cors errors so I had to do the following.
echo "host: 0.0.0.0" | tee \ ~/.hal/default/service-settings/gate.yml \ ~/.hal/default/service-settings/deck.yml
You may note the lack of TLS and cors config, this is a test system so make better choices in production :)