kubernate microk8s dashboard-proxy timed out waiting for the condition on deployments/kubernetes-dashboard - kubernetes

I am having the time out issue while starting up my cluster dashboard with
microk8s dashboard-proxy
This issue occurs again and again in my kubernate cluster. I don't really know the cause of it.
error: timed out waiting for the condition on deployments/kubernetes-dashboard
Traceback (most recent call last):
File "/snap/microk8s/1609/scripts/wrappers/dashboard-proxy.py", line 50, in <module>
dashboard_proxy()
File "/snap/microk8s/1609/scripts/wrappers/dashboard-proxy.py", line 13, in dashboard_proxy
check_output(command)
File "/snap/microk8s/1609/usr/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/snap/microk8s/1609/usr/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['microk8s.kubectl', '-n', 'kube-system', 'wait', '--timeout=240s', 'deployment', 'kubernetes-dashboard', '--for', 'condition=available']' returned non-zero exit status 1

I am running a kubernate cluster on vagrant machines(Centos) using microk8s. This issue can be cause by many raisons. Bellow is some of them;
Leak of memory
To fix: you need to increase memory within you vagrant machines settings
some thing went wrong during you microk8s installation
To fix : remove microk8s and reinstall it. In my case I use snap to install it. This is how I did
snap remove microk8s --purge
snap install microk8s --classic --channel=1.18/stable
some time you need to kill the process and restart you vagrant machine. In my case I did
lsof -Pi:10443 where 10443 on which my dashboard is running
kill -9 xxxx where xxxx is the PID retrieved from the previous command

Related

When I run the sentinel register tesla_ai.jir -set_active true -mode ir command in Jaseci after closing the terminal, I get a Python error

When I run the sentinel register tesla_ai.jir -set_active true -mode ir command in Jaseci after closing the terminal, I get a Python error. What could be causing this issue?
Here is the error message:
File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/usr/local/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused
During handling of the above exception, another exception occurred:
This occured while typing jaseci > sentinel register tesla_ai.jir -set_active true -mode ir
Three things are happening here:
First, we registered the jir we compiled earlier to new sentinel. This means this new sentinel now has access to all of our walkers, nodes and edges. -mode ir option speciifes a jir program is registered instead of a jac program.
Second, with -set_active true we set this new sentinel to be the active sentinel. In other words, this sentinel is the default one to be used when requests hit the Jac APIs, if no specific sentinels are specified.
Third, sentinel register has automatically creates a new graph (if no currently active graph) and run the init walker on that graph. This behavior can be customized with the options -auto_run and -auto_create_graph.
Restart the terminal or the entire computer to see if the issue is temporary and resolve it.
Check if I have the latest version of Jaseci installed. Try updating it to see if that resolves the issue.
Ensure that MY environment has the required dependencies and libraries installed for Jaseci to run correctly.

Proxmox lxc add add linux.kernel_modules

I am trying to setup an LXC container (debian) as a Kubernetes node.
I am so far that the only thing in the way is the kubeadm init script...
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/5.4.44-2-pve/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/5.4.44-2-pve\n", err: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
After some research I figured out that I probably need to add the following: linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
But adding this to /etc/pve/lxc/107.conf doesn't do anything.
Does anybody have a clue how to add the linux kernel modules?
To allow load with modprobe any modules inside privileged proxmox lxc container, you need add this options to container config:
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:
lxc.mount.auto: proc:rw sys:rw
lxc.mount.entry: /lib/modules lib/modules none bind 0 0
before that, you must first create the /lib/modules folder inside the container
I'm not sure what guide you are following but assuming that you have the required kernel modules on the host, this would do it:
lxc config set my-container linux.kernel_modules overlay
You can follow this guide from K3s too. Basically:
lxc config edit k3s-lxc
and
config:
linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
raw.lxc: lxc.mount.auto=proc:rw sys:rw
security.privileged: "true"
security.nesting: "true"
✌️
For the fix ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file run from the host:
pct set $VMID --mp0 /usr/lib/modules/$(uname -r),mp=/lib/modules/$(uname -r),ro=1,backup=0
For the fix [ERROR SystemVerification]: failed to parse kernel config run from the host:
pct push $VMID /boot/config-$(uname -r) /boot/config-$(uname -r)
Where $VMID is your container id.

Airflow- dag_id could not be found issue when using kubernetes executor

I am using airflow stable helm chart and using Kubernetes Executor, new pod is being scheduled for dag but its failing with dag_id could not be found issue. I am using git-sync to get dags. Below is the error and kubernetes config values. Can someone please help me resolve this issue?
Error:
[2020-07-01 23:18:36,939] {__init__.py:51} INFO - Using executor LocalExecutor
[2020-07-01 23:18:36,940] {dagbag.py:396} INFO - Filling up the DagBag from /opt/airflow/dags/dags/etl/sampledag_dag.py
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 37, in <module>
args.func(args)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 75, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/bin/cli.py", line 523, in run
dag = get_dag(args)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/bin/cli.py", line 149, in get_dag
'parse.'.format(args.dag_id))
airflow.exceptions.AirflowException: dag_id could not be found: sampledag . Either the dag did not exist or it failed to parse.
Config:
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: false
AIRFLOW__KUBERNETES__GIT_REPO: git#git.com/dags.git
AIRFLOW__KUBERNETES__GIT_BRANCH: master
AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: /dags
AIRFLOW__KUBERNETES__GIT_SSH_KEY_SECRET_NAME: git-secret
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-repo
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: tag
AIRFLOW__KUBERNETES__RUN_AS_USER: "50000"
sampledag
import logging
import datetime
from airflow import models
from airflow.contrib.operators import kubernetes_pod_operator
import os
args = {
'owner': 'airflow'
}
YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)
try:
print("Entered try block")
with models.DAG(
dag_id='sampledag',
schedule_interval=datetime.timedelta(days=1),
start_date=YESTERDAY) as dag:
print("Initialized dag")
kubernetes_min_pod = kubernetes_pod_operator.KubernetesPodOperator(
# The ID specified for the task.
task_id='trigger-task',
# Name of task you want to run, used to generate Pod ID.
name='trigger-name',
namespace='scheduler',
in_cluster = True,
cmds=["./docker-run.sh"],
is_delete_operator_pod=False,
image='imagerepo:latest',
image_pull_policy='Always',
dag=dag)
print("done")
except Exception as e:
print(str(e))
logging.error("Error at {}, error={}".format(__file__, str(e)))
raise
I had the same issue. I solved it by adding the following to my config:
AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH: repo/
What was happening is that the init container will download your dags in [AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT]/[AIRFLOW__KUBERNETES__GIT_SYNC_DEST] and AIRFLOW__KUBERNETES__GIT_SYNC_DEST by default is repo (https://airflow.apache.org/docs/stable/configurations-ref.html#git-sync-dest)
I am guessing that the problem could be incurred from the difference in your setup that causes: /opt/airflow/dags/dags/etl/sampledag_dag.py and AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: /dags
I'd double check that these are what you want, and are what you expect.
I was facing the same issue while trying to use Kubernetes Executor using stable helm airflow chart. In my case, I was able to resolve it by changing
AIRFLOW__KUBERNETES__RUN_AS_USER: "50000" to AIRFLOW__KUBERNETES__GIT_SYNC_RUN_AS_USER: "65533" in the env section of helm chart.
Same value is mentioned in this link
I came to this conclusion as the init container (git sync) which was running before the temporary worker pod came up was not able to clone/sync the git dags to the worker pods. In my case, there was a permissions error (even when kube secret for ssh clone was correctly passed)
Note:
the git-sync init container returns no error even if it fails to fetch the DAGs
Kubernetes debugging information for init containers
kubectl get pods -n [NAMESPACE]
kubectl logs -n [NAMESPACE] [POD_ID] -c git-sync
Getting the same issue, I solved it with the suggestion from #gtrip to set the UID of the git-sync run user to 65533.
I would add the following debug hints:
the git-sync init container returns no error even if it fails to fetch the DAGs
Kubernetes debugging information for init containers
kubectl get pods -n [NAMESPACE]
kubectl logs -n [NAMESPACE] [POD_ID] -c git-sync

python-memcache memcached -- I installed on centos virtualbox but it get/set never seem to work

I'm using python. I did a yum install memcached followed by a easy_install python-memcached
I used the simple test program from the Help(memcache). When I wasn't getting the proper answers I threw in some print statements:
[~/test]$ cat m2.py
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
x = mc.set("some_key", "Some value")
print 'Just set a key and value into the cache (suposedly)'
value = mc.get("some_key")
print 'Just retrieved that value from the cache using the key'
print 'X %s' % x
print 'Value %s' % value
[~/test]$ python m2.py
Just set a key and value into the cache (suposedly)
Just retrieved that value from the cache using the key
X 0
Value None
[~/test]$
The question now is, what have I failed to do in my installation? It appears to be working from an API perspective but it fails to put anything into the memcache share area.
I'm using a virtualbox vm running centos
[~]# cat /proc/version
Linux version 2.6.32-358.6.2.el6.i686 (mockbuild#c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Thu May 16 18:12:13 UTC 2013
Is there a daemon that is supposed to be running? I don't see an obvious named one when I do a ps.
I tried to get pylibmc installed on my vm but was unable to find a working installation so for now will see if I can get the above stuff working first.
I discovered if i ran straight from the python console GUI i get a bit more output if I set debug=1
>>> mc = memcache.Client(['127.0.0.1:11211'], debug=1)
>>> mc.stats
{}
>>> mc.set('test','value')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
0
>>> mc.get('test')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
When I try to use per the example telnet to connect to the port i get a connection refused:
[root#~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root#~]#
I tried the instructions I found on the net for configuring telnet so localhost wouldn't be disabled:
vi /etc/xinetd.d/telnet
service telnet
{
flags = REUSE
socket_type = stream
wait = no
user = root
server = /usr/sbin/in.telnetd
log_on_failure += USERID
disable = no
}
And then ran the commands to restart the service(s):
service iptables stop
service xinetd stop
service iptables start
service xinetd start
service iptables stop
I ran with both cases (iptables started and stopped) but it has no effect. So I am out of ideas. What do I need to do to make it so the PORT will be allowed? if that is the problem?
Or is there a memcached service that needs to be running that needs to open up the port ?
well this is what it took to get it working: ( a series of manual steps )
1) su -
cd /var/run
mkdir memcached # this was missing
In the memcached file I added "-l 127.0.0.1" to the OPTIONS statement. It's apparently a listen option. Do this for steps 2 & 3. I'm not certain which file is actually used at runtime.
2) cd /etc/sysconfig
cp memcached memcached.old
vi memcached
3) cd /etc/init.d
cp memcached memcached.old
vi memcached
4) Try some commands to see if the server starts now
/etc/init.d/memcached start
/etc/init.d/memcached status
/etc/init.d/memcached stop
/etc/init.d/memcached restart
I tried opening a browser, but it never seemed to actually display anything so I don't really know how valid this approach is. I'm not running apache or anything like this so perhaps its not relevant to my cause. Perhaps I would have to supply a ?key=blah or something.
5) http://127.0.0.1:11211
6) Now it should be ready to go. If one runs the test shown with the following it should work. At least it did for me. doing the help(memcache) will display a simple program. just paste that in and it should work just fine.
[~]$ python
>>> import memcache
>>> help(memcache)

Issues with running two instances of searchd

I have just updated our Sphinx server from 1.10-beta to 2.0.6-release, and now I have run into some issues with searchd. Previously we were able to run two instances of searchd next to each other by specifying two different config-files, i.e:
searchd --config /etc/sphinx/sphinx.conf
searchd --config /etc/sphinx/sphinx.staging.conf
sphinx.conf listens to 9306:mysql41, and 9312, while sphinx.staging.conf listens to 9307:mysql41 and 9313.
After we updated to 2.0.6 however, a second instance is never started. Or rather.. the output makes it seem like it starts, and a pid-file is created etc. But for some reason only the first searchd instance keeps running, and the second seems to shutdown right away. So while trying to run searchd --config /etc/sphinx/sphinx.conf twice (if that was the first one started) complains that the pid-file is in use, trying to run searchd --config /etc/sphinx/sphinx.staging.conf (if that is the second started instance) "starts" the daemon again and again, only no new process is created..
Note that if I switch these commands around when first creating the process, then sphinx.conf is the instance not really started.
I have checked, and rechecked, that these ports are only used by searchd.
Does anyone have any idea of what I can do/try next? I've installed it from source on ubuntu 10.04 LTS with:
./configure --prefix /etc/sphinx --with-mysql --enable-id64 --with-libstemmer
make -j4 install
Note to self: Check the logs!
RT-indices use binary logs to enable crash recovery. Since my old config files did not specify a path for where these should be stored, both instances of searchd tried to write to the same binary logs. The instance started last was of course not permitted to manipulate these files, and thus exited with a fatal error:
[Fri Nov 2 17:13:32.262 2012] [ 5346] FATAL: failed to lock
'/etc/sphinx/var/data/binlog.lock': 11 'Resource temporarily unavailable'
[Fri Nov 2 17:13:32.264 2012] [ 5345] Child process 5346 has been finished,
exit code 1. Watchdog finishes also. Good bye!
The solution was simple, ensure to specify a binlog_path inside the searchd configuration section of each configuration file:
searchd
{
[...]
binlog_path = /path/to/writable/directory
[...]
}