SUMMARY
I have installed zabbix on OpenShift cluster. I am trying to monitor a host(vm) outside the cluster but the zabbix server is unable to connect to it. In the /etc/zabbix/zabbix_agentd.conf file I have mentioned the DNS name of the server zabbix-server but it looks like there server is trying to connect through a different public IP. I am not sure what this IP is.
OS / ENVIRONMENT / Used docker-compose files
I applied the kubernetes.yaml file present in this repo - https://github.com/zabbix/zabbix-docker/blob/6.2/kubernetes.yaml - on an OpenShift cluster.
CONFIGURATION
In the /etc/zabbix/zabbix_agentd.conf file Server=zabbix-server.
STEPS TO REPRODUCE
Apply the kubernetes.yaml file on Openshift cluster and try to monitor any external vm.
EXPECTED RESULTS
The zabbix server should be able to connect to the vm.
ACTUAL RESULTS
Zabbix server logs.
Defaulted container "zabbix-server" out of: zabbix-server, zabbix-snmptraps
\*\* Updating '/etc/zabbix/zabbix_server.conf' parameter "DBHost": 'mysql-server'...added
287:20230120:060843.131 Zabbix agent item "system.cpu.load\[all,avg5\]" on host "Host-C" failed: first network error, wait for 15 seconds
289:20230120:060858.592 Zabbix agent item "system.cpu.num" on host "Host-C" failed: another network error, wait for 15 seconds
289:20230120:060913.843 Zabbix agent item "system.sw.arch" on host "Host-C" failed: another network error, wait for 15 seconds
289:20230120:060929.095 temporarily disabling Zabbix agent checks on host "Host-C": interface unavailable
Logs from the agent installed on the vm.
350446:20230122:103232.230 failed to accept an incoming connection: connection from "9.x.x.219" rejected, allowed hosts: "zabbix-server"
350444:20230122:103332.525 failed to accept an incoming connection: connection from "9.x.x.219" rejected, allowed hosts: "zabbix-server"
350445:20230122:103432.819 failed to accept an incoming connection: connection from "9.x.x.210" rejected, allowed hosts: "zabbix-server"
350446:20230122:103533.114 failed to accept an incoming connection: connection from "9.x.x.217" rejected, allowed hosts: "zabbix-server"
If I add this IP in /etc/zabbix/zabbix_agentd.conf it will work. But what IP is this? Is this a service? Or any node/pod IP? It keeps on changing. Everytime I cannot change this id in the conf file. I need something more stable.
Kindly help me out with this issue.
So I don't know zabbix. So I have to make some educated guesses both in how the agent works and how the server works.
But, to summarize, unlike something like docker compose where you are running the zabbix server on a known server, in Openshift/Kubernetes you are deploying into a cluster of machines with their own networking. In other words, the whole point of OpenShift is that OpenShift will control where the application's pod gets deployed and will relocate/restart that pod as needed. With a different IP every time. (And the DNS name is meaningless since the two systems aren't sharing DNS anyway.) Most likely the IP's you are seeing are the pod's randomly assigned IPs.
So, what are you to do when you have a situation like yours where an external application requires a predicable IP? Well, option 1, is to remove that requirement. Using something like a certificate is obviously more secure and more reliable than depending on an IP anyway. But another option is to use an egress IP. This is a feature of OpenShift where you essentially use a proxy to provide an external application with a consistent IP.
I'm trying to install SonarQube onto a bare-metal kubernetes cluster.
All is working except for when the software inside the pod tries to make a HTTPS request.
I've checked using wget (pod doesn't have curl & cannot use ping) & using the kubernetes DNS debugging guide, however whenever I make a call such as wget https://google.com I get the following error:
Connecting to google.com (192.168.1.179:443)
ssl_client: google.com: TLS connect failed
wget: error getting response: Connection reset by peer
command terminated with exit code 1
The IP address 192.168.1.179 is the address of another server on the host network.
The resolv.conf I have (ubuntu host) is:
nameserver 1.1.1.1
nameserver 1.0.0.1
I can't figure out why this is happening or how to fix it. DNS is working but not resolving HTTPS.
I'm using Calico, kubernetes dashboard, MetalLB, ingress-nginx & sonarqube
Edit:
After restarting the host the DNS servers successfully changed to 1.1.1.1.
However, now I'm presented with the following
Connecting to google.com (142.250.204.14:443)
ssl_client: google.com: TLS connect failed
wget: error getting response: Connection reset by peer
command terminated with exit code 1
This error went away when I:
Disabled my firewall (ufw) and;
Restarted the machine for the DNS changes to take affect.
I want to create Nagios core event handler whenever I stop apache service
Nagios log is generating and seems like it invoking event handler script, but is not executing it.
I am following these documents.
This is logs of nagios:
SERVICE ALERT: tecmint;HTTP load;CRITICAL;HARD;4;connect to address <ip> and port 80: Connection refused
[1607493385] SERVICE EVENT HANDLER: tecmint;HTTP load;CRITICAL;HARD;4;restart-httpd
Why Apache is not starting?
If you want to monitor and restart Apache in a remote server then you need to use SSH or NRPE with NRPE is preferred in this case as it is faster and doesn't require SSH kay pair exchange.
Briefly you would have 1 master Nagios server and 1 or more Nagios agent(s)
The master would run check_nrpe with some arguments to ask agent to check a service and optionally run an event handler (script)
like that
/usr/local/nagios/libexec/check_nrpe -H agent_IP_Address -c command
where is something like check_http which will be installed in Agent as a plugin
Master should have Nagios core installed
Agent should have NRPE agent and libexec installed
as in this manual:
https://assets.nagios.com/downloads/nagiosxi/docs/Installing_The_XI_Linux_Agent.pdf
Command, Hosts, and Services definitions will stay in the master
The script that restart Apache (the event handler) should be in the agent
This is a full reference of how to install and configure NRPE master-agent model
https://assets.nagios.com/downloads/nagioscore/docs/nrpe/NRPE.pdf?__hstc=118811158.538bdb251b7c177fd3206bea46d0e972.1616532961907.1616532961907.1616532961907.1&__hssc=118811158.11.1616532961908&__hsfp=2505829642
can't communicate with my Hyperledger Fabric's First-Network...
I can query and invoke from inside CLI docker container. Works fine!
But if i want to use Postman and Json to invoke or query from a client PC, than i get an error message in the orderer log:
[grpc] Printf -> DEBU fc9 grpc: Server.Serve failed to complete security handshake from "10.xx.xx.xxx:56694": tls: oversized record received with length 21536
The docker containers are on Suse Linux Server and not on locally VM.
I can ping my server and the Orderer-Container Port is mapped as default config(7050:7050)
I don't really know where to find the right cert.pem and key.pem files on the linux server filesystem. Tried different one in Postman = Option client certificates.
Also tried to search a solution but can't find a working one.
Hyperledger Fabric Peer and Orderer nodes only support direct communication using gRPC (which is protocol buffers over HTTP/2) APIs. They do not offer an HTTP/REST interface. Postman only supports HTTP endpoints so it will not work with peer or orderer nodes. (the error you see if also likely due to the fact that postman was not using HTTPS).
If you want to attempt to use REST with the peer and orderer nodes, you might want to check out https://github.com/hyperledger/fabric-sdk-rest which aims to provide a REST server in front of Hyperledger Fabric nodes.
I’m trying to have my dc/os 1.8 docker containers send log messages to a logstash that is also running in dc/os by using the service address of the logstash service.
that doesn’t appear to work as docker throws an error: logstash.marathon.l4lb.thisdcos.directory: no such host
are service addresses not exposed to the host systems (or do I need to configure something for this)?
on dc/os 1.7 I used a fixed host port in my logstash config and logstash.marathon.mesos as host, but these .marathon.mesos hostnames seem to not exist in 1.8 anymore.
the service addresses work fine when I try to use them from within a container (for example to link my prometheus service to my alertmanager service). but from the host level they don’t exist.
EDIT:
my statement about the missing marathon.mesos urls was wrong. they do work, but I uses the wrong one. for now this fixes my problem kind of. I configured logging using this host and a fixed container port.
for everybody trying the same thing: you have to configure the fixed host port everytime you make changes to the service config in the ui via the json mode. the fixed host port config is no longer available in the network tab of the ui, so the dc/os ui will DELETE the host port config on every load.
still no idea why the l4lb urls don't work.
EDIT2
still no idea, but i figured out that minuteman generates crash and error logs every other second:
/opt/mesosphere/active/minuteman/minuteman/error.log:
CRASH REPORT Process <0.25809.2> with 0 neighbours exited with reason: {timeout,{gen_server,call,[{lashup_kv,'navstar#10.2.140.216'},{start_kv_sync_fsm,'minuteman#10.2.103.143',<0.25809.2>}]}} in gen_server:call/2 line 204
/opt/mesosphere/active/minuteman/minuteman/log/crash.log
2016-10-12 13:16:49 =CRASH REPORT====
crasher:
initial call: lashup_kv_sync_tx_fsm:init/1
pid: <0.29002.2>
registered_name: []
exception exit: {{timeout,{gen_server,call,[{lashup_kv,'navstar#10.2.140.216'},{start_kv_sync_fsm,'minuteman#10.2.103.143',<0.29002.2>}]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,204}]},{lashup_kv_sync_tx_fsm,init,1,[{file,"/pkg/src/minuteman/_build/default/lib/lashup/src/lashup_kv_sync_tx_fsm.erl"},{line,23}]},{gen_statem,init_it,6,[{file,"gen_statem.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
ancestors: [lashup_kv_aae_sup,lashup_kv_sup,lashup_platform_sup,lashup_sup,<0.916.0>]
messages: []
links: [<0.992.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 27
reductions: 127
neighbours:
the dc/os ui claims spartan and minuteman are healthy, but while the crash.log of the dns dispatcher is empty the l4lb gets new crashes every other second.
They should certainly be available from the host OS. Are these host services running the "Spartan" and "Minuteman" services?
my problem was twofold:
the l4b did not properly run, that was only fixed after a total reinstall of the cluster
the l4b only supports TCP traffic. because i wanted to use it to send container-logs to logstash using udp (docker-gelf only supports UDP) this failed