Running a Sensu handler on the client instead of the server - sensu

I have the following problem: I am using sensu to monitor some raspberry pis. Im using standalone checks which works just fine. Now sometimes it might happen that one of the pis lost its wifi connection or just gets restarted manual and dhcp fails or for some other reason has no internet connection. The idea is to let the pi check it self for a internet connection and if the check fails it should solve the problem by it self like restarting wifi or reboot the pi.
Of course a simple bash script with a cronjob should do the job but I want to do the check with sensu. The problem is obvious if the check fails i don't have a internet connection and therefore can't send the check result to the sensu server.
Long story short ;) is it possible to implement something like the remediation feature just on the client? So that a handler on the client it self starts the script which should resolve the problem.

I don't think this is possible. Standalone checks are scheduled by the client but the check result us still published to the server. The result is then handled by the handler which resides on the server.
You could write a standalone "check" plugin which monitors the wifi and if it is off then it will turn it on. It isn't using a handler though.

Related

Local web server on windows stopped being reachable by devices on the same network

I use a local Python web server on my Windows machine. It’s simple, but good enough while in the static web page development stage. I just run it with something like this on my WSL command line:
python3 -m http.server
I can also access it on mobile devices on the same network, by going to my local address, e.g.: http://192.168.1.12:8000. All was good, until suddenly I could no longer access it on external devices, I got a “server not responding” type of message. Also, I could clearly see that when I refreshed the page on my phone, there was no GET request on the logs.
Immediately I tested on the local machine, and it was still working fine. This obviously smelled like a Firewall. In Linux, I’d know what to do, but it’s the first time I had to deal with this on Windows. This is what I’ve tried, without resolving the connection problem:
I opened the Event Viewer but could not see any obvious logs to check
I stopped the server (CTRL+C) and started it again on another port (5000). The Windows Firewall message popped up again asking for permission for Python3 to access the “Public network” and the “Private network”. Normally I just tick the “private network” but this time I checked both, as a troubleshooting step, in case my Wi-Fi was incorrectly being considered “public”.
I went to Windows Firewall and temporarily shut it down on the private network.
I installed and tried running nmap on the WSL, but it failed to run and prompted me to install the Windows version instead.
I installed and ran the Windows version of nmap but it told me that port 5000 was open.
What is the recommended way to troubleshoot and fix this issue?
Still suspecting the firewall, I tried something new, I switched off the “public network” firewall. I tested on my mobile and the page loaded as normal again! I immediately turned the firewall back on. Tested the page on my mobile once more, still fine. So, the solution was to toggle the public network firewall. I would make it more generic and toggle all firewall categories on Windows. And of course, I would make sure that the firewall stays on, this was a very quick operation.
I thought I’d put this here rather than ServerFault or SuperUser as it could potentially be more useful to developers, and it took a precious hour of my time. I still don’t know why it stopped working on its own in the first place. Better troubleshooting steps or suggestions are welcome, but I probably won’t be able to verify it as I don’t know how to purposely induce the issue.
Another solution that worked another time, was to delete all instances of Python 3.8 from the list of allowed apps (I don't know why Windows shows the same app multiple times) then (re)start the Python server and allow it through when the Firewall question pops up again.
In windows firewall you may have 4 options to configure your local web server when you are creating new Inbound connections rule.
1 Program
2 Port
3 Predefined
4 Custom
Try to use port only in "TCP protocol" and the custom port.
Allow connection.
Select: all checks: domain, private and public.
Enter a name.
Thats all.

TCP retransmission on RST - Different socket behaviour on Windows and Linux?

Summary:
I am guessing that the issue here is something to do with how Windows and Linux handle TCP connections, or sockets, but I have no idea what it is. I'm initiating a TCP connection to a piece of custom hardware that someone else has developed and I am trying to understand its behaviour. In doing so, I've created a .Net core 2.2 application; run on a Windows system, I can initiate the connection successfully, but on Linux (latest Raspbian), I cannot.
It appears that it may be because Linux systems do not try to retry/retransmit a SYN after a RST, whereas Windows ones do - and this behaviour seems key to how this peculiar piece of hardware works..
Background:
We have a black box piece of hardware that can be controlled and queried over a network, by using a manufacturer-supplied Windows application. Data is unencrypted and requires no authentication to connect to it and the application has some other issues. Ultimately, we want to be able to relay data from it to another system, so we decided to make our own application.
I've spent quite a long time trying to understand the packet format and have created a library, which targets .net core 2.2, that can be used to successfully communicate with this kit. In doing so, I discovered that the device seems to require a kind of "request to connect" command to be sent, via UDP. Straight afterwards, I am able to initiate a TCP connection on port 16000, although the first TCP attempt always results in a RST,ACK being returned - so a second attempt needs to be made.
What I've developed works absolutely fine on both Windows (x86) and Linux (Raspberry Pi/ARM) systems and I can send and receive data. However, when run on the Raspbian system, there seems to be problems when initiating the TCP connection. I could have sworn that we had it working absolutely fine on a previous build, but none of the previous commits seem to work - so it may well be a system/kernel update that has changed something.
The issue:
When initiating a TCP connection to this device, it will - straight away - reset the connection. It does this even with the manufacturer-supplied software, which itself then immediately re-attempts the connection again and it succeeds; so this kind of reset-once-then-it-works-the-second-time behaviour in itself isn't a "problem" that I have any control over.
What I am trying to understand is why a Windows system immediately re-attempts the connection through a retransmission...
..but the Linux system just gives up after one attempt (this is the end of the packet capture..)
To prove it is not an application-specific issue, I've tried using ncat/netcat on both the Windows system and the Raspbian system, as well as a Kali system on a separate laptop to prove it isn't an ARM/Raspberry issue. Since the UDP "request" hasn't been sent, the connection will never succeed anyway, but this simply demonstrates different behaviour between the OSes.
Linux versions look pretty much the same as above, whereby they send a single packet that gets reset - whereas the Windows attempt demonstrates the multiple retransmissions..
So, does anyone have any answer for this behaviour difference? I am guessing it isn't a .net core specific issue, but is there any way I can set socket options to attempt a retransmission? Or can it be set at the OS level with systemctl commands or something? I did try and see if there are any SocketOptionNames, in .net, that look like they'd control attempts/retries, as this answer had me wonder, but no luck so far.
If anyone has any suggestions as to how to better align this behaviour across platforms, or can explain the reason for this difference is at all, I would very much appreciate it!
Nice find! According to this, Windows´ TCP will retry a connection if it receives a RST/ACK from the remote host after sending a SYN:
... Upon receiving the ACK/RST client from the target host, the client determines that there is indeed no service listening there. In the Microsoft Winsock implementation of TCP, a pending connection will keep attempting to issue SYN packets until a maximum retry value is reached (set in the registry, this value defaults to 3 extra times)...
The value used to limit those retries is set in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpMaxConnectRetransmissions according to the same article. At least in Win10 Pro it doesn´t seem to be present by default.
Although this is a conveniece for Windows machines, an application still should determine its own criteria for handling a failed connect attempt IMO (i. e number of attempts, timeouts etc).
Anyhow, as I said, surprising fact! Living and learning I guess ...
Cristian.

My netty TCP/IP server stops listenning few hours after starting

I have written TCP/IP server using Netty4.0 running on a Linux machine listening to small GPS tracking devices. I have been facing weird problem, which is server stops listening to them in a sudden several hours after I starts it. There is any error log I can see and still server is running. It looks like only channel is not working. When I run a client to do health check, the client socket is still alive and keep sending packet to the server but server does not get it.
If you have any idea how to solve it, please tell me about it. It would be appreciated.
It is impossible to tell without more informations. I would check different things like if there was an OOM exception or with telnet if the server really refuse connections etc. Also jstack may show you if there is some deadlock etc.

Sockets won't connect after Windows 8.1 update

I am currently working on a project that involves the use of sockets. The program was working just fine, but after a windows 8.1 update on my computer, either the client socket is sending out a signal to be accepted or the server socket isn't receiving the signal so it can accept. I tried to use it on my emulator and it worked just fine. In addition, I have updated my GPU drivers to see if this could be fixed, but they still won't connect. The program doesn't crash and give me an error; it just sits there and waits. Does anyone have any ideas? Perhaps the update is blocking my peer to peer connection? Any help is much appreciated. Thanks.

How do I know if a system has powered on?

I am writing a script that powers on a system via network. And then i need to run a few commands on the other host. How do I know whether the system has powered on?
My programming language is Perl and the target host is RHEL5.
Is there any kernel interrupt or network boot information that indicates the system has powered on and the os has loaded?
[In a different scenario] I was also wondering just in case if i just switch on my Machine manually. when is it exactly said to have powered on. and when is the OS is supposed to have booted completely for a network related operation such as executing a network command there. What if the system is on DHCP how would a remote system then search for this machine [i guess it is possible via mac address. but if i am wrong ].
If I have missed out any info please feel free to ask me. If you have any suggestions to make the task easier please surface them :)
thanx
imkin
Well, I'd say the system is booted when it can perform the request you've made of it. That is, the sshd daemon is running. That's booted sufficiently for your purposes (I assume - substitute for whatever daemon you really need).
So, I'd send the power-on signal, and check back every 15-30 seconds to see if I could connect. If I've failed to connect within whatever is a reasonable time for that machine (2 minutes or 5 minutes or whatever), then I'd send an alert to the IT support team. Well, I'd send it to myself first, and only once I've investigated a few failures or so and found them to all be legitimate would I start sending it directly to IT.
DHCP is kind of a different question. You'd have to start learning about broadcasting, or having a daemon on that machine "call home" during boot to register its current IP address. And it would have to "call home" every time a DHCP renewal changed its IP address. This is decidedly more convoluted. Try to avoid DHCP on such server machines if at all possible.
On the rebooting machine you can install a script in your crontab with the special #reboot assertion (see man 5 crontab). That script could send a notification of some kind to the other machine, notifying it that it's up now.
I think checking for sshd sounds like a good approach.
As for the DHCP problem: if the other computer is on the same subnet you can look it up by MAC address using Net::ARP.
How about adding a script to the remote machine which gets run on startup to have it tell you when it is ready.