How to override Computer Not Reachable in scom? - scom

I'm trying to override monitor Computer Not Reachable to my group and Management pack, to alert me if connection is fail, but there is no anything when connection is break. Can anyone help me? Is it possible to override Computer Not Reachable?
Thanks!

The "Computer Not Reachable" pseudo-monitor triggers an alert when SCOM cannot ping a monitored computer. So, it has nothing in common with broken connection between a SCOM agent and a SCOM Management group. NB! Ping test only starts in response of communication failure (see further), it does not run continuously.
On the other hand, the "Health Service Heartbeat Failure" pseudo-monitor is exactly what you are looking for: it alerts when a SCOM Agent stops reporting back to its Management Group.
I said "pseudo-monitor" because they both are not real unit monitors, but fake roll-up monitors, which have no down-laying monitors -- so have nothing to roll up, but set programmatically instead.
Cheers
Max

Related

Google Cloud SQL always on

I've a Google Cloud SQL instance with followings settings:
pricingPlan: PER_USE
activationPolicy: ON_DEMAND
I have added an IPv4 address.
I have verified with Google Cloud SQL API that the settings are well saved.
Problem: I do not have any active query but the instance never stop, charging me 24h per day.
I'm sure the connections don't come from me. I've deleted all authorized networks and I've reboot the instance, but I still always have 1 active connection.
Is the someone that have the same problem?
Many thanks,
Loic
As Juan Munoz says, there is always 1 active connection while your instance is running, but that won't make your instance keep running.
If you are being charged continuously even though you have set activationPolicy=ON_DEMAND and have no authorized networks you might want to check if you have an authorized App Engine app and whether it is connecting to your instance.
Also your instance will be constantly active, regardless of activationPolicy, if it is a replication master. Because the slave keeps a connection open the master will never be able to shut down. I doubt this is occurring here as I imagine the slave's connection would have appeared on your active connections graph.

Azure drops TCP connection after a few packets

I'm having a strange problem trying to maintain TCP connections from my local PC to Azure (oddly remote desktop works fine). I first noticed the problem with my own software, but it's not limited to it. What I've noticed is:
TCP 3 way handshake completes
Some data is successfully sent and received
Something bad happens and no more data is sent
To rule out my software, I tested netcat. On my Azure machine I set up a netcat server to echo a large text file. On my local PC I established the netcat connection to the Azure server and observed some of the text file being printed and then it just stopped.
The first Wireshark image is from the Azure server, and the second image is from my PC. Both were captured at the same time doing the netcat test I described above.
Here is my Azure endpoint configuration (same result with both endpoints):
I'm currently at a loss, and don't know enough about what the problem may be to continue my debugging efforts. Any suggestions would be greatly appreciated.
Thanks!
Some guy from Azure support told me that Azure Network's max. TCP packet size is 1350. So if your packets larger than this, it might be a problem. try to limit them to 1300 and test it again.
There are several things that could cause these kind of n/w problems such as below.
Azure SLB/SNAT
Azure Physical n/w issues(ACLs on Azure routers where your vms reside)
Proxies in front of your client applications etc.
We should systematically prune the problem space while tackling these kind of problems. For example, in this instance, you should run your client app on one of the Azure VMS and verify whether the issue is reproducible or not. If the issue is not reproducible, your problem must reside on your local pc n/w(or proxy behind which your machine resides) for 99% of time.
If you found the issue is reproducible from another Azure VM too, there must be something wrong on Azure side for 99% of time.
Some tips to identify issues on Azure side
Check whether your tcp connection is idle(not sending) data for 4 minutes. If that's the case, Azure SLB/SNAT layer drops the tcp connection on azure side. You could prevent this issue by either sending tcp keep alives or increasing vm endpoint idle time out using AzureEndpoint.
Hope this helps.

MassTransit MSMQ remote queues not reachable

We've developed a MassTransit based demo which is working well as long as all processes run on the same server.
However, as sonn as my receiver wants to subscribe himself at another machine it hangs for a while and afterwards we receive the following exception:
"System.InvalidOperationException: Timeout waiting for subscription service to respond."
Checked already: Firewall rules for MSMQ (inbound and outbound), network, etc.
What could have gone wrong?
Subscription queues on the other machine are private? Is this a problem?
Do we have to change the address format in some special way? msmq://computerName/queueName not ok for remote connections?
Looks like we have forgotten some tiny thing, as nobody else had this problem before...
The most likely thing is MT assumes somethings about remote queues - that they are transactional. Local queues can be queried to discover that, but remote queues you cannot. I would add ?tx=false to the end of your remote queue URI if you aren't using transactional queues for the subscription service.
Next, double check the outgoing queues on the local machine. Are the message stuck there or did they disappear? If you are using transactional queues, can the machines enroll in a DTC transaction together?
Answering your question, all queues are private. This is not a problem, they are still remotely addressable.
I hope this helps get you further. After that, I would consider joining the mailing list and posting your questions there: https://groups.google.com/forum/?fromgroups#!forum/masstransit-discuss
For reference: the problem has been a wrong URL in the receiver queue, the receiver queue always resides on the local system of course. Sorry for any inconvenience.

Session getting disconnected in the middle of working

Sessions are getting disconnected automatically (in the middle of working).
Disconnection happens for the users when they working by using telnet connection to Linux server via putty telnet application.
During the disconnection, the Network b/w utilization is high and no limitation for total number of users in a network.
Error "Hangup signal received (562)"
Any idea about this ??
The network connection was interrupted or a hangup signal was sent via "kill".
You mention network utilization being "high" when disconnects happen. How do you know that? What measurement are you looking at that tells you it is "high"? That might be a symptom of a networking issue that is at the root of the problem.
There are few directions:
OpenEdge has published this article with links to implementing keep-alive packets:
https://knowledgebase.progress.com/articles/Article/Telnet-connection-times-out-after-15-minutes
Increase the number of "instances" in xinetd.conf, and then restart the service.
Make sure that the database watchdog is up and running: https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dmadm/prowdog-command.html
Check the database log file, to find out what happened just before the hangup (https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/gsins/openedge-database-log-file.html)

How do I know if a system has powered on?

I am writing a script that powers on a system via network. And then i need to run a few commands on the other host. How do I know whether the system has powered on?
My programming language is Perl and the target host is RHEL5.
Is there any kernel interrupt or network boot information that indicates the system has powered on and the os has loaded?
[In a different scenario] I was also wondering just in case if i just switch on my Machine manually. when is it exactly said to have powered on. and when is the OS is supposed to have booted completely for a network related operation such as executing a network command there. What if the system is on DHCP how would a remote system then search for this machine [i guess it is possible via mac address. but if i am wrong ].
If I have missed out any info please feel free to ask me. If you have any suggestions to make the task easier please surface them :)
thanx
imkin
Well, I'd say the system is booted when it can perform the request you've made of it. That is, the sshd daemon is running. That's booted sufficiently for your purposes (I assume - substitute for whatever daemon you really need).
So, I'd send the power-on signal, and check back every 15-30 seconds to see if I could connect. If I've failed to connect within whatever is a reasonable time for that machine (2 minutes or 5 minutes or whatever), then I'd send an alert to the IT support team. Well, I'd send it to myself first, and only once I've investigated a few failures or so and found them to all be legitimate would I start sending it directly to IT.
DHCP is kind of a different question. You'd have to start learning about broadcasting, or having a daemon on that machine "call home" during boot to register its current IP address. And it would have to "call home" every time a DHCP renewal changed its IP address. This is decidedly more convoluted. Try to avoid DHCP on such server machines if at all possible.
On the rebooting machine you can install a script in your crontab with the special #reboot assertion (see man 5 crontab). That script could send a notification of some kind to the other machine, notifying it that it's up now.
I think checking for sshd sounds like a good approach.
As for the DHCP problem: if the other computer is on the same subnet you can look it up by MAC address using Net::ARP.
How about adding a script to the remote machine which gets run on startup to have it tell you when it is ready.