Custom Munin plugin won't report - plugins

I've built my first Munin plugin to give us the size of our Redis queue, but it won't report for some reason. Every other plugin on the node, including other Redis-centric plugins work fine.
Here's the plugin code:
#!/bin/sh
case $1 in
config)
cat <<'EOM'
multigraph redis_queue_size
graph_title Redis Queue Size
graph_info The size of Redis queue
graph_category redis
graph_vlabel Messages
redisqueue.label redisqueue
redisqueue.type GAUGE
redisqueue.min 0
EOM
exit 0;;
esac
queuelength=`redis-cli llen mykeyname`
printf "redisqueue.value "
echo $queuelength
The plugin is in /usr/share/munin/plugins/redis_queue_
The plugin is symlinked to /etc/munin/plugins/redis_queue_
I made sure to restart the service
$ sudo service munin-node force-reload
If I run sudo munin-run redis_queue_ I get the correct output:
redisqueue.value 1567595
If I run munin-node-config I get the following:
redis_queue_ | yes |
If I connect to the instance from the master using telnet to fetch the plugin, I get:
$ telnet 10.101.21.56 4949
Trying 10.101.21.56...
Connected to 10.101.21.56.
Escape character is '^]'.
# munin node at redis01.example.com
fetch redis_queue_
redisqueue.value 1035336
The master shows an empty graph for it, but the "last updated" time isn't increasing. I initially had the plugin configured a little differently (it wasn't producing good output) so all the values are -nan. Once I fixed the output, I expected the plugin to start working, but all efforts have failed.
Everything looks right, but yet still no values in the graph.
Edit: Munin v1.4.6

Related

Q: rails-tutorial Preview page not working

I get this screen after following the rails tutorial instructions to show the Hello, world text in a browser window (Figure 1.15 in the book).
I seem to recall having to specify both the PORT and IP environment variables the last time I ran through the tutorial...but now can't find any reference to these in the book text.
Preview Fails error
I had the same error, and I fixed it by running like this:
$ rails server -p $PORT -b $IP
=> Booting Puma
=> Rails 5.1.4 application starting in development
=> Run `rails server -h` for more startup options
Puma starting in single mode...
* Version 3.9.1 (ruby 2.4.0-p0), codename: Private Caller
* Min threads: 5, max threads: 5
* Environment: development
* Listening on tcp://0.0.0.0:8080
Use Ctrl-C to stop
I haven't yet figured out why I am getting the error, though.

PlayFramework Hangs After Days

The server run successfully at one time, but it hangs after days with no error logs. Then, all requests would not get the response.
This is the start command with options
sudo /opt/dev -Dhttps.port=443 -Dhttp.port=9000 -J-Xms3277m -J-Xmx3277m -J-XX:ParallelGCThreads=2 -J-Xmn2574M -J-XX:+UseConcMarkScMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-server &
/opt/dev is the script file generated from activator stage
===========server info==========
linux: Ubuntu 14.04.5 LTS
ram: 4G
openjdk version "1.8.0_141"
===========process info========
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND
15037 root 20 0 5978800 2.280g 31216 S 0.0 58.3 63:33.82 java
===========port info ===================
tcp6 :::9000 :::* LISTEN 15037/java
tcp6 :::443 :::* LISTEN 15037/java
===========other info==========
play version 2.3.2
scala version 2.11.1
akka setting
akka.jvm-exit-on-fatal-error = false
play.akka.jvm-exit-on-fatal-error = false
akka.default-dispatcher.fork-join-executor.pool-size-max =64
akka.actor.debug.receive = on
===========================================
These steps could help identify the problem.. or they could be just first steps in this direction.
Try to start with adding -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/where/to/put/hprof according you start script params think you need to use -J-XX instead of -XX. This will create heap-dump in case of OOM.
Add logging in endpoints (at start and at end) to be able to check if play receives request or even this does not happen.
While you have unresponsive play, try to check open file descriptors and compare it with your limits. To check you can find pid of your java process and call sudo ls -al /proc/7333/fd/|wc -l to see your limits use ulimit -a.
Would be nice to try to control akka queues. For the case if you use same dispatcher for frontend requests purposes and for some backoffice processing (dispatcher could be filled with long background tasks)
I would do all the diagnostic steps that Evgeny suggested, plus:
Change "akka.jvm-exit-on-fatal-error" and "play.akka.jvm-exit-on-fatal-error" to true, this may be masking your problem.
Take a stack dump of the running process when it is in this state and use that to identify the problem or post it here. See How to get a complete stack trace of a running java program that is taking 100% cpu?

wsadmin script timing out when executing against DMGR via SOAP

I'm attempting to start and stop an application on a single JVM via the wsadmin console since the Web UI for IBM BPM PS Adv. doesn't allow for that kind of operation. So, I have the following script:
https://gist.github.com/predatorian3/b8661c949617727630152cbe04f78d7e
and when I run it against the DMGR from the Cell Host, I receive the following errors.
[wasadmin#server01 ~]$ cat /usr/local/bin/Run_wsadmin.sh
#!/bin/bash
#
#
#
/opt/IBM/WebSphere/AppServer/bin/wsadmin.sh -lang jython -user serviceAccount -password password $*
[wasadmin#cessoapscrt00 ~]$ time Run_wsadmin.sh -f /opt/IBM/wsadmin/wsadmin_Restart_Application.py WPS00 CRT00WPS01 redirectResource_war
WASX7209I: Connected to process "dmgr" on node CRTDMGR using SOAP connector; The type of process is: DeploymentManager
WASX7303I: The following options are passed to the scripting environment and are available as arguments that are stored in the argv variable: "[WPS00, CRT00WPS01, redirectResource_war]"
WASX7017E: Exception received while running file "/opt/IBM/wsadmin/wsadmin_Restart_Application.py"; exception information: com.ibm.websphere.management.exception.ConnectorException
org.apache.soap.SOAPException: [SOAPException: faultCode=SOAP-ENV:Client; msg=Read timed out; targetException=java.net.SocketTimeoutException: Read timed out]
real 3m21.275s
user 0m17.411s
sys 0m0.796s
So, I'm not specifying the connection types, and using the default, which is SOAP. However, upon reading about the other Connection Types, none of them seem any better, but I attribute that to IBM Documentation vagueness. Is there an option to increase the timeout wait periods, or turn it off, or is there a better connection type?
Also running this directly on the wsadmin console, it seems that it is hanging up on gathering the application manager string.
[wasadmin#server01 ~]$ Run_wsadmin.sh
WASX7209I: Connected to process "dmgr" on node CRTDMGR using SOAP connector; The type of process is: DeploymentManager WASX7031I: For help, enter: "print Help.help()"
wsadmin>appManager = AdminControl.queryNames('cell=CRTCELL,node=WPS00,type=ApplicatoinManager,process=CRT00WPS01,*')
WASX7015E: Exception running command: "appManager = AdminControl.queryNames('cell=CRTCELL,node=WPS00,type=ApplicationManager,process=CRT00WPS01,*')"; exception information:
com.ibm.websphere.management.exception.ConnectorException
org.apache.soap.SOAPException: [SOAPException: faultCode=SOAP-ENV:Client; msg=Read timed out; targetException=java.net.SocketTimeoutException: Read timed out]
wsadmin>
You can increase timeout value in {profile}/properties/soap.client.props
com.ibm.SOAP.requestTimeout=180
If you want to turn off timeout, modify com.ibm.SOAP.requestTimeout=0
Or if you want longer timeout you can modify the value 180 to something else.
Also about your query command, I noticed that you have a typo on the MBean type, you had type=ApplicatoinManager, it should be type=ApplicationManager
HERE YOU GO -- I had the same issue. I want to override the timeout prop temporarily. This worked like a champ. Make sure you follow below steps exactly.I did some mistakes and the prop did not passed, I figured out and it works.
Copy the soap.client.props file from /properties and give it a new name such as mysoap.client.props.
Edit mysoap.client.props and update the value of com.ibm.SOAP.requestTimeout as required
Create a new Java properties file soap_override.props and enter the following line:
com.ibm.SOAP.ConfigURL=file:/mysoap.client.props
Pass soap_override.props into wsadmin using the -p option: wsadmin -p soap_override.props...
REFERENCE:
https://www.ibm.com/developerworks/community/blogs/timdp/entry/avoiding_wsadmin_request_timeouts_the_neat_way32?lang=en

Solaris svcs command shows wrong status

I have freshly installed an application on solaris 5.10 . When checked through ps -ef | grep hyperic | grep agent, process are up and running . When checked the status through svcs hyperic-agent command, the output shows that the agent is in maintenance mode . Application is working fine and I dont have any issues with the application . Please help
There are several reasons that lead to that behavior:
Starter (start/exec property of service) returned status that is different from SMF_EXIT_OK (zero). Than you may check logs:
# svcs -x ssh
...
See: /var/svc/log/network-ssh:default.log
If you check logs, you may see following messages that means, starter script failed or incorrectly written:
[ Aug 11 18:40:30 Method "start" exited with status 96 ]
Another reason for such behavior is that service faults during while its working (i.e. one of processes coredumps or receives kill signal or all processes exits) as described here: https://blogs.oracle.com/lianep/entry/smf_5_fault_retry_models
The actual system that provides SMF facilities for monitoring that is System Contracts. You may determine contract ID of online service with svcs -v (field CTID):
# svcs -vp svc:/network/smtp:sendmail
STATE NSTATE STIME CTID FMRI
online - Apr_14 68 svc:/network/smtp:sendmail
Apr_14 1679 sendmail
Apr_14 1681 sendmail
Than watch events with ctwatch:
# ctwatch 68
CTID EVID CRIT ACK CTTYPE SUMMARY
68 28 crit no process contract empty
Than there are two options to handle that:
There is a real problem with service so it eventually faults. Than debug the application.
It is normal behavior of service, so you should edit and re-import your service manifest, to make SMF less paranoid. I.e. configure ignore_error and duration properties.

python-memcache memcached -- I installed on centos virtualbox but it get/set never seem to work

I'm using python. I did a yum install memcached followed by a easy_install python-memcached
I used the simple test program from the Help(memcache). When I wasn't getting the proper answers I threw in some print statements:
[~/test]$ cat m2.py
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
x = mc.set("some_key", "Some value")
print 'Just set a key and value into the cache (suposedly)'
value = mc.get("some_key")
print 'Just retrieved that value from the cache using the key'
print 'X %s' % x
print 'Value %s' % value
[~/test]$ python m2.py
Just set a key and value into the cache (suposedly)
Just retrieved that value from the cache using the key
X 0
Value None
[~/test]$
The question now is, what have I failed to do in my installation? It appears to be working from an API perspective but it fails to put anything into the memcache share area.
I'm using a virtualbox vm running centos
[~]# cat /proc/version
Linux version 2.6.32-358.6.2.el6.i686 (mockbuild#c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Thu May 16 18:12:13 UTC 2013
Is there a daemon that is supposed to be running? I don't see an obvious named one when I do a ps.
I tried to get pylibmc installed on my vm but was unable to find a working installation so for now will see if I can get the above stuff working first.
I discovered if i ran straight from the python console GUI i get a bit more output if I set debug=1
>>> mc = memcache.Client(['127.0.0.1:11211'], debug=1)
>>> mc.stats
{}
>>> mc.set('test','value')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
0
>>> mc.get('test')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
When I try to use per the example telnet to connect to the port i get a connection refused:
[root#~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root#~]#
I tried the instructions I found on the net for configuring telnet so localhost wouldn't be disabled:
vi /etc/xinetd.d/telnet
service telnet
{
flags = REUSE
socket_type = stream
wait = no
user = root
server = /usr/sbin/in.telnetd
log_on_failure += USERID
disable = no
}
And then ran the commands to restart the service(s):
service iptables stop
service xinetd stop
service iptables start
service xinetd start
service iptables stop
I ran with both cases (iptables started and stopped) but it has no effect. So I am out of ideas. What do I need to do to make it so the PORT will be allowed? if that is the problem?
Or is there a memcached service that needs to be running that needs to open up the port ?
well this is what it took to get it working: ( a series of manual steps )
1) su -
cd /var/run
mkdir memcached # this was missing
In the memcached file I added "-l 127.0.0.1" to the OPTIONS statement. It's apparently a listen option. Do this for steps 2 & 3. I'm not certain which file is actually used at runtime.
2) cd /etc/sysconfig
cp memcached memcached.old
vi memcached
3) cd /etc/init.d
cp memcached memcached.old
vi memcached
4) Try some commands to see if the server starts now
/etc/init.d/memcached start
/etc/init.d/memcached status
/etc/init.d/memcached stop
/etc/init.d/memcached restart
I tried opening a browser, but it never seemed to actually display anything so I don't really know how valid this approach is. I'm not running apache or anything like this so perhaps its not relevant to my cause. Perhaps I would have to supply a ?key=blah or something.
5) http://127.0.0.1:11211
6) Now it should be ready to go. If one runs the test shown with the following it should work. At least it did for me. doing the help(memcache) will display a simple program. just paste that in and it should work just fine.
[~]$ python
>>> import memcache
>>> help(memcache)