Using Ubuntu 12.04 I am trying to set up a LAN cluster. The details:
Controller Config
# Configuration file for ipcontroller.
c = get_config()
c.IPControllerApp.reuse_files = True
c.IPControllerApp.engine_ssh_server = u'bar#bar1'
c.HubFactory.ip = '*'
c.HubFactory.db_class = 'NoDB'
Cluster Config
# Configuration file for ipcluster.
c = get_config()
c.IPClusterEngines.engine_launcher_class = 'SSH'
c.SSHEngineSetLauncher.engine_args = ['--profile-dir=~/.config/ipython/profile_foo']
c.SSHEngineSetLauncher.engines = {'foo#foo1' : 1, 'foo#foo2' : 1, 'foo#foo3' : 1, 'foo#foo4' : 1}
Engine config
# Configuration file for ipengine.
c = get_config()
c.EngineFactory.timeout = 10
So, then running
ipcluster start --profile=foo --debug
yields the following:
2013-09-03 19:43:45.772 [IPClusterStart] Process 'ssh' started: 5198
2013-09-03 19:43:45.773 [IPClusterStart] Process 'engine set' started: [None, None, None, None]
2013-09-03 19:43:47.086 [IPClusterStart] 2013-09-03 19:44:02.726 [IPEngineApp] Completed registration with id 0
2013-09-03 19:43:47.795 [IPClusterStart] 2013-09-03 19:43:53.737 [IPEngineApp] Completed registration with id 1
2013-09-03 19:43:48.561 [IPClusterStart] 2013-09-03 19:43:59.793 [IPEngineApp] Completed registration with id 2
2013-09-03 19:43:49.667 [IPClusterStart] 2013-09-03 19:44:03.859 [IPEngineApp] Completed registration with id 3
2013-09-03 19:44:15.773 [IPClusterStart] Engines appear to have started successfully
Looks good to me. But when I try to connect with a Client, I get less than the anticipated number of engines. This occurs even for 1 or 2 engines running on a single remote machine
In [22]: rc=Client(profile='foo')
In [23]: rc.ids
Out[23]: [1, 2]
I set the timeout high in case that was the issue, but it persists.
If I run ipcontroller and ipengines separately, the process succeeds, but I would really prefer being able to start and stop a cluster with ipcluster.
Related
Is it possible to add a comment/description to a supervisor process?
Right now when I run a status action i get:
supervisor> status api_stock
api_stock RUNNING pid 7875, uptime 0:10:09
And I would like something like:
api_stock RUNNING pid 7875, uptime 0:10:09 desc: This app is maintained by dev team 2
Setup Details
2 ejabberd nodes with postgresql as database (OS : Ubuntu 16.04)
Trying to do clustering of two ejabberd as mentioned in
https://docs.ejabberd.im/admin/guide/clustering/
After starting the master node the following steps have been performed on the slave node
copy .erlang.cookie to the slave node
copy ejabbed.yml from master to slave.
slave started successfully but shows the below error.
=====Error=========
Eshell V9.2 (abort with ^G)
(ejabberd#gim-Veriton-M6650G)1> 18:29:41.856 [notice] Changed loghwm of /usr/local/var/log/ejabberd/error.log to 100
18:29:41.856 [notice] Changed loghwm of /usr/local/var/log/ejabberd/ejabberd.log to 100
18:29:41.857 [info] Application lager started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.860 [info] Application crypto started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.865 [info] Application sasl started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.871 [info] Application asn1 started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.871 [info] Application public_key started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.880 [info] Application ssl started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.881 [info] Application p1_utils started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.883 [info] Application fast_yaml started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.888 [info] Application fast_tls started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.892 [info] Application fast_xml started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.895 [info] Application stringprep started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.899 [info] Application xmpp started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.903 [info] Application cache_tab started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.910 [info] Application eimp started on node 'ejabberd#gim-Veriton-M6650G'
18:29:41.910 [info] Loading configuration from /usr/local/etc/ejabberd/ejabberd.yml
18:29:41.913 [error] CRASH REPORT Process <0.67.0> with 0 neighbours exited with reason: no case clause matching <<>> in ejabberd_config:get_config_option_key/2 line 473 in application_master:init/4 line 134
18:29:41.913 [info] Application ejabberd exited with reason: no case clause matching <<>> in ejabberd_config:get_config_option_key/2 line 473
(ejabberd#gim-Veriton-M6650G)1>
I've tried re creating mnesia DB also but didn't help.
ejabberdctl status shows ejabberd is not running in that node
Can some oe please look into the issue and help.
Finally I found the solution to the problem
The issue is with the node name as the node name of the master ia FQ name
but the slave node's name is without a domain.
Also added both the node names in the /etc/hosts file
For ejabberd clustering ,Please refer the below steps.
Before starting , configure proper entries in the /etc/hosts files of both nodes.
ie the nodes should resolve each other using their host names.
set ejaberd node name in ejabberd.cfg file , both the nodes should have different node names.
1.cofigure ejabberd in one master node with a proper node name (either a FQDN or just a name of your convenience)
2.Configure slave node with the same config as that of master ie. bot the nodes should have the same configuration in ejabberd.yml file)
3.copy erlang.cookie from master node to slave and the ejabberd user should be ale to read the cookie file.
4.Start the master node in live mode (ejabberdctl live )
5.Start slave node in live mode
6.Check the cookie value in erlang console of both the nodes using the command 'erlang:get_cookie().' , both the nodes should have the same value.
7.If bot the nodes have same value then execute "ejabberdctl --not-timeout join_cluser ejabberd#nodename" in the slave.
change ejabberd#nodename according to your environment.
In my case I ran ejabberd with 'ejabberd' user with node name as ejabberd#cluster-node1 (If you want you can use a FQDN also like ejabberd#example.com)
8.If the abode command executed without any error then the nodes are in cluster
9.Confirm the cluster in any of the erlang console using the command mnesia:info(). here you will get the node details in "running_db_nodes"
10.Hurrayyyy you are done...
For load balancing the cluster you can use HAProxy
Please refer https://blog.onefellow.com/post/76702632637/haproxy-and-ejabberd for details
I've not done load balancing using any hardware load balancer , need to check on that
If anyone have done that please do post here ..
I am getting an error when submitting a COMPSs application at a cluster
I execute this
$ enqueue_compss --lang=python /home/bsc21/bsc21863/simple.py 3
And receive this error,
/gpfs/apps/MN3/COMPSs/1.4/Runtime/scripts/user/../queues/lsf/submit.sh:
line 1: #!/bin/bash: No such file or directory
Queue: default
Reservation disabled
Num Nodes: 2
Num Switches: 0
Job dependency: None
Exec-Time: 00:10
Network: ethernet
Node memory: disabled
Tasks per Node: 16
Tasks in Master: 0
Master Port: 43306
Master WD: .
Worker WD: scratch
Library Path: .
Classpath: .
COMM: integratedtoolkit.nio.master.NIOAdaptor
To COMPSs: --lang=python /home/bsc21/bsc21863/simple.py 3
The submission seems correct. Ignore the print of log line 1: #!/bin/bash: No such file or directory. I do not know why it is appearing, but it doesn't stop the submission. Have you check if there was a job submitted after the command execution or some .out or .err files have been appeared later?
I’m trying to install check_mk agent with the standard check_mk xinetd config file via puppet on a Debian 7 server.
Check_mk installs without a problem but I've got an issue with the xinetd config.
When I change the port in the source config file on the puppet master and run puppet agent -t on the client host the new configuration is deployed correctly but puppet doesn't reload the xinetd service because the system can't recognize the state of the xinetd service.
The puppet manifest looks like this:
class basic::check-mk {
case $operatingsystem {
debian: {
package {'check-mk-agent':
ensure => present,
}
file { '/etc/xinetd.d/check_mk':
notify => Service['xinetd'],
ensure => file,
source => 'puppet:///modules/basic/etc--xinetd--checkmk',
mode => '0644',
}
service { 'xinetd':
ensure => running,
enable => true,
restart => '/etc/init.d/xinetd reload',
}
}
}
}
The debug looks like this:
info: Applying configuration version '1464186485'
debug: /Stage[main]/Ntp::Config/notify: subscribes to Class[Ntp::Service]
debug: /Stage[main]/Ntp/Anchor[ntp::begin]/before: requires Class[Ntp::Install]
debug: /Stage[main]/basic::Check-mk/Service[xinetd]/subscribe: subscribes to File[/etc/xinetd.d/check_mk]
debug: /Stage[main]/Ntp::Install/before: requires Class[Ntp::Config]
debug: /Stage[main]/Ntp::Service/before: requires Anchor[ntp::end]
debug: /Schedule[daily]: Skipping device resources because running on a host
debug: /Schedule[monthly]: Skipping device resources because running on a host
debug: /Schedule[hourly]: Skipping device resources because running on a host
debug: Prefetching apt resources for package
debug: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n''
debug: Puppet::Type::Package::ProviderApt: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n''
debug: /Schedule[never]: Skipping device resources because running on a host
debug: file_metadata supports formats: b64_zlib_yaml pson raw yaml; using pson
debug: /Stage[main]/basic::Check-mk/File[/etc/xinetd.d/check_mk]/content: Executing 'diff -u /etc/xinetd.d/check_mk /tmp/puppet-file20160525-10084-1vrr8zf-0'
notice: /Stage[main]/basic::Check-mk/File[/etc/xinetd.d/check_mk]/content:
--- /etc/xinetd.d/check_mk 2016-05-25 14:57:26.220873468 +0200
+++ /tmp/puppet-file20160525-10084-1vrr8zf-0 2016-05-25 16:28:06.393363702 +0200
## -25,7 +25,7 ##
service check_mk
{
type = UNLISTED
- port = 6556
+ port = 6554
socket_type = stream
protocol = tcp
wait = no
debug: Finishing transaction 70294357735140
info: FileBucket got a duplicate file {md5}cb0264ad1863ee2b3749bd3621cdbdd0
info: /Stage[main]/basic::Check-mk/File[/etc/xinetd.d/check_mk]: Filebucketed /etc/xinetd.d/check_mk to puppet with sum cb0264ad1863ee2b3749bd3621cdbdd0
notice: /Stage[main]/basic::Check-mk/File[/etc/xinetd.d/check_mk]/content: content changed '{md5}cb0264ad1863ee2b3749bd3621cdbdd0' to '{md5}56ac5c1a50c298de4999649b27ef6277'
debug: /Stage[main]/basic::Check-mk/File[/etc/xinetd.d/check_mk]: The container Class[basic::Check-mk] will propagate my refresh event
info: /Stage[main]/basic::Check-mk/File[/etc/xinetd.d/check_mk]: Scheduling refresh of Service[xinetd]
debug: Service[ntp](provider=debian): Executing '/etc/init.d/ntp status'
debug: Service[xinetd](provider=debian): Executing '/etc/init.d/xinetd status'
debug: Service[xinetd](provider=debian): Executing '/etc/init.d/xinetd start'
notice: /Stage[main]/basic::Check-mk/Service[xinetd]/ensure: ensure changed 'stopped' to 'running'
debug: /Stage[main]/basic::Check-mk/Service[xinetd]: The container Class[basic::Check-mk] will propagate my refresh event
debug: Service[xinetd](provider=debian): Executing '/etc/init.d/xinetd status'
debug: /Stage[main]/basic::Check-mk/Service[xinetd]: Skipping restart; service is not running
notice: /Stage[main]/basic::Check-mk/Service[xinetd]: Triggered 'refresh' from 1 events
debug: /Stage[main]/basic::Check-mk/Service[xinetd]: The container Class[basic::Check-mk] will propagate my refresh event
debug: Class[basic::Check-mk]: The container Stage[main] will propagate my refresh event
debug: /Schedule[weekly]: Skipping device resources because running on a host
debug: /Schedule[puppet]: Skipping device resources because running on a host
debug: Finishing transaction 70294346109840
debug: Storing state
debug: Stored state in 0.01 seconds
notice: Finished catalog run in 1.43 seconds
debug: Executing '/etc/puppet/etckeeper-commit-post'
debug: report supports formats: b64_zlib_yaml pson raw yaml; using pson
The following line seems suspicious to me:
debug: /Stage[main]/basic::Check-mk/Service[xinetd]: Skipping restart; service is not running
And service --status-all says [ ? ] xinetd. Why does the system not recognize the state of the service?
Your debug log and the output of your manual service command suggest that your xinetd does not have a working status subcommand. As a result, Puppet does not know how (or whether) to manage its run state.
You could consider fixing the initscript to recognize the status subcommand and make an LSB-compliant response (or at least to exit with code 0 if the service is running and anything else otherwise). Alternatively, you can add a status attribute to the Service resource, giving an alternative command that Puppet can use to ascertain the service's run state. (I have linked to the current docs, but I'm pretty sure that Service has had that attribute since well before Puppet 2.7.)
SOLVED: To fix the problem I had to add a status section to the init.d script of xinetd. Afterwards service xinetd status and puppet were able to recognize the status of the service. The added section looks like this:
status)
if pidof xinetd > /dev/null
then
echo "xinetd is running."
exit 0
else
echo "xinetd is NOT running."
exit 1
fi
;;
Additionaly I added the status option to the Usage line:
*)
echo "Usage: /etc/init.d/xinetd {start|stop|reload|force-reload|restart|status}"
exit 1
;;
This solved the problem.
When I run Chef-Client in PowerShell and allow the process to output to the screen using the following command:
& Chef-Client -z -r "chef-cookbook"
I get this output:
[2014-11-10T07:20:40-08:00] WARN: No config file found or specified on command line, using command line options.
Starting Chef Client, version 11.16.4
resolving cookbooks for run list: ["chef-cookbook"]
Synchronizing Cookbooks:
- chef-cookbook
- powershell-automation
Compiling Cookbooks...
Converging 2 resources
Recipe: powershell-automation::Port_Configuration
* powershell_script[Port_Configuration] action run (skipped due to not_if)
Recipe: powershell-automation::IIS_InstallAutomation
* powershell_script[IIS_InstallAutomation] action run (skipped due to not_if)
Running handlers:
Running handlers complete
Chef Client finished, 0/0 resources updated in 43.69728 seconds
When I run the same command, but capture it to a variable, using the following command:
$chefOutput = & Chef-Client -z -r "chef-cookbook"
The $chefOutput variable contains:
[2014-11-10T07:23:01-08:00] WARN: No config file found or specified on command line, using command line options.
[2014-11-10T07:23:01-08:00] INFO: Auto-discovered chef repository at C:/Temp
[2014-11-10T07:23:01-08:00] INFO: Starting chef-zero on host localhost, port 8889 with repository at repository at C:/Temp
One version per cookbook
[2014-11-10T07:23:06-08:00] INFO: *** Chef 11.16.4 ***
[2014-11-10T07:23:06-08:00] INFO: Chef-client pid: 3364
[2014-11-10T07:23:37-08:00] INFO: Setting the run_list to [recipe[chef-cookbook]] from CLI options
[2014-11-10T07:23:37-08:00] INFO: Run List is [recipe[chef-cookbook]]
[2014-11-10T07:23:37-08:00] INFO: Run List expands to [chef-cookbook]
[2014-11-10T07:23:37-08:00] INFO: Starting Chef Run for XXXXX.XXX.XXX.XXX.com
[2014-11-10T07:23:37-08:00] INFO: Running start handlers
[2014-11-10T07:23:37-08:00] INFO: Start handlers complete.
[2014-11-10T07:23:37-08:00] INFO: HTTP Request Returned 404 Not Found : Object not found: /reports/nodes/XXXXX.XX.XX.XX.com/runs
[2014-11-10T07:23:37-08:00] INFO: Loading cookbooks [chef-cookbook#2015.1.0, powershell-automation#2015.1.0]
[2014-11-10T07:23:37-08:00] INFO: Processing powershell_script[Port_Configuration] action run (powershell-automation::Port_Configuration line 22)
[2014-11-10T07:23:37-08:00] INFO: Processing bash[Guard resource] action run (dynamically defined)
[2014-11-10T07:23:38-08:00] INFO: bash[Guard resource] ran successfully
[2014-11-10T07:23:38-08:00] INFO: Processing powershell_script[IIS_InstallAutomation] action run (powershell-automation::IIS_InstallAutomation line 16)
[2014-11-10T07:23:43-08:00] INFO: Chef Run complete in 6.346486 seconds
[2014-11-10T07:23:43-08:00] INFO: Running report handlers
[2014-11-10T07:23:43-08:00] INFO: Report handlers complete
Why does this discrepancy between outputs happen?
Note: I am seeing that the output in the variable also contains the time stamps and INFO tags for each line. Based on this, I believe this is something to do with how Chef outputs vs something to do with PowerShell.
It checks if stdout is a TTY.