mongodb mms monitoring agent does not find group members - mongodb

I have installed the latest mongodb mms agent (6.5.0.456) on ubuntu 16.04 and initialised the replicaset. Hence I am running a single node replicaset with the monitoring agent enabled. The agent works fine, however it does not seem to actually find the replicaset member:
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Iterate:170] Received new configuration: Primary agent, Assigned 0 out of 0 plus 0 chunk monitor(s)
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Iterate:182] Nothing to do. Either the server detected the possibility of another monitoring agent running, or no Hosts are configured on the Group.
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Run:199] Done. Sleeping for 55s...
[2018/05/26 18:30:30.222] [discovery.monitor.info] [components/discovery.go:discover:746] Performing discovery with 0 hosts
[2018/05/26 18:30:30.222] [discovery.monitor.info] [components/discovery.go:discover:803] Received discovery responses from 0/0 requests after 891ns
I can see two processes for monitor agents:
/bin/sh -c /usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config >> /var/log/mongodb-mms/monitoring-agent.log 2>&1
/usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config
However if I terminate one, it also tears down the other, so I do not think that is the problem.
So, question is what is the Group that the agent is referring to. Where is that configured? Or how do I find out which Group the agent refers to and how do I check if the group is configured correctly.
The rs.config() looks fine, with one replicaset member, which has a host field, which looks just fine. I can use that value to connect to the instance using the mongo command. no auth is configured.
EDIT
It kind of looks that the cloud manager now needs to be configured with the seed host. Then it starts to discover all the other nodes in the replicaset. This seems to be different to pre-cloud-manager days, where the agent was able to track the rs - if I remember correctly... Probably there still is a way to get this done easier, so I am leaving this question open for now...

So, question is what is the Group that the agent is referring to. Where is that configured? Or how do I find out which Group the agent refers to and how do I check if the group is configured correctly.
Configuration values for the Cloud Manager agent (such as mmsGroupId and mmsApiKey) are set in the config file, which is /etc/mongodb-mms/monitoring-agent.config by default. The agent needs this information in order to communicate with the Cloud Manager servers.
For more details, see Install or Update the Monitoring Agent and Monitoring Agent Configuration in the Cloud Manager documentation.
It kind of looks that the cloud manager now needs to be configured with the seed host. Then it starts to discover all the other nodes in the replicaset.
Unless a MongoDB process is already managed by Cloud Manager automation, I believe it has always been the case that you need to add an existing MongoDB process to monitoring to start the process of initial topology discovery. Once a deployment is monitored, any changes in deployment membership should automatically be discovered by the Cloud Manager agent.
Production employments should have authentication and access control enabled, so in addition to adding a seed hostname and port via the Cloud Manager UI you usually need to provide appropriate credentials.

Related

Rundeck ansible inventory: static instead of dynamic

Deployed Rundeck (rundeck/rundeck:4.2.0) importing and discovering my inventory using Ansible Resource Model Source. Having 300 nodes, out of which statistically ~150 are accessible/online, the rest is offline (IOT devices). All working fine.
My challenge is when creating jobs i can assign only those nodes which are online, while i wanted to assign ALL nodes (including those offline) and keep retrying the job for the failed ones only. Only this way i could track the completeness of my deployment. Ideally i would love rundeck to be intelligent enough to automatically deploy the job as soon as my node goes back online.
Any ideas/hints how to achieve that ?
Thanks,
The easiest way is to use the health checks feature (only available on PagerDuty Process Automation On-Prem, formerly "Rundeck Enterprise"), in that way you can use a node filter only for "healthy" (up) nodes.
Using this approach (e.g: configuring a command health check against all nodes) you can dispatch your jobs only for "up" nodes (from a global set of nodes), this is possible using the .* as node filter and !healthcheck:status: HEALTHY as exclude node filter. If any "offline" node "turns on", the filter/exclude filter should work automatically.
On Ansible/Rundeck integration it works using the following environment variable: ANSIBLE_HOST_KEY_CHECKING=False or using host_key_checking=false on the ansible.cfg file (at [defaults] section).
In that way, you can see all ansible hosts in your Rundeck nodes, and your commands/jobs should be dispatched only for online nodes, if any "offline" node changes their status, the filter should work.

Dynamic port mapping for ECS tasks

I want to run a socket program in aws ecs with client and server in one task definition. I am able to run it when I use awsvpc network mode and connect to server on localhost every time. This is good so I don’t need to know the IP address of server. The issue is server has to start on some port and if I run 10 of these tasks only 3 tasks(= number of running instances) run at a time. This is clearly because 10 tasks cannot open the same port. I can manually check for open ports before starting the server and somehow write it to docker shared volume where client can read and connect. But this seems complicated and my server has unnecessary code. For the Services there is dynamic port mapping by using Application Load Balancer but there isn’t anything for simply running tasks.
How can I run multiple socket programs without having to manage the port number in Aws ecs?
If you're using awsvpc mode, each task will get its own eni and there shouldn't be any port conflict. But each instance type has a limited number of enis available. You can increase that by enabling eni trunking which, however is supported by a handful of instance types:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-supported-instance-types

Can "spring.cloud.consul.host" config value have multiple Consul agents?

I'm a bit confused with this configuration. My Spring Boot app with #EnableDiscoveryClient has spring.cloud.consul.host set to localhost. I'm running a Consul Agent on the host where my Boot app is running, but I've a few questions (can't seem to find my answers in the documentation).
Can this config accept multiple values?
If so, I'd prefer to set the values to a list of Consul server addresses (but then, what's the point of running Consul Agents at all, so this doesn't seem practical, which means I'm not understanding something here)
If not, are we expected to run a Consul Agent on every node a Boot app with #EnableDiscoveryClient is running? (this feels wrong as well; for one, this would seem like a single point of failure even though one agent should be able to tell everything about the cluster; what if I can't contact this one agent?)
What's the best practice for this configuration?
Actuallly this is Consul itself to solve your problem. An agent is runing on every server to handle clustering, failures, sharing data, autodiscovery etc. for you so that you don't neen to know the other hosts in your Spring Boot configuration. Spring Boot app always connects to the agent running on the same machine.
See https://www.consul.io/docs/agent/basics.html

How do I deploy an entire environment (group of servers) using Chef?

I have an environment (Graphite) that looks like the following:
N worker servers
1 relay server that forwards work to these worker servers
1 web server that can query the relay server.
I would like to use Chef to setup and deploy this environment in EC2 without having to create each worker server individually, get their IPs and set them as attributes in the relay cookbook, create that relay, get the IP, set it as an attribute in the web server cookbook, etc.
Is there a way using chef in which I can make sure that the environment is properly deployed, configured and running without having to set the IPs manually? Particularly, I would like to be able to add a worker server and have the relay update its worker list, or swap the relay server for another one and have the web server update its reference accordingly.
Perhaps this is not what Chef is intended for and is more for per-server configuration and deployment, if that is the case, what would be a technology that facilitates this?
Things you will need are:
knife-ec2 - This is used to start/stop Amazon EC2 instances.
chef-server - To be able to use search in your recipes. It should be also accessible from your EC2 instances.
search - with this you will be able to find among the nodes provisioned by chef, exactly the one you need using different queries.
I have lately written an article How to Run Dynamic Cloud Tests with 800 Tomcats, Amazon EC2, Jenkins and LiveRebel. It involves loadbalancer installation and loadbalancer must know all IP adresses of the servers it balances. You can check out the recipe of balanced node, how it looks for loadbalancer:
search(:node, "roles:lr-loadbalancer").first
And check out the loadbalancer recipe, how it looks for all the balanced nodes and updates the apache config file:
lr_nodes = search(:node, "role:lr-node")
template ::File.join( node[:apache2][:home], 'conf.d', 'httpd-proxy-balancer.conf' ) do
mode 0644
variables(:lr_nodes => lr_nodes)
notifies :restart, 'service[apache2]'
end
Perhaps you are looking for this?
http://www.infochimps.com/platform/ironfan

Condor central manager could not see the other computing nodes

I connect three servers to form an HPC cluster using condor as a middleware when I run the command condor_status from the central manager it does not shows the other nodes I can run jobs in the central manager and connect to the other nodes via SSH but it seems that there is something missing in condor configuration files where I set the central manager as condor host and allows writing and reading for everyone. I keep the daemon MASTER, STARTD in the daemon list for the worker nodes.
When I run condor_status in the central manager it just show the central manager and when I run it on the compute node it give me the error "CEDAR:6001:Failed to connect to" followed by the central manager IP and port number.
I manage to solve it. The problem was in the central manager's firewall (in my case it was iptables) which was running.
So, when I stopped the firewall (su -c "service iptables stop") all nodes appeared normally, typing condor_status".
The firewall status can be checked using "service iptables status".
There are a number of things that could be going on here. I'd suggest you follow this tutorial and see if it resolves your problems -
http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/
In my case the service "condor.exe" was not running on the server. I had stopped manually. I just start it and every thing went fine.