Ganglia No matching metrics detected - ganglia

We are getting error as "No matching metrics detected". cluster level metrics are visible.
ganglia core 3.6.0
ganglia web 3.5.12
Please help to resolve this issue.
Regards,
Jayendra

Somewhere, in a .conf file (or .pyconf, et al,) you must specify a 'collection_group' with a list of the metrics you want to collect. From the default gmond.conf, it should look similar to this:
collection_group {
collect_once = yes
time_threshold = 1200
metric {
name = "cpu_num"
title = "CPU Count"
}
metric {
name = "cpu_speed"
title = "CPU Speed"
}
metric {
name = "mem_total"
title = "Memory Total"
}
}
You may use wildcards to match the name.
You'll also need to include the module that provides the metrics you are looking to collect. Again, the example gmond.conf contains something like this:
modules {
module {
name = "core_metrics"
}
module {
name = "cpu_module"
path = "modcpu.so"
}
}
among others.
You can generate an example gmond.conf by typing
gmond -t > /usr/local/etc/gmond.conf
This path is correct for ganglia-3.6.0, I know that many file paths have changed several times since 3.0...
A good reference book is 'Monitoring with Ganglia.' I'd recommend getting a copy if you're going to be getting very deeply involved with configuring / maintaining a ganglia installation.

When summary/cluster graphs are visible, but individual host graph data is not, this might be caused by a mismatch of hostname case (between reported hostname and rrd graph directory names).
Check /var/lib/ganglia/rrds/CLUSTER-NAME/HOSTNAME
This will show you what case the hostnames are getting their graphs generated as.
If the case does not match their hostname, edit: /etc/ganglia/conf.php (this allows overrides to defaults at: /usr/share/ganglia/conf_default.php)
Add the following line:
$conf['case_sensitive_hostnames'] = false;
Another place to check for case sensitiviy is the gmetad settings at /etc/ganglia/gmetad
case_sensitive_hostnames 0
Versions This Was Fixed On:
OS: CentOS 6
Ganglia Core: 3.7.2-2
Ganglia Web: 3.7.1-2
Installed via EPEL

Related

How to get internal ip of postgreSQL DB in GCP created by Terraform

I am learning terraform deployments coupled with GCP to streamline deployments.
I have successfully deployed a postgreSQL db.
Now I am trying to utilize terraform outputs to write a the private ip generated by the postgreSQL DB server to the output directory where terraform is initiated from.
What is not clear to me is:
(1) The output is defined within the same main.tf file?
(2) Where is the output parameters referenced from? I cannot find the documentation to properly aline. Such I keep getting the error upon applying: Error: Reference to undeclared resource
My main.tf looks like this
resource "google_sql_database_instance" "main" {
name = "db"
database_version = "POSTGRES_12"
region = "us-west1"
settings {
availability_type = "REGIONAL"
tier = "db-custom-2-8192"
disk_size = "10"
disk_type = "PD_SSD"
disk_autoresize = "true"
}
}
output "instance_ip_addr" {
value = google_sql_database_instance.private_network.id
description = "The private IP address of the main server instance."
}
As for the code style, usually there would be a separate file called outputs.tf where you would add all the values you want to have outputted after a successful apply. The second part of the question is two-fold:
You have to understand how references to resource attributes/arguments work [1][2]
You have to reference the correct logical ID of the resource, i.e., the name you assigned to it, followed by the argument/attribute [3]
So, in your case that would be:
output "instance_ip_addr" {
value = google_sql_database_instance.main.private_ip_address # <RESOURCE TYPE>.<NAME>.<ATTRIBUTE>
description = "The private IP address of the main server instance."
}
[1] https://www.terraform.io/language/expressions/references#references-to-resource-attributes
[2] https://www.terraform.io/language/resources/behavior#accessing-resource-attributes
[3] https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/sql_database_instance#attributes-reference
To reference an attribute of a resource, you should put something like:
[resource type].[resource name].[attribute]
In this case, the output should be:
output "instance_ip_addr" {
value = google_sql_database_instance.main.private_ip_address
description = "The private IP address of the main server instance."
}
The output attributes are listed in the documentation. It's fine to put that in main.tf.

How to get YARN "Memory Total" and "VCores Total" metrics programmatically in pyspark

I'm lingering around this:
https://docs.actian.com/vectorhadoop/5.0/index.html#page/User/YARN_Configuration_Settings.htm
but none of those configs are what I need.
"yarn.nodemanager.resource.memory-mb" was promising, but it's only for the node manager it seems and only gets master's mem and cpu, not the cluster's.
int(hl.spark_context()._jsc.hadoopConfiguration().get('yarn.nodemanager.resource.memory-mb'))
You can access those metrics from Yarn History Server.
URL: http://rm-http-address:port/ws/v1/cluster/metrics
metrics:
totalMB
totalVirtualCores
Example response (can be also XML):
{ "clusterMetrics": {
"appsSubmitted":0,
"appsCompleted":0,
"appsPending":0,
"appsRunning":0,
"appsFailed":0,
"appsKilled":0,
"reservedMB":0,
"availableMB":17408,
"allocatedMB":0,
"reservedVirtualCores":0,
"availableVirtualCores":7,
"allocatedVirtualCores":1,
"containersAllocated":0,
"containersReserved":0,
"containersPending":0,
"totalMB":17408,
"totalVirtualCores":8,
"totalNodes":1,
"lostNodes":0,
"unhealthyNodes":0,
"decommissioningNodes":0,
"decommissionedNodes":0,
"rebootedNodes":0,
"activeNodes":1,
"shutdownNodes":0 } }
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API
All you need is to figure out your Yarn History Server address and port- check in your configuration files, can't help you with this since I don't know where do you manage Yarn.
When you have the URL, access it with python:
import requests
url = 'http://rm-http-address:port/ws/v1/cluster/metrics'
reponse = requests.get(url)
# Parse the reponse json/xml and get the relevant metrics...
Of course no Hadoop or Spark Context is needed in this solution

JPAM Configuration for Apache Drill

I'm trying to configure PLAIN authentification based on JPAM 1.1 and am going crazy since it doesnt work after x times checking my syntax and settings. When I start drill with cluster-id and zk-connect only, it works, but with both options of PLAIN authentification it fails. Since I started with pam4j and tried JPAM later on, I kept JPAM for this post. In general I don't have any preferences. I just want to get it done. I'm running Drill on CentOS in embedded mode.
I've done anything required due to the official documentation:
I downloaded JPAM 1.1, uncompressed it and put libjpam.so into a specific folder (/opt/pamfile/)
I've edited drill-env.sh with:
export DRILLBIT_JAVA_OPTS="-Djava.library.path=/opt/pamfile/"
I edited drill-override.conf with:
drill.exec: {
cluster-id: "drillbits1",
zk.connect: "local",
impersonation: {
enabled: true,
max_chained_user_hops: 3
},
security: {
auth.mechanisms: ["PLAIN"],
},
security.user.auth: {
enabled: true,
packages += "org.apache.drill.exec.rpc.user.security",
impl: "pam",
pam_profiles: [ "sudo", "login" ]
}
}
It throws the subsequent error:
Error: Failure in starting embedded Drillbit: org.apache.drill.exec.exception.DrillbitStartupException: Problem in finding the native library of JPAM (Pluggable Authenticator Module API). Make sure to set Drillbit JVM option 'java.library.path' to point to the directory where the native JPAM exists.:no jpam in java.library.path (state=,code=0)
I've run that *.sh file by hand to make sure that the necessary path is exported since I don't know if Drill is expecting that. The path to libjpam should be know known. I've started Sqlline with sudo et cetera. No chance. Documentation doesn't help. I don't get it why it's so bad and imo incomplete. Sadly there is 0 explanation how to troubleshoot or configure basic user authentification in detail.
Or do I have to do something which is not told but expected? Are there any Prerequsites concerning PLAIN authentification which aren't mentioned by Apache Drill itself?
Try change:
export DRILLBIT_JAVA_OPTS="-Djava.library.path=/opt/pamfile/"
to:
export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Djava.library.path=/opt/pamfile/"
It works for me.

Splunk Kafka Add-on doesn't read chef managed configuration files

We are using Chef to manage our infrastructure, and I'm running into an issue where the Splunk TA (Add-on for Kafka) simply refuses to acknowledge I've dropped kafka_credential.conf file in the local directory of the plugin. If I use the "Web UI", it generates an entry properly and it shows up in the add-on configuration.
[root#ip-10-14-1-42 local]# ls
app.conf inputs.conf kafka.conf kafka_credentials.conf
[root#ip-10-14-1-42 local]# grep -nr "" *.conf
app.conf:1:# MANAGED BY CHEF. PLEASE DO NOT MODIFY!
app.conf:2:[install]
app.conf:3:is_configured = 1
inputs.conf:1:# MANAGED BY CHEF. PLEASE DO NOT MODIFY!
inputs.conf:2:[kafka_mod]
inputs.conf:3:interval = 60
inputs.conf:4:start_by_shell = false
inputs.conf:5:
inputs.conf:6:[kafka_mod://my_app]
inputs.conf:7:kafka_cluster = default
inputs.conf:8:kafka_topic = log-my_app
inputs.conf:9:kafka_topic_group = my_app
inputs.conf:10:kafka_partition_offset = earliest
inputs.conf:11:index = main
kafka.conf:1:# MANAGED BY CHEF. PLEASE DO NOT MODIFY!
kafka.conf:2:[global_settings]
kafka.conf:3:log_level = INFO
kafka.conf:4:index = main
kafka.conf:5:use_kv_store = 0
kafka.conf:6:use_multiprocess_consumer = 1
kafka.conf:7:fetch_message_max_bytes = 1048576
kafka_credentials.conf:1:# MANAGED BY CHEF. PLEASE DO NOT MODIFY!
kafka_credentials.conf:2:[default]
kafka_credentials.conf:3:kafka_brokers = 10.14.2.164:9092,10.14.2.194:9092
kafka_credentials.conf:4:kafka_partition_offset = earliest
kafka_credentials.conf:5:index = main
Upon restarting splunk, the add-on is installed, and even the input is created under the Inputs section, but the cluster itself is "not available" and when examining the logs I see this:
2017-08-09 01:40:25,442 INFO pid=29212 tid=MainThread file=kafka_mod.py:main:168 | Start Kafka
2017-08-09 01:40:30,508 INFO pid=29212 tid=MainThread file=kafka_config.py:_get_kafka_clusters:228 | Clusters: {}
2017-08-09 01:40:30,509 INFO pid=29212 tid=MainThread file=kafka_config.py:__init__:188 | No Kafka cluster are configured
It seems like this plugin is only respecting clusters created through the WebUI. That is not going to work as we want to be able to fully configure this through Chef. Short of hacking the REST API, and fudging around with the .py files in the addon directory and forcing a dictionary in, what are my options?
Wondering if anyone has encountered this before.
If I had to guess it is silently rejecting the files because # is not traditionally used for comments in INI files. Try a ; instead.

Static resource reload with akka-http

In short: is it possible to reload static resources using akka-http?
A bit more:
I have Scala project.
I'm using App object to launch my Main
class.
I'm using getFromResourceDirectory to locate my resource
folder.
What I would like to have is to hot-swap my static resources during development.
For example, I have index.html or application.js, which I change and I want to see changes after I refresh my browser without restarting my server. What is the best practise of doing such thing?
I know that Play! allows that, but don't want to base my project on Play! only because of that.
Two options:
Easiest: use the getFromDirectory directive instead when running locally and point it to the path where your files you want to 'hotload' are, it serves them directly from the file system, so every time you change a file and load it through Akka HTTP it will be the latest version.
getFromResourceDirectory loads files from the classpath, the resources are available because SBT copies them into the class directory under target every time you build (copyResources). You could configure sbt using unmanagedClasspath to make it include the static resource directory in the classpath. If you want to package the resources in the artifact when running package however this would require some more sbt-trixery (if you just put src/resources in unmanagedClasspath it will depend on classpath ordering if the copied ones or the modified ones are used).
I couldn't get it to work by adding to unmanagedClasspath so I instead used getFromDirectory. You can use getFromDirectory as a fallback if getFromResourceDirectory fails like this.
val route =
pathSingleSlash {
getFromResource("static/index.html") ~
getFromFile("../website/static/index.html")
} ~
getFromResourceDirectory("static") ~
getFromDirectory("../website/static")
First it tries to look up the file in the static resource directory and if that fails, then checks if ../website/static has the file.
The below code try to find the file in the directory "staticContentDir". If the file is found, it is sent it back to the client. If it is not found, it tries by fetching the file from the directory "site" in the classpath.
The user url is: http://server:port/site/path/to/file.ext
/site/ comes from "staticPath"
val staticContentDir = calculateStaticPath()
val staticPath = "site"
val routes = pathPrefix(staticPath) {
entity(as[HttpRequest]) { requestData =>
val fullPath = requestData.uri.path
encodeResponse {
if (Files.exists(staticContentDir.resolve(fullPath.toString().replaceFirst(s"/$staticPath/", "")))) {
getFromBrowseableDirectory(staticContentDir.toString)
} else {
getFromResourceDirectory("site")
}
}
}
}
I hope it is clear.