I have grafana Dashboards, Pods drop down coming None within namespace, however we have pods running in namespace and pulling data prometheus.
Screenshot:
Query:
"datasource": "Prometheus",
"definition": "",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "Pod",
"multi": false,
"name": "pod",
"options": [],
"query": {
"query": "query_result(sum(container_memory_working_set_bytes{namespace=\"$namespace\"}) by (pod_name))",
"refId": "Prometheus-pod-Variable-Query"
},
"refresh": 1,
"regex": "/pod_name=\\\"(.*?)(\\\")/",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
I am imported json code:
https://grafana.com/grafana/dashboards/6879
Edit your dashboard's JSON:
Rename "pod_name" to "pod" in the 2 places (and save)
Looks like this grafana dashboard was created with older kubernetes version,
and metrics internals since changed.
Probably will also need similar edits for "container_name" changing to "container" in these older dashboards
This might not be a full answer, but I cannot yet comment.
The linked dashboard imports and works for me fine. So I suspect one of these:
Prometheus scraping is not running (correctly). You could enter directly into the Prometheus app and check whether if the container_memory_working_set_bytes metric has any value at all, anywhere.
The kube_system system namespace might be restricted with respect to scraping and such. If another namespace works and only this one doesn't, then this is the case.
Related
Is it possible to trigger some alerts on the Prometheus dashboard by manually stopping respective services on the Kubernetes cluster in order to verify that I'm receiving alert for issues on Prometheus dashboard ?
I would recommend using tools such as chaos toolkit to do this declaratively and automatically instead of doing it manually. This is called chaos engineering more generally.
{
"title": "Do we remain available in face of pod going down?",
"description": "We expect Kubernetes to handle the situation gracefully when a pod goes down",
"tags": ["kubernetes"],
"steady-state-hypothesis": {
"title": "Verifying service remains healthy",
"probes": [
{
"name": "all-our-microservices-should-be-healthy",
"type": "probe",
"tolerance": true,
"provider": {
"type": "python",
"module": "chaosk8s.probes",
"func": "microservice_available_and_healthy",
"arguments": {
"name": "myapp"
}
}
}
]
},
"method": [
{
"type": "action",
"name": "terminate-db-pod",
"provider": {
"type": "python",
"module": "chaosk8s.pod.actions",
"func": "terminate_pods",
"arguments": {
"label_selector": "app=my-app",
"name_pattern": "my-app-[0-9]$",
"rand": true
}
},
"pauses": {
"after": 5
}
}
]
}
You can use Gremlin to achieve this goal too. First, install the Gremlin agent on your Kubernetes cluster using the helm chart: https://github.com/gremlin/helm/
Next, shutdown the specific services using the Kubernetes features within Gremlin. You can control the blast radius by selecting 1 pod/1 service or many pods/services. This is a tutorial that I wrote on this topic: https://www.gremlin.com/community/tutorials/how-to-install-and-use-gremlin-with-kubernetes/.
Validating monitoring and alerting is a great use case for Chaos Engineering. As you said, triggering alerts on the Prometheus dashboard by manually stopping respective services on the Kubernetes cluster. This will enable you to verify alerts for issues on your Prometheus dashboard. This tutorial explains how to use Gremlin webhooks with Grafana and Prometheus: https://www.gremlin.com/community/tutorials/visualize-chaos-experiments-in-grafana-with-gremlin-webhooks/
I am getting resource not found in resource group error while deploying arm template.Could someone help
please .Below is the sample template used:
{
"name": "[variables('AppName')]",
"type": "Microsoft.Web/sites",
"apiVersion": "2016-08-01",
"kind": "app",
"location": "xx",
"identity": {
"type": "SystemAssigned"
},
"properties": {
"httpsOnly": true,
"clientAffinityEnabled": false,
"serverFarmId": "xx"
},
"resources": [
{
"name": "appsettings",
"type": "config",
"apiVersion": "2016-08-01",
"properties": {
xx:xx
},
"dependsOn": [
"[resourceId('Microsoft.Web/sites', variables('AppName'))]",
"[resourceId('Microsoft.KeyVault/vaults/secrets', variables('keyVaultName'),'xx')]",
"[resourceId('Microsoft.KeyVault/vaults/secrets', variables('keyVaultName'),'xx')]",
"[resourceId('Microsoft.KeyVault/vaults/secrets', variables('keyVaultName'),'xx')]"
]
}
]
},
{
"type": "Microsoft.Web/sites/config",
"apiVersion": "2016-08-01",
"name": "[concat(variables('AppName'), '/web')]",
"location": "xx",
"dependsOn": [
"[resourceId('Microsoft.Web/sites', variables('AppName'))]"
],
}
Let me know is this the right way to do
its hard to tell without having the exact template and all the variables\parameters, but generally it means one of the following:
wrong name used for the resources that depend on the webapp somewhere
wrong location used for the resources that depend on the webapp somewhere
dependsOn isn't setup properly and it doesnt wait for the webapp and attempts to create a resource in parallel with the web app
Have you ever use the same ARM template to deploy succeed?
Also kindly check if you could directly use script to deploy locally successful without using Azure DevOps. This will help to narrow down the issue.
##[error]ResourceNotFound: The Resource 'Microsoft.Web/sites/xx' under resource group 'yy' was not found in deploying ARM template
This error indicate Resource Manager needs to retrieve the properties for a resource, but can't find the resource in your subscriptions.
You could give a try with below solution:
Solution 1 - check resource properties
Solution 2 - set dependencies
Solution 3 - get external resource
Solution 4 - get managed identity from resource
Solution 5 - check functions
More details please take a look at our official doc here-- Resolve resource not found errors
I am trying to deploy data factory using ARM template. It is easy to use the exported template to create a deployment pipeline.
However, as the data factory needs to access an on-premise database server, I need to have an integrated runtime. The problem is how can I include the run time in the arm template?
The template looks like this and we can see that it is trying to include the runtime:
{
"name": "[concat(parameters('factoryName'), '/OnPremisesSqlServer')]",
"type": "Microsoft.DataFactory/factories/linkedServices",
"apiVersion": "2018-06-01",
"properties":
{
"annotations": [],
"type": "SqlServer",
"typeProperties": {
"connectionString": "[parameters('OnPremisesSqlServer_connectionString')]"
},
"connectVia": {
"referenceName": "OnPremisesSqlServer",
"type": "IntegrationRuntimeReference"
}
},
"dependsOn": [
"[concat(variables('factoryId'), '/integrationRuntimes/OnPremisesSqlServer')]"
]
},
{
"name": "[concat(parameters('factoryName'), '/OnPremisesSqlServer')]",
"type": "Microsoft.DataFactory/factories/integrationRuntimes",
"apiVersion": "2018-06-01",
"properties": {
"type": "SelfHosted",
"typeProperties": {}
},
"dependsOn": []
}
Running this template gives me this error:
\"connectVia\": {\r\n \"referenceName\": \"OnPremisesSqlServer\",\r\n \"type\": \"IntegrationRuntimeReference\"\r\n }\r\n }\r\n} and error is: Failed to encrypted linked service credentials on self-hosted IR 'OnPremisesSqlServer', reason is: NotFound, error message is: No online instance..
The problem is that I will need to type in some key in the integrated runtime's UI, so it can be registered in azure. But I can only get that key from my data factory instance's UI. So above arm template deployment will always fail at least once. I am wondering if there is a way to create the run time independently?
The problem is that I will need to type in some key in the integrated
runtime's UI, so it can be registered in azure. But I can only get
that key from my data factory instance's UI. So above arm template
deployment will always fail at least once. I am wondering if there is
a way to create the run time independently?
It seems that you already know how to create Self-Hosted IR in the ADF ARM.
{
"name": "[concat(parameters('dataFactoryName'), '/integrationRuntime1')]",
"type": "Microsoft.DataFactory/factories/integrationRuntimes",
"apiVersion": "2018-06-01",
"properties": {
"additionalProperties": {},
"description": "jaygongIR1",
"type": "SelfHosted"
}
}
Result:
Only you concern is that Windows IR Tool need to be configured with AUTHENTICATION KEY to access ADF Self-Hosted IR node.So,it should be Unavailable status once it is created.This flow is make sense i think,authenticate key should be created first,then you can use it to configure On-Premise Tool.You can't implement all of things in one step because these behaviors are operated on both of azure and on-premise sides.
Based on the Self-Hosted IR Tool document ,the Register steps can't be implemented with Powershell code. So,all steps can't be processed in the flow are creating IR and getting Auth key,not for Registering in the tool.
Grafana is showing wrong disk use percentage in graph. Currently my glusterfs disk usage is 8%, but on graph its showing 7%.
Below is the metrics which I am currently using.
{
"hide": true,
"target": "sumSeries(collectd.gls--01.df-gluster.df_complex-used)",
"refId": "A"
},
{
"hide": true,
"target": "sumSeries(collectd.gls--01.df-gluster.df_complex-{free,used})",
"refId": "B"
},
{
"hide": false,
"target": "asPercent(#A, #B)",
"refId": "C
Also I am unable to see percent_bytes-used metrics in collectd directory.
Depending on the ReportReserved in your collectd configuration you might need to take account for reserved disk space. If it is true (default on collectd > 4) you will have to change your second metric to: 'df_complex-{free,used,reserved}'
I am unable to save the json file. If I am making changes in the file after some time it is getting back to the same values.
I am trying to create all my azure resources from PowerShell script. All resources are getting, but it is also throwing this exception.
A CNAME record pointing from mytmp.trafficmanager.net to mywebapp.azurewebsites.net was not found
But I can see a traffic manager endpoint has been configured properly. What do I miss here, any idea?
PS Code:
{
"comments": "Generalized from resource: '/subscriptions/<subid>/resourceGroups/<rgid>/providers/Microsoft.Web/sites/<web_app_name>/hostNameBindings/<traffic_manager_dns>'.",
"type": "Microsoft.Web/sites/hostNameBindings",
"name": "[concat(parameters('<web_app_name>'), '/', parameters('hostNameBindings_<traffic_manager_dns>_name'))]",
"apiVersion": "2016-08-01",
"location": "South Central US",
"scale": null,
"properties": {
"siteName": "<web_app_name>",
"domainId": null,
"hostNameType": "Verified"
},
"dependsOn": [
"[resourceId('Microsoft.Web/sites', parameters('sites_<web_app_name>_name'))]"
]
}
Above code throws that exception actually. When I commented this code block everything is fine. But I wanted to understand the reason for the error.
A CNAME record pointing from mytmp.trafficmanager.net to mywebapp.azurewebsites.net was not found
It indicates the DNS record is not created when deploying the template. You need to prove that you are the owner of the hostname. You also could test it from the Azure portal manually.
Before deploy the template you need to create an CNAME record with your DNS provider. For more information you could refer to Map a CNAME record.