I tried to use Grafana Tempo for distributed tracing.
I launch it from docker-compose:
version: "3.9"
services:
# MY MICROSERVICES
...
prometheus:
image: prom/prometheus
ports:
- ${PROMETHEUS_EXTERNAL_PORT}:9090
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:cached
promtail:
image: grafana/promtail
volumes:
- ./log:/var/log
- ./promtail/:/mnt/config
command: -config.file=/mnt/config/promtail-config.yaml
loki:
image: grafana/loki
command: -config.file=/etc/loki/local-config.yaml
tempo:
image: grafana/tempo
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo/tempo-local.yaml:/etc/tempo.yaml
# - ./tempo-data/:/tmp/tempo
ports:
- "14268" # jaeger ingest
- "3200" # tempo
- "55680" # otlp grpc
- "55681" # otlp http
- "9411" # zipkin
tempo-query:
image: grafana/tempo-query
command: [ "--grpc-storage-plugin.configuration-file=/etc/tempo-query.yaml" ]
volumes:
- ./tempo/tempo-query.yaml:/etc/tempo-query.yaml
ports:
- "16686:16686" # jaeger-ui
depends_on:
- tempo
grafana:
image: grafana/grafana
volumes:
- ./grafana/datasource-config/:/etc/grafana/provisioning/datasources:cached
- ./grafana/dashboards/prometheus.json:/var/lib/grafana/dashboards/prometheus.json:cached
- ./grafana/dashboards/loki.json:/var/lib/grafana/dashboards/loki.json:cached
- ./grafana/dashboards-config/:/etc/grafana/provisioning/dashboards:cached
ports:
- ${GRAFANA_EXTERNAL_PORT}:3000
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
- GF_AUTH_DISABLE_LOGIN_FORM=true
depends_on:
- prometheus
- loki
with tempo-local.yaml:
server:
http_listen_port: 3200
distributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
jaeger: # the receives all come from the OpenTelemetry collector. more configuration information can
protocols: # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
thrift_http: #
grpc: # for a production deployment you should only enable the receivers you need!
thrift_binary:
thrift_compact:
zipkin:
otlp:
protocols:
http:
grpc:
opencensus:
ingester:
trace_idle_period: 10s # the length of time after a trace has not received spans to consider it complete and flush it
max_block_bytes: 1_000_000 # cut the head block when it hits this size or ...
max_block_duration: 5m # this much time passes
compactor:
compaction:
compaction_window: 1h # blocks in this time window will be compacted together
max_block_bytes: 100_000_000 # maximum size of compacted blocks
block_retention: 1h
compacted_block_retention: 10m
storage:
trace:
backend: local # backend configuration to use
block:
bloom_filter_false_positive: .05 # bloom filter false positive rate. lower values create larger filters but fewer false positives
index_downsample_bytes: 1000 # number of bytes per index record
encoding: zstd # block encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
wal:
path: /tmp/tempo/wal # where to store the the wal locally
encoding: snappy # wal encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
local:
path: /tmp/tempo/blocks
pool:
max_workers: 100 # worker pool determines the number of parallel requests to the object store backend
queue_depth: 10000
tempo-query.yaml:
backend: "tempo:3200"
and datasource.yml for instrumenting datasources on grafana:
apiVersion: 1
deleteDatasources:
- name: Prometheus
- name: Tempo
- name: Loki
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: false
version: 1
editable: false
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo-query:16686
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo
- name: Tempo-Multitenant
type: tempo
access: proxy
orgId: 1
url: http://tempo-query:16686
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo-authed
jsonData:
httpHeaderName1: 'Authorization'
secureJsonData:
httpHeaderValue1: 'Bearer foo-bar-baz'
- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki:3100
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: \[.+,(.+),.+\]
name: TraceID
url: $${__value.raw}
But if I test the datasource in grafana, I have this error:
In Loki view I can found the Tempo button for seeing traces... but I can't see it on Tempo because I have an error:
Anyway if I take the trace id and I search it on Jaeger, I can see it correctly.
What I missing in configuration for Tempo? How configure it correctly?
Grafana 7.5 and later can talk to Tempo natively, and no longer need the tempo-query proxy. I think this explains what is happening, Grafana is attempting to use the Tempo-native API against tempo-query, which exposes the Jaeger API instead. Try changing the Grafana datasource in datasource.yml to http://tempo:3200.
This solution applies to the tempo installation via the kubernetes helm charts (so it applies to the question's title but not the exact question):
Use the url http://helmreleasename-tempo:3100 for configuring the tempo datasource in grafana. You need to check your tempo service name in kubernetes and use the port 3100.
Related
I am creating/configuring a service account(SA) in the helm chart.
It is created(in the k8s namespace as a secret), however, when I try to use its token in a HTTP/REST API call e.g. get folders, it says:
"invalid API key"
The idea is whenever Grafana is installed from scratch, an SA should be provisioned. This SA token will be then used for accessing the REST API.
Chart.yaml
apiVersion: v2
name: kraken-observability-stack
version: 0.1.0
#We don't have a built-in-house app so we dont set
#appVersion: 0.1.0
kubeVersion: "^1.20.0-0"
description: The kraken observability stack for collecting and visualizing metrics, logs and traces related to CI pipelines.
home: https://docs.net/
dependencies:
- name: grafana
repository: https://grafana.github.io/helm-charts
version: 6.50.x
- name: mimir-distributed
repository: https://grafana.github.io/helm-charts
version: 3.2.x
- name: loki-distributed
repository: https://grafana.github.io/helm-charts
version: 0.68.x
- name: tempo-distributed
repository: https://grafana.github.io/helm-charts
version: 1.0.x
- name: opentelemetry-collector
repository: https://open-telemetry.github.io/opentelemetry-helm-charts
version: 0.47.x
(partial) values.yaml
grafana:
testFramework:
enabled: false
resources:
limits:
#maybe we shouldn't set cpu limits to avoid overbooking of resources.
#cpu: 1000m
memory: 1Gi
requests:
memory: 200Mi
cpu: 200m
grafana.ini:
force_migration: true
data_proxy:
timeout: 60s
#feature_toggles:
# enable: tempoServiceGraph,tempoSearch,tempoBackendSearch,tempoApmTable
auth:
login_cookie_name: "kraken_grafana_session"
auth.anonymous:
enabled: true
org_name: 'CICDS Pipelines User'
org_role: 'Viewer'
analytics:
reporting_enabled: false
check_for_updates: false
check_for_plugin_updates: false
enable_feedback_links: false
log:
level: warn
mode: console
plugins:
enable_alpha: true
app_tls_skip_verify_insecure: true
allow_loading_unsigned_plugins: true
#podAnnotations for grafana to expose its own metrics
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/schema: "http"
prometheus.io/port: "http"
prometheus.io/path: "/metrics"
rbac:
#disable Create and use RBAC resources
create: false
#disable Create PodSecurityPolicy (we don't have privileges for that)
pspEnabled: false
#disable to enforce AppArmor in created PodSecurityPolicy
pspUseAppArmor: false
serviceAccount:
create: true
name: grafana-init-sa
labels: {kraken-init}
replicas: 3
image:
#repository: docker-virtual.repository.net/grafana/grafana
repository: grafana/grafana
downloadDashboardsImage:
repository: docker-virtual.repository.net/curlimages/curl
tag: 7.85.0
pullPolicy: IfNotPresent
persistence:
type: statefulset
enabled: true
initChownData:
## This allows the prometheus-server to be run with an arbitrary user
##
enabled: false
#image:
# repository: docker-virtual.repository.net/busybox
# Administrator credentials when not using an existing secret (see below)
adminUser: admin
adminPassword: changeit
# Use an existing secret for the admin user.
# grafana-admin-credentials name is reserved by the operator and thus -creds
admin:
existingSecret: "grafana-admin-user"
userKey: ADMIN_USER
passwordKey: ADMIN_PASSWORD
env:
HTTP_PROXY: http://p985nst:p985nst#proxyvip-se.sbcore.net:8080/
HTTPS_PROXY: http://p985nst:p985nst#proxyvip-se.sbcore.net:8080/
NO_PROXY: .cluster.local,.net,.sbcore.net,.svc,10.0.0.0/8,172.30.0.0/16,localhost
# ## Pass the plugins you want installed as a list.
# ##
# plugins:
# - digrich-bubblechart-panel
# - grafana-clock-panel
# - grafana-piechart-panel
# - natel-discrete-panel
extraSecretMounts:
- name: loki-credentials-secret-mount
secretName: loki-credentials
defaultMode: 0440
mountPath: /etc/secrets/.loki_credentials
readOnly: true
I am currently using Loki to store logs generated by my applications from EKS Fargate. Sidecar pattern with promtail is used to scrape logs. Single Loki pod is used and S3 is configured as a destination to store logs. It works nicely as expected. However, when I tested the availability of the logging system by deleting pods, I discovered that if Loki’s pod was deleted, some logs would be missing (range around 20 mins before the pod was deleted to the time the pod was deleted) even after the pod restarted.
To solve this problem, I tried to use EFS as the persistent volume of Loki’ pod, mounting the path /loki. The whole process is followed by this article (https://aws.amazon.com/blogs/aws/new-aws-fargate-for-amazon-eks-now-supports-amazon-efs/). But I have got an error from the Loki pod with msg "error running loki" err="mkdir /loki/compactor: permission denied”
Therefore, I have 2 questions in my mind:
Should I use EFS as a solution for log backup in my case?
Why did I get a permission denied inside the pod, any ways to solve this problem?
My Loki-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
# grpc_listen_port: 9096
ingester:
wal:
enabled: true
dir: /loki/wal
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
# final_sleep: 0s
chunk_idle_period: 3m
chunk_retain_period: 30s
max_transfer_retries: 0
chunk_target_size: 1048576
schema_config:
configs:
- from: 2020-05-15
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index_cache
shared_store: s3
aws:
bucketnames: bucketnames
endpoint: s3.us-west-2.amazonaws.com
region: us-west-2
access_key_id: access_key_id
secret_access_key: secret_access_key
sse_encryption: true
compactor:
working_directory: /loki/compactor
shared_store: s3
compaction_interval: 5m
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 48h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 96h
querier:
query_ingesters_within: 0
analytics:
reporting_enabled: false
Deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: fargate-api-dev
name: dev-loki
spec:
selector:
matchLabels:
app: dev-loki
template:
metadata:
labels:
app: dev-loki
spec:
volumes:
- name: loki-config
configMap:
name: dev-loki-config
- name: dev-loki-efs-pv
persistentVolumeClaim:
claimName: dev-loki-efs-pvc
containers:
- name: loki
image: loki:2.6.1
args:
- -print-config-stderr=true
- -config.file=/tmp/loki.yaml
resources:
limits:
memory: "500Mi"
cpu: "200m"
ports:
- containerPort: 3100
volumeMounts:
- name: dev-loki-config
mountPath: /tmp
readOnly: false
- name: dev-loki-efs-pv
mountPath: /loki
Promtail-config.yaml
server:
log_level: info
http_listen_port: 9080
clients:
- url: http://loki.com/loki/api/v1/push
positions:
filename: /run/promtail/positions.yaml
scrape_configs:
- job_name: api-log
static_configs:
- targets:
- localhost
labels:
job: apilogs
pod: ${POD_NAME}
__path__: /var/log/*.log
I had a similar issue using EFS as volume to store the logs and I found this solution https://github.com/grafana/loki/issues/2018#issuecomment-1030221498
Basically loki container by it's own is not able to create a directory to start working, so we used a initcotainer to do it for it.
This solution worked like a charm for.
I am a beginner who is using Prometheus and Grapana to monitor the value of REST API.
Prometheus, json-exporrter, and grafana both used the Helm chart, Prometheus installed as default values.yaml, and json-exporter installed as custom values.yaml.
I checked that the prometheus set the service monitor of json-exporter as a target, but I couldn't check its metrics.
How can I check the metrics? Below is the environment , screenshots and code.
environment :
kubernetes : v1.22.9
helm : v3.9.2
prometheus-json-exporter helm chart : v0.5.0
kube-prometheus-stack helm chart : 0.58.0
screenshots :
https://drive.google.com/drive/folders/1vfjbidNpE2_yXfxdX8oX5eWh4-wAx7Ql?usp=sharing
values.yaml
in custom_jsonexporter_values.yaml
# Default values for prometheus-json-exporter.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 1
image:
repository: quay.io/prometheuscommunity/json-exporter
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: ""
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
# Specifies whether a service account should be created
create: true
# Annotations to add to the service account
annotations: []
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
podAnnotations: []
podSecurityContext: {}
# fsGroup: 2000
# podLabels:
# Custom labels for the pod
securityContext: {}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000
service:
type: ClusterIP
port: 7979
targetPort: http
name: http
serviceMonitor:
## If true, a ServiceMonitor CRD is created for a prometheus operator
## https://github.com/coreos/prometheus-operator
##
enabled: true
namespace: monitoring
scheme: http
# Default values that will be used for all ServiceMonitors created by `targets`
defaults:
additionalMetricsRelabels: {}
interval: 60s
labels:
release: prometheus
scrapeTimeout: 60s
targets:
- name : pi2
url: http://xxx.xxx.xxx.xxx:xxxx
labels: {} # Map of labels for ServiceMonitor. Overrides value set in `defaults`
interval: 60s # Scraping interval. Overrides value set in `defaults`
scrapeTimeout: 60s # Scrape timeout. Overrides value set in `defaults`
additionalMetricsRelabels: {} # Map of metric labels and values to add
ingress:
enabled: false
className: ""
annotations: []
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
nodeSelector: []
tolerations: []
affinity: []
configuration:
config: |
---
modules:
default:
metrics:
- name: used_storage_byte
path: '{ .used }'
help: used storage byte
values:
used : '{ .used }'
labels: {}
- name: free_storage_byte
path: '{ .free }'
help: free storage byte
labels: {}
values :
free : '{ .free }'
- name: total_storage_byte
path: '{ .total }'
help: total storage byte
labels: {}
values :
total : '{ .total }'
prometheusRule:
enabled: false
additionalLabels: {}
namespace: ""
rules: []
additionalVolumes: []
# - name: password-file
# secret:
# secretName: secret-name
additionalVolumeMounts: []
# - name: password-file
# mountPath: "/tmp/mysecret.txt"
# subPath: mysecret.txt
Firstly you can check the targets page on the Prometheus UI to see if a) your desired target is even defined and b) if the endpoint is reachable and being scraped.
However, you may need to troubleshoot a little if either of the above is not the case:
It is important to understand what is happening. You have deployed a Prometheus Operator to the cluster. If you have used the default values from the helm chart, you also deployed a Prometheus custom resource(CR). This instance is what is telling the Prometheus Operator how to ultimately configure the Prometheus running inside the pod. Certain things are static, like global metric relabeling for example, but most are dynamic, such as picking up new targets to actually scrape. Inside the Prometheus CR you will find options to specify serviceMonitorSelector and serviceMonitorNamespaceSelector (The behaviour is the same also for probes and podmonitors so I'm just going over it once). Assuming you leave the default set like serviceMonitorNamespaceSelector: {}, Prometheus Operator will look for ServiceMonitors in all namespaces on the cluster to which it has access via its serviceAccount. The serviceMonitorSelector field lets you specify a label and value combination that must be present on a serviceMonitor that must be present for it to be picked up. Once a or multiple serviceMonitors are found, that match the criteria in the selectors, Prometheus Operator adjusts the configuration in the actual Prometheus instance(tl;dr version) so you end up with proper scrape targets.
Step 1 for trouble shooting: Do your selectors match the labels and namespace of the serviceMonitors? Actually check those. The default on the prometheus operator helm chart expects a label release: prometheus-operator and in your config, you don't seem to add that to your json-exporter's serviceMonitor.
Step 2: The same behaviour as outline for how serviceMonitors are picked up, is happening in turn inside the serviceMonitor itself, so make sure that your service actually matches what is specced out in the serviceMonitor.
To deep dive further into the options you have and what the fields do, check the API documentation.
docker-compose.yml:
This is the docker-compose to run the prometheus, node-exporter and alert-manager service. All the services are running great. Even the health status in target menu of prometheus shows ok.
version: '2'
services:
prometheus:
image: prom/prometheus
privileged: true
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alertmanger/alert.rules:/alert.rules
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- '9090:9090'
node-exporter:
image: prom/node-exporter
ports:
- '9100:9100'
alertmanager:
image: prom/alertmanager
privileged: true
volumes:
- ./alertmanager/alertmanager.yml:/alertmanager.yml
command:
- '--config.file=/alertmanager.yml'
ports:
- '9093:9093'
prometheus.yml
This is the prometheus config file with targets and alerts target sets. The alertmanager target url is working fine.
global:
scrape_interval: 5s
external_labels:
monitor: 'my-monitor'
# this is where I have simple alert rules
rule_files:
- ./alertmanager/alert.rules
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
alerting:
alertmanagers:
- static_configs:
- targets: ['some-ip:9093']
alert.rules:
Just a simple alert rules to show alert when service is down
ALERT service_down
IF up == 0
alertmanager.yml
This is to send the message on slack when alerting occurs.
global:
slack_api_url: 'https://api.slack.com/apps/A90S3Q753'
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
username: 'tara gurung'
channel: '#general'
api_url: 'https://hooks.slack.com/services/T52GRFN3F/B90NMV1U2/QKj1pZu3ZVY0QONyI5sfsdf'
Problems:
All the containers are working fine I am not able to figure out the exact problem.What am I really missing. Checking the alerts in prometheus shows.
Alerts
No alerting rules defined
Your ./alertmanager/alert.rules file is not included in your docker config, so it is not available in the container. You need to add it to the prometheus service:
prometheus:
image: prom/prometheus
privileged: true
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alertmanager/alert.rules:/alertmanager/alert.rules
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- '9090:9090'
And probably give an absolute path inside prometheus.yml:
rule_files:
- "/alertmanager/alert.rules"
You also need to make sure you alerting rules are valid. Please see the prometheus docs for details and examples. You alert.rules file should look something like this:
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 5m
Once you have multiple files, it may be better to add the entire directory as a volume rather than individual files.
If you need answers to this question see the explanation on this link
How to make alert rules visible on Prometheus User Interface?
Your alert rules inside the prometheus.yml should look like this
rule_files:
- "/etc/prometheus/alert.rules.yml"
You need to stop the alertmanager and prometheus containers and run this
docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus
Verify if you can see the alert.rule config path : Prometheus container ID and go to cd /etc/prometheus
docker exec -it fa99f733f69b sh
I'm attempting to setup a service broker to add postgres to our Cloud Foundry installation. We're running our system on vmWare. I'm using this release in order to do that:
cf-services-contrib-release
I need to setup the networks: section in the manifest, and what I'm setting there isn't working.
This is what my networks look like in the vmWare vCenter UI:
And this is what my clusters and resource pools look like in the vCenter UI:
I tried both with and without quotes, around the 'name' of the network. But I'm now getting an error saying that bosh can't find the network:
Failed compiling packages > rootfs_lucid64/9b3f611b46e076b94b37645c98f9100e7bcef5dd: Can't find network: VLAN1130_LB_100.114.130.0 (00:00:01)
Failed compiling packages > postgresql93/06163819b694f8d9836586d024f64c11efe30180: Can't find network: VLAN1130_LB_100.114.130.0 (00:00:01)
Failed compiling packages > postgresql92/2867893e714aae6e6b76bd06e7aa30d47023c46e: Can't find network: VLAN1130_LB_100.114.130.0 (00:00:01)
Error 100: Can't find network: VLAN1130_LB_100.114.130.0
Task 2430 error
This was my latest configuration attempt:
networks:
- name: default
type: manual
subnets:
- range: 100.114.130.0/24
gateway: 100.114.130.1
cloud_properties:
name: VLAN1130_LB_100.114.130.0
I also tried using single quotes as below. But I got the same error as above!
networks:
- name: default
type: manual
subnets:
- range: 100.114.130.0/24
gateway: 100.114.130.1
cloud_properties:
name: 'VLAN1130_LB_100.114.130.0'
Our network that we're on is this one: 100.114.130.0/24
So it makes sense to select VLAN1130_LB_100.114.130.0 in the config.
I've tried setting all of these options in the yaml file with no quotes. And none of them seem to work!
<ul>
<li>USH_UCS_CLOUD_FOUNDRY: <a href="https://gist.github.com/bluethundr/18ac490e96a5e02fad65">postgres_2432_debug.txt</li>
<li>USH_UCS_CLOUD_FOUNDRY_DVS: postgres_2433_debug.txt</li>
<li>USH_UCS_CLOUD_FO-DVUplinks-435272: postgres_2434_debug.txt </li>
<li>VLAN1129_LB_100.114.129.0: postgres_2435_debug.txt</li>
<li>VLAN1130_LB_100.114.130.0: postgres_2436_debug.txt</li>
<li>VLAN14-ESXI_MGMT-3.156.14.0: <a href="https://gist.github.com/bluethundr/dbde624e63842721a133">postgres_2437_debug.txt</li>
</ul>
I wouldn't expect VLAN1129_LB_100.114.129.0 to work, but I tried it anyway, just to be complete.
I've supplied debug dumps of each failed attempt next to each setting you see above. Surely one of them must work! But as you can see none of them did.
Here's my complete yaml file that I deployed with the 'bosh deploy' command:
name: cf-22b9f4d62bb6f0563b71
director_uuid: fd713790-b1bc-401a-8ea1-b8209f1cc90c
releases:
- name: cf-services-contrib
version: 6
compilation:
workers: 3
network: default
reuse_compilation_vms: true
cloud_properties:
ram: 5120
disk: 10240
cpu: 2
update:
canaries: 1
canary_watch_time: 30000-60000
update_watch_time: 30000-60000
max_in_flight: 4
networks:
- name: default
type: manual
subnets:
- range: 100.114.130.0/24
gateway: 100.114.130.1
cloud_properties:
name: VLAN1130_LB_100.114.130.0
resource_pools:
- name: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
network: default
stemcell:
name: bosh-vsphere-esxi-ubuntu-trusty-go_agent
version: '2865.1'
cloud_properties:
cpu: 2
ram: 4096
disk: 10240
datacenters:
- name: 'Universal City'
clusters:
- USH_UCS_CLOUD_FOUNDRY_NONPROD_01: {resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'}
jobs:
- name: gateways
release: cf-services-contrib
templates:
- name: postgresql_gateway_ng
instances: 1
resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
networks:
- name: default
default: [dns, gateway]
properties:
# Service credentials
uaa_client_id: "cf"
uaa_endpoint: http://uaa.devcloudwest.example.com
uaa_client_auth_credentials:
username: admin
password: secret
- name: postgresql_service_node
release: cf-services-contrib
template: postgresql_node_ng
instances: 1
resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
persistent_disk: 10000
properties:
postgresql_node:
plan: default
networks:
- name: default
default: [dns, gateway]
properties:
networks:
apps: default
management: default
cc:
srv_api_uri: http://api.devcloudwest.example.com
nats:
address: 100.114.130.11
port: 25555
user: nats #CHANGE
password: secret
authorization_timeout: 5
service_plans:
postgresql:
default:
description: "Developer, 250MB storage, 10 connections"
free: true
job_management:
high_water: 230
low_water: 20
configuration:
capacity: 125
max_clients: 10
quota_files: 4
quota_data_size: 240
enable_journaling: true
backup:
enable: false
lifecycle:
enable: false
serialization: enable
snapshot:
quota: 1
postgresql_gateway:
token: f75df200-4daf-45b5-b92a-cb7fa1a25660
default_plan: default
supported_versions: ["9.3"]
version_aliases:
current: "9.3"
cc_api_version: v2
postgresql_node:
supported_versions: ["9.3"]
default_version: "9.3"
max_tmp: 900
password: secret
How can we get past this issue?
From Amit's comment:
The name used in Cloud Properties must include any nested sub-folders. In the provided configuration the network is nested under USH_UCS_CLOUD_FOUNDRY, so the value for name should reflect that, i.e. USH_UCS_CLOUD_FOUNDRY/VLAN1130_LB_100.114.130.0 no quotes are required.
networks:
- name: default
type: manual
subnets:
- range: 100.114.130.0/24
gateway: 100.114.130.1
cloud_properties:
name: USH_UCS_CLOUD_FOUNDRY/VLAN1130_LB_100.114.130.0