Cloud Foundry bosh Error 100: Can't find network - postgresql

I'm attempting to setup a service broker to add postgres to our Cloud Foundry installation. We're running our system on vmWare. I'm using this release in order to do that:
cf-services-contrib-release
I need to setup the networks: section in the manifest, and what I'm setting there isn't working.
This is what my networks look like in the vmWare vCenter UI:
And this is what my clusters and resource pools look like in the vCenter UI:
I tried both with and without quotes, around the 'name' of the network. But I'm now getting an error saying that bosh can't find the network:
Failed compiling packages > rootfs_lucid64/9b3f611b46e076b94b37645c98f9100e7bcef5dd: Can't find network: VLAN1130_LB_100.114.130.0 (00:00:01)
Failed compiling packages > postgresql93/06163819b694f8d9836586d024f64c11efe30180: Can't find network: VLAN1130_LB_100.114.130.0 (00:00:01)
Failed compiling packages > postgresql92/2867893e714aae6e6b76bd06e7aa30d47023c46e: Can't find network: VLAN1130_LB_100.114.130.0 (00:00:01)
Error 100: Can't find network: VLAN1130_LB_100.114.130.0
Task 2430 error
This was my latest configuration attempt:
networks:
- name: default
type: manual
subnets:
- range: 100.114.130.0/24
gateway: 100.114.130.1
cloud_properties:
name: VLAN1130_LB_100.114.130.0
I also tried using single quotes as below. But I got the same error as above!
networks:
- name: default
type: manual
subnets:
- range: 100.114.130.0/24
gateway: 100.114.130.1
cloud_properties:
name: 'VLAN1130_LB_100.114.130.0'
Our network that we're on is this one: 100.114.130.0/24
So it makes sense to select VLAN1130_LB_100.114.130.0 in the config.
I've tried setting all of these options in the yaml file with no quotes. And none of them seem to work!
<ul>
<li>USH_UCS_CLOUD_FOUNDRY: <a href="https://gist.github.com/bluethundr/18ac490e96a5e02fad65">postgres_2432_debug.txt</li>
<li>USH_UCS_CLOUD_FOUNDRY_DVS: postgres_2433_debug.txt</li>
<li>USH_UCS_CLOUD_FO-DVUplinks-435272: postgres_2434_debug.txt </li>
<li>VLAN1129_LB_100.114.129.0: postgres_2435_debug.txt</li>
<li>VLAN1130_LB_100.114.130.0: postgres_2436_debug.txt</li>
<li>VLAN14-ESXI_MGMT-3.156.14.0: <a href="https://gist.github.com/bluethundr/dbde624e63842721a133">postgres_2437_debug.txt</li>
</ul>
I wouldn't expect VLAN1129_LB_100.114.129.0 to work, but I tried it anyway, just to be complete.
I've supplied debug dumps of each failed attempt next to each setting you see above. Surely one of them must work! But as you can see none of them did.
Here's my complete yaml file that I deployed with the 'bosh deploy' command:
name: cf-22b9f4d62bb6f0563b71
director_uuid: fd713790-b1bc-401a-8ea1-b8209f1cc90c
releases:
- name: cf-services-contrib
version: 6
compilation:
workers: 3
network: default
reuse_compilation_vms: true
cloud_properties:
ram: 5120
disk: 10240
cpu: 2
update:
canaries: 1
canary_watch_time: 30000-60000
update_watch_time: 30000-60000
max_in_flight: 4
networks:
- name: default
type: manual
subnets:
- range: 100.114.130.0/24
gateway: 100.114.130.1
cloud_properties:
name: VLAN1130_LB_100.114.130.0
resource_pools:
- name: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
network: default
stemcell:
name: bosh-vsphere-esxi-ubuntu-trusty-go_agent
version: '2865.1'
cloud_properties:
cpu: 2
ram: 4096
disk: 10240
datacenters:
- name: 'Universal City'
clusters:
- USH_UCS_CLOUD_FOUNDRY_NONPROD_01: {resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'}
jobs:
- name: gateways
release: cf-services-contrib
templates:
- name: postgresql_gateway_ng
instances: 1
resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
networks:
- name: default
default: [dns, gateway]
properties:
# Service credentials
uaa_client_id: "cf"
uaa_endpoint: http://uaa.devcloudwest.example.com
uaa_client_auth_credentials:
username: admin
password: secret
- name: postgresql_service_node
release: cf-services-contrib
template: postgresql_node_ng
instances: 1
resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
persistent_disk: 10000
properties:
postgresql_node:
plan: default
networks:
- name: default
default: [dns, gateway]
properties:
networks:
apps: default
management: default
cc:
srv_api_uri: http://api.devcloudwest.example.com
nats:
address: 100.114.130.11
port: 25555
user: nats #CHANGE
password: secret
authorization_timeout: 5
service_plans:
postgresql:
default:
description: "Developer, 250MB storage, 10 connections"
free: true
job_management:
high_water: 230
low_water: 20
configuration:
capacity: 125
max_clients: 10
quota_files: 4
quota_data_size: 240
enable_journaling: true
backup:
enable: false
lifecycle:
enable: false
serialization: enable
snapshot:
quota: 1
postgresql_gateway:
token: f75df200-4daf-45b5-b92a-cb7fa1a25660
default_plan: default
supported_versions: ["9.3"]
version_aliases:
current: "9.3"
cc_api_version: v2
postgresql_node:
supported_versions: ["9.3"]
default_version: "9.3"
max_tmp: 900
password: secret
How can we get past this issue?

From Amit's comment:
The name used in Cloud Properties must include any nested sub-folders. In the provided configuration the network is nested under USH_UCS_CLOUD_FOUNDRY, so the value for name should reflect that, i.e. USH_UCS_CLOUD_FOUNDRY/VLAN1130_LB_100.114.130.0 no quotes are required.
networks:
- name: default
type: manual
subnets:
- range: 100.114.130.0/24
gateway: 100.114.130.1
cloud_properties:
name: USH_UCS_CLOUD_FOUNDRY/VLAN1130_LB_100.114.130.0

Related

How configure properly Grafana Tempo?

I tried to use Grafana Tempo for distributed tracing.
I launch it from docker-compose:
version: "3.9"
services:
# MY MICROSERVICES
...
prometheus:
image: prom/prometheus
ports:
- ${PROMETHEUS_EXTERNAL_PORT}:9090
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:cached
promtail:
image: grafana/promtail
volumes:
- ./log:/var/log
- ./promtail/:/mnt/config
command: -config.file=/mnt/config/promtail-config.yaml
loki:
image: grafana/loki
command: -config.file=/etc/loki/local-config.yaml
tempo:
image: grafana/tempo
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo/tempo-local.yaml:/etc/tempo.yaml
# - ./tempo-data/:/tmp/tempo
ports:
- "14268" # jaeger ingest
- "3200" # tempo
- "55680" # otlp grpc
- "55681" # otlp http
- "9411" # zipkin
tempo-query:
image: grafana/tempo-query
command: [ "--grpc-storage-plugin.configuration-file=/etc/tempo-query.yaml" ]
volumes:
- ./tempo/tempo-query.yaml:/etc/tempo-query.yaml
ports:
- "16686:16686" # jaeger-ui
depends_on:
- tempo
grafana:
image: grafana/grafana
volumes:
- ./grafana/datasource-config/:/etc/grafana/provisioning/datasources:cached
- ./grafana/dashboards/prometheus.json:/var/lib/grafana/dashboards/prometheus.json:cached
- ./grafana/dashboards/loki.json:/var/lib/grafana/dashboards/loki.json:cached
- ./grafana/dashboards-config/:/etc/grafana/provisioning/dashboards:cached
ports:
- ${GRAFANA_EXTERNAL_PORT}:3000
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
- GF_AUTH_DISABLE_LOGIN_FORM=true
depends_on:
- prometheus
- loki
with tempo-local.yaml:
server:
http_listen_port: 3200
distributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
jaeger: # the receives all come from the OpenTelemetry collector. more configuration information can
protocols: # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
thrift_http: #
grpc: # for a production deployment you should only enable the receivers you need!
thrift_binary:
thrift_compact:
zipkin:
otlp:
protocols:
http:
grpc:
opencensus:
ingester:
trace_idle_period: 10s # the length of time after a trace has not received spans to consider it complete and flush it
max_block_bytes: 1_000_000 # cut the head block when it hits this size or ...
max_block_duration: 5m # this much time passes
compactor:
compaction:
compaction_window: 1h # blocks in this time window will be compacted together
max_block_bytes: 100_000_000 # maximum size of compacted blocks
block_retention: 1h
compacted_block_retention: 10m
storage:
trace:
backend: local # backend configuration to use
block:
bloom_filter_false_positive: .05 # bloom filter false positive rate. lower values create larger filters but fewer false positives
index_downsample_bytes: 1000 # number of bytes per index record
encoding: zstd # block encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
wal:
path: /tmp/tempo/wal # where to store the the wal locally
encoding: snappy # wal encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
local:
path: /tmp/tempo/blocks
pool:
max_workers: 100 # worker pool determines the number of parallel requests to the object store backend
queue_depth: 10000
tempo-query.yaml:
backend: "tempo:3200"
and datasource.yml for instrumenting datasources on grafana:
apiVersion: 1
deleteDatasources:
- name: Prometheus
- name: Tempo
- name: Loki
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: false
version: 1
editable: false
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo-query:16686
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo
- name: Tempo-Multitenant
type: tempo
access: proxy
orgId: 1
url: http://tempo-query:16686
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo-authed
jsonData:
httpHeaderName1: 'Authorization'
secureJsonData:
httpHeaderValue1: 'Bearer foo-bar-baz'
- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki:3100
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: \[.+,(.+),.+\]
name: TraceID
url: $${__value.raw}
But if I test the datasource in grafana, I have this error:
In Loki view I can found the Tempo button for seeing traces... but I can't see it on Tempo because I have an error:
Anyway if I take the trace id and I search it on Jaeger, I can see it correctly.
What I missing in configuration for Tempo? How configure it correctly?
Grafana 7.5 and later can talk to Tempo natively, and no longer need the tempo-query proxy. I think this explains what is happening, Grafana is attempting to use the Tempo-native API against tempo-query, which exposes the Jaeger API instead. Try changing the Grafana datasource in datasource.yml to http://tempo:3200.
This solution applies to the tempo installation via the kubernetes helm charts (so it applies to the question's title but not the exact question):
Use the url http://helmreleasename-tempo:3100 for configuring the tempo datasource in grafana. You need to check your tempo service name in kubernetes and use the port 3100.

How to wait until env for appid is created in jelastic manifest installation?

I have the following manifest:
jpsVersion: 1.3
jpsType: install
application:
id: shopozor-k8s-cluster
name: Shopozor k8s cluster
version: 0.0
baseUrl: https://raw.githubusercontent.com/shopozor/services/dev
settings:
fields:
- name: envName
caption: Env Name
type: string
default: shopozor
- name: topo
type: radio-fieldset
values:
0-dev: '<b>Development:</b> one master (1) and one scalable worker (1+)'
1-prod: '<b>Production:</b> multi master (3) with API balancers (2+) and scalable workers (2+)'
default: 0-dev
- name: version
type: string
caption: Version
default: v1.16.3
onInstall:
- installKubernetes
- enableSubDomains
actions:
installKubernetes:
install:
jps: https://github.com/jelastic-jps/kubernetes/blob/${settings.version}/manifest.jps
envName: ${settings.envName}
displayName: ${settings.envName}
settings:
deploy: cmd
cmd: |-
curl -fsSL ${baseUrl}/scripts/install_k8s.sh | /bin/bash
topo: ${settings.topo}
dashboard: version2
ingress-controller: Nginx
storage: true
api: true
monitoring: true
version: ${settings.version}
jaeger: false
enableSubDomains:
- jelastic.env.binder.AddDomains[cp]:
domains: staging,api-staging,assets-staging,api,assets
Unfortunately, when I run that manifest, the k8s cluster gets installed, but the subdomains cannot be created (yet), because:
[15:26:28 Shopozor.cluster:3]: enableSubDomains: {"action":"enableSubDomains","params":{}}
[15:26:29 Shopozor.cluster:4]: api [cp]: {"method":"jelastic.env.binder.AddDomains","params":{"domains":"staging,api-staging,assets-staging,api,assets"},"nodeGroup":"cp"}
[15:26:29 Shopozor.cluster:4]: ERROR: api.response: {"result":2303,"source":"JEL","error":"env for appid [5ce25f5a6988fbbaf34999b08dd1d47c] not created."}
What jelastic API methods can I use to perform the necessary waiting until subdomain creation is possible?
My current workaround is to split that manifest into two manifests: one cluster installation manifest and one update manifest creating the subdomains. However, I'd like to have everything in the same manifest.
Please change this:
enableSubDomains:
- jelastic.env.binder.AddDomains[cp]:
domains: staging,api-staging,assets-staging,api,assets
to:
enableSubDomains:
- jelastic.env.binder.AddDomains[cp]:
envName: ${settings.envName}
domains: staging,api-staging,assets-staging,api,assets

gke cluster deployment with custom network

I am trying to create a yaml file to deploy gke cluster in a custom network I created. I get an error
JSON payload received. Unknown name \"network\": Cannot find field."
I have tried a few names for the resources but I am still seeing the same issue
resources:
- name: myclus
type: container.v1.cluster
properties:
network: projects/project-251012/global/networks/dev-cloud
zone: "us-east4-a"
cluster:
initialClusterVersion: "1.12.9-gke.13"
currentMasterVersion: "1.12.9-gke.13"
## Initial NodePool config.
nodePools:
- name: "myclus-pool1"
initialNodeCount: 3
version: "1.12.9-gke.13"
config:
machineType: "n1-standard-1"
oauthScopes:
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/ndev.clouddns.readwrite
preemptible: true
## Duplicates node pool config from v1.cluster section, to get it explicitly managed.
- name: myclus-pool1
type: container.v1.nodePool
properties:
zone: us-east4-a
clusterId: $(ref.myclus.name)
nodePool:
name: "myclus-pool1"
I expect it to place the cluster nodes in this network.
The network field needs to be part of the cluster spec. The top-level of properties should just be zone and cluster, network should be on the same indentation as initialClusterVersion. See more on the container.v1.cluster API reference page
Your manifest should look more like:
EDIT: there is some confusion in the API reference docs concerning deprecated fields. I offered a YAML that applies to the new API, not the one you are using. I've update with the correct syntax for the basic v1 API and further down I've added the newer API (which currently relies on gcp-types to deploy.
resources:
- name: myclus
type: container.v1.cluster
properties:
projectId: [project]
zone: us-central1-f
cluster:
name: my-clus
zone: us-central1-f
network: [network_name]
subnetwork: [subnet] ### leave this field blank if using the default network
initialClusterVersion: "1.13"
nodePools:
- name: my-clus-pool1
initialNodeCount: 0
config:
imageType: cos
- name: my-pool-1
type: container.v1.nodePool
properties:
projectId: [project]
zone: us-central1-f
clusterId: $(ref.myclus.name)
nodePool:
name: my-clus-pool2
initialNodeCount: 0
version: "1.13"
config:
imageType: ubuntu
The newer API (which provides more functionality and allows you to use more features including the v1beta1 API and beta features) would look something like this:
resources:
- name: myclus
type: gcp-types/container-v1:projects.locations.clusters
properties:
parent: projects/shared-vpc-231717/locations/us-central1-f
cluster:
name: my-clus
zone: us-central1-f
network: shared-vpc
subnetwork: local-only ### leave this field blank if using the default network
initialClusterVersion: "1.13"
nodePools:
- name: my-clus-pool1
initialNodeCount: 0
config:
imageType: cos
- name: my-pool-2
type: gcp-types/container-v1:projects.locations.clusters.nodePools
properties:
parent: projects/shared-vpc-231717/locations/us-central1-f/clusters/$(ref.myclus.name)
nodePool:
name: my-clus-separate-pool
initialNodeCount: 0
version: "1.13"
config:
imageType: ubuntu
Another note, you may want to modify your scopes, the current scopes will not allow you to pull images from gcr.io, some system pods may not spin up properly and if you are using Google's repository, you will be unable to pull those images.
Finally, you don't want to repeat the node pool resource in both the cluster spec and separately. Instead, create the cluster with a basic (default) node pool, for all additional node pools, create them as separate resources to manage them without going through the cluster. There are very few updates you can perform on a node pool, asside from resizing

Bosh Concourse error when trying to add external workers | failed to configure SSH server: ssh: no key found

We are having a hard time setting up the cluster for an external worker. I think we don't understand all of the SSH keys that need to be created manually and exchanged. The TSA job errors with
failed to configure SSH server: ssh: no key found.
here is my manifest for bosh:
---
name: concourse
releases:
- name: concourse
version: latest
- name: garden-runc
version: latest
stemcells:
- alias: trusty
os: ubuntu-trusty
version: latest
instance_groups:
- name: web
instances: 1
vm_type: default
stemcell: trusty
azs: [z1]
networks: [{name: default}]
jobs:
- name: atc
release: concourse
properties:
no_really_i_dont_want_any_auth: true
postgresql_database: &atc_db atc
- name: tsa
release: concourse
properties:
host_key: ((tsa_host_private_key))
host_public_key: ((tsa_host_public_key))
authorized_keys:
- ((osx_worker_public_key))
- name: db
instances: 1
vm_type: default
stemcell: trusty
persistent_disk_type: default
azs: [z1]
networks: [{name: default}]
jobs:
- name: postgresql
release: concourse
properties:
databases:
- name: *atc_db
role: postgresql
password: password
- name: worker
instances: 1
vm_type: default
stemcell: trusty
azs: [z1]
networks: [{name: default}]
jobs:
- name: groundcrew
release: concourse
properties:
tsa:
host_public_key: ((tsa_host_public_key))
- name: baggageclaim
release: concourse
properties: {}
- name: garden
release: garden-runc
properties:
garden:
listen_network: tcp
listen_address: 0.0.0.0:7777
update:
canaries: 1
max_in_flight: 1
serial: false
canary_watch_time: 1000-60000
update_watch_time: 1000-60000

Deployment manager fails with a cryptic error

So, I have a deployment manager configuration that is supposed to create a GKE cluster.
Here is the configuration:
resources:
- name: mycluster
type: container.v1.cluster
properties:
zone: us-central1-a
cluster:
name: mycluster
description: hello mycluster
masterAuth:
username: admin
password: password
loggingService: logging.googleapis.com
monitoringService: monitoring.googleapis.com
addonsConfig:
httpLoadBalancing:
disabled: false
horizontalPodAutoscaling:
disabled: false
nodePools:
-
name: default
initialNodeCount: 3
config:
machineType: n1-standard-1
diskSizeGb: 100
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
labels:
nodepool: default
autoscaling:
enabled: false
-
name: other
initialNodeCount: 2
config:
machineType: n1-standard-2
diskSizeGb: 100
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
labels:
nodepool: other
autoscaling:
enabled: false
locations:
- us-central1-a
- us-central1-b
- us-central1-c
When I run gcloud deployment-manager deployments create mydeployment --config config.yaml the deployment runs for a few minutes and fails with:
ERROR: (gcloud.deployment-manager.deployments.create) Error in Operation operation-xxxxxxxxxx-xxxxxxxxx-xxxxxxx-xxxxxxx:
errors:
- code: RESOURCE_ERROR
location: /deployments/mydeployment/resources/mycluster
message: 'Unexpected response from resource of type container.v1.cluster: 404 {"statusMessage":"Not
Found","requestPath":null}'
The cluster actually does get successfully created in GKE, and I can interact with it as normal. Deleting the failed deployment with gcloud deployment-manager deployments delete mydeployment deletes the deployment, but leaves the cluster hanging around.
What am I doing wrong here? I've tried other container.v1.cluster samples from around the web (such as https://github.com/mkarthikworld/caddy/tree/master/gke-caddy), they all fail for me with the same error.
Not sure where else to look.