What I'm looking for is a yq query that returns the service names that are using
a specified volume for a given docker-compose.yml file.
For example, in the stripped down docker-compose.yml file below, say I am looking for the names of all services that
use the volume v-app-olorin.
version: "3"
services:
arwen:
this: that
volumes:
- v-app-mithrandir:/data/mithrandir
- v-app-olorin:/data/olorin
boromir:
volumes:
- v-app-mithrandir:/data/mithrandir
- v-app-stormcrow:/data/stormcrow
cirdan:
volumes:
- v-app-mithrandir:/data/mithrandir
- v-app-olorin:/data/olorin
volumes:
v-app-mithrandir:
name: v-app-mithrandir
v-app-olorin:
name: v-app-olorin
v-app-stormcrow:
name: v-app-stormcrow
The expected response would be:
arwen
cirdan
I can match simple key values with something like this:
yq e '.services | with_entries(select(.value.this == "that")) | to_entries | .[] | .key' docker-compose.yml
arwen
But I'm having trouble matching an element of the volumes array. Thank you for any help.
here's an expression that does that:
yq '.services[] | select(.volumes[] | contains("v-app-olorin")) | key' docker-compose.yml
Explanation:
splat out the services entries into their invidiual nodes .services[]
select the ones that have "v-app-olorin" in their volumes array: select(.volumes[] | contains("v-app-olorin"))
get the key of that services entry
Disclaimer: I wrote yq
Related
I am trying to read data from a table in a postgresql database and proceed with an ETL project. I have an Docker enviroment using this docker-compose:
version: "3.3"
services:
spark-master:
image: docker.io/bitnami/spark:3.3
ports:
- "9090:8080"
- "7077:7077"
volumes:
- /opt/spark-apps
- /opt/spark-data
environment:
- SPARK_LOCAL_IP=spark-master
- SPARK_WORKLOAD=master
spark-worker-a:
image: docker.io/bitnami/spark:3.3
ports:
- "9091:8080"
- "7000:7000"
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=1G
- SPARK_DRIVER_MEMORY=1G
- SPARK_EXECUTOR_MEMORY=1G
- SPARK_WORKLOAD=worker
- SPARK_LOCAL_IP=spark-worker-a
volumes:
- /opt/spark-apps
- /opt/spark-data
spark-worker-b:
image: docker.io/bitnami/spark:3.3
ports:
- "9092:8080"
- "7001:7000"
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=1G
- SPARK_DRIVER_MEMORY=1G
- SPARK_EXECUTOR_MEMORY=1G
- SPARK_WORKLOAD=worker
- SPARK_LOCAL_IP=spark-worker-b
volumes:
- /opt/spark-apps
- /opt/spark-data
postgres:
container_name: postgres_container
image: postgres:11.7-alpine
environment:
POSTGRES_USER: admin
POSTGRES_PASSWORD: admin
volumes:
- /data/postgres
ports:
- "4560:5432"
restart: unless-stopped
# jupyterlab with pyspark
jupyter-pyspark:
image: jupyter/pyspark-notebook:latest
environment:
JUPYTER_ENABLE_LAB: "yes"
ports:
- "9999:8888"
volumes:
- /app/data
I was succesful connecting to the DB, but I can't print any data. Here's my code:
from pyspark.sql import SparkSession
spark = SparkSession.builder\
.appName("salesETL")\
.config("spark.driver.extraClassPath", "./postgresql-42.5.1.jar")\
.getOrCreate()
df = spark.read.format("jdbc").option("url", "jdbc:postgresql://postgres_container:5432/postgres")\
.option("dbtable", "sales")\
.option("driver", "org.postgresql.Driver")\
.option("user", "admin")\
.option("password", "admin").load()
df.show(10).toPandas()
With .toPandas() it gives me this error:
AttributeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 df.show(10).toPandas()
AttributeError: 'NoneType' object has no attribute 'toPandas'
Without .toPandas() it print the columns but no data
+--------+----------+-----------+-------------+-----------------+-------------+--------------+----------+--------+-----------+
|order_id|order_date|customer_id|customer_name|customer_lastname|customer_city|customer_state|product_id|quantity|order_value|
+--------+----------+-----------+-------------+-----------------+-------------+--------------+----------+--------+-----------+
+--------+----------+-----------+-------------+-----------------+-------------+--------------+----------+--------+-----------+
I am new to Pyspark/Spark so I can't figure out what I am missing. It's my very first project. What can it be?
ps: when I run type(df) it returns pyspark.sql.dataframe.DataFrame
show returns nothing. You should call pandas on the dataframe directly. Moreover, I think it's to_pandas not toPandas (https://spark.apache.org/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.to_pandas.html). So it seems the error will be vanished, with something like that:
df.to_pandas()
About the empty dataset, is there any error? If there is no error, are you sure that any records exist on the table?
Well, I couldn't find a justification of why this has happened and fix it. Instead, I took a workaround: I loaded data to Python using Pandas and then changed the pandas DF to Pyspark DF.
Here's my code:
import psycopg2
import pandas as pd
from pyspark.sql import SparkSession
from sqlalchemy import create_engine
appName = "salesETL"
master = "local"
spark = SparkSession.builder.master(master).appName(appName).getOrCreate()
engine = create_engine(
"postgresql+psycopg2://admin:admin#postgres_container/postgres?client_encoding=utf8")
pdf = pd.read_sql('select * from sales.sales', engine)
# Convert Pandas dataframe to spark DataFrame
df = spark.createDataFrame(pdf)
I have the following yaml file:
values: |
nameOverride: my-service
fullnameOverride: ""
namespace: my-ns
containerApps:
- name: app-frontend
image_tag: xxxxxxx
- name: app-backend
image_tag: xxxxxxx
I'm looking for a way to replace e.g. xxxxxx by yyyyyy on containerApps.[app-frontend].image_tag within a multi-line value (values: |).
The output being:
values: |
nameOverride: my-service
fullnameOverride: ""
namespace: my-ns
containerApps:
- name: app-frontend
image_tag: yyyyyy
- name: app-backend
image_tag: xxxxxxx
How this can be accomplished using yq?
Any help is welcomed.
Here's a solution using mikefarah/yq. It decodes the multiline string using #yamld, makes the substitution using sub, and encodes the result back using #yaml.
yq '
.values |= (
#yamld | (
.containerApps[] | select(.name == "app-frontend") | .image_tag
) |= sub("xxxxxxx", "yyyyyy")
| #yaml
)
'
values: |
nameOverride: my-service
fullnameOverride: ""
namespace: my-ns
containerApps:
- name: app-frontend
image_tag: yyyyyy
- name: app-backend
image_tag: xxxxxxx
To update the file in-place (instead of just outputting it), use the -i flag.
I have the following (modified from from https://stackoverflow.com/a/70152440/807037):
yq eval-all '
.clusters = (
(
(.clusters[] | {.name: .}) as $item ireduce ({}; . * $item)
) as $uniqueMap |
( $uniqueMap | to_entries | .[]) as $item ireduce([]; . + $item.value)
) |
.contexts = (
(
(.contexts[] | {.name: .}) as $item ireduce ({}; . * $item)
) as $uniqueMap |
( $uniqueMap | to_entries | .[]) as $item ireduce([]; . + $item.value)
) |
select(fi == 0)' konfig monfig
How can the following common code be extracted so as to keep the script DRY:
.«KEY» = (
(
(.«KEY»[] | {.name: .}) as $item ireduce ({}; . * $item)
) as $uniqueMap |
( $uniqueMap | to_entries | .[]) as $item ireduce([]; . + $item.value)
)
Input files:
# konfig
apiVersion: apiVersion-keep
clusters:
- cluster:
certificate-authority-data: cad-0
server: server-0
name: name-0
- cluster:
certificate-authority-data: cad-1-discard
server: server-1-discard
name: name-1
- cluster:
certificate-authority-data: cad-2
server: server-2
name: name-2
contexts:
- context:
cluster: cluster-0
user: user-0
name: name-0
- context:
cluster: cluster-1-discard
user: user-1-discard
name: name-1
- context:
cluster: cluster-2
user: user-2
name: name-2
current-context: name-keep
# monfig
apiVersion: apiVersion-discard
clusters:
- cluster:
certificate-authority-data: cad-1-keep
server: server-1-keep
name: name-1
contexts:
- context:
cluster: cluster-1-keep
user: user-1-keep
name: name-1
current-context: name-discard
Expected:
apiVersion: apiVersion-keep
clusters:
- cluster:
certificate-authority-data: cad-0
server: server-0
name: name-0
- cluster:
certificate-authority-data: cad-1-keep
server: server-1-keep
name: name-1
- cluster:
certificate-authority-data: cad-2
server: server-2
name: name-2
contexts:
- context:
cluster: cluster-0
user: user-0
name: name-0
- context:
cluster: cluster-1-keep
user: user-1-keep
name: name-1
- context:
cluster: cluster-2
user: user-2
name: name-2
current-context: name-keep
The issue is a little trickier than it appears because a use case for |= is updating each matching left hand side node with respect to itself. .clusters results in two nodes (as with .contexts) and each of those nodes is updated independently. yq doesn't know to group the nodes together. After playing around a little I got this to work:
./yq eval-all '
. ref $r |
with( ("clusters", "contexts");
$r[.] = (
(
($r[.] | .[] | {.name: .}) as $item ireduce ({}; . * $item)
) as $uniqueMap |
( $uniqueMap | to_entries | .[]) as $item ireduce([]; . + $item.value)
)
) | select(fi==0)' file1.yaml file2.yaml
Explanation:
. ref $r Create a reference to the root context, called $r. This matches the top level nodes (file1 and file2).
Using the with operator, I can parameterise the merge expression against $r, passing in the two paths that need to be merged. Each path is run against the root context $r in $r[.].
Hope that makes sense!
Disclaimer: I wrote yq
Generally speaking (due to missing sample data), you can use the update operator |= (available since v4.3.0), enabling you to address by context . on the RHS. Then, as the absolute context only appears on the LHS, you can just list at once all contexts you want this to be applied to.
(.clusters, .contexts) |= ( .[] | ... )
Here is my docker-compose.yaml:
version: '3.3'
mongo:
build:
context: '.'
dockerfile: 'Dockerfile'
environment:
MONGO_INITDB_DATABASE: 'mydb'
ports:
- '27017:27017'
volumes:
- 'data-storage:/data/db'
networks:
mynet:
volumes:
data-storage:
networks:
mynet:
Here is my Dockerfile:
FROM mongo:latest
COPY ./initdb.js /docker-entrypoint-initdb.d/
And finally here is my inidb.js:
db.createCollection("strategyitems");
db.strategyitems.createIndex( {strategy: 1 }, { unique: false } );
db.strategyitems.createIndex( {strategy: 1, symbol: 1 }, { unique: true } );
db.strategyitems.insertMany([
{ strategy: "crypto", symbol: "btcusd", eval_period: 15, buy_booster: 8.0, sell_booster: 5.0, buy_lot: 0.2, sell_lot: 0.2 },
{ strategy: "crypto", symbol: "ethusd", eval_period: 15, buy_booster: 8.0, sell_booster: 5.0, buy_lot: 0.2, sell_lot: 0.2 },
{ strategy: "crypto", symbol: "neousd", eval_period: 15, buy_booster: 8.0, sell_booster: 5.0, buy_lot: 0.2, sell_lot: 0.2 }
]);
The container builds and starts successfully... but no way to get the db statements above executed.
If I log into the container, folder /docker-entrypoint-initdb.d/ contains initdb.js... so I'd expect the db get intialized.
Am I missing something?
So the supplied compose file doesn't work for me, I had to edit it to get it up & running (v18.06 CE), so heads-up on that.
version: '3.3'
services:
mongo:
build:
context: .
dockerfile: Dockerfile
environment:
MONGO_INITDB_DATABASE: 'mydb'
ports:
- '27017:27017'
volumes:
- 'data-storage:/data/db'
networks:
mynet:
volumes:
data-storage:
networks:
mynet:
Next, if you'd run docker-compose up before adding the initdb.js file and then stopped with docker-compose down, then docker-compose down stops the containers, but doesn't remove the volume
docker ps
| CONTAINER | ID | IMAGE | COMMAND | CREATED | STATUS | PORTS | NAMES | | | |
|--------------|------------------|----------------------|---------|---------|--------|-------|-------|---------|--------------------------|--------------------|
| c412bbd9a22b | lumberjack_mongo | docker-entrypoint.s… | 7 | minutes | ago | Up | 6 | minutes | 0.0.0.0:27017->27017/tcp | lumberjack_mongo_1 |
docker volume ls
| DRIVER | | VOLUME | NAME |
|--------|---|--------|-------------------------|
| local | | | lumberjack_data-storage |
docker-compose down
Removing lumberjack_mongo_1 ... done
Removing network lumberjack_mynet
docker volume ls
| DRIVER | | VOLUME | NAME |
|--------|---|--------|-------------------------|
| local | | | lumberjack_data-storage |
The problem arises when docker-compose up is run when the volume exists - Docker mounts the volume before the container starts up. Mongo does some pre-checks and if it finds that the directories are present, then skips the initdb sequence.
If you remove the volume after docker-compose down and do a docker-compose up, the volume will be created from scratch, the pre-check finds nothing and initializes the mongodb
docker volume rm lumberjack_data-storage
lumberjack_data-storage
docker-compose up
Creating network "lumberjack_mynet" with the default driver
Creating volume "lumberjack_data-storage" with default driver
Creating lumberjack_mongo_1 ... done
Attaching to lumberjack_mongo_1
[....]
mongo_1 | /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/initdb.js
mongo_1 | 2018-08-04T18:08:47.699+0000 I INDEX [LogicalSessionCacheRefresh] build index on: config.system.sessions properties: { v: 2, key: { lastUse: 1 }, name: "lsidTTLIndex", ns: "config.system.sessions", expireAfterSeconds: 1800 }
mongo_1 | 2018-08-04T18:08:47.745+0000 I NETWORK [conn2] received client metadata from 127.0.0.1:45324 conn2: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.0" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
mongo_1 | 2018-08-04T18:08:47.747+0000 I STORAGE [conn2] createCollection: initdb.strategyitems with generated UUID: 585edb14-bc63-4879-bc5d-504867fb5e12
mongo_1 | 2018-08-04T18:08:47.851+0000 I INDEX [conn2] build index on: initdb.strategyitems properties: { v: 2, key: { strategy: 1.0 }, name: "strategy_1", ns: "initdb.strategyitems" }
mongo_1 | 2018-08-04T18:08:47.851+0000 I INDEX [conn2] building index using bulk method; build may temporarily use up to 500 megabytes of RAM
mongo_1 | 2018-08-04T18:08:47.852+0000 I INDEX [conn2] build index done. scanned 0 total records. 0 secs
mongo_1 | 2018-08-04T18:08:47.881+0000 I INDEX [conn2] build index on: initdb.strategyitems properties: { v: 2, unique: true, key: { strategy: 1.0, symbol: 1.0 }, name: "strategy_1_symbol_1", ns: "initdb.strategyitems" }
mongo_1 | 2018-08-04T18:08:47.881+0000 I INDEX [conn2] building index using bulk method; build may temporarily use up to 500 megabytes of RAM
mongo_1 | 2018-08-04T18:08:47.882+0000 I INDEX [conn2] build index done. scanned 0 total records. 0 secs
mongo_1 | 2018-08-04T18:08:47.886+0000 I NETWORK [conn2] end connection 127.0.0.1:45324 (0 connections now open)
[....]
mongo_1 | MongoDB init process complete; ready for start up.
mongo_1 |
mongo_1 | 2018-08-04T18:08:48.933+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
mongo_1 | 2018-08-04T18:08:48.939+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=e90c80083360
I'm attempting to setup a service broker to add postgres to our Cloud Foundry installation. We're running our system on vmWare. I'm using this release in order to do that:
cf-contrib-release
I added the release in bosh:
#bosh releases
Acting as user 'director' on 'microbosh-ba846726bed7032f1fd4'
+-----------------------+----------------------+-------------+
| Name | Versions | Commit Hash |
+-----------------------+----------------------+-------------+
| cf | 208.12* | a0de569a+ |
| cf-autoscaling | 13* | 927bc7ed+ |
| cf-metrics | 34* | 22f7e1e1 |
| cf-mysql | 20* | caa23b3d+ |
| | 22* | af278086+ |
| cf-rabbitmq | 161* | 4d298aec |
| cf-riak-cs | 10* | 5e7e46c9+ |
| cf-services-contrib | 6* | 57fd2098+ |
| docker | 23* | 82346881+ |
| newrelic_broker | 1.3* | 1ce3471d+ |
| notifications-with-ui | 18* | 490b6446+ |
| postgresql-docker | 4* | a53c9333+ |
| push-console-release | console-du-jour-203* | d2d31585+ |
| spring-cloud-broker | 1.0.0* | efd69612 |
+-----------------------+----------------------+-------------+
(*) Currently deployed
(+) Uncommitted changes
Releases total: 13
I setup my resource pools and jobs in my yaml file according to this doumentation:
http://bosh.io/docs/vsphere-cpi.html#resource-pools
This is how our cluster looks:
vmware cluster
And here is what I put in the yaml file:
resource_pools:
- name: default
network: default
stemcell:
name: bosh-vsphere-esxi-ubuntu-trusty-go_agent
version: '2865.1'
cloud_properties:
cpu: 2
ram: 4096
disk: 10240
datacenters:
- name: 'Universal City'
clusters:
- USH_UCS_CLOUD_FOUNDRY_NONPROD_01: {resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'}
jobs:
- name: gateways
release: cf-services-contrib
templates:
- name: postgresql_gateway_ng
instances: 1
resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
networks:
- name: default
default: [dns, gateway]
properties:
# Service credentials
uaa_client_id: "cf"
uaa_endpoint: http://uaa.devcloudwest.example.com
uaa_client_auth_credentials:
username: admin
password: secret
And I'm getting an error when I run 'bosh deploy' that says:
Error 140003: Job `gateways' references an unknown resource pool `USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
Here's my yaml file in it's entirety:
name: cf-22b9f4d62bb6f0563b71
director_uuid: fd713790-b1bc-401a-8ea1-b8209f1cc90c
releases:
- name: cf-services-contrib
version: 6
compilation:
workers: 3
network: default
reuse_compilation_vms: true
cloud_properties:
ram: 5120
disk: 10240
cpu: 2
update:
canaries: 1
canary_watch_time: 30000-60000
update_watch_time: 30000-60000
max_in_flight: 4
networks:
- name: default
type: manual
subnets:
- range: exam 10.114..130.0/24
gateway: exam 10.114..130.1
cloud_properties:
name: 'USH_UCS_CLOUD_FOUNDRY'
#resource_pools:
# - name: common
# network: default
# size: 8
# stemcell:
# name: bosh-vsphere-esxi-ubuntu-trusty-go_agent
# version: '2865.1'
resource_pools:
- name: default
network: default
stemcell:
name: bosh-vsphere-esxi-ubuntu-trusty-go_agent
version: '2865.1'
cloud_properties:
cpu: 2
ram: 4096
disk: 10240
datacenters:
- name: 'Universal City'
clusters:
- USH_UCS_CLOUD_FOUNDRY_NONPROD_01: {resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'}
jobs:
- name: gateways
release: cf-services-contrib
templates:
- name: postgresql_gateway_ng
instances: 1
resource_pool: 'USH_UCS_CLOUD_FOUNDRY_NONPROD_01_RP'
networks:
- name: default
default: [dns, gateway]
properties:
# Service credentials
uaa_client_id: "cf"
uaa_endpoint: http://uaa.devcloudwest.example.com
uaa_client_auth_credentials:
username: admin
password: secret
- name: postgresql_service_node
release: cf-services-contrib
template: postgresql_node_ng
instances: 1
resource_pool: common
persistent_disk: 10000
properties:
postgresql_node:
plan: default
networks:
- name: default
default: [dns, gateway]
properties:
networks:
apps: default
management: default
cc:
srv_api_uri: http://api.devcloudwest.example.com
nats:
address: exam 10.114..130.11
port: 25555
user: nats #CHANGE
password: secret
authorization_timeout: 5
service_plans:
postgresql:
default:
description: "Developer, 250MB storage, 10 connections"
free: true
job_management:
high_water: 230
low_water: 20
configuration:
capacity: 125
max_clients: 10
quota_files: 4
quota_data_size: 240
enable_journaling: true
backup:
enable: false
lifecycle:
enable: false
serialization: enable
snapshot:
quota: 1
postgresql_gateway:
token: f75df200-4daf-45b5-b92a-cb7fa1a25660
default_plan: default
supported_versions: ["9.3"]
version_aliases:
current: "9.3"
cc_api_version: v2
postgresql_node:
supported_versions: ["9.3"]
default_version: "9.3"
max_tmp: 900
password: secret
And here's gist with the debug output from that error:
postgres_2423_debug.txt
The docs for the jobs blocks say:
resource_pool [String, required]: A valid resource pool name from the Resource Pools block. BOSH runs instances of this job in a VM from the named resource pool.
This needs to match the name of one of your resource_pools, namely default, not the name of the resource pool in vSphere.
The only sections that have direct references to the IaaS are things that say cloud_properties. Specific names of resources (like networks, clusters, or datacenters in your vSphere, or subnets, AZs, and instance types in AWS) only show up in places that say cloud_properties.
You use that data to define "networks" and "resource pools" at a higher level of abstraction that is IaaS-agnostic, e.g. except for cloud properties, the specifications you give for resource pools is the same whether you're deploying to vSphere, AWS, OpenStack, etc.
Then your jobs reference these networks, resource pools, etc. by the logical name you've given to the abstractions. In particular, jobs don't require any IaaS-specific configuration whatsoever, just references to a logical network(s) and a resource pool that you've defined elsewhere in your manifest.