How do you run snowplow-bigquery-loader? - snowplow

Where do you find/make/use the command:
./snowplow-bigquery-mutator \
listen # Can be "init" to create empty table
--config $CONFIG \
--resolver $RESOLVER \
Given by the snowplow-bigquery-loader documentation.

You should run it from a GCP compute instance. Ensure that both your --config and --resolver arguments are base64 encoded json arrays provided in the examples.

Related

Detect if argo workflow is given unused parameters

My team runs into a recurring issue where if we mis-spell a parameter for our argo workflows, that parameter gets ignored without error. For example, say I run the following submission command, where the true (optional) parameter is validation_data_config:
argo submit --from workflowtemplate/training \
-p output=$( artifacts resolve-url $BUCKET $PROJECT $TAG) \
-p tag=${TAG} \
-p training_config="$( yq . training.yaml )" \
-p training_data_config="$( yq . train-data.yaml )" \
-p validation-data-config="$( yq . validation-data.yaml )" \
-p wandb-project="cyclegan_c48_to_c384" \
-p cpu=4 \
-p memory="15Gi" \
--name "${NAME}" \
--labels "project=${PROJECT},experiment=${EXPERIMENT},trial=${TRIAL}"
The validation configuration is ignored and the job is run without validation metrics because I used hyphens instead of underscores.
I also understand the parameters should use consistent hyphen/underscore naming, but we've also had this happen with e.g. the "memory" parameter.
Is there any way to detect this automatically, to have the submission error if a parameter is unused, or even to get a list of parameters for a given workflow template so I can write such detection myself?

How to make Powershell pass a list as argument to gcloud

I want to submit my neural network model to google cloud via the following command as in the tutorial:
gcloud ai-platform jobs submit training ${JOB_NAME} \
--region=us-central1 \
--master-image-uri=gcr.io/cloud-ml-public/training/pytorch-gpu.1-10 \
--scale-tier=CUSTOM \
--master-machine-type=n1-standard-8 \
--master-accelerator=type=nvidia-tesla-p100,count=1 \
--job-dir=${JOB_DIR} \
--package-path=./trainer \
--module-name=trainer.task \
-- \
--train-files=gs://cloud-samples-data/ai-platform/chicago_taxi/training/small/taxi_trips_train.csv \
--eval-files=gs://cloud-samples-data/ai-platform/chicago_taxi/training/small/taxi_trips_eval.csv \
--num-epochs=10 \
--batch-size=100 \
--learning-rate=0.001
I was working with powershell and I have a problem with the master-accelerator argument which should be a dictionnary. I don't know how to pass such to gcloud. I have tried #{count=1; type=...}, but received a Bad syntax for dict arg: [#] error.
How can I pass a list of parameters in PowerShell such that the gcloud submit command accepts it?
I thank you in advance for your help.
EDIT
I have tried to use delimiters as ^^^^:^^^^, along this, but this still does not work (Invalid delimeter).

How to run schema registry container for sasl plain kafka cluster

I want to run the cp-schema-registry image on AWS ECS, so I am trying to get it to run on docker locally. I have a command like this:
docker run -e SCHEMA_REGISTRY_HOST_NAME=schema-registry \
-e SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS="1.kafka.address:9092,2.kafka.address:9092,3.kafka.address:9092" \
-e SCHEMA_REGISTRY_KAFKASTORE_SECURITY_PROTOCOL=SASL_PLAINTEXT \
confluentinc/cp-schema-registry:5.5.3
(I have replaced the kafka urls).
My consumers/producers connect to the cluster with params:
["sasl.mechanism"] = "PLAIN"
["sasl.username"] = <username>
["sasl.password"] = <password>
Docs seem to indicate there is a file I can create with these parameters, but I don't know how to pass this into the docker run command. Can this be done?
Thanks to OneCricketeer for the help above with the SCHEMA_REGISTRY_KAFKASTORE_SASL_JAAS_CONFIG var. The command ended up like this (I added port 8081:8081 so I could test with curl):
docker run -p 8081:8081 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry \
-e SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS="1.kafka.broker:9092,2.kafka.broker:9092,3.kafka.broker:9092" \
-e SCHEMA_REGISTRY_KAFKASTORE_SECURITY_PROTOCOL=SASL_SSL \
-e SCHEMA_REGISTRY_KAFKASTORE_SASL_MECHANISM=PLAIN \
-e SCHEMA_REGISTRY_KAFKASTORE_SASL_JAAS_CONFIG='org.apache.kafka.common.security.plain.PlainLoginModule required username="user" password="pass";' confluentinc/cp-schema-registry:5.5.3
Then test with curl localhost:8081/subjects and get [] as a response.

How Can I Debug apiserver Startup When No Logs Are Generated?

I am trying to install the aws-encryption-provider following the steps at https://github.com/kubernetes-sigs/aws-encryption-provider. After I added the --encryption-provider-config=/etc/kubernetes/aws-encryption-provider-config.yaml parameter to /etc/kubernetes/manifests/kube-apiserver.yaml the apiserver process did not restart. Nor do I see any error messages.
What technique can I use to see errors created when apiserver starts?
Realizing that the apiserver is running inside a docker container, I connected to one of my controller nodes using SSH. Then I started a container using the following command to get a shell prompt using the same docker image that apiserver is using.
docker run \
-it \
--rm \
--entrypoint /bin/sh \
--volume /etc/kubernetes:/etc/kubernetes:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /etc/pki:/etc/pki:ro \
--volume /etc/pki/ca-trust:/etc/pki/ca-trust:ro \
--volume /etc/pki/tls:/etc/pki/tls:ro \
--volume /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro \
--volume /etc/kubernetes/ssl:/etc/kubernetes/ssl:ro \
--volume /var/run/kmsplugin:/var/run/kmsplugin \
k8s.gcr.io/kube-apiserver:v1.18.5
Once inside that container, I could run the same command that is setup in kube-apiserver.yaml. This command was:
kube-apiserver \
--encryption-provider-config=/etc/kubernetes/aws-encryption-provider-config.yaml \
--advertise-address=10.250.203.201 \
...
--service-node-port-range=30000-32767 \
--storage-backend=etcd3 \
I elided the bulk of the command since you'll need to get specific values from your own kube-apiserver.yaml file.
Using this technique showed me the error message:
Error: error while parsing encryption provider configuration file
"/etc/kubernetes/aws-encryption-provider-config.yaml": error while parsing
file: resources[0].providers[0]: Invalid value:
config.ProviderConfiguration{AESGCM:(*config.AESConfiguration)(nil),
AESCBC:(*config.AESConfiguration)(nil), Secretbox:(*config.SecretboxConfiguration)
(nil), Identity:(*config.IdentityConfiguration)(nil), KMS:(*config.KMSConfiguration)
(nil)}: provider does not contain any of the expected providers: KMS, AESGCM,
AESCBC, Secretbox, Identity

Hasura use SSL certificates for Postgres connection

I can run Hashura from the Docker image.
docker run -d -p 8080:8080 \
-e HASURA_GRAPHQL_DATABASE_URL=postgres://username:password#hostname:port/dbname \
-e HASURA_GRAPHQL_ENABLE_CONSOLE=true \
hasura/graphql-engine:latest
But I also have a Postgres instance that can only be accessed with three certificates:
psql "sslmode=verify-ca sslrootcert=server-ca.pem \
sslcert=client-cert.pem sslkey=client-key.pem \
hostaddr=$DB_HOST \
port=$DB_PORT\
user=$DB_USER dbname=$DB_NAME"
I don't see a configuration for Hasura that allows me to connect to a Postgres instance in such a way.
Is this something I'm suppose to pass into the database connection URL?
How should I do this?
You'll need to mount your certificates into the docker container and then configure libpq (which is what hasura uses underneath) to use the required certificates with these environment variables. It'll be something like this (I haven't tested this):
docker run -d -p 8080:8080 \
-v /absolute-path-of-certs-folder:/certs
-e HASURA_GRAPHQL_DATABASE_URL=postgres://hostname:port/dbname \
-e HASURA_GRAPHQL_ENABLE_CONSOLE=true \
-e PGSSLMODE=verify-ca \
-e PGSSLCERT=/certs/client-cert.pem \
-e PGSSLKEY=/certs/client-key.pem \
-e PGSSLROOTCERT=/certs/server-ca.pem \
hasura/graphql-engine:latest