What is the Concourse (fly) CLI command to list all the available resource types? - concourse

While troubleshooting some pipelines a week ago I stumbled upon a fly CLI command that lists the resource types available in Concourse (i.e. for which one would not need to provide resource_types in the pipeline.).
Can someone help me dig out this command again?
I'll post an answer if I manage to find it again.

I think you are looking for this command to list all resource types available in the concourse:
fly -t <target-name> workers -d
Here, you will get the resource types column, there you can check available types.

Related

fetching Cloud Data Fusion Runtime info

I want to pass the runid of Data fusion pipeline to some function upon pipeline completion but i am not able to find any run-time variable which holds this value. Please help!
As an update to the previous answer, the first thing to do is to obtain the details of the deployed pipelines in a given namespace. For this, the following endpoint should be queried: '/v3/namespaces/${NAMESPACE}/apps'. Where ${NAMESPACE} is the namespace where the pipeline is deployed.
This endpoint returns a list with the pipelines deployed on this namespace ${NAMESPACE} (not the pipeline JSON, just a high level description list). Once the pipeline list is obtained, to obtain the run metrics of a given pipeline, the following endpoint should be called: '/v3/namespaces/${NAMESPACE}/apps/${PIPELINE}/workflows/DataPipelineWorkflow/runs', where ${PIPELINE} is the name of the pipeline. This endpoint will return the details of all the runs for this pipeline. This is where the run_id can be obtained. The field containing the run_id is actually called runid in this list.
With the run_id, you can then obtain all the run logs for example by querying the endpoint '{CDAP_ENDPOINT}/v3/namespaces/{NAMESPACE}/apps/{PIPELINE}/workflows/DataPipelineWorkflow/runs/{run["runid"]}/logs?start={run["start"]}&stop={run["start"]}'. The previous snippet is a python snippet where run is a dictionary containing the run details of a particular run.
As explained in the CDAP microservice guide, to call these endpoints, the CDAP endpoint must be obtained by running the command: gcloud beta data-fusion instances describe --project=${PROJECT} --location=${REGION} --format="value(apiEndpoint)" ${INSTANCE_ID}. The authentication token will also be needed and this can be found through running: gcloud auth print-access-token.
The correct answer has been provided by #Edwin Elia in the comment section:
Retrieving the run-id of a Data Fusion pipeline within its run or the predecessor pipeline's is not possible currently. Here is an enhancement that you can track that would make it possible.
When talking about retrieving the run_id value after pipeline completion you should be able to use the REST API from the CDAP documentation to get information on the run including the run-id.

Is there a repository for downloading sample component logs of openstack?

I am new to OpenStack, is there a repository where I can download OpenStack sample logs of each component to be used for analysis. I need to look for logs of each component that has a log level af error and critical. Found some logs on github and openstack.org but it is more on nova, glance and neutron logs for the other components there is no log file that can be found, need to find logs about cinder, swift, heat and ceilometer can someone help me with this?
Thanks in advance. :)
You can get logs from Openstack Zuul. It's the CI infrastructure for Openstack that tests all new code. It's not going to be exactly the same as regular production logs, as it may contain experimental code, but should give you an idea at least.
http://zuul.openstack.org/
If you click on one of the jobs to expand the list, and then pick one of the jobs you should be able to get logs from all components used in that test.

How to run multiple Kubernetes jobs in sequence?

I would like to run a sequence of Kubernetes jobs one after another. It's okay if they are run on different nodes, but it's important that each one run to completion before the next one starts. Is there anything built into Kubernetes to facilitate this? Other architecture recommendations also welcome!
This requirement to add control flow, even if it's a simple sequential flow, is outside the scope of Kubernetes native entities as far as I know.
There are many workflow engine implementations for Kubernetes, most of them are focusing on solving CI/CD but are generic enough for you to use however you want.
Argo: https://applatix.com/open-source/argo/
Added a custom resource deginition in Kubernetes entity for Workflow
Brigade: https://brigade.sh/
Takes a more serverless like approach and is built on Javascript which is very flexible
Codefresh: https://codefresh.io
Has a unique approach where you can use the SaaS to easily get started without complicated installation and maintenance, and you can point Codefresh at your Kubernetes nodes to run the workflow on.
Feel free to Google for "Kubernetes Workflow", and discover the right platform for yourself.
Disclaimer: I work at Codefresh
I would try to use cronjobs and set the concurrency policy to forbid so it doesn't run concurrent jobs.
I have worked on IBM TWS (Workload Automation) which is a scheduler similar to cronjob where you can mention the dependencies of the jobs.
You can specify a job to run only after it's dependencies has run using follows keyword.

Kubernetes command logging on Google Cloud Platform for PCI Compliance

Using Kubernetes' kubectl I can execute arbitrary commands on any pod such as kubectl exec pod-id-here -c container-id -- malicious_command --steal=creditcards
Should that ever happen, I would need to be able to pull up a log saying who executed the command and what command they executed. This includes if they decided to run something else by simply running /bin/bash and then stealing data through the tty.
How would I see which authenticated user executed the command as well as the command they executed?
Audit logging is not currently offered, but the Kubernetes community is working to get it available in the 1.4 release, which should come around the end of September.
There are 3rd party solutions that can solve the auditing issue, and if you're looking for a PCI compliance as the title implies solutions exist that helps solve the broader problem, and not just auditing.
Here is a link to such a solution by Twistlock. https://info.twistlock.com/guide-to-pci-compliance-for-containers
Disclaimer, I work for Twistlock.

Best practice for getting RDS password to docker container on ECS

I am using Postgres Amazon RDS and Amazon ECS for running my docker containers.
The question is. What is the best practice for getting the username and password for the RDS database into the docker container running on ECS?
I see a few options:
Build the credentials into docker image. I don't like this since then everyone with access to the image can get the password.
Put the credentials in the userdata of the launch configuration used by the autoscaling group for ECS. With this approach all docker images running on my ECS cluster has access to the credentials. I don't really like that either. That way if a blackhat finds a security hole in any of my services (even services that does not use the database) he will be able to get the credentials for the database.
Put the credentials in a S3 and control the limit the access to that bucket with a IAM role that the ECS server has. Same drawbacks as putting them in the userdata.
Put the credentials in the Task Definition of ECS. I don't see any drawbacks here.
What is your thoughts on the best way to do this? Did I miss any options?
regards,
Tobias
Building it into the container is never recomended. Makes it hard to distribute and change.
Putting it into the ECS instances does not help your containers to use it. They are isolated and you'd end up with them on all instances instead of just where the containers are that need them.
Putting them into S3 means you'll have to write that functionality into your container. And it's another place to have configuration.
Putting them into your task definition is the recommended way. You can use the environment portion for this. It's flexible. It's also how PaaS offerings like Heroku and Elastic Beanstalk use DB connection strings for Ruby on rails and other services. Last benefit is it makes it easy to use your containers against different databases (like dev, test, prod) without rebuilding containers or building weird functionality
The accepted answer recommends configuring environment variables in the task definition. This configuration is buried deep in the ECS web console. You have to:
Navigate to Task Definitions
Select the correct task and revision
Choose to create a new revision (not allowed to edit existing)
Scroll down to the container section and select the correct container
Scroll down to the Env Variables section
Add your configuration
Save the configuration and task revision
Choose to update your service with the new task revision
This tutorial has screenshots that illustrate where to go.
Full disclosure: This tutorial features containers from Bitnami and I work for Bitnami. However the thoughts expressed here are my own and not the opinion of Bitnami.
For what it's worth, while putting credentials into environment variables in your task definition is certainly convenient, it's generally regarded as not particularly secure -- other processes can access your environment variables.
I'm not saying you can't do it this way -- I'm sure there are lots of people doing exactly this, but I wouldn't call it "best practice" either. Using Amazon Secrets Manager or SSM Parameter Store is definitely more secure, although getting your credentials out of there for use has its own challenges and on some platforms those challenges may make configuring your database connection much harder.
Still -- it seems like a good idea that anyone running across this question be at least aware that using the task definition for secrets is ... shall way say ... frowned upon?