How to collect log data of a specific namespace in Openshift? - kubernetes

I have a cluster with many namespaces.
I'm trying to log data from a specific namespace in my Openshift cluster but it is logging the data from all the namespaces. I tried to follow the documentation of the Openshift regarding logging, but there is no mention of scoping the log data.
I followed this documentation:
https://docs.openshift.com/container-platform/4.7/logging/cluster-logging.html
I'm using fluentd as the log collector.

As Cluster Logging on OpenShift, you can transfer logs in namespaces or Pods matched label you select.
The sample CR like Forward logs in my-project namespace to Elasticserach which is deployed by Cluster Logging could be as follows:
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
inputs:
- name: my-app-logs
application:
namespaces:
- my-project
pipelines:
- name: my-app
inputRefs:
- my-app-logs
outputRefs:
- default
You can customize inputs field as you want. It also could be specified Pods using matchLabels expression. *2
outputs default means send logs to default Elasticsearch on Cluster Logging.
*1: https://docs.openshift.com/container-platform/4.11/logging/cluster-logging-external.html
*2: https://docs.openshift.com/container-platform/4.7/logging/cluster-logging-external.html#cluster-logging-collector-log-forward-logs-from-application-pods_cluster-logging-external

Related

Flink kubernetes deployment - how to provide S3 credentials from Hashicorp Vault?

I'm trying to deploy a Flink stream processor to a Kubernetes cluster with the help of the official Flink kubernetes operator.
The Flink app also uses Minio as its state backend. Everything worked fine until I tried to provide the credentials from Hashicorp Vault in the following way:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-app
namespace: default
spec:
serviceAccount: sa-example
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
serviceAccountName: default:sa-example
containers:
- name: flink-main-container
# ....
flinkVersion: v1_14
flinkConfiguration:
presto.s3.endpoint: https://s3-example-api.dev.net
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3p://example-flink/example-1/high-availability/
high-availability.cluster-id: example-1
high-availability.namespace: example
high-availability.service-account: default:sa-example
# presto.s3.access-key: *
# presto.s3.secret-key: *
presto.s3.path-style-access: "true"
web.upload.dir: /opt/flink
jobManager:
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: job-manager-pod-template
annotations:
vault.hashicorp.com/namespace: "/example/dev"
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/agent-init-first: "true"
vault.hashicorp.com/agent-inject-secret-appsecrets.yaml: "example/Minio"
vault.hashicorp.com/role: "example-serviceaccount"
vault.hashicorp.com/auth-path: auth/example
vault.hashicorp.com/agent-inject-template-appsecrets.yaml: |
{{- with secret "example/Minio" -}}
presto.s3.access-key: {{.Data.data.accessKey}}
presto.s3.secret-key: {{.Data.data.secretKey}}
{{- end }}
When I comment the presto.s3.access-key and presto.s3.secret-key config values in the flinkConfiguration, replace them with the above listed Hashicorp Vault annotations and try to provide them programmatically during runtime:
val configuration: Configuration = getSecretsFromFile("/vault/secrets/appsecrets.yaml")
val env = org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getExecutionEnvironment(configuration)
I receive the following error message:
java.io.IOException: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider#5331f738: profile file cannot be null, com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper#bc0353f: Failed to connect to service endpoint: ]
at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3OutputStream.uploadObject(PrestoS3FileSystem.java:1278) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3OutputStream.close(PrestoS3FileSystem.java:1226) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at org.apache.flink.fs.s3presto.common.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52) ~[flink-s3-fs-presto-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.FileSystemBlobStore.put(FileSystemBlobStore.java:80) ~[flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.FileSystemBlobStore.put(FileSystemBlobStore.java:72) ~[flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.BlobUtils.moveTempFileToStore(BlobUtils.java:385) ~[flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.BlobServer.moveTempFileToStore(BlobServer.java:680) ~[flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:350) [flink-dist_2.12-1.14.2.jar:1.14.2]
at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:110) [flink-dist_2.12-1.14.2.jar:1.14.2]
I initially also tried to append the secrets to flink-config.yaml in the docker-entrypoint.sh based on this documentation - Configure Access Credentials:
if [ -f '/vault/secrets/appsecrets.yaml' ]; then
(echo && cat '/vault/secrets/appsecrets.yaml') >> $FLINK_HOME/conf/flink-conf.yaml
fi
The question is how to provide the S3 credentials during the runtime since the Flink operator mounts the flink-config.yaml from a config map and it is a flink-conf.yaml: Read-only file system.
Thank you
There is no support for this from the Kubernetes operator. In fact, this is not a limitation of the Flink Kubernetes operator, it is due to the fact of lack in support in Kubernetes native integration. There is a separate story for this in the Kubernetes operator side - FLINK-27491.
As a workaround, what you can do is, set up an init container and update the config map from the init container using kubernetes API after reading it from the vault. So the updated config map should have the secrets replaced by the init container and those will be visible to the job manager and all of its task managers. The whole Flink cluster journey starts only after updating the config map from the init container so it should be visible to the Flink cluster.
A simple example to update the config map from the init container can be found here. In this example, the config map is updated with a simple CURL command. In theory, you can use any lightweight client to update the config map like this.
A side note: If possible I would suggest to use AWS IAM role rather than IAM plain secrets as IAM role is more secure compared to IAM static credentials.

Controller pattern in k8s operators?

"Operators make use of the controller pattern"
What is controller pattern in k8s operators ?
The controller pattern has been summarized in these 3 sentences:
A controller tracks at least one Kubernetes resource type. These objects have a spec field that represents the desired state. The controller(s) for that resource are responsible for making the current state come closer to that desired state.
Ref: https://kubernetes.io/docs/concepts/architecture/controller/#controller-pattern
Basically, the Kubernetes controllers watches for the respective resource in a control loop. Once, it find a resource, it reads the desired state from the spec and do some work to make the cluster state same as the desired state.
For example, you have created a Deployment where you have specified in the spec that you want 1 pod that runs your application. Now, Deployment controller sees this and create 1 pod in the cluster to match your desired state. Now, if you update the Deployment spec and say that you now want 2 pods. The Deployment controller will see this change as it is always watching for Deployment and create another pod in the cluster to match the desired state.
You can find more details about these in the following resources:
Inside of Kubernetes Controller
A deep dive into Kubernetes controllers
The Job controller is an example of a Kubernetes built-in controller. Built-in controllers manage state by interacting with the cluster API server.
Ref: https://kubernetes.io/docs/concepts/architecture/controller/#control-via-api-server
In my understanding, you create a xxx-operator, it means you add a Kind for your k8s, the kubectl explain <kind-name> can show the Kind msg you've given.
So, use the xx-operator, you can set your app with a simpler yaml. You can also use tools like Kustomize or Helm based on this way.
e.g. :
run this, at first:
kubectl explain ZookeeperCluster
you will get an err.
you can install a helm chart pravega/zookeeper-operator:
helm install --create-namespace -n op-pravega-zk -- zookeeper-operator
pravega/zookeeper-operator
you will get out msg after installed but it's helpless:
NAME: zookeeper-operator
LAST DEPLOYED: Thu Nov 25 11:20:31 2021
NAMESPACE: op-pravega-zk
STATUS: deployed
REVISION: 1
TEST SUITE: None
but, now, if you run this again:
kubectl explain ZookeeperCluster
you will get these:
KIND: ZookeeperCluster
VERSION: zookeeper.pravega.io/v1beta1
DESCRIPTION:
ZookeeperCluster is the Schema for the zookeeperclusters API
FIELDS:
apiVersion <string>
APIVersion defines the versioned schema of this representation of an
object. Servers should convert recognized schemas to the latest internal
value, and may reject unrecognized values. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
kind <string>
Kind is a string value representing the REST resource this object
represents. Servers may infer this from the endpoint the client submits
requests to. Cannot be updated. In CamelCase. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
metadata <Object>
Standard object's metadata. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
spec <Object>
ZookeeperClusterSpec defines the desired state of ZookeeperCluster
status <Object>
ZookeeperClusterStatus defines the observed state of ZookeeperCluster
now, you can use it like this yaml ( ... means ellipsis) :
apiVersion: "zookeeper.pravega.io/v1beta1"
kind: "ZookeeperCluster"
metadata:
name: zookeeper
namespace: pravega-zk
...
spec:
replicas: 3
image:
repository: pravega/zookeeper
...
pod:
serviceAccountName: zookeeper
storageType: persistence
persistence:
reclaimPolicy: Delete
spec:
storageClassName: local-hostpath
...
or like this ( a ZookeeperCluster Kind in this helm chart ) :
helm install --create-namespace -n pravega-zk --set persistence.storageClassName=local-hostpath -- zookeeper pravega/zookeeper
other e.g. like:
Create a Zookeeper cluster.
kubectl create --namespace zookeeper -f - <<EOF
apiVersion: zookeeper.pravega.io/v1beta1
kind: ZookeeperCluster
metadata:
name: zookeeper
namespace: zookeeper
spec:
replicas: 1
EOF
Ref: https://banzaicloud.com/docs/supertubes/kafka-operator/install-kafka-operator/
btw, after you install a operator, you can use kubectl describe clusterrole zookeeper-operator to show some message, then run kubectl api-resources to find it, then you may find the name of this kind ....

Kubectl get deployments, no resources

I've just started learning kubernetes, in every tutorial the writer generally uses "kubectl .... deploymenst" to control the newly created deploys. Now, with those commands (ex kubectl get deploymets) i always get the response No resources found in default namespace., and i have to use "pods" instead of "deployments" to make things work (which works fine).
Now my question is, what is causing this to happen, and what is the difference between using a deployment or a pod? ? i've set the docker driver in the first minikube, it has something to do with this?
First let's brush up some terminologies.
Pod - It's the basic building block for Kubernetes. It groups one or more containers (such as Docker containers), with shared storage/network, and a specification for how to run the containers.
Deployment - It is a controller which wraps Pod/s and manages its life cycle, which is to say actual state to desired state. There is one more layer in between Deployment and Pod which is ReplicaSet : A ReplicaSet’s purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
Below is the visualization:
Source: I drew it!
In you case what might have happened :
Either you have created a Pod not a Deployment. Therefore, when you do kubectl get deployment you don't see any resources. Note when you create Deployments it in turn creates a ReplicaSet for you and also creates the defined pods.
Or may be you created your deployment in a different namespace, if that's the case, then type this command to find your deployments in that namespace kubectl get deploy NAME_OF_DEPLOYMENT -n NAME_OF_NAMESPACE
More information to clarify your concepts:
Source
Below the section inside spec.template is the section which is supposedly your POD manifest if you were to create it manually and not take the deployment route. Now like I said earlier in simple terms Deployments are a wrapper to your PODs, therefore anything which you see outside the path spec.template is the configuration which you will need to defined on how you want to manage (scaling,affinity, e.t.c) your POD
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
Deployment is a controller providing higher level abstraction on top of pods and ReplicaSets. A Deployment provides declarative updates for Pods and ReplicaSets. Deployments internally creates ReplicaSets within which pods are created.
Use cases of deployment is documented here
One reason for No resources found in default namespace could be that you created the deployment in a specific namespace and not in default namespace.
You can see deployments in a specific namespace or in all namespaces via
kubectl get deploy -n namespacename
kubectl get deploy -A

How to enable logging for third party containers in Kubernetes?

The similar as Docker using this as below to configure logging in compose file for third party (mariadb, opentsdb ...) to show logs on Kibana.
logging:
driver: fluentd
options:
fluentd-address: "0.0.0.0:24224"
tag: "docker.{{.ID}}"
I want to ask that how to configure for Kubernetes?
Basically, you can use fluentd to collect logs and push them to 3rd party log storage (StackDriver or ElasticSearch). To ensure that fluentd is running on every cluster node we can use DaemonSet object.
As an example let’s see a part of the file content:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
...
spec:
...
spec:
containers:
- name: fluentd
image: quay.io/fluent/fluentd-kubernetes-daemonset
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch-logging"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
...
This article describes most important steps to get everything set up.
Get Fluentd DaemonSet sources
We have created a Fluentd DaemonSet that have the proper rules and container image ready to get started:
https://github.com/fluent/fluentd-kubernetes-daemonset
Please grab a copy of the repository from the command line using GIT:
$ git clone https://github.com/fluent/fluentd-kubernetes-daemonset

How to configure a Kubernetes Multi-Pod Deployment

I would like to deploy an application cluster by managing my deployment via k8s Deployment object. The documentation has me extremely confused. My basic layout has the following components that scale independently:
API server
UI server
Redis cache
Timer/Scheduled task server
Technically, all 4 above belong in separate pods that are scaled independently.
My questions are:
Do I need to create pod.yml files and then somehow reference them in deployment.yml file or can a deployment file also embed pod definitions?
K8s documentation seems to imply that the spec portion of Deployment is equivalent to defining one pod. Is that correct? What if I want to declaratively describe multi-pod deployments? Do I do need multiple deployment.yml files?
Pagids answer has most of the basics. You should create 4 Deployments for your scenario. Each deployment will create a ReplicaSet that schedules and supervises the collection of PODs for the Deployment.
Each Deployment will most likely also require a Service in front of it for access. I usually create a single yaml file that has a Deployment and the corresponding Service in it. Here is an example for an nginx.yaml that I use:
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
name: nginx
labels:
app: nginx
spec:
type: NodePort
ports:
- port: 80
name: nginx
targetPort: 80
nodePort: 32756
selector:
app: nginx
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginxdeployment
spec:
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginxcontainer
image: nginx:latest
imagePullPolicy: Always
ports:
- containerPort: 80
Here some additional information for clarification:
A POD is not a scalable unit. A Deployment that schedules PODs is.
A Deployment is meant to represent a single group of PODs fulfilling a single purpose together.
You can have many Deployments work together in the virtual network of the cluster.
For accessing a Deployment that may consist of many PODs running on different nodes you have to create a Service.
Deployments are meant to contain stateless services. If you need to store a state you need to create StatefulSet instead (e.g. for a database service).
You can use the Kubernetes API reference for the Deployment and you'll find that the spec->template field is of type PodTemplateSpec along with the related comment (Template describes the pods that will be created.) it answers you questions. A longer description can of course be found in the Deployment user guide.
To answer your questions...
1) The Pods are managed by the Deployment and defining them separately doesn't make sense as they are created on demand by the Deployment. Keep in mind that there might be more replicas of the same pod type.
2) For each of the applications in your list, you'd have to define one Deployment - which also makes sense when it comes to difference replica counts and application rollouts.
3) you haven't asked that but it's related - along with separate Deployments each of your applications will also need a dedicated Service so the others can access it.
additional information:
API server use deployment
UI server use deployment
Redis cache use statefulset
Timer/Scheduled task server maybe use a statefulset (If your service has some state in)