there is 2 ways to deploy OpenTelemetry collector on Kubernetes
https://opentelemetry.io/docs/collector/deployment/
Agent and Gateway
my question is when deploying OpenTelemetry collector as a deamonset
why we still need the Agent ?
https://www.aspecto.io/blog/opentelemetry-collector-guide/
agent mode
and also is it good approach to deploy OpenTelemetry as a deamonset without the Agent ?
Before I get to the core of your question, note that the agent/gateway modes are not Kubernetes specific. This pattern is equally valid and applicable in case you're running your microservices on, say, virtual machines or ECS.
Now, let's dive deeper into why there are cases where it makes sense to use the OpenTelemetry collector in agent/gateway mode. At its core, it's about separation of concerns, enabling different teams to focus on things they care about.
The agent mode means running an OpenTelemetry collector "close" to the workload. In the context of Kubernetes, this could be a sidecar (another container running the collector in the app pod), you could run a deployment per namespace, and indeed you could run the collector as a DaemonSet. No matter how you run these collectors, the expectation would be that the dev or product team owns the collector (config) and with it decides what receivers are needed for workloads. For example, one team needs a Prometheus receiver, another team needs a Statsd receiver for their app. We get back to the outbound (exporter) side of the collector in a moment.
The gateway mode means a central (standalone) operation of the collector, typically owned by the platform team, enabling them to:
Centrally enforce policies such as filtering sensitive log items, making sampling decisions for traces, dropping certain metrics, and so forth.
Manage permissions and credentials in a central place. For example, in order to ingest metrics using the Prometheus Remote Write exporter into Amazon Managed Service for Prometheus, the collector needs to use an IAM role with a specific IAM policy attached. The same applies for OTLP exporters, requiring you to provide an API key.
Scale the collector: either by running a bunch of OpenTelemetry collectors behind a loadbalancer and scale them horizontally or by vertically scaling a single collector instance. Through managing the collector scaling, the platform team can ensure that all signals (traces, metrics, logs) are reliable delivered to the backend destinations.
Now we get back to the communication between agents and gateway: this is done via the OpenTelemetry Protocol (OTLP). That is, using the OTLP exporter on the agent side and the OTLP receiver on the gateway side, which further simplifies both the task on the side of the dev/product teams and the platform team, providing for a secure and performant telemetry data transfer.
Related
In the Kubernetes world, a typical/classic pattern is using Deployment for Stateless Applications and using StatefulSet for a stateful application.
I am using a vendor product (Ping Access) which is meant to be a stateless application (it plays the role of a Proxy in front of other Ping products such as Ping Federate).
The github repo for Ping Cloud (where they run these components as containers) shows them running Ping Access (a stateless application) as a Stateful Set.
I am reaching out to their support team to understand why anyone would run a Stateless application as a StatefulSet.
Are there other examples of such usage (as this appears strange/bizarre IMHO)?
I also observed a scenario where a customer is using a StatefulApp (Ping Federate) as a regular deployment instead of hosting them as a StatefulSet.
The Ping Cloud repository does build and deploy Ping Federate as a StatefulSet.
Honestly, both these usages, running a stateless app as a StatefulSet (Ping Access) and running a stateful app as a deployment (Ping Federate) sound like classic anti-patterns.
Apart from the ability to attach dedicated Volumes to StatefulSets you get the following features of which some might be useful for stateless applications:
Ordered startup and shutdown of Pods with K8s doing them one by one in an ordered fashion.
Possibility to guarantee that not more than a single Pod is running at a time even during unscheduled Pod restarts.
Stable DNS names for Pods.
I can only speculate, why Ping Federate uses a StatefulSet. Possibly, it has to do with access limitations of the downstream services it connects to.
The consumption of PingAccess is stateless, but the operation is very much stateful. Namely, the PingAccess admin console maintains a database for configuration, and part of that configuration includes clustered engine mapping and session keys.
Thus, if you were to take away the persistent volume, restarting the admin console would decouple all the engines in the cluster. Then the engines no longer receive configuration.. and web session keys would be mismatched.
The ping-cloud-base repo uses StatefulSet for engines not for persistent volumes, but for sts naming scheme. I personally disagree with this and recommend using Deployment for engines. The only downside is you then have to remove orphaned engines from the admin configuration. Orphaned engines meaning engine config that stays in the admin console db after the engine deployment is rolled/updated. These can be removed from the admin UI, or API. Pretty easy to script in a hook.
It would be ideal for an application that is not a datastore to run without persistent volume, but for the reasons mentioned above, the PingAccess admin console does require and act like a persistent datastore so I think StatefulSet is okay.
Finally, the Ping DevOps team focuses support on their helm chart (where engines are also deployments by default). I'd suspect the community and enterprise support is much larger there for folks deploying on their own. ping-cloud-base is a good place to pick up some hooks though.
I have some limitations with the rights required by Flink native deployment.
The prerequisites say
KubeConfig, which has access to list, create, delete pods and **services**, configurable
Specifically, my issue is I cannot have a service account with the rights to create/remove services. create/remove pods is not an issue. but services by policy only can be created within an internal tool.
could it be any workaround for this?
Flink creates two service in native Kubernetes integration.
Internal service, which is used for internal communication between JobManager and TaskManager. It is only created when the HA is not enabled. Since the HA service will be used for the leader retrieval when HA enabled.
Rest service, which is used for accessing the webUI or rest endpoint. If you have other ways to expose the rest endpoint, or you are using the application mode, then it is also optional. However, it is always be created currently. I think you need to change some codes to work around.
I have a simple java based application deployed in Kubernetes. I want to get the average latency of requests sent to the application(GET and POST).
Stackdriver Monitoring API has the latency details of loadbalancer. But that can only be collected after 210 seconds which is not sufficient in my case. How can I configure in Kubernetes to get the latency details every 30 seconds (or 1 minute) immediately.
I wish the solution to be independent of Java so that I can use it for any application I deploy.
On GKE, you can use Stackdriver Trace, which is GCP specific. I am currently fighting with python client library. Hopefully Java is more mature.
Or you can use Jaeger, which is CNCF project.
Use a Service Mesh
A Service Mesh will let you observe things like latency between your services without extra code for this in each applications. Istio is such an implementation that is available on Google Kubernetes Engine.
Get uniform metrics and traces from any running applications without requiring developers to manually instrument their applications.
Istio’s monitoring capabilities let you understand how service performance impacts things upstream and downstream
See Istio on GCP
use a service mesh: software that helps you orchestrate, secure, and collect telemetry across distributed applications. A service mesh transparently oversees and monitors all traffic for your application, typically through a set of network proxies that sit alongside each microservice.
Welcome to the service mesh era
I'm currently looking into triggering some 3D rendering from an AppEngine-based service.
The idea is that input data is submitted by an API client to this web service, which then invokes an internal Kubernetes GPU enabled application ("rendering backend") to do the hard work.
GPU-enabled clusters are relatively expensive ($$$), so I really want the cluster to be up and running on demand. I am trying to achieve that by setting the autoscaling minimum to 0 for the rendering backend.
The only pretty way of "triggering" a rendering task on such a cluster I could think of is via Pub/Sub Push. Basically, I need something like Cloud Tasks, but those seem to be aimed at long running tasks executed in AppEngine, not Kubernetes. Plus I like the way Pub/Sub decouples the web service from the rendering backend.
Google's Pub/Sub only allows pushing via HTTPS and only to a validated domain. It appears that Google is forcing me to completely "expose" my internal rendering backend by assigning a domain name to it, which feels ridiculous. I cannot just tell Pub/Sub to invoke http://loadbalancer.IP.address/handle_push.
This is making me doubt my architecture.
How would you go about building something like this on GCP?
From the GKE perspective:
You can have a cluster with a dedicated GPU-based nodepool and schedule your pods there using Taints and tolerations. Additionally, you can control the number of nodes in your nodepool using Autoscaling so that, you can use them only when your pods are to be scheduled/run.
Consider that this requires an additional default-non-GPU-based nodepool, where system pods are being run.
For triggering, as long as your default pool is running, you'd be able to deploy your application and the autoscaling should start automatically. For deploying from an App Engine application, you might want to consider talking to the Kubernetes API directly through a library.
Finally and considering the nature of your current goal (3D rendering), it might be best to use Kubernetes Jobs. With these, you can complete an sporadic computational load, allowing the nodepool to downsize once is finished.
Wrapping up, you can have a minimum cluster with a zero-sized GPU-based nodepool that will autoscale when a tainted job is requested to be run there, and once the workload is finished, it should automatically downscale. These actions can be triggered from GAE, using one of the client libraries.
We have a microservice architecture and there are REST services interacting with each other through HTTP. All of these services are hosted on a Kubernetes cluster. Do we need to have explicit authentication for such service interaction or does Kubernetes provide enough security for it?
Kubernetes provides only orchestration for your conteinerized applications. It helps you to run, update, scale your services and provides a way of delivering traffic to them inside the cluster. Most of the Kubernetes security relates to traffic management and role based administration of the cluster.
Some additional tools like Istio can provide you secure communication between pods and some other traffic management capabilities.
Applications in pods should have their own capabilities of providing Authentication and Authorization based on local files/databases or network services like LDAP or OpenID etc.
It's purely based on how you design, architect, how you create a SDD for your system. While designing one, security hardening must be considered and give priority. The software and tools bring their features but, how you adopt is important. Kubernetes is no exception.
You are running your micro-services using HTTP and in production system, you can not believe that your system is secure even if it's running in Kubernetes cluster. Kubernetes brings cool features from security perspective as RBAC, CRD, etc. as you can find in here, Kubernetes 1.8 Security, Workloads and Feature Depth. But, still leveraging only these feature is not sufficient. The internal services should be as secure as external once. Following are few things you should take care once you are running your workload into kubernetes cluster,
Scan all your docker images for vulnerability testing.
Use RBAC over ABAC and assign optimum privileges to respective teams.
Configure a security context for a pod running your service.
Avoid unauthorized internal access to service data and protect all micro-services end-points.
Encryption keys should be rotated over a certain period of time.
The datastore like etcd for your kubernetes cluster must be secured.
Only admin should have access to kubectl.
Use token based validation and enable authentication on all REST api calls.
Continuous Monitoring all the services, logs for analysis, health-check, all the processes running inside containers.
Hope this helps.