Is it possible to push metrics to a grafana agent from a java application with OpenTelemetry agent instrumentation? - grafana

When I'm setting up my Grafana Agent, I see only the "scraping" option, so the agent itself needs to poll my java app (e.g. /actuator/metrics endpoint). I'm wondering if it's possible to have the "push" approach instead.
Basically, I want the same as we could do to collect traces: add otlp java agent to our application, set a grpc/http receiver on the agent side and that's it. But it seems that you cannot have receivers block for metrics section (which exists in traces section).
I'm using Grafana Agent v0.31 if it matters.

Related

Could open-telemetry collector be used without instrumentation?

We are building application with argo workflows. We are wondering if we could just set up opentelemetry collector inside of our kubernetes cluster, and start using it as stdout exporter into elastic stack. Couldn't find information if OTEL can export logs without instrumentation. Any thoughts?
The short answer is: yes. For logs you don't need instrumentation, in principle.
However, logs are still in development (so, you'd need to track upstream and deal with the fact that you're operating against a moving target. There are a number of components upstream you can use, for example, you can use the Filelog Receiver and the Elasticsearch Exporter in a pipeline. I recently did a POC and demo (just ignore the custom collector part and use upstream) that you could use as a starting point.
As indicated by the following OpenTelemetry (Otel) website:
https://opentelemetry.io/docs/concepts/data-collection/
the Receiver component of Otel Collector can be either push or pull based. For example, using the following Java agent of Otel Collector:
https://github.com/open-telemetry/opentelemetry-java-instrumentation
a Spring Boot jar can be bootstrapped with such an agent without the need to change any code inside this Spring Boot app and still be able to push the telemetry data onto the Otel Collector. Another example is to use a puller instance of Otel Collector to pull the telemetry data out of a Redis runtime using the following example:
https://signoz.io/blog/redis-opentelemetry/

Deploying and Update Process of On Premise Kubernetes Environment Application

We are developing a microservice based system that is orchestrated using Kubernetes. Part of our use case is supplying our clients an On-Premise installation where they receive an Image (VMDK / QCOW2) with all the system deployed.
One of our main challenges is handling the update process of such system, currently the plan is to have an API endpoint that will receive an encrypted and signed package that will contain all the images and a certain update shell script. The API endpoint will start an asynchronous process that will extract the images and execute the shell script that eventually should call the Kubernetes to update all the images with the new code.
The question is where this API endpoint should be defined?
Be in a special "Maintenance" service that will be outside of the Kubernetes and control it, this service will be updated last in case it's code should be also updated.
Be part of one of the microservices containers that run inside Kubernetes - but then this image can be part of the updated images so any API that should return the update status can be un-available
What is the common way to export an interface to System Update or System Deployment wizard processes?
Thanks!

Application Performance monitoring on Swisscom Application Cloud

I am investigating options for monitoring our installation in Swisscom's cloud-foundry. My objectives are the following:
monitor performance indicators for deployed application (such as cpu, disk, memory)
monitor performance indicators for services (slow queries, number of queries, ideally also some metrics on hitting quotas)
So far, I understand the options are the following (including some BUTs):
I used a very nice TOP cf-plugin (github)
This works very well. It seems that it registers itself to get the required firehose nozzles and consume data.
That is very useful for tracing / ad-hoc monitoring, but not very good for a serious infrastructure monitoring.
Another way I found is to use firehose-syslog solution.
This can be deployed as an app to (as far as I understand) do the job in similar way, as the TOP cf plugin.
The problem is, that it requires registered client, so it can authenticate with the doppler endpoint. For some reason, the top-cf-plugin does that automatically / in another way.
Last option i am considering is to build the monitoring itself to the App (using a special buildpack)
That can be for example done with Datadog. But it seems to also require a dedicated uaa client to register the Nozzle.
I would like to check, if somebody is (was) on the similar road, has some findings.
Eventually I would like to raise the following questions towards the swisscom community support:
is it possible to register uaac client to be able to ingest events through the firehose nozzle from external service? (this requires admin credentials if I was reading correctly)
is there an alternative way to authenticate with the nozzle (for example using a special user and his authentication token?)
is there any alternative to monitor the CF deployments in Swisscom? Eventually, is there a paper, blogpost or other form of documentation, that would be helpful in this respect (also for other users of AppCloud)?
Since it requires admin permissions, we can not give out UAA clients for the firehose.
However, there are different ways to get metrics in context of a user.
CF API
You can obtain basic metrics of a specific app by polling the CF API:
https://apidocs.cloudfoundry.org/5.0.0/apps/get_detailed_stats_for_a_started_app.html
However, since you have to poll (and for each app), it's not the recommended way.
Metrics in syslog drain
CF allows devs to forward their logs to syslog drains; in more recent versions, CF also sends metrics to this syslog drain (see https://docs.cloudfoundry.org/devguide/deploy-apps/streaming-logs.html#container-metrics).
For example, you could use Swisscom's Elasticsearch service to store these metrics and then analyze it using Kibana.
Metrics using loggregator (firehose)
The firehose allows streaming logs to clients for two types of roles:
Streaming all logs to admins (which requires a UAA client with admin permissions) and streaming app logs and metrics to devs with permissions in the app's space. This is also what the cf logs command uses. cf top also works this way (it enumerates all apps and streams the logs of each app).
However, you will find out that most open source tools that leverage the firehose only work in admin mode, since they're written for the platform operator.
Of course you also have the possibility to monitor your app by instrumenting it (white box approach), for example by configuring Spring actuator in a Spring boot app or by including an agent of your favourite APM vendor (Dynatrace, AppDynamics, ...)
I guess this is the most common approach; we've seen a lot of teams having success by instrumenting their applications. Especially since advanced monitoring anyway requires you to create your own metrics as the firehose provided cpu/memory metrics are not that powerful in a microservice world.
However, option 2. would be worth a try as well, especially since the ELK's stack metric support is getting better and better.

mongodb mms monitoring agent does not find group members

I have installed the latest mongodb mms agent (6.5.0.456) on ubuntu 16.04 and initialised the replicaset. Hence I am running a single node replicaset with the monitoring agent enabled. The agent works fine, however it does not seem to actually find the replicaset member:
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Iterate:170] Received new configuration: Primary agent, Assigned 0 out of 0 plus 0 chunk monitor(s)
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Iterate:182] Nothing to do. Either the server detected the possibility of another monitoring agent running, or no Hosts are configured on the Group.
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Run:199] Done. Sleeping for 55s...
[2018/05/26 18:30:30.222] [discovery.monitor.info] [components/discovery.go:discover:746] Performing discovery with 0 hosts
[2018/05/26 18:30:30.222] [discovery.monitor.info] [components/discovery.go:discover:803] Received discovery responses from 0/0 requests after 891ns
I can see two processes for monitor agents:
/bin/sh -c /usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config >> /var/log/mongodb-mms/monitoring-agent.log 2>&1
/usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config
However if I terminate one, it also tears down the other, so I do not think that is the problem.
So, question is what is the Group that the agent is referring to. Where is that configured? Or how do I find out which Group the agent refers to and how do I check if the group is configured correctly.
The rs.config() looks fine, with one replicaset member, which has a host field, which looks just fine. I can use that value to connect to the instance using the mongo command. no auth is configured.
EDIT
It kind of looks that the cloud manager now needs to be configured with the seed host. Then it starts to discover all the other nodes in the replicaset. This seems to be different to pre-cloud-manager days, where the agent was able to track the rs - if I remember correctly... Probably there still is a way to get this done easier, so I am leaving this question open for now...
So, question is what is the Group that the agent is referring to. Where is that configured? Or how do I find out which Group the agent refers to and how do I check if the group is configured correctly.
Configuration values for the Cloud Manager agent (such as mmsGroupId and mmsApiKey) are set in the config file, which is /etc/mongodb-mms/monitoring-agent.config by default. The agent needs this information in order to communicate with the Cloud Manager servers.
For more details, see Install or Update the Monitoring Agent and Monitoring Agent Configuration in the Cloud Manager documentation.
It kind of looks that the cloud manager now needs to be configured with the seed host. Then it starts to discover all the other nodes in the replicaset.
Unless a MongoDB process is already managed by Cloud Manager automation, I believe it has always been the case that you need to add an existing MongoDB process to monitoring to start the process of initial topology discovery. Once a deployment is monitored, any changes in deployment membership should automatically be discovered by the Cloud Manager agent.
Production employments should have authentication and access control enabled, so in addition to adding a seed hostname and port via the Cloud Manager UI you usually need to provide appropriate credentials.

Cloud foundry app status or health notification?

Is there a way to get some notification when a Cloud Foundry application fails or is unreachable? I mean to register to some deployed app and if the status of the application is changed to failed or something, I want to receive a notification.
On Pivotal Cloud Foundry, when a app crashes, an event is emitted thru the firehose.
PCF Metrics tile, available from Pivotal, can be deployed to your PCF foudnation. PCF Metrics will track all events for apps running on the foundation and are accessible to developers (thru Apps Manager). I believe Metrics tile tracks history for up to two weeks. I am not aware of any alerting capabilities in the PCF Metrics tile (I could be wrong, in which case, please correct me), that will prompt you when an app crashes.
Other approaches are to implement event logging tools like Splunk, New Relic etc. They support alerts. You will have to build those.
API monitoring tools like AppD, Apigee, and New Relic provide alerting and can notify you went the response time to an app has degraded (as in your app has crashed). This approach is a little more involved. You may require to add an agent to your buildpack, depending on the tool you choose.
IMHO there is no such built-in feature for Cloud Foundry, but IBM Cloud offers the Availability Monitoring service to monitor apps and send out alerts in case of unavailability or other similar events. The service is part of the DevOps category in the IBM Cloud catalog.
There is also Alert Notification to manage alerts, the notification of the right groups via all kinds of channels and to track the alert status. For your question you should start with the Availability Monitoring and then work towards how those events are handled.
You can use the cf events appname command to get a list of all events about the application, this will print out all the recent events such as application crashes.
if run the cf events appname -v you will see the json rest calls the cf cli makes to Cloud Foundry.
You can use Cloud Foundry Java Client to write you own code to interact with Cloud Foundry.
Another thing you can do is stream your application logs to any syslog compatible log aggregation service for example splunk. Then have splunk monitor for app crash events in the log. You can read how to configure app log streaming at the docs
This functionality is scheduled to be available with PCF Metrics 1.5 and can be seen with PWS (Pivotal Web Services) in Alpha Mode.
The functionality is available under the Monitors Tab inside of PCF Metrics (1.5).
Webhook notifications (i.e. Slack) can be configured for a number of Events (including as you discussed crashes).
You can create a User Provided service and Add a syslog drain URL. And then bind the service to your application. Now in case of any events happening it will put the logs into the URL you have provided.