influxdb container high cpu- i/o- network-usage - raspberry-pi

im running Homeassistant + InfluxDb 1.8.10 on my Raspy 4 8GB.
Few days ago i noticed that my InfluxDB is running high on CPU, I/O Usage and Network Usage.
Earlier is was around 1-5 % CPU, now the Container Stats tells me ~250%...
Portainer Container Stats
Do u have a Idea what to do?
Thank you in advance,
Chris
So first i thought my sd card is diing, so i moved my complete data to a new ssd...
The Load moved from 7/8 to 4/5, but same container stats.
Pi Data
Then I moved the InfluxDB to a own container and removed it from Homeassistant, but even no success.

Related

HAProxy reverse ssl termination: Memory keeps growing. Memory leak?

I have haproxy 2.5.1 in SSL termination config running in a container of a Kubernetes POD, the backend is an Scala App that runs in another container of same POD.
I have seen that I can put 500K connections in the setup and the RSS memory usage of HAProxy is 20GB. If I remove the traffic and wait 15 minutes the RSS memory drops to 15GB, but if I repeat the same exercise one or two more times, RSS for HAProxy will hit 30GB and HAProxy will be kill as I have a limit of 30GB in the POD for HAProxy.
The question here is if this behavior of continuous memory growth is expected?
Here is the incoming traffic:
And here is the memory usage chart which shows how after 3 cycles of Placing Load and Removing Load, the RSS memory reached 30GB and then got killed (Just as an observation the two charts have different timezone but they belong to same execution)
We switched from Alpine based image(musl) into libc based image and that solved the problem. We got 5X increase on connection rate and memory growth gone too.

Evaluation of thousands metrics in InfuxDB

I’m trying to evaluate thousands of metrics using a checkers, but my computer doesn’t count it. I tried tasks too.
PC: notebook with Core i5 (8 threads) and 16 GB RAM
I’m running influxdb in the docker (6 threads, 8 GB RAM is allowed).
Have you some idea where is problem?
Or influxdb can compute so many metrics?
Thanks!
I solved it on influxdb community: https://community.influxdata.com/t/evaluation-of-thousands-metrics/19422

Running multiple containers on the same Service Fabric node

I have a windows Service Fabric node with 4 cores and I want to host 3 containerized stateless services on it, where each windows container is allocated 1 core to read a message from a queue and process it. I run some experiments and got these results:
1 container running on the node: message takes ~18 sec to be
processed, avg cpu usage per container: 24.7%, memory usage: 1 GB
2 containers running on the node: message takes ~25 sec to be
processed, avg cpu usage per container: 24.4%, memory usage: 1 GB
3 containers running on the node: message takes ~35 sec to be
processed, avg cpu usage per container: 24.6%, memory usage: 1 GB
I thought that containers are supposedly isolated, and I expected the processing time to be constant at ~18s regardless of the number of containers, but in this case, it seems that adding one container affects the processing time in other containers. Each container is set to use 1 core, so they shouldn't be overstepping to use each other's resources, and cpu is not reaching full utilization. Even if cpu was a bottleneck here, I'd expect that at least 2 containers would be able to run with ~18 sec processing time.
Is there a logical explanation for the results? Isn't it not possible to run multiple containers on the same Service Fabric host without affecting the performance of each when there are enough compute resources? How big could the Service Fabric overhead possibly be when trying to run multiple containers on the same node?
Thanks!
Your container is not only using CPU, but also memory and I/O (disk, network), which can also become bottlenecks.
To see the overhead of SF, run the containers outside of SF and see if it makes a difference.
Use a machine with more memory, and after that, try using an SSD drive. See if that increases performance.
To avoid process overhead, consider using a single container and have multiple threads do parallel message processing. Make sure to assign it 3 cores.

Elastic Cloud APM Server - Queue is full

I have many Java microservices running in a Kubernetes Cluster. All of them are APM agents sending data to an APM server in our Elastic Cloud Cluster.
Everything was working fine but suddenly every microservice received the error below showed in the logs.
I tried to restart the cluster, increase the hardware power and I tried to follow the hints but no success.
Obs: The disk is almost empty and the memory usage is ok.
Everything is in 7.5.2 version
I deleted all the indexes related to APM and everything worked after some minutes.
for better performance u can fine tune these fields in apm-server.yml file
internal queue size increase queue.mem.events=output.elasticsearch.worker * output.elasticsearch.bulk_max_size
default is 4096
output.elasticsearch.worker (increase) default is 1
output.elasticsearch.bulk_max_size (increase) default is 50 very less
Example : for my use case i have used following stats for 2 apm-server nodes and 3 es nodes (1 master 2 data nodes )
queue.mem.events=40000
output.elasticsearch.worker=4
output.elasticsearch.bulk_max_size=10000

How to get JMETER Load Testing Accurately With multiple webservice (pods)

I am new to JMeter Load testing. We have installed a distributed JMeter Load testing in EC2 AWS, with 1 master and 5 slaves.
So I am testing my web service endpoint (HTTP Request) in JMeter.
Details:
Tested my web service and reached 1900 threads with 0% error percentage in 1 pod with 8Gi RAM.
Now, I deployed my web service in 2 pods (replicas) with 8Gi RAM, however it is having more than 0% error percentage even my threads was set ONLY to 2100.
Did check the logs, there is no error, did check the pods health it was alright, did check the Database CPU utillization and it was alright.
Our expectation is that, since we have 2 pods already, it can accomodate x2 of 1900 threads (just like what happened in 1 pod).
Did I miss something to check? I hope someone could give me some light :(
It has been bugging me for 12 hours now.
Thank you in advance.