Ethstat / Interface Grafana CollectD not showing correct value (MB/s) - interface

I use Grafana with CollectD (and Graphite) to monitor my network usage on my server.
I use the 'Interface' Plugin of CollectD and display the graphs like this:
alias(scale(nonNegativeDerivative(collectd.graph_host.interface-eth0.if_octets.rx), 0.00000095367431640625), 'download')
When I now initiate a downlad with a speedlimit. The download runs for approx 10 minutes, but only this is shown (green line is the download). So it only shows a peak.
Do I have to use some other metrics? I also tried the 'ethstat' but that has so many options none of which I understand!
Is there any beginners documentation. I only found the CollectD Docs, which I read but that does not say anything what the metrics of the ethstat actually mean.

No, there isn't any beginner documentation about the ethstats metrics meaning in collectd. This is because the ethstat plugin reports statistics collected by ethtool on your system and the ethtool stats are vendor specific.
To point you in the right direction, run ethtool -S eth0
That should show you names and numbers like what collectd is reporting.
Now run ethtool -i eth0 and find your driver info.
Then, google your driver name and find out what statistics your card reports and what they mean. It may involve reading linux driver source code, but don't be too scared of that. What you want is probably in the comments, not the code.

Related

Perf collection on kubernetes pods

I am trying to find performance bottlenecks by using the perf tool on a kubernetes pod. I have already set the following on the instance hosting the pod:
"kernel.kptr_restrict" = "0"
"kernel.perf_event_paranoid" = "0"
However, I have to problems.
When I collect samples through perf record -a -F 99 -g -p <PID> --call-graph dwarf and feed it to speedscore or similarly to a flamegraph, I still see question marks ??? and the process that I would like to see its CPU usage breakdown (C++ based), the aforementioned ??? is on the top of the stack and system calls fall below it. The main process is the one that has ??? around on it.
I tried running perf top and it says
Failed to mmap with 1 (Operation not permitted)
My questions are:
For collecting perf top, what permissions do I need to change on the host instance of the pod?
Which other settings do I need to change at the instance level so I don't see any more ??? showing up on perf's output. I would like to see the function call stack of the process, not just the system calls. See the following stack:
The host OS is ubuntu.
Zooming in on the first system call, you would see this, but this only gives me a fraction of the CPU time spent and only the system calls.
UPDATE/ANSWER:
I was able to run perf top, by setting
"kernel.perf_event_paranoid" = "-1". However, as seen in the image below, the process I'm trying to profile (I've blackened out the name to hide the name), is not showing me function names but just addresses. I try running them through addr2line, but it says addr2line: 'a.out': No such file.
How can I get the addresses resolve to function names on the pod? Is it even possible?
I was also able to fix the memory-function mapping with perf top. This was due to the fact that I was trying to run perf from a different container than where the process was running (same pod, different container). There may be a way to add extra information, but just moving the perf to the container running the process fixed it.

OpenZFS cluster setup with Corosync, DRBD & Pacemaker

 
I am trying to set up a ZFS cluster on two nodes running Enterprise Storage OS (ESOS). This is based on Redhat, and running the newest ESOS production release (4.0.12).
I have been reading up on this for a bit, and think I finally understand that I have to use Corosync, DRBD and Pacemaker for this to be done correctly.
Though, I haven't done anything like this before, and still have some questions about the different modules.
 
The complete setup is like the following:
2 ESOS nodes running a ZFS active/passive cluster.
3 ESXi hosts connecting to this cluster using iSCSI. These are connected using fiber.
The 2 ESOS nodes got a dedicated 10G fiber link for synchronization.
 
First of, I am not able to find any answers to whether or not this configuration would ever be possible to archive, considering I am using ZFS.
If I understand what I have read correctly, you configure a shared iSCSI initiator address when this is set up. Then you use that on ESXi, where Corosync, DRBD & Pacemaker does the rest on the SAN side of things. Have I understood this correctly?
Corosync uses rings to communicate date between the two hosts (not so sure about this one, nor what it exactly means).
Do I need to use all three modules (Corosync, DRBD & Pacemaker), and in essence, what do they actually do.
In the different guides I have been reading, I have seen Asymmetric Logical Unit Access (ALUA) been mentioned a couple times. Is this possible to use to instruct iSCSI initiators which SAN node to use, and thereby not have to use a shared initiator?
Does anyone by any chance know of a website where someone has done something like this?
I will try this one tomorrow, and see if it helps me in the right direction: https://marcitland.blogspot.com/2013/04/building-using-highly-available-esos.html
 
Thanks.

Prometheus Exporter is unreachable, what is way to investigate?

I would assume it's more or less common case.
Sometimes we can observe gaps in time series data in Prometheus.
After investigation, we found:
Prometheus was up all time and information from other exporters were exist.
According to "up" metric , exporter was unreachable.
Exporter pod was alive
Looks like exporter application by itself was alive as well, due to some messages in syslog.
Hence, i can conclude we have network problem, which i have no idea how to debug in k8, either Prometheus ignores one exporter (usually the same one) time to time.
Thanks for any hints
One thing you can do to confirm the availability of the exporter is using periodic scraping manually (using a script with curl for example). Or using a scraping tool such as Metricat (https://metricat.dev/).
If you set up interval small enough you might see small windows of unavailability.

EFK - Have preconfigured filter by container that will appear in Kibana

I've got the EFK stack installed on kubernetes following this addon: https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch
What I want to achieve is having all the logs of the same pod together, and even maybe some other filters. But I don't want to configure the filter in kibana with the GUI, I'd like to have them preconfigured in the way that some of my known containers (the containers that I want to monitorize) are configured previously and installed when kibana rather than using an additional step to import/export them. I'd like to have the predefined filters in a way that, immediately after the installation, I can go to "discover", select the pod name that I want to see and then I see all the logs in the format:
In my understanding, that being the first time that I use this tech is near to zero, the in the fluentd-configmap.yml with the correct parameters should do the trick, but none of my tries has altered what I see in kibana.
Am I looking in the correct place for doing this or this filter is not for this use and I'm completely wasting my time? How could I do this filter in any case?
Any help, even if is only a hint, would be appreciated.

How to enable systemd collector in docker-compose.yml file for node exporter

Hi I 'm new to prometheus I have a task to make prometheus show systemd services metrics (I use grafana for visualization) I' m using stefanprodan/dockprom example as my starting point however I couldn't find how to enable systemd collector for node exporter in the node exporter section of the docker-compose.yml and also leave all the enabled by default collectors. Also I need help with getting that info to be sent into grafana. I would appreciate the code in the example or a place where I could find an adequate explanation how to do it like for dummies because I'm not experienced. Thanks in advance.
In order to enable the systemd collector in node_exporter, the command line flag --collector.systemd needs to be passed to the exporter (reference). The default collectors will remain enabled, so you don't need to worry about that.
In order to pass that flag to the application, you need to add that flag to the command portion of the nodeexporter section of the Docker Compose file (here)
In regards to sending the data to Grafana, as long as you have your Prometheus data source configured in Grafana, those metrics will show up automatically -- you don't need to update your Prometheus->Grafana when or removing metrics (or really ever, after initial setup).