When setting up metrics and telemetry for my API, is it an anti-pattern to track something like "request-latency" as a metric (possibly in addition to) tracking it as a span?
For example, say my API makes a request to another API in order to generate a response. If I want to track latency information such as:
My API's response latency
The latency for the request from my API to the upstream API
DB request latency
Etc.
That seems like a good candidate for using a span but I think it would also be helpful to have it as a metric.
Is it a bad practice to duplicate the OTEL data capture (as both a metric and a span)?
I can likely extract that information and avoid duplication, but it might be simpler to log it as a metric as well.
Thanks in advance for your help.
I would say traces and also metrics have own use cases. Traces have usually low retention period (AWS X-Ray: 30 days) + you can generate metrics based on traces for short time period (AWS X-Ray: 24 hours). If you will need longer time period then those queries will be expensive (and slow). So I would say metrics stored in time series DB will be perfect use case for longer time period stats.
BTW: there is also experimental Span Metrics Processor, which you can use to generate Prometheus metrics from the spans directly with OTEL collector - no additional app instrumentation/code.
Related
We currently have a GKE environemt with several HPAs for different deployments. All of them work just fine out-of-the-box, but sometimes our users still experience some delay during peak hours.
Usually this delay is the time it takes the new instances to start and become ready.
What I'd like is a way to have an HPA that could predict usage and scale eagerly before it is needed.
The simplest implementation I could think of is just an HPA that could take the average usage of previous days and in advance (say 10 minutos earliers) scale up or down based on the historic usage for the current time-frame.
Is there anything like that in vanilla k8s or GKE? I was unable to find anything like that in GCP's docs.
If you want to scale your applications based on events/custom metrics, you can use KEDA (Kubernetes-based Event Driven Autoscaler) which support scaling based on GCP Stackdriver, Datadog or Promtheus metrics (and many other scalers).
What you need to do is creating some queries to get the CPU usage at the moment: CURRENT_TIMESTAMP - 23H50M (or the aggregated value for the last week), then defining some thresholds to scale up/down your application.
If you have trouble doing this with your monitoring tool, you can create a custom metrics API that queries the monitoring API and aggregate the values (with the time shift) before sending it to the metrics-api scaler.
I have a PromQL query that is looking at max latency per quantile and displays the data in Grafana, but it shows data from a pod that is redeployed and no longer exists. The pod is younger than the staleness period of 15 days.
Here's the query: max(latency{quantile="..."})
The max latency found is from the time it was throttling, and shortly after it got redeployed and went back to normal, and now I want to look only at the max latency of what is currently live.
All the info that I found so far about staleness says it should be filtering behind the scenes, but doesn't look like it's happening in the current setup and I cannot figure out what should I change.
When adding manually in the query the specific instance ID - it works well, but the ID will change once it gets redeployed: max(latency{quantile="...", exported_instance="ID"})
Here is a long list of similar questions I found, some are not answered, some are not asking for the same. The ideas that I did find that are somewhat relevant but don't solve the problem in a sustainable way are:
Suggestions from the links below that were not helpful
change staleness period, won't work because it affects the whole system
restart Prometheus, won't work because it can't be done every time a pod is redeployed
list each graph per machine, won't work with a max query
Links to similar questions
How do I deal with old collected metrics in Prometheus?
Switch prom->elk: log based monitoring
Get data from prometheus only from last scrape iteration
Staleness is a relevant concept, in Singlestat it shows how to use only current value
Grafana dashboard showing deleted information from prometheus
Default retention is 15 days, hide machines with a checkbox
How can I delete old Jobs from Prometheus?
Manual query/restart
grafana variable still catch old metrics info
Update prometheus targets
Clear old data in Grafana
Delete with prometheus settings
https://community.grafana.com/t/prometheus-push-gateway/18835
Not answered
https://www.robustperception.io/staleness-and-promql
Explains how new staleness works without examples
The end goal
is displaying the max latency between all sources that are live now, dropping data from no longer existing sources.
You can use auto generated metric named up to isolate your required metrics from others. You can easily determine which metric sources are offline from up metric.
up{job="", instance=""}:
1 if the instance is healthy, i.e. reachable,
or 0 if the scrape failed.
I'm contemplating on whether to use MongoDB or Kafka for a time series dataset.
At first sight obviously it makes sense to use Kafka since that's what it's built for. But I would also like some flexibility in querying, etc.
Which brought me to question: "Why not just use MongoDB to store the timestamped data and index them by timestamp?"
Naively thinking, this feels like it has the similar benefit of Kafka (in that it's indexed by time offset) but has more flexibility. But then again, I'm sure there are plenty of reasons why people use Kafka instead of MongoDB for this type of use case.
Could someone explain some of the reasons why one may want to use Kafka instead of MongoDB in this case?
I'll try to take this question as that you're trying to collect metrics over time
Yes, Kafka topics have configurable time retentions, and I doubt you're using topic compaction because your messages would likely be in the form of (time, value), so the time could not be repeated anyway.
Kafka also provides stream processing libraries so that you can find out averages, min/max, outliers&anamolies, top K, etc. values over windows of time.
However, while processing all that data is great and useful, your consumers would be stuck doing linear scans of this data, not easily able to query slices of it for any given time range. And that's where time indexes (not just a start index, but also an end) would help.
So, sure you can use Kafka to create a backlog of queued metrics and process/filter them over time, but I would suggest consuming that data into a proper database because I assume you'll want to be able to query it easier and potentially create some visualizations over that data.
With that architecture, you could have your highly available Kafka cluster holding onto data for some amount of time, while your downstream systems don't necessarily have to be online all the time in order to receive events. But once they are, they'd consume from the last available offset and pickup where they were before
Like the answers in the comments above - neither Kafka nor MongoDB are well suited as a time-series DB with flexible query capabilities, for the reasons that #Alex Blex explained well.
Depending on the requirements for processing speed vs. query flexibility vs. data size, I would do the following choices:
Cassandra [best processing speed, best/good data size limits, worst query flexibility]
TimescaleDB on top of PostgresDB [good processing speed, good/OK data size limits, good query flexibility]
ElasticSearch [good processing speed, worst data size limits, best query flexibility + visualization]
P.S. by "processing" here I mean both ingestion, partitioning and roll-ups where needed
P.P.S. I picked those options that are most widely used now, in my opinion, but there are dozens and dozens of other options and combinations, and many more selection criteria to use - would be interested to hear about other engineers' experiences!
I have an issue with designing a database in mongo db.
So in general, the system will continuously gather insight user data (e.g. likes, retweets, views) from different social websites apis (twitter api , instagram api , fb api) with with different rate of each channel. While also saving each insight every hour as historical data . These current real time insights should be viewed by users in the website.
Should I save the insight data in cache and the historical insight data in document ?
What is the expected write rate and query rate?
What rate will the dataset grow at? These are key questions that will determine the size and topology of your MongoDB Cluster. If your write rate does not exceed the write capacity of a single node then you should be able to host your data on a single replica set. However, this assumes that your data set is not large (>1TB). At that size recovery from a single node failure can be time-consuming (it will not cause an outage but the longer a single node is down the higher the risk of a second node failing).
In both cases (write capacity exceeds a single node or dataset is larger than 1TB) the rough guidance is that this is the time to consider a [sharded cluster][2]. Design of a sharded cluster is beyond the scope of a single StackOverflow answer.
I have multiple servers/workers going through a task queue doing API requests. (Django with Memcached and Celery for the queue) The API requests are limited to 10 requests a second. How can I rate limit it so that the total number of requests (all servers) don't pass the limit?
I've looked through some of the related rate limit questions I'm guessing they are focused on a more linear, non concurrent scenario. What sort of approach should I take?
Have you looked in Rate Limiter from Guava project? They introduced this class in one of the latest releases and it seems to partially satisfy your needs.
Surely it won't calculate rate limit across multiple nodes in distributed environment but what you coud do is to have rate limit configured dynamically based on number of nodes which are are running (ie for 5 nodes you'd have rate limit of 2 API requests a second)
I have been working on an opensource project to solve this exact problem called Limitd. Although I don't have clients for other technologies than node yet, the protocol and the idea are simple.
Your feedback is very welcomed.
I solved that problem unfortunately not for your technology: bandwidth-throttle/token-bucket
If you want to implement it, here's the idea of the implementation:
It's a token bucket algorithm which converts the containing tokens into a timestamp since when it last was completely empty. Every consumption updates this timestamp (locked) so that each process shares the same state.