MongoDB Atlas dedicated cluster: Should I be concerned about 'Restarts in last hour' alerts? - mongodb

We’re using a standard 3-node Atlas replicaset in a dedicated cluster (M10, Mongo 6.0.3, AWS) and have configured an alert if the ‘Restarts in last hour is’ rule exceeds 0 for any node.
https://www.mongodb.com/docs/atlas/reference/alert-conditions/#mongodb-alert-Restarts-in-Last-Hour-is
We’re seeing this alert fire every now and then and we’re wondering what this means for a node in a dedicated cluster and whether this is something to be concerned about, since I don’t think we have any control over it. Should we should disable this rule or increase the restart threshold?
Thanks in advance for any advice.
(Note I've asked this over at the Mongo community support site also, but haven't received any traction yet so asking here too)

I got an excellent response on my question at the Mongo community support site:
A node restarting is not necessarily a cause for concern. However, you should investigate the cause of the restart itself to better determine if this is an issue or not. You should take a look at your Project Activity Feed to see if you can determine why the nodes are restarting. I understand you have noted this is an M10 cluster so you should have access to the MongoDB logs, you also can check those to try determine the cause of the node restart. If you do not have access to the logs, you can consider working with Atlas in-app chat support to diagnose the issue.
It’s always good to keep the alerts active, as they can indicate a potential problem as soon as they occur. You can consider increasing the restart threshold to reduce alert noise after concluding whether the restarts are expected or not.
In my case, having checked the activity feed I was able to match up all the alerts we were seeing to Mongo version auto-updates on the nodes. We still wanted to keep that so we've increased our alert threshold to fire on >1 restart per hour rather than >0 restart, assuming that auto-updates won't be applied multiple times in the same hour.

Related

Issues with matching service in Cadence

Two days ago, we started presenting some issues with our cadence setup.
The first thing we noticed is the Open workflows were not disappearing from the list once they completed. For example this workflow appears as Open in the list:
But when you click on it, you will see that it’s actually completed:
At the same time this started to happen, we noticed how several workflows would take quite a long time to complete, several of them would stuck in “Schedule” states and never go further from there. After checking the logs, the only error we saw was this:
{"level":"error","ts":"2021-03-06T19:12:04.865Z","msg":"Persistent store operation failure","service":"cadence-matching","component":"matching-engine","wf-task-list-name":"cadence-sys-history-scanner-tasklist-0","wf-task-list-type":1,"store-operation":"create-task","error":"InternalServiceError{Message: CreateTasks operation failed. Error : Request on table cadence.tasks with ttl of 630720000 seconds exceeds maximum supported expiration date of 2038-01-19T03:14:06+00:00. In order to avoid this use a lower TTL, change the expiration date overflow policy or upgrade to a version where this limitation is fixed. See CASSANDRA-14092 for more details.}","wf-task-list-name":"cadence-sys-history-scanner-tasklist-0","wf-task-list-type":1,"number":6300094,"next-number":6300094,"logging-call-at":"taskWriter.go:176","stacktrace":"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error\n\t/cadence/common/log/loggerimpl/logger.go:134\ngithub.com/uber/cadence/service/matching.(*taskWriter).taskWriterLoop\n\t/cadence/service/matching/taskWriter.go:176"}
Does somebody have an idea of why this is happening?
The first one is because of visibility sampling being enabled by default(to protect default core DB). You can disable it by configure system.enableVisibilitySampling to false.
But when you doing that, it’s better to separate the visibility and default store into different database cluster so that visibility doesn’t bring down the default(core data model) DB.
see more in https://github.com/uber/cadence/issues/3884
The second is a bug fixed in 0.16.0
It should be resolved if you upgrade server.
See https://github.com/uber/cadence/pull/3627
and https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/recoveringTtlYear2038Problem.html

Way to configure notifications/alerts for a kubernetes pod which is reaching 90% memory and which is not exposed to internet(backend microservice)

I am currently working on a solution for alerts/notifications where we have microservices deployed on kubernetes in a way of frontend and back end services. There has been multiple occasions where backend services are not able to restart or reach a 90% allocated pod limit, if they encounter memory exhaust. To identify such pods we want an alert mechanism to lookin when they fail or saturation level. We have prometheus and grafana as monitoring services but are not able to configure alerts, as i have quite a limited knowledge in these, however any suggestions and references provided where i can have detailed way on achieving this will be helpful. Please do let me know
I did try it out on the internet for such ,but almost all are pointing to node level ,cluster level monitoring only. :(
enter image description here
The Query used to check the memory usage is :
sum (container_memory_working_set_bytes{image!="",name=~"^k8s_.*",namespace=~"^$namespace$",pod_name=~"^$deployment-[a-z0-9]+-[a-z0-9]+"}) by (pod_name)
I saw this recently on google. It might be helpful to you. https://groups.google.com/u/1/g/prometheus-users/c/1n_z3cmDEXE?pli=1

Sync two offline masters when network available

I have a use case where I need to set up two physical stations at a venue. Each station will be running a couple of app servers and a mongodb server.
I can't rely on the venue's internet access so I need my app to be able to work offline and "sync" the dbs every once in a while.
I initially thought about having two masters that would somehow sync with a remote one but TIL that master-master replication is not possible with mongodb.
I've read about the active-active approach, however, that won't let me write to a different shard when offline.
I'm running out of ideas, any recommendation would be greatly appreciated.
------ Update on what I'm trying to achieve:
I'm working with a venue that has two entrances. The idea is to be able to capture some information from people attending the events (name, email, etc). After getting registered we will print a name tag with some of the info.
Everything sounds pretty easy, however, if possible, I would like to not rely on the venue's network (internet). So that's where I started struggling figuring out whats the best approach. I guess what I want is being able to have a remote mongo but if the network goes down somehow keep saving records locally and send them to the remote mongo instance when network is available again.
Extra considerations:
- Events last a couple of days, some people lose their name tag overnight, they should be able to go to either of the entrances and get it reprinted. So we should be able to find their info even if they registered in entrance A but they are asking for a reprint in entrance B.
More questions:
- Am I overthinking it? Maybe venue's network + a 4G/LTE modem as a backup should be enough? I would prefer not relying on it tho.
I believe you're overthinking things. Here's what I would do if faced with a similar situation:
From the description, it doesn't sound like the two sites need to be connected in real time at all. I would create a server on Entry A, another in Entry B, and consolidate their data each day after the day ended if required. This is because:
It's unlikely that one person will register in both sites within a single day. If they lost their tag on that day, I'll just tell them to go back to where they registered earlier and get it reprinted there. Worst case, you'll create a duplicate entry (should be obvious which is the duplicate since no one would lose their tag within seconds) but I would not anticipate hundreds of people all lost their tags within a day.
If the attendee lost their tag overnight, both servers will have synced data and should be able to reprint.
If you're concerned about the venue's Wifi access, just run cables from the server to the printing stations.
Personally, I would argue that the overnight sync is not really needed at all (see the likelihood of people registering twice). I would just collect the data from both servers after the event ended. That is, unless you have specific needs for the combined data from both entries during the 2nd day.
Note: please make sure you're running a minimum of 3-node replica set. Running a standalone instance for prod environment is not recommended. Hardware/disk corruption is a common event.

Why should I store kubernetes deployment configuration into source control if kubernetes already keeps track of it?

One of the documented best practices for Kubernetes is to store the configuration in version control. It is mentioned in the official best practices and also summed up in this Stack Overflow question. The reason is that this is supposed to speed-up rollbacks if necessary.
My question is, why do we need to store this configuration if this is already stored by Kubernetes and there are ways with which we can easily go back to a previous version of the configuration using for example kubectl? An example is a command like:
kubectl rollout history deployment/nginx-deployment
Isn't storing the configuration an unnecessary duplication of a piece of information that we will then have to keep synchronized?
The reason I am asking this is that we are building a configuration service on top of Kubernetes. The user will interact with it to configure multiple deployments, I was wondering if we should keep a history of the Kubernetes configuration and the content of configMaps in a database for possible roll backs or if we should just rely on kubernetes to retrieve the current configuration and rolling back to previous versions of the configuration.
You can use Kubernetes as your store of configuration, to your point, it's just that you probably shouldn't want to. By storing configuration as code, you get several benefits:
Configuration changes get regular code reviews.
They get versioned, are diffable, etc.
They can be tested, linted, and whatever else you desired.
They can be refactored, share code, and be documented.
And all this happens before actually being pushed to Kubernetes.
That may seem bad ("but then my configuration is out of date!"), but keep in mind that configuration is actually never in date - just because you told Kubernetes you want 3 replicas running doesn't mean there are, or if there were that 1 isn't temporarily down right now, and so on.
Configuration expresses intent. It takes a different process to actually notice when your intent changes or doesn't match reality, and make it so. For Kubernetes, that storage is etcd and it's up to the master to, in a loop forever, ensure the stored intent matches reality. For you, the storage is source control and whatever process you want, automated or not, can, in a loop forever, ensure your code eventually becomes reflected in Kubernetes.
The rollback command, then, is just a very fast shortcut to "please do this right now!". It's for when your configuration intent was wrong and you don't have time to fix it. As soon as you roll back, you should chase your configuration and update it there as well. In a sense, this is indeed duplication, but it's a rare event compared to the normal flow, and the overall benefits outweigh this downside.
Kubernetes cluster doesn't store your configuration it runs it, as you server runs your application code.

Timeline of kubernetes events

I would like to be able to see all of the various things that happened to a kube cluster on a timeline, including when nodes were found to be dead, when new nodes were added, when pods crashed and when they were restarted.
So far the best that we have found is kubectl get event but that seems to have a few limitations:
it doesn't go back in time that far (I'm not sure how far it goes back. A day?)
it combines similar events and orders the resulting list by the time of the latest event in each group. This makes it impossible to know what happened during some time range since events in that range may have been combined with later events outside the range.
One idea that I have is to write a pod that will use the API to watch the stream of events and log them to a file. This would let us control retention and it seems that events that occur while we are watching will not be combined, solving the second problem as well.
What are other people doing about this?
My understanding is that Kubernetes itself dedups events, documented here:
https://github.com/kubernetes/kubernetes/blob/master/docs/design/event_compression.md
Once that happens, there is no way to get the individual events back.
See https://github.com/kubernetes/kubernetes/issues/36304 for complaints how that loses info. https://github.com/kubernetes/kubernetes/pull/46034 at least improved the message. See also https://github.com/kubernetes/enhancements/pull/1291 KEP for recent discussion and proposal to improve usability in kubectl.
How long events are retained? Their "time-to-live" is apparently controlled by kube-apiserver --event-ttl option, defaults to 1 hour:
https://github.com/kubernetes/kubernetes/blob/da53a247633/cmd/kube-apiserver/app/options/options.go#L71-L72
You can raise this. Might require more resources for etcd — from what I saw in some 2015 github discussions, event TTL used to be 2 days, and events were the main thing stressing etcd...
In a pinch, it might be possible to figure out what happened earlier from various log, especially the kubelet logs?
Saving events
Running kubectl get event -o yaml --watch into a persistent file sounds like a simple thing to do. I think when you watch events as they arrive, you see them pre-dedup.
Heapster can send events to some of the supported sinks:
https://github.com/kubernetes/heapster/blob/master/docs/sink-configuration.md
Eventrouter can send events to various sinks: https://github.com/heptiolabs/eventrouter/tree/master/sinks
Have you checked out the pod specific events tab in the Dashboard?
Some events from a cluster I have running in GKE:
kubernetes/heapster can persist event to gcl and influxdb, but for now there is no api to access stored data