K8s - Node alerts - kubernetes

How can I configure GCP to send me alerts when nodes events (create / shutdown) happen?
I would like to receive email alerting me about the cluster scaling.
tks

First, note that you can retrieve such events in Stackdriver Logging by using the following filter :
logName="projects/[PROJECT_NAME]/logs/cloudaudit.googleapis.com%2Factivity" AND
(
protoPayload.methodName="io.k8s.core.v1.nodes.create" OR
protoPayload.methodName="io.k8s.core.v1.nodes.delete"
)
This filter will retrieve only audit activity log entries (cloudaudit.googleapis.com%2Factivity) in your project [PROJECT_NAME], corresponding to a node creation event (io.k8s.core.v1.nodes.create) or deletion (io.k8s.core.v1.nodes.delete).
To be alerted when such a log is generated, there are multiple possibilities.
You could configure a sink to a Pub/Sub topic based on this filter, and then trigger a Cloud Function when a filtered log entry is created. This Cloud Function will define the logic to send you a mail. This is probably the solution I'd choose, since this use case is described in the documentation.
Otherwise, you could define a logs-based metric based on this filter (or one logs-based metric for creation and another for deletion), and configure an alert in Stackdriver Monitoring when this log-based metric is increased. This alert could be configured to send an email. However, I won't suggest you to implement this, because this is not a real "alert" (in the sense of "something went wrong"), but rather an information. You probably don't want to have incidents opened in Stackdriver Monitoring every time a node is created or deleted. But you can keep the idea of one/multiple logs-based metric and process it/them with a custom application.

for a faster way than using GCP sinks, you may also consider using internal Kubernetes nodes watchers.
You can see an example in https://github.com/notify17/k8s-node-watcher-example/blob/5fc3f802de69f65866cc8f37c4b0e721835ea5b9/main.go#L83.
This example uses Notify17 to generate notifications directly to you browser or mobile phone.
The relevant code is:
// Sets up the nodes watcher
watcher, err := api.Nodes().Watch(listOptions)
// ...
ch := watcher.ResultChan()
for event := range ch {
node, ok := event.Object.(*v1.Node)
// ...
switch event.Type {
case watch.Added:
// ...
// Triggers a Notify17 notification for the ADDED event
notify17(httpClient,
"Node added", fmt.Sprintf("Node %s has been added", node.Name))
case watch.Deleted:
// ...
// Triggers a Notify17 notification for the DELETED event
notify17(httpClient,
"Node deleted", fmt.Sprintf("Node %s has been deleted", node.Name))
}
// ...
You can test out this approach by following the instructions provided in the README.
Note: the drawback with this method is that, if the node where the pod lies on gets deleted/killed unsafely, there may be a chance the event will not be triggered for that node. If the node is deleted gracefully instead, like in the case of a cluster autoscaler, then the pod will be probably recreated on a new node before the old node gets deleted, therefore triggering the notification.

Related

Kogito - wait until data from multiple endpoints is received

I am using Kogito with Quarkus. I have set on drl rule and am using a bpmn configuration. As can be seen below, currently one endpoint is exposed, that starts the process. All needed data is received from the initial request, it is then evaluated and process goes on.
I would like to extend the workflow to have two separate endpoints. One to provide the age of the person and another to provide the name. The process must wait until all needed data is gathered before it proceeds with evaluation.
Has anybody come across a similar solution?
Technically you could use a signal or message to add more data into a process instance before you execute the rules over the entire data, see https://docs.kogito.kie.org/latest/html_single/#ref-bpmn-intermediate-events_kogito-developing-process-services.
In order to do that you need to have some sort of correlation between these events, otherwise, how do you map that event name 1 should be matched to event age 1. If you can keep the process instance id, then the second event can either trigger a rest endpoint to the specific process instance or send it a message via a message broker.
You also have your own custom logic to aggregate the events and only fire a new process instance once your criteria of complete data is met, and there is also plans in Kogito to extend the capabilities of how correlation is done, allowing for instance to use variables of the process as the identifier. For example, if you have person.id as correlation and event to name and age of the same id would signal the same process instance. HOpe this info helps.

How to avoid publishing duplicate data to Kafka via Kafka Connect and Couchbase Eventing, when replicate Couchbase data on multi data center with XDCR

My buckets are:
MyDataBucket: application saves its data on this bucket.
MyEventingBucket: A couchbase eventing function extracts the 'currentState' field from MyDataBucket and saves it in this bucket.
Also, I have a kafka couchbase connector that pushs data from MyEventingBucket to kafka topic.
When we had a single data center, there wasn't any problem. Now, we have three data centers. We replicate our data with XDCR between data centers and we work as active-active. So, write requests can be from any data center.
When data is replicated on other data centers, the eventing service works on all data centers, and the same data is pushed three-time (because we have three data centers) on Kafka with Kafka connector.
How can we avoid pushing duplicate data o Kafka?
Ps: Of course, we can run an eventing service or Kafka connector in only one data center. So, we can publish data on Kafka just once. But this is not a good solution. Because we will be affected when a problem occurs in this data center. This was the main reason of using multi data center.
Obviously in a perfect world XDCR would just work with Eventing on the replicated bucket.
I put together an Eventing based work around to overcome issues in an active / active XDCR configuration - it is a bit complex so I thought working code would be best. This is one way to perform the solution that Matthew Groves alluded to.
Documents are tagged and you have a shared via XDCR "cluster_state" document (see comments in the code) to coordinated which cluster is "primary" as you only want one cluster to fire the Eventing function.
I will give the code for an Eventing function "xcdr_supression_700" for version 7.0.0 with a minor change it will also work for 6.6.5.
Note, newer Couchbase releases have more functionality WRT Eventing and allow the Eventing function to be simplified for example:
Advanced Bucket Accessors in 6.6+ specifically couchbase.replace()
can use CAS and prevent potential races (note Eventing does not allow
locking).
Timers have been improved and can be overwritten in 6.6+ thus simplifying the logic needed to determine if a timer is an orphan.
Constant Alias bindings in 7.X allow the JavaScript Eventing code identical between clusters changing just a setting for each cluster.
Setting up XDCR and Eventing
The following code will successfully suppress all extra Eventing mutations on a bucket called "common" or in 7.0.X a keyspace of "common._default._default" with an active/active XDCR replication.
The example is for two (2) clusters but may be extended. This code is 7.0 specific (I can supply a 6.5.1 variant if needed - please DM me)
PS : The only thing it does is log a message (in the cluster that is processing the function). You can just set up two one node clusters, I named my clusters "couch01" and "couch03". Pretty easy to setup and test to ensure that mutations in your bucket are only processed once across two clusters with active/active XDCR
The Eventing Function is generic WRT the JavaScript BUT it does require a different constant alias on each cluster, see the comment just under the OnUpdate(doc,meta) entry point.
/*
PURPOSE suppress duplicate mutations by Eventing when we use an Active/Active XDCR setup
Make two clusters "couch01" and "couch03" each with bucket "common" (if 7.0.0 keyspace "common._default._default")
On cluster "couch01", setup XDCR replication of common from "couch01" => "couch03"
On cluster "couch03", setup XDCR replication of common from "couch03" => "couch01"
This is an active / active XDCR configuration.
We process all documents in "common" except those with "type": "cluster_state" the documents can contain anything
{
"data": "...something..."
}
We add "owner": "cluster" to every document, in this sample I have two clusters "couch01" and "couch03"
We add "crc": "crc" to every document, in this sample I have two clusters "couch01" and "couch03"
If either the "owner" or "crc" property does not exist we will add the properties ourselves to the document
{
"data": "...something...",
"owner": "couch01",
"crc": "a63a0af9428f6d2d"
}
A document must exist with KEY "cluster_state" when things are perfect it looks lke the following:
{
"type": "cluster_state",
"ci_offline": {"couch01": false, "couch03": false },
"ci_backups": {"couch03": "couch01", "couch01": "couch03" }
}
Note ci_offline is an indicator that the cluster is down, for example is a document has an "owner": "couch01"
and "ci_offline": {"couch01": true, "couch03": false } then the cluster "couch02" will take ownership and the
documents will be updated accordingly. An external process (ping/verify CB is running, etc.) runs every minute
or so and then updates the "cluster_state" if a change in cluster state occurs, however prior to updating
ci_offline to "true" the eventing Function on that cluster should either be undeployed or paused. In addition
re-enabeling the cluster setting the flag ci_offline to "false" must be done before the Function is resumed or
re-deployed.
The ci_backups tells which cluster is a backup for which cluster, pretty simple for two clusters.
If you have timers when the timer fires you MUST check if the doc.owner is correct if not ignore the timer, i.e.
do nothing. In addition, when you "take ownership" you will need to create a new timer. Finally, all timers should
have an id such that if we ping pong ci_offline that the timer will be overwritten, this implies 6.6.0+ else you
need do even to more work to suppress orphaned timers.
The 'near' identical Function will be deployed on both clusters "couch01" and "couch02" make sure you have
a constant binding for 7.0.0 THIS_CLUSTER "couch01" or THIS_CLUSTER "couch02", or for 6.6.0 uncomment the
appropriate var statement at the top of OnUpdate(). Next you should have a bucket binding of src_bkt to
keyspace "common._default._default" for 7.0.0 or to bucket "common" in 6.6.0 in mode read+write.
*/
function OnUpdate(doc, meta) {
// ********************************
// MUST MATCH THE CLUSTER AND ALSO THE DOC "cluster_state"
// *********
// var THIS_CLUSTER = "couch01"; // this could be a constant binding in 7.0.0, in 6.X we uncomment one of these to match he cluster name
// var THIS_CLUSTER = "couch03"; // this could be a constant binding in 7.0.0, in 6.X we uncomment one of these to match he cluster name
// ********************************
if (doc.type === "cluster_state") return;
var cs = src_bkt["cluster_state"]; // extra bucket op read the state of the clusters
if (cs.ci_offline[THIS_CLUSTER] === true) return; // this cluster is marked offline do nothing.
// ^^^^^^^^
// IMPORTANT: when an external process marks the cs.ci_offline[THIS_CLUSTER] back to false (as
// in this cluster becomes online) it is assumed that the Eventing function was undeployed
// (or was paused) when it was set "true" and will be redeployed or resumed AFTER it is set "false".
// This order of this procedure is very important else mutations will be lost.
var orig_owner = doc.owner;
var fallback_cluster = cs.ci_backups[THIS_CLUSTER]; // this cluster is the fallback for the fallback_cluster
/*
if (!doc.crc && !doc.owner) {
doc.owner = fallback_cluster;
src_bkt[meta.id] = doc;
return; // the fallback cluster NOT THIS CLUSTER is now the owner, the fallback
// cluster will then add the crc property, as we just made a mutation in that
// cluster via XDCR
}
*/
if (!doc.crc && !doc.owner) {
doc.owner = THIS_CLUSTER;
orig_owner = doc.owner;
// use CAS to avoid a potential 'race' between clusters
var result = couchbase.replace(src_bkt,meta,doc);
if (result.success) {
// log('success adv. replace: result',result);
} else {
// log('lost to other cluster failure adv. replace: id',meta.id,'result',result);
// re-read
doc = src_bkt[meta.id];
orig_owner = doc.owner;
}
}
// logic to take over a failed clusters data, requires updating "cluster_state"
if (orig_owner !== THIS_CLUSTER) {
if ( orig_owner === fallback_cluster && cs.ci_offline[fallback_cluster] === true) {
doc.owner = THIS_CLUSTER; // Here update the doc's owner
src_bkt[meta.id] = doc; // This cluster now will now process this doc's mutations.
} else {
return; // this isn't the fallback cluster.
}
}
var crc_changed = false;
if (!doc.crc) {
var cur_owner = doc.owner;
delete doc.owner;
doc.crc = crc64(doc); // crc DOES NOT include doc.owner && doc.crc
doc.owner = cur_owner;
crc_changed = true;
} else {
var cur_owner = doc.owner;
var cur_crc = doc.crc;
delete doc.owner;
delete doc.crc;
doc.crc = crc64(doc); // crc DOES NOT include doc.owner && doc.crc
doc.owner = cur_owner;
if (cur_crc != doc.crc) {
crc_changed = true;
} else {
return;
}
}
if (crc_changed) {
// update the data with the new crc, to suppress duplicate XDCR processing, and re-deploy form Everything
// we could use CAS here but at this point only one cluster will update the doc, so we can not have races.
src_bkt[meta.id] = doc;
}
// This is the action on a fresh unprocessed mutation, here it is just a log message.
log("A. Doc created/updated", meta.id, 'THIS_CLUSTER', THIS_CLUSTER, 'offline', cs.ci_offline[THIS_CLUSTER],
'orig_owner', orig_owner, 'owner', doc.owner, 'crc_changed', crc_changed,doc.crc);
}
Make sure you have two buckets prior to importing "xcdr_supression_700.json" or "xcdr_supression_660.json"
The 1st cluster's (cluster01) setup play attention to the constant alias as you will need to ensure you have THIS_CLUSTER set to "couch01"
The 2nd cluster's (cluster03) setup play attention to the constant alias as you will need to ensure you have THIS_CLUSTER set to "couch03"
Now if you are running version 6.6.5 you do not have Constant Alias bindings (which act as globals in your Eventing function's JavaScript) thus the requirement to uncomment the appropriate variable example for cluster couch01.
function OnUpdate(doc, meta) {
// ********************************
// MUST MATCH THE CLUSTER AND ALSO THE DOC "cluster_state"
// *********
var THIS_CLUSTER = "couch01"; // this could be a constant binding in 7.0.0, in 6.X we uncomment one of these to match he cluster name
// var THIS_CLUSTER = "couch03"; // this could be a constant binding in 7.0.0, in 6.X we uncomment one of these to match he cluster name
// ********************************
// .... code removed (see prior code example) ....
}
Some comments/details:
You may wonder why we need to use CRC function and store it in the document undergoing XDCR.
The CRC function, crc64(), built into Eventing is used to detect a non-change or a mutation possible due to a XDCR document update. The use of CRC and the properties "owner" and "crc" allow a) the determination of the owning cluster and b) the suppression of the Eventing function when the mutation is due to an XDCR cluster to cluster copy based on the "active" cluster.
Note when updating CRC in the document as part of timer function, the OnUpdate(doc,meta) entry point of the Eventing function will be triggered again. If you have timers when the timer fires you MUST check if the doc.owner is correct if it is not you ignore the timer, i.e. do nothing. In addition, when you "take ownership" you will need to create a new timer. Finally, all timers should have an id such that if we ping pong cluster_state.ci_offline that the timer will be overwritten, this implies you must use version 6.6.0+ else you need do even to more work to determine when a timer fires that the timer is orphaned and then suppress any action. Be very careful in older Couchbase versions because in 6.5 you cannot overwrite a timer by its id and all timer ids should be unique.
Any mutation made to the source bucket by an Eventing function is suppressed or not seen by that Eventing function whether a document is mutated by the main JavaScript code to by a timer callback. Yet these mutations will be seen via XCDR active/active replication in the other cluster.
As to using Eventing timers pay attention to the comment, I put in the prior paragraph about overwriting and suppressing especially if you insist on using Couchbase-server 6.5 which is getting a bit long of tooth so to speak.
Concerning the responsibility to update the cluster_state document, it is envisioned that this would be a periodic script outside of Couchbase run in a Linux cron that does "aliveness" tests with a manual override. Be careful here as you can easily go "split brain" due to a network partitioning issue.
A comment about the cluster_state, this document is subject to XCDR it is a persistent document that the active/active replication makes appear to be a single inter-cluster document. If a cluster is "down" changing it on the live cluster will result in it replicating when the "down" cluster is recovered.
Deploy/Undeploy will either process all current documents via the DCP mutation stream all over again (feed boundary == Everything) -or- only process items or mutations occurring after the time of deployment (feed boundary == From now). So you need careful coding in the first case to prevent acting on the same document twice and you will miss mutations in the second case.
It is best to design our Eventing Functions to be idempotent, where there is no additional effect if it is called more than once with the same input parameters. This can be achieved by storing state in the documents that are processed so you never reprocess them on a re-deploy.
Pause/Resume Invoking Pause will create a check point and shutdown the Eventing processing. The on a Resume the DCP stream will start form the checkpoint (for each vBucket) you will not miss a single mutation subject to DCP dedup. Furthermore all "active" timers that would have fired during the "pause" will fire as soon as possible (typically within the next 7 second timer scan interval).
Best
Jon Strabala
Principal Product Manager - Couchbase

What are the limits on actorevents in service fabric?

I am currently testing the scaling of my application and I ran into something I did not expect.
The application is running on a 5 node cluster, it has multiple services/actortypes and is using a shared process model.
For some component it uses actor events as a best effort pubsub system (There are fallbacks in place so if a notification is dropped there is no issue).
The problem arises when the number of actors grows (aka subscription topics). The actorservice is partitioned to 100 partitions at the moment.
The number of topics at that point is around 160.000 where each topic is subscribed 1-5 times (nodes where it is needed) with an average of 2.5 subscriptions (Roughly 400k subscriptions).
At that point communications in the cluster start breaking down, new subscriptions are not created, unsubscribes are timing out.
But it is also affecting other services, internal calls to a diagnostics service are timing out (asking each of the 5 replicas), this is probably due to the resolving of partitions/replica endpoints as the outside calls to the webpage are fine (these endpoints use the same technology/codestack).
The eventviewer is full with warnings and errors like:
EventName: ReplicatorFaulted Category: Health EventInstanceId {c4b35124-4997-4de2-9e58-2359665f2fe7} PartitionId {a8b49c25-8a5f-442e-8284-9ebccc7be746} ReplicaId 132580461505725813 FaultType: Transient, Reason: Cancelling update epoch on secondary while waiting for dispatch queues to drain will result in an invalid state, ErrorCode: -2147017731
10.3.0.9:20034-10.3.0.13:62297 send failed at state Connected: 0x80072745
Error While Receiving Connect Reply : CannotConnect , Message : 4ba737e2-4733-4af9-82ab-73f2afd2793b:382722511 from Service 15a5fb45-3ed0-4aba-a54f-212587823cde-132580461224314284-8c2b070b-dbb7-4b78-9698-96e4f7fdcbfc
I've tried scaling the application but without this subscribe model active and I easily reach a workload twice as large without any issues.
So there are a couple of questions
Are there limits known/advised for actor events?
Would increasing the partition count or/and node count help here?
Is the communication interference logical? Why are other service endpoints having issues as well?
After time spent with the support ticket we found some info. So I will post my findings here in case it helps someone.
The actor events use a resubscription model to make sure they are still connected to the actor. Default this is done every 20 seconds. This meant a lot of resources were being used and eventually the whole system overloaded with loads of idle threads waiting to resubscribe.
You can decrease the load by setting resubscriptionInterval to a higher value when subscribing. The drawback is that it will also mean the client will potentially miss events in the mean time (if a partition is moved).
To counteract the delay in resubscribing it is possible to hook into the lower level service fabric events. The following psuedo code was offered to me in the support call.
Register for endpoint change notifications for the actor service
fabricClient.ServiceManager.ServiceNotificationFilterMatched += (o, e) =>
{
var notification = ((FabricClient.ServiceManagementClient.ServiceNotificationEventArgs)e).Notification;
/*
* Add additional logic for optimizations
* - check if the endpoint is not empty
* - If multiple listeners are registered, check if the endpoint change notification is for the desired endpoint
* Please note, all the endpoints are sent in the notification. User code should have the logic to cache the endpoint seen during susbcription call and compare with the newer one
*/
List<long> keys;
if (resubscriptions.TryGetValue(notification.PartitionId, out keys))
{
foreach (var key in keys)
{
// 1. Unsubscribe the previous subscription by calling ActorProxy.UnsubscribeAsync()
// 2. Resubscribe by calling ActorProxy.SubscribeAsync()
}
}
};
await fabricClient.ServiceManager.RegisterServiceNotificationFilterAsync(new ServiceNotificationFilterDescription(new Uri("<service name>"), true, true));
Change the resubscription interval to a value which fits your need.
Cache the partition id to actor id mapping. This cache will be used to resubscribe when the replica’s primary endpoint changes(ref #1)
await actor.SubscribeAsync(handler, TimeSpan.FromHours(2) /*Tune the value according to the need*/);
ResolvedServicePartition rsp;
((ActorProxy)actor).ActorServicePartitionClientV2.TryGetLastResolvedServicePartition(out rsp);
var keys = resubscriptions.GetOrAdd(rsp.Info.Id, key => new List<long>());
keys.Add(communicationId);
The above approach ensures the below
The subscriptions are resubscribed at regular intervals
If the primary endpoint changes in between, actorproxy resubscribes from the service notification callback
This ends the psuedo code form the support call.
Answering my original questions:
Are there limits known/advised for actor events?
No hard limits, only resource usage.
Would increasing the partition count or/and node count help here? Partition count not. node count maybe, only if that means there are less subscribing entities on a node because of it.
Is the communication interference logical? Why are other service endpoints having issues as well?
Yes, resource contention is the reason.

Graphite - Gather metrics only from active service instances

Lets say my spring microservice processes data. Every time a successful processing event occurs, for metrics, I update the micrometer counter. This is registered to a Graphite Registry.
registry = new GraphiteMeterRegistry(new GraphiteConfiguration(), Clock.SYSTEM, HierarchicalNameMapper.DEFAULT);
Counter counter = Counter.builder("process").tag("status","success").register(registry);
So far, it sounds good. But what if I have to create and deploy multiple instances of my service?
How do I get the aggregated count of all successful events from all the instances?
To illustrate my case further, I log the counter.count() value on each increment. Here is what i see ->
<Instance 1> <time> <package-name> Count :122
<Instance 2> <time> <package-name> Count :53
So when I run the graphite query on graphana -
process.status.success.count
I tend to get the random count from either of these instances.
What I need is a query like -
process.service-instance.status.success.count
so that I can run a summarize() function in the end.
Update
Now I'm able to source data from all instances by getting the service instance ID. But that presents a new problem - Since I restart my services time and again, and my service-id changes every time, how do I source data from ONLY ACTIVE services?
Since process.*.status.success.count represents aggregate count of ALL services - dead or alive
Never use instance ids for aggregation. When instances restart, instance ids will change. (Use instance-id for logging/debugging/record-keeping purpose only.)
Use service-id for aggregation.
For micrometer, you can add service-name in common tags.
registry.config().commonTags("service", "xyz-service");
Common tags are defined at registry level and every metric associated with that registry will have common tags added to it.
And, for dead or alive situation: The metric was pushed when the instance was alive. So if you want to know how many times some step ran, you'll need to consider that count.
To source data from active instances, use time-filter. That will return data pushed by instances that were alive in that duration (Why? Because dead instances do not push metrics).

How to implement "trigger" for redis datastore?

I have a program, which will poll on a certain key from the redis datastore, and do something when the value satisfies a certain condition.
However, I think periodically polling on redis is quite inefficient, I'm wondering if there is a "trigger" mechanism for redis, when the value changes and satisfies the condition, the trigger will be called. The trigger might be a RPC function, or an HTTP msg, or something else, so that I don't need to poll on it any more, just like the difference between poll and interrupt.
Is this possible?
You can use the Pub/Sub feature of Redis. It's exactly what you need given your circumstances as you described.
Essentially, you SUBSCRIBE to a "channel", and the other part of your application writes (PUBLISH) the value being changed to that channel. Your subscriber (consumer, the client that wants to know about the change) will get notified in virtually realtime.
Since Redis 2.8 (released 22 Nov 2013), there is now a feature called Keyspace Notifications which lets clients subscribe to special Pub/Sub channels for keyspace events, which you can use as a trigger on a certain key.
The feature is disabled by default because "while not very sensible the feature uses some CPU power." To enable, send a CONFIG SET command to configure the feature. For example, the following command will enable keyspace events for String commands:
> CONFIG SET notify-keyspace-events K$
OK
Next, use the regular pubsub SUBSCRIBE command to subscribe to the specially-named channel. For example, to listen to keyspace events on the mykey key in DB 0:
> SUBSCRIBE __keyspace#0__:mykey
Reading messages... (press Ctrl-C to quit)
Test out the feature by setting the key's value from another client:
> SET mykey myvalue
OK
You should receive a message in the subscribed client:
1) "message"
2) "__keyspace#0__:mykey"
3) "set"
After receiving the event, you can fetch the updated value and see if it satisfies the condition in your application code.
If you can use Pub/Sub, that's best. If for some reason that doesn't work, you could also use the (performance-impacting) MONITOR command, which will send you every command the server receives. That's probably not a good idea.
Specifically for lists, you have BLPOP, which will block the connection until a new item is available to pop from a list.
How about a message box to deal with?For example, 2 message (AND operation) could trigger another message, I think this could make some point? Like JBPM, but not complex than that.