Cometd Seti- Is there a way to find the list of ServerSession associated with the given userId? - cometd

Is there a way to find the list of ServerSession associated with the given userId?
public boolean associate(final String userId, final ServerSession session)
{
if (session == null)
throw new NullPointerException();
LocalLocation location = new LocalLocation(userId, session);
boolean wasAssociated = isAssociated(userId);
boolean added = associate(userId, location);
if (added)
{
session.addListener(location);
if (_logger.isDebugEnabled())
_logger.debug("Associated session {} to user {}", session, userId);
if (!wasAssociated)
{
if (_logger.isDebugEnabled())
_logger.debug("Broadcasting association addition for user {}", userId);
// Let everyone in the cluster know that this session is here
_session.getChannel(SETI_ALL_CHANNEL).publish(new SetiPresence(true, userId));
}
}
return added;
}
If the user is already associated, seti doesn't publish on SETI_ALL_CHANNEL to notify other comets.
Detail Explanation-
I have implemented cometd oort cluster to push the database change notification to the browser.
Two nodes are considered as master/slave. Master node only receives the notification from the database.
Users are connected through the slave node.
When the user first handshake with the slave node, seti publishes an event user presence added.
Both the nodes in the cluster know about the users in the cloud.
When, user refreshes the browser a new(second) handshake is initiated. The userId(loginUserId) remains the same. Seti doesn’t publishes it to the cluster which is correct by design.
After certain time, the session through the first handshake is removed due to inactivity. The presence removed event is fired by seti, which is also the expected by design.
Slave nodes only know the user is connected through second handshake, however the master node doesn’t know that the user is present in the cloud.
When new event arrives from database, the master node doesn’t see the user in the cloud and thus no events are transferred to the browser. Moreover, the master and slave nodes are connected at this point of time. The users connected to the cluster is not synchronized between two nodes.
I was thinking to seti/disassociate and then seti/associate the session for the same user.
We can get a list of sessions connected-
seti.getOort().getBayeuxServer().getSessions();
// clientId - previous client session id need to be identified
ServerSession serverSession=seti.getOort().getBayeuxServer().getSession(clientId);
Through client Id, Server Session can be retrieved and the disassociation can be done for the user.
seti.disassociate(loginUserId, serverSession);
boolean association=seti.associate(loginUserId, serverSession);//new handshake- serverSession

If you have two sessions connected to the slave node, and one of them goes away, then Seti will not broadcast a presence removed event, because that is only fired when all associations for a userId on a node are gone.
This is exactly to avoid that one node still has an association, but other nodes think the user is gone from the cloud.
If you have evidence that this is happening, then it's a bug, but this specific behavior is tested in the CometD test suite.

Related

What are the limits on actorevents in service fabric?

I am currently testing the scaling of my application and I ran into something I did not expect.
The application is running on a 5 node cluster, it has multiple services/actortypes and is using a shared process model.
For some component it uses actor events as a best effort pubsub system (There are fallbacks in place so if a notification is dropped there is no issue).
The problem arises when the number of actors grows (aka subscription topics). The actorservice is partitioned to 100 partitions at the moment.
The number of topics at that point is around 160.000 where each topic is subscribed 1-5 times (nodes where it is needed) with an average of 2.5 subscriptions (Roughly 400k subscriptions).
At that point communications in the cluster start breaking down, new subscriptions are not created, unsubscribes are timing out.
But it is also affecting other services, internal calls to a diagnostics service are timing out (asking each of the 5 replicas), this is probably due to the resolving of partitions/replica endpoints as the outside calls to the webpage are fine (these endpoints use the same technology/codestack).
The eventviewer is full with warnings and errors like:
EventName: ReplicatorFaulted Category: Health EventInstanceId {c4b35124-4997-4de2-9e58-2359665f2fe7} PartitionId {a8b49c25-8a5f-442e-8284-9ebccc7be746} ReplicaId 132580461505725813 FaultType: Transient, Reason: Cancelling update epoch on secondary while waiting for dispatch queues to drain will result in an invalid state, ErrorCode: -2147017731
10.3.0.9:20034-10.3.0.13:62297 send failed at state Connected: 0x80072745
Error While Receiving Connect Reply : CannotConnect , Message : 4ba737e2-4733-4af9-82ab-73f2afd2793b:382722511 from Service 15a5fb45-3ed0-4aba-a54f-212587823cde-132580461224314284-8c2b070b-dbb7-4b78-9698-96e4f7fdcbfc
I've tried scaling the application but without this subscribe model active and I easily reach a workload twice as large without any issues.
So there are a couple of questions
Are there limits known/advised for actor events?
Would increasing the partition count or/and node count help here?
Is the communication interference logical? Why are other service endpoints having issues as well?
After time spent with the support ticket we found some info. So I will post my findings here in case it helps someone.
The actor events use a resubscription model to make sure they are still connected to the actor. Default this is done every 20 seconds. This meant a lot of resources were being used and eventually the whole system overloaded with loads of idle threads waiting to resubscribe.
You can decrease the load by setting resubscriptionInterval to a higher value when subscribing. The drawback is that it will also mean the client will potentially miss events in the mean time (if a partition is moved).
To counteract the delay in resubscribing it is possible to hook into the lower level service fabric events. The following psuedo code was offered to me in the support call.
Register for endpoint change notifications for the actor service
fabricClient.ServiceManager.ServiceNotificationFilterMatched += (o, e) =>
{
var notification = ((FabricClient.ServiceManagementClient.ServiceNotificationEventArgs)e).Notification;
/*
* Add additional logic for optimizations
* - check if the endpoint is not empty
* - If multiple listeners are registered, check if the endpoint change notification is for the desired endpoint
* Please note, all the endpoints are sent in the notification. User code should have the logic to cache the endpoint seen during susbcription call and compare with the newer one
*/
List<long> keys;
if (resubscriptions.TryGetValue(notification.PartitionId, out keys))
{
foreach (var key in keys)
{
// 1. Unsubscribe the previous subscription by calling ActorProxy.UnsubscribeAsync()
// 2. Resubscribe by calling ActorProxy.SubscribeAsync()
}
}
};
await fabricClient.ServiceManager.RegisterServiceNotificationFilterAsync(new ServiceNotificationFilterDescription(new Uri("<service name>"), true, true));
Change the resubscription interval to a value which fits your need.
Cache the partition id to actor id mapping. This cache will be used to resubscribe when the replica’s primary endpoint changes(ref #1)
await actor.SubscribeAsync(handler, TimeSpan.FromHours(2) /*Tune the value according to the need*/);
ResolvedServicePartition rsp;
((ActorProxy)actor).ActorServicePartitionClientV2.TryGetLastResolvedServicePartition(out rsp);
var keys = resubscriptions.GetOrAdd(rsp.Info.Id, key => new List<long>());
keys.Add(communicationId);
The above approach ensures the below
The subscriptions are resubscribed at regular intervals
If the primary endpoint changes in between, actorproxy resubscribes from the service notification callback
This ends the psuedo code form the support call.
Answering my original questions:
Are there limits known/advised for actor events?
No hard limits, only resource usage.
Would increasing the partition count or/and node count help here? Partition count not. node count maybe, only if that means there are less subscribing entities on a node because of it.
Is the communication interference logical? Why are other service endpoints having issues as well?
Yes, resource contention is the reason.

Scala and playframework shared cache between nodes

I have a complex problem and I can't figure out which one is the best solution to solve it.
this is the scenario:
I have N servers under a single load balancer and a Database.
All the servers connect to the database
All the servers run the same identical application
I want to implement a Cache in order to decrease the response time and reduce to the minimum the HTTP calls Server -> Database
I implemented it and works like a charm on a single server...but I need to find a mechanism to update all the other caches in the other servers when the data is not valid anymore.
example:
I have server A and server B, both have their own cache.
At the first request from the outside, for example, get user information, replies server A.
his cache is empty so he needs to get the information from the database.
the second request goes to B, also here server B cache is empty, so he needs to get information from the database.
the third request, again on server A, now the data is in the cache, it replies immediately without database request.
the fourth request, on server B, is a write request (for example change user name), server B can make the changes on the database and update his own cache, invalidating the old user.
but server A still has the old invalid user.
So I need a mechanism for server B to communicate to server A (or N other servers) to invalidate/update the data in the cache.
whats is the best way to do this, in scala play framework?
Also, consider that in the future servers can be in geo-redundancy, so in different geographical locations, in a different network, served by a different ISP.
would be great also to update all the other caches when one user is loaded (one server request from database update all the servers caches), this way all the servers are ready for future request.
Hope I have been clear.
Thanks
Since you're using Play, which under the hood, already uses Akka, I suggest using Akka Cluster Sharding. With this, the instances of your Play service would form a cluster (including failure detection, etc.) at startup, and organize between themselves which instance owns a particular user's information.
So proceeding through your requests, the first request to GET /userinfo/:uid hits server A. The request handler hashes uid (e.g. with murmur3: consistent hashing is important) and resolves it to, e.g., shard 27. Since the instances started, this is the first time we've had a request involving a user in shard 27, so shard 27 is created and let's say it gets owned by server A. We send a message (e.g. GetUserInfoFor(uid)) to a new UserInfoActor which loads the required data from the DB, stores it in its state, and replies. The Play API handler receives the reply and generates a response to the HTTP request.
For the second request, it's for the same uid, but hits server B. The handler resolves it to shard 27 and its cluster sharding knows that A owns that shard, so it sends a message to the UserInfoActor on A for that uid which has the data in memory. It replies with the info and the Play API handler generates a response to the HTTP request from the reply.
In this way, all subsequent requests (e.g. the third, the same GET hitting server A) for the user info will not touch the DB, no matter which server they hit.
For the fourth request, which let's say is POST /userinfo/:uid and hits server B, the request handler again hashes the uid to shard 27 but this time, we send, e.g., an UpdateUserInfoFor(uid, newInfo) message to that UserInfoActor on server A. The actor receives the message, updates the DB, updates its in-memory user info and replies (either something simple like Done or the new info). The request handler generates a response from that reply.
This works really well: I've personally seen systems using cluster sharding keep terabytes in memory and operate with consistent single-digit millisecond latency for streaming analytics with interactive queries. Servers crash, and the actors running on the servers get rebalanced to surviving instances.
It's important to note that anything matching your requirements is a distributed system and you're requiring strong consistency, i.e. you're requiring that it be unavailable under a network partition (if B is unable to communicate an update to A, it has no choice but to fail the request). Once you start talking about geo-redundancy and multiple ISPs, you're going to see partitions pretty regularly. The only way to get availability under a network partition is to relax the consistency demand and accept that sometimes the GET will not incorporate the latest PUT/POST/DELETE.
This is probably not something that you want to build yourself. But there are plenty of distributed caches out there that you can use, such as Ehcache or InfiniSpan. I suggest you look into one of those two.

K8s - Node alerts

How can I configure GCP to send me alerts when nodes events (create / shutdown) happen?
I would like to receive email alerting me about the cluster scaling.
tks
First, note that you can retrieve such events in Stackdriver Logging by using the following filter :
logName="projects/[PROJECT_NAME]/logs/cloudaudit.googleapis.com%2Factivity" AND
(
protoPayload.methodName="io.k8s.core.v1.nodes.create" OR
protoPayload.methodName="io.k8s.core.v1.nodes.delete"
)
This filter will retrieve only audit activity log entries (cloudaudit.googleapis.com%2Factivity) in your project [PROJECT_NAME], corresponding to a node creation event (io.k8s.core.v1.nodes.create) or deletion (io.k8s.core.v1.nodes.delete).
To be alerted when such a log is generated, there are multiple possibilities.
You could configure a sink to a Pub/Sub topic based on this filter, and then trigger a Cloud Function when a filtered log entry is created. This Cloud Function will define the logic to send you a mail. This is probably the solution I'd choose, since this use case is described in the documentation.
Otherwise, you could define a logs-based metric based on this filter (or one logs-based metric for creation and another for deletion), and configure an alert in Stackdriver Monitoring when this log-based metric is increased. This alert could be configured to send an email. However, I won't suggest you to implement this, because this is not a real "alert" (in the sense of "something went wrong"), but rather an information. You probably don't want to have incidents opened in Stackdriver Monitoring every time a node is created or deleted. But you can keep the idea of one/multiple logs-based metric and process it/them with a custom application.
for a faster way than using GCP sinks, you may also consider using internal Kubernetes nodes watchers.
You can see an example in https://github.com/notify17/k8s-node-watcher-example/blob/5fc3f802de69f65866cc8f37c4b0e721835ea5b9/main.go#L83.
This example uses Notify17 to generate notifications directly to you browser or mobile phone.
The relevant code is:
// Sets up the nodes watcher
watcher, err := api.Nodes().Watch(listOptions)
// ...
ch := watcher.ResultChan()
for event := range ch {
node, ok := event.Object.(*v1.Node)
// ...
switch event.Type {
case watch.Added:
// ...
// Triggers a Notify17 notification for the ADDED event
notify17(httpClient,
"Node added", fmt.Sprintf("Node %s has been added", node.Name))
case watch.Deleted:
// ...
// Triggers a Notify17 notification for the DELETED event
notify17(httpClient,
"Node deleted", fmt.Sprintf("Node %s has been deleted", node.Name))
}
// ...
You can test out this approach by following the instructions provided in the README.
Note: the drawback with this method is that, if the node where the pod lies on gets deleted/killed unsafely, there may be a chance the event will not be triggered for that node. If the node is deleted gracefully instead, like in the case of a cluster autoscaler, then the pod will be probably recreated on a new node before the old node gets deleted, therefore triggering the notification.

Azure Service Bus topic with exclusive, autodelete subscriptions with a generated name

How can I create topic and subscribe on it multiple independent subscribers with different subscriptions for each without specifying subscription names. If the subscriber disconnect, the corresponding subscription should be automatic removed. This case can be realised with rabbitmq server for logging purposes, for example. https://www.rabbitmq.com/tutorials/tutorial-three-dotnet.html.
In the .NET client, when we supply no parameters to queueDeclare() we create a non-durable, exclusive, autodelete queue with a generated name.
If it is impossible, how can I wrap .net client for realising this case? Thanks.
As you mentioned in your comment, you can create new subscription with unique GUID as subscription name when the client connect (or app start). And specifying SubscriptionDescription.AutoDeleteOnIdle property to set the TimeSpan idle interval after which the subscription is automatically deleted.
var namespaceManager = NamespaceManager.CreateFromConnectionString(connectionString);
var subscriptionname = Guid.NewGuid().ToString();
if (!namespaceManager.SubscriptionExists(topicname, subscriptionname))
{
SqlFilter updatedMessagesFilter =
new SqlFilter("mypro = 'test'");
namespaceManager.CreateSubscription(new SubscriptionDescription(topicname, subscriptionname) { AutoDeleteOnIdle = TimeSpan.FromMinutes(5) },
updatedMessagesFilter);
}
When client disconnect, you can delete the subscription manually.
if (namespaceManager.SubscriptionExists(topicname, subscriptionname))
{
namespaceManager.DeleteSubscription(topicname, subscriptionname);
}
Note: to guarantee 100% delete subscription, you can retain information about client and subscriptionname (unique GUID) in an external storage, and every time when a client connect/reconnect, you can detect if a record exists in external storage that indicates a subscription (used by this client before) is still not be deleted for current client, if the record exists, you can delete that subscription before you create a new subscription.

Concerns about zookeeper's lock-recipe

While reading the ZooKeeper's recipe for lock, I got confused. It seems that this recipe for distributed locks can not guarantee "any snapshot in time no two clients think they hold the same lock". But since ZooKeeper is so widely adopted, if there were such mistakes in the reference documentation, someone should have pointed it out long ago, so what did I misunderstand?
Quoting the recipe for distributed locks:
Locks
Fully distributed locks that are globally synchronous, meaning at any snapshot in time no two clients think they hold the same lock. These can be implemented using ZooKeeeper. As with priority queues, first define a lock node.
Call create( ) with a pathname of "locknode/guid-lock-" and the sequence and ephemeral flags set.
Call getChildren( ) on the lock node without setting the watch flag (this is important to avoid the herd effect).
If the pathname created in step 1 has the lowest sequence number suffix, the client has the lock and the client exits the protocol.
The client calls exists( ) with the watch flag set on the path in the lock directory with the next lowest sequence number.
if exists( ) returns false, go to step 2. Otherwise, wait for a notification for the pathname from the previous step before going to step 2.
Consider the following case:
Client1 successfully acquired the lock (in step 3), with ZooKeeper node "locknode/guid-lock-0";
Client2 created node "locknode/guid-lock-1", failed to acquire the lock, and is now watching "locknode/guid-lock-0";
Later, for some reason (say, network congestion), Client1 fails to send a heartbeat message to the ZooKeeper cluster on time, but Client1 is still working away, mistakenly assuming that it still holds the lock.
But, ZooKeeper may think Client1's session is timed out, and then
delete "locknode/guid-lock-0",
send a notification to Client2 (or maybe send the notification first?),
but can not send a "session timeout" notification to Client1 in time (say, due to network congestion).
Client2 gets the notification, goes to step 2, gets the only node ""locknode/guid-lock-1", which it created itself; thus, Client2 assumes it hold the lock.
But at the same time, Client1 assumes it holds the lock.
Is this a valid scenario?
The scenario you describe could arise. Client 1 thinks it has the lock, but in fact its session has timed out, and Client 2 acquires the lock.
The ZooKeeper client library will inform Client 1 that its connection has been disconnected (but the client doesn't know the session has expired until the client connects to the server), so the client can write some code and assume that his lock has been lost if he has been disconnected too long. But the thread which uses the lock needs to check periodically that the lock is still valid, which is inherently racy.
...But, Zookeeper may think client1's session is timeouted, and then...
From the Zookeeper documentation:
The removal of a node will only cause one client to wake up since
each node is watched by exactly one client. In this way, you avoid
the herd effect.
There is no polling or timeouts.
So I don't think the problem you describe arises. It looks to me as thought there could be a risk of hanging locks if something happens to the clients that create them, but the scenario you describe should not arise.
from packt book - Zookeeper Essentials
If there was a partial failure in the creation of znode due to connection loss, it's
possible that the client won't be able to correctly determine whether it successfully
created the child znode. To resolve such a situation, the client can store its session ID
in the znode data field or even as a part of the znode name itself. As a client retains
the same session ID after a reconnect, it can easily determine whether the child znode
was created by it by looking at the session ID.