Is having a channel per user a good idea in MQTT? - chat

I was developing a chat system for a company, and played around a lot of tools available to do so. One of my most interesting journey was through Pheonix framework (Elixer Language! phew)
I ended up using an MQTT based server to manage chat. I had used MQTT for a few device based communications I did for IoT projects. Used an EMQ server for my broker and this js library for both FE and BE.
Setting it up was a cakewalk.
Now I had a few questions when I was adding few more features.
How should I scale my channels/messages ratio.
How many subscriptions are too many subscriptions ?
I would have access to the usage so would have data to base these on.
Any text on these would be appreciated.
Adding few facts about the application.
The chat is used in an application which kindof conducts meetings. Here are some rough figures for the same.
Average size of a meeting = 25 people (Can go upto 10,000)
Average number of meetings a day = 50 (currently)
Messages per minute in a meeting = 20

How many subscriptions are too many subscriptions?
It depends on the infrastructure and resources you have available to build a cluster.
This is the link to the official documentation on how to build an EMQ cluster.
The cluster architecture of emqttd broker is based on distributed
Erlang/OTP and Mnesia database.
The cluster design could be summarized by the following two rules:
When a MQTT client SUBSCRIBE a Topic on a node, the node will tell all
the other nodes in the cluster: I subscribed a Topic.
When a MQTT
Client PUBLISH a message to a node, the node will lookup the Topic
table and forward the message to nodes that subscribed the Topic.
Finally there will be a global route table(Topic -> Node) that
replicated to all nodes in the cluster:
topic1 -> node1, node2
topic2 -> node3
topic3 -> node2, node4
--------- ---------
| Node1 | --------| Node2 |
--------- ---------
| \ / |
| \ / |
| / \ |
| / \ |
--------- ---------
| Node3 | --------| Node4 |
--------- ---------

Related

Kafka Metrics | Why i see additional node ids node--1 and node--2 in metrics report

I have two brokers with ids 1001 and 1002. In the out of the producer performance tool, the per-node metrics section displays metrics for node--1 and node--2 while my broker ids are 1001 and 1002. What is node--1/node--2 value? how should I interpret these metric values?
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node--1} : 5.388
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node--2} : 44.175
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-1001} : 59489.855
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-1002} : 7739.021
Nodes with a negative ID are nodes obtained from the bootstrap servers configuration.
Addresses in bootstrap.servers are used to discover the cluster. So the client connects to them to retrieve the cluster metadata. Once done, clients re-establish connections to the nodes they have discovered and use these new connections for all traffic. The bootstrap connections are usually not reused and dropped later.
Not asked here but related:
If you look at consumers you may also notice nodes with really large IDs (such as node-2147483646). These denote connections to group coordinators nodes.

Should topic partitions be replicated across all broker nodes in a Kafka cluster?

Though answers are available to question similar to above. My curiosity lies in the fact that suppose n1-5 nodes are in cluster where topics t1 is on n1,n2 and n3, topic t2 is on n3,n4,n5. Now if suppose p1 pushes messages in t1 and c1 consumes from t1 and similarly p2 and c2 for t2.
Here is where I have certain doubts?
Assume nodes n3- n5 are all down, now still p1 and c1 will have active connection to cluster which is kind of useless as anyways publishing and consuming fails. (metric connection_count is greater than 0 means there are connections to cluster from either producer or consumer)
Is it correct way to replicate a topic to all nodes in a Kafka cluster?
Why do we give multiple node address details in bootstrap server property is one address is sufficient?
Note: I am a beginner in Kafka world and still experimenting with local setup to discover potential problems which might occur in real world.
Why should it fail? Nodes n1 and n2 are still up and running and assuming that the topic had a replication-factor=3 all the data should still be accessible.
I'd say it depends. It won't harm to replicate the topics across all nodes but sometimes it is redundant (especially when you have a very high number of brokers in the cluster). For high availability, you should set at least replication-factor=3. This allows for example one broker to be taken down for maintenance and one more to fail unexpectedly.
bootstrap.servers is used to setup the connection the Kafka cluster. One address is typically enough to access the whole cluster, but it is always best to provide all the addresses in case one of the servers is down. Note that clients (producers or consumers) make use of all brokers irrespective of which servers are specified in bootstrap.servers.
Example of 2 topics (each having 3 and 2 partitions respectively):
Broker 1:
+-------------------+
| Topic 1 |
| Partition 0 |
| |
| |
| Topic 2 |
| Partition 1 |
+-------------------+
Broker 2:
+-------------------+
| Topic 1 |
| Partition 2 |
| |
| |
| Topic 2 |
| Partition 0 |
+-------------------+
Broker 3:
+-------------------+
| Topic 1 |
| Partition 1 |
| |
| |
| |
| |
+-------------------+
Note that data is distributed (and Broker 3 doesn't hold any data of topic 2).
Topics, should have a replication-factor > 1 (usually 2 or 3) so that when a broker is down, another one can serve the data of a topic. For instance, assume that we have a topic with 2 partitions with a replication-factor set to 2 as shown below:
Broker 1:
+-------------------+
| Topic 1 |
| Partition 0 |
| |
| |
| |
| |
+-------------------+
Broker 2:
+-------------------+
| Topic 1 |
| Partition 0 |
| |
| |
| Topic 1 |
| Partition 0 |
+-------------------+
Broker 3:
+-------------------+
| Topic 1 |
| Partition 1 |
| |
| |
| |
| |
+-------------------+
Now assume that Broker 2 has failed. Broker 1 and 3 can still serve the data for topic 1. So a replication-factor of 3 is always a good idea since it allows for one broker to be taken down for maintenance purposes and also for another one to be taken down unexpectedly. Therefore, Apache-Kafka offers strong durability and fault tolerance guarantees.
Note about Leaders:
At any time, only one broker can be a leader of a partition and only that leader can receive and serve data for that partition. The remaining brokers will just synchronize the data (in-sync replicas). Also note that when the replication-factor is set to 1, the leader cannot be moved elsewhere when a broker fails. In general, when all replicas of a partition fail or go offline, the leader will automatically be set to -1.
Assume nodes n3- n5 are all down, now still p1 and c1 will have active
connection to cluster which is kind of useless as anyways publishing
and consuming fails. (metric connection_count is greater than 0 means
there are connections to cluster from either producer or consumer)
Answer: If all of the three brokers that is your topic replicas are down, then you cannot produce or consume from that topic. To avoid this kind of situations it is recommended to locate brokers in different racks and provide broker.rack information in broker configs.
broker.rack: Rack of the broker. This will be used in rack aware
replication assignment for fault tolerance. Examples: RACK1,
us-east-1d
Is it correct way to replicate a topic to all nodes in a Kafka
cluster?
Answer: It is totally up to your fault tolerance needs. If you replicate topic to all 6 brokers, then you can tolerate up to 5 broker failures. (of course min.insync.replicas and acks configs are also important. If number of replicas is 6, min.insync.replicas=2, acks=all then you can tolerate up to 4 broker failures to continue sending messages)
Why do we give multiple node address details in bootstrap server
property is one address is sufficient?
Answer:bootstrap.servers config is used to initial connection to the Kafka cluster. Yes, one address is enough, but what if the broker in this address is down. You cannot connect to cluster. So it's recommended to provide more than one address, to avoid this kind of situation with redundancy.

Incremental Cooperative Rebalancing leads to unevenly balanced connectors

We encounter lots of unevenly balanced connectors on our setup since the upgrade to Kafka 2.3 (also with Kafka connect 2.3) that should include the new Incremental Cooperative Rebalancing in Kafka connect explained here :
https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
Let me explain a bit our setup, we are deploying multiple Kafka connect clusters to dump Kafka topic on HDFS. A single connect cluster is spawned for each hdfs-connector, meaning that at any times, exactly one connector is running on a connect cluster. Those clusters are deployed on top of Kubernetes with randomly selected ips in a private poll.
Let's take an example. For this hdfs connector, we spawned a connect cluster with 20 workers. 40 tasks should run on this cluster, so we could expect to have 2 tasks per worker. But as shown in the command below, while querying the connect API after a while, the connector seems really unbalanced, some workers are even not working at all while one of them took ownership of 28 tasks.
bash-4.2$ curl localhost:8083/connectors/connector-name/status|jq '.tasks[] | .worker_id' | sort |uniq -c
...
1 "192.168.32.53:8083"
1 "192.168.33.209:8083"
1 "192.168.34.228:8083"
1 "192.168.34.46:8083"
1 "192.168.36.118:8083"
1 "192.168.42.89:8083"
1 "192.168.44.190:8083"
28 "192.168.44.223:8083"
1 "192.168.51.19:8083"
1 "192.168.57.151:8083"
1 "192.168.58.29:8083"
1 "192.168.58.74:8083"
1 "192.168.63.102:8083"
Here we would expect that the whole poll of workers are used and the connector evenly balanced after a while. We would expect to have somethings like :
bash-4.2$ curl localhost:8083/connectors/connector-name/status|jq '.tasks[] | .worker_id' | sort |uniq -c
...
2 "192.168.32.185:8083"
2 "192.168.32.53:8083"
2 "192.168.32.83:8083"
2 "192.168.33.209:8083"
2 "192.168.34.228:8083"
2 "192.168.34.46:8083"
2 "192.168.36.118:8083"
2 "192.168.38.0:8083"
2 "192.168.42.252:8083"
2 "192.168.42.89:8083"
2 "192.168.43.23:8083"
2 "192.168.44.190:8083"
2 "192.168.49.219:8083"
2 "192.168.51.19:8083"
2 "192.168.55.15:8083"
2 "192.168.57.151:8083"
2 "192.168.58.29:8083"
2 "192.168.58.74:8083"
2 "192.168.59.249:8083"
2 "192.168.63.102:8083"
The second result was actually achieved by manually killing some workers, and a bit of luck (we didn't found a proper way to force an even balance across the connect cluster for now, it's more a process of try and fail until the connector is evenly balanced.
Does anyone already came across this issue and manage to solve it properly ?

Zookeeper clarification on CAP

I would like to clarify my understanding of CAP theorem
for eg : Zookeeper is classified as CP ( Consistent and Partition Tolerant )
What does this mean ? In the event of partition failure , does the system return consistent data ?
Or does it mean that the moment there is a connectivity issue between the nodes in ZK cluster , the ZK is not available.
If yes , what it means it that , when the nodes in cluster are not able to talk to each other , the entire ZK goes down.
Zookeeper serves requests as long as there is quorum meaning majority of nodes are available. Since it needs majority not all the nodes its tolerant to network partition.
It replicates data to all nodes (at least the quorum) to be consistent.
If leader cannot be elected then zookeeper (no quorum) will fail requests and this is why it is not highly available.
Typically 3 or 5 servers are used for zookeeper and the quorum will be 2 or 3 nodes respectively.
Refer to this blog post for more details.
https://www.ibm.com/developerworks/library/bd-zookeeper/index.html

MSMQ Cluster losing messages on failover

I've got a MSMQ Cluster setup with nodes (active/passive) that share a drive.
Here are the tests I'm performing. I send messages to the queue that are recoverable. I then take the MSMQ cluster group offline and then bring it online again.
Result: The messages are still there.
I then simulate failover by moving the group to node 2. Moves over successfully, but the messages aren't there.
I'm sending the messages as recoverable and the MSMQ cluster group has a drive that both nodes can access.
Anyone?
More Info:
The Quorum drive stays only on node 1.
I have two service/app groups. One MSMQ and one that is a generic service group.
Even more info:
When node 1 is active, I pump it full of messages. Failover to node 2. 0 message in the queue for 02. Then I failover back to 01, and the messages are in 01.
You haven't clustered MSMQ or aren't using clustered MSMQ properly.
What you are looking at are the local MSMQ services.
http://blogs.msdn.com/b/johnbreakwell/archive/2008/02/18/clustering-msmq-applications-rule-1.aspx
Cheers
John
==================================
OK, maybe the drive letter being used isn't consistently implemented.
What is the storage location being used by clustered MSMQ?
If you open this storage location up in Explorer from Node 1 AND Node 2 at the same time, are the folder contents exactly the same? If you create a text file via Node 1's Explorer window, does it appear after a refresh in Node 2's Explorer window?