Centralised Vs Decentralised ? Single Point of Failure? - distributed-computing

In centralised System, there are a number of terminals connected to a mainframe, so does this mean, this has a single point of failure ?

Related

What is meant by Distributed System?

I am reading about distributed systems and getting confused with what is really means?
I understand on high level, it means that set of different machines that work together to achieve a single goal.
But this definition seems too broad and loose. I would like to give some points to explain the reasons for my confusion:
I see lot of people referring the micro-services as distributed system where the functionalities like Order, Payment etc are distributed in different services, where as some other refer to multiple instances of Order service which possibly trying to serve customers and possibly use some consensus algorithm to come to consensus on shared state (eg. current Inventory level).
When talking about distributed database, I see lot of people talk about different nodes which possibly use to store/serve a part of user request like records with primary key from 'A-C' in first node 'D-F' in second node etc. On high level it looks like sharding.
When talking about distributed rate limiting. Some refer to multiple application nodes (so called distributed application nodes) using a single rate limiter, some other mention that the rate limiter itself has multiple nodes with a shared cache (like redis).
It feels that people use distributed systems to mention about microservices architecture, horizontal scaling, partitioning (sharding) and anything in between.
I am reading about distributed systems and getting confused with what is really means?
As commented by #ReinhardMänner, the good general term definition of distributed system (DS) is at https://en.wikipedia.org/wiki/Distributed_computing
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. The components interact with one another in order to achieve a common goal.
Anything that fits above definition can be referred as DS. All mentioned examples such as micro-services, distributed databases, etc. are specific applications of the concept or implementation details.
The statement "X being a distributed system" does not inherently imply any of such details and for each DS must be explicitly specified, eg. distributed database does not necessarily meaning usage of sharding.
I'll also draw from Wikipedia, but I think that the second part of the quote is more important:
A distributed system is a system whose components are located on
different networked computers, which communicate and coordinate their
actions by passing messages to one another from any system. The
components interact with one another in order to achieve a common
goal. Three significant challenges of distributed systems are:
maintaining concurrency of components, overcoming the lack of a global clock, and managing the independent failure of components. When
a component of one system fails, the entire system does not fail.
A system that constantly has to overcome these problems, even if all services are on the same node, or if they communicate via pipes/streams/files, is effectively a distributed system.
Now, trying to clear up your confusion:
Horizontal scaling was there with monoliths before microservices. Horizontal scaling is basically achieved by division of compute resources.
Division of compute requires dealing with synchronization, node failure, multiple clocks. But that is still cheaper than scaling vertically. That's where you might turn to consensus by implementing consensus in the application, or using a dedicated service e.g. Zookeeper, or abusing a DB table for that purpose.
Monoliths present 2 problems that microservices solve: address-space dependency (i.e. someone's component may crash the whole process and thus your component) and long startup times.
While microservices solve these problems, these problems aren't what makes them into a "distributed system". It doesn't matter if the different processes/nodes run the same software (monolith) or not (microservices), it matters that they are different processes that can't easily communicate directly (e.g. via function calls that promise not to fail).
In databases, scaling horizontally is also cheaper than scaling vertically, The two components of horizontal DB scaling are division of compute - effectively, a distributed system - and division of storage - sharding - as you mentioned, e.g. A-C, D-F etc..
Sharding of storage does not define distributed systems - a single compute node can handle multiple storage nodes. It's just that it's much more useful for a database that divides compute to also shard its storage, so you often see them together.
Distributed rate limiting falls under "maintaining concurrency of components". If every node does its own rate limiting, and they don't communicate, then the system-wide rate cannot be enforced. If they wait for each other to coordinate enforcement, they aren't concurrent.
Usually the solution is "approximate" rate limiting where components synchronize "occasionally".
If your components can't easily (= no latency) agree on a global rate limit, that's usually because they can't easily agree on a global anything. In that case, you're effectively dealing with a distributed system, even if all components just threads in the same process.
(that could happen e.g. if you plan to scale out but haven't done so yet, so you don't allow your threads to communicate directly.)

Can a Kafka-Cluster be cut in half?

Scenario: you have a Kafka-Cluster in different DCs but they are configured as one cluster. So there is no mirroring through MirrorMaker or something liket hat. The DCs are not very far from eatch other. But they are physically separated.
Now what do you have to do to ensure that the cluster is failsafe on BOTH SIDES if the connection between those two DCs is down? So on BOTH sides the producers and consumer should still work.
I would guess: you need multiple Zookeepers on both sides and multiple Kafka-Nodes.
But is that enough? Does the cluster heal itself after getting reconnected?
Thanks in advance.
I'm assuming your datacenters that "are not very far from eatch other" are basically Availability Zones (AZs).
It's pretty common to spread a cluster over multiples AZs. However it's usually not desired or possible that each "slice" can live on its own.
The immediate issue is Zookeeper which by design prevents split-brain scenarios. So if a ZK cluster is split only one "slice" (at best) will carry on working. So the brokers that are on a side of the non working ZK clusters will be non functional.
Then let's say it was possible to have both sides keep working. What happens when you joins both sides again?
Data is likely to have diverged as clients wrote data to each side separately. Now you could have the same partition with different messages for the same offset and no way to resolve the conflict as both options are "valid".
I hope this shows why this is not a possible solution. In practice, if an AZ goes offline, it is non functional until it is brought back online.
Clients that were connected to the offline AZ should reconnect to the other AZ (using multiple bootstrap servers) and clients that were in the failed AZ should be reprovisioned into another one.
If configured correctly, Kafka can survive an AZ outage (even though in practice, it's best to have 3 AZs) and keep all resources available. Also in this scenario, the cluster will automatically return to a good state when the failed AZ returns.

How to deploy zookeeper across multiple data centers and failover?

I would like to know about the existing approaches that are available when running Zookeeper across data centers?
One approach that I found after doing some research is to have observers. That approach is to have only one ensemble in the main data center with leader and follower. And having observers in the backup data center. When main datacenter crash, we select other datacenter as the new main data center and convert observers to leader/follower manually.
I would like to about better approaches to achieve the same.
Thanks
First I would like to point the cons of your solution which hopefully my solution would solve:
a) in case of main data center failure the recovery process is manual (I quote you: "convert observers to leader/follower manually")
b) only the main data center accepts writes -> in case of failure all data (when observer don't write logs) or only last updates (when observer do write logs) are lost
Because the question is about data centerS I'll consider that we have enough (DCs) to reach our objective: solving a. and b. while having an usable multi data center distributed ZK.
So, when having an even number of data centers (DC) one could use an additional DC only for getting an odd number of ZK nodes in the ensemble. When having e.g. 2 DCs than a 3rd one could be added; each DC could contain 1 rwZK (read-write ZK node) or, for better tolerance against failures, each DC could contain 3 rwZK organized as hierarchical quorums (both cases could benefit of ZK observers). Inside a DC all ZK clients should point only to the DC's ZK-group so the traffic remained between DCs would be only for e.g. leader election, writes. With this kind of setup one solves both a. and b. but loses write/recovery-performance because the writes/elections must be agreed between data centers: at least 2 DCs must agree on writes/elections with 2 ZK nodes agreement per DC (see hierarchical quorums). The intra-DC agreement should be fast enough hence won't matter much for the overall write agreement process; bottom line, approximately only the delay between DCs would matter. The disadvantages of this approach are:
- additional cost for the 3rd data center: this could be mitigated by using the company office (a guy did that) as the 3rd data center
- lost sessions because of inter-DC network latency and/or throughput: with high enough timeouts one could reach a maximum possible write-throughput (depending on inter-DC average network speed) so this solution would be valid only when that maximum is acceptable. Still, when using 1 rw-ZK per DC I guess there'll be not much difference to your solution because the writes from backup DC to main DC must travel between DCs too; but for your solution won't be inter-DCs write agreements or leader elections related communication so it's faster.
Other consideration:
Regardless of the chosen solution the inter-DCs communication should be secured and for this ZK offers no solution so tunneling or other approach must be implemented.
UPDATE
Another solution would be to still use an additional 3rd DC (or company office) but where to keep only the rw-ZKs (1, 3 or other odd number) while the other 2 DCs to only have observer-ZKs. The clients should still connect only to the DC's ZK servers but we no longer need hierarchical quorums. The gain here is that the write agreements and leader elections would be only inside the DC with rw-ZKs (let's call it arbiter DC). The disadvantages are:
- the arbiter DC is a single point of failure
- the write requests will still have to travel from observer DCs to arbiter DC

Difference between centralized and distributed computing

Can anyone tell me the differences between centralized and distributed computing?
Centralized
A system with centralized multiprocessor parallel architecture.In the late 1980 s Centralized systems have been progressively replaced by distributed systems.
characteristics of centralized system
Non autonomous components
usually homogeneous technology
Multiple users share the same resources at all time
single point of control
single point of failure
Distributed
set of tightly coupled programs executing on one or more computers which are interconnected through a network and coordinating their actions. These programs know about one another and carry out tasks that none could carry out in isolation
characteristics of distributed system
autonomous components
Mostly build using heterogeneous technology
System components may be used exclusively
Concurrent processes can execute
Multiple point of failure
Requirement of distributed system
Scalability- possibility of adding new hosts
openness- easily extended and modified
Heterogeneity-supports various H/W S/w platforms
Resource sharing- H/w, S/W and data
fault tolerance- ability to function correctly even if faults occur
Centralized: all calculations are done on one particular computer (system). Example: you have a dedicated server for calculating data.
Distributed: the calculation is distributed to multiple computers. Example: when you have a large amount of data then you can divide it and send each part to particular computers which will make the calculations for their part.
Main basic differences are:
distrib-systems have no global state
no shared memory
no shared variables
distrib-systems have no shared time clock
therefore order of events is difficult
distrib-systems can have race conditions
race conditions see http://en.wikipedia.org/wiki/Race_condition
So "computing" in a distrubuted environment is very difficult. Do you have concret question about programing models or whatever?
Centralized Systems
"In Centralized Systems,several jobs are done on a particular central processing unit(CPU)"
Distributed Systems
"In Distributed Systems,jobs are distributed among several processor.The Processor are interconnected by a computer network"
(Sheheryar ,NUML)
Briefly, Centralized computing, as the name itself depicts, is concerned with just a single server. The particular operation is being held at this server location and nowhere else.
Distributed computing is held where the system requirement is quite large, and the job is distributed to several processors and the solutions are then combined together, keeping in mind that the processors are interconnected by a computer network.
centralized system:is a system which computing is done at central location using terminals attached to central computer in brief (mainframe and dump terminals all computation is done on the mainframe through terminals )
distributed system:is a collection of independent computers that appear to its users as single coherent system where hardware is distributed consisting of n processing elements (processor and memory )also software is distributed where no centralized os each processing element has its own os ,no physically centralized file system and inter-process communication via message passing at lowest level
Big Note:the main differences is reliability. in distributed system if one machine crashes,the system as a whole can still survive
METHOD OF ARBITRATION In all but the simplest systems, more than one module may
need control of the bus.
In a centralized scheme, a single hardware device, referred to as a bus controller or arbiter, is responsible for allocating time on the bus.
In a distributed scheme, there is no central controller. Rather, each module contains access
control logic and the modules act together to share the bus.
in centralized system in case the server fails it affects the whole system because the server controls the whole operation
in D.S system incase a system fails it doesn't affect the operations of the other computers because they are independent and distributed in operations
Let us try to understand this with an example.
Say you are carrying a large amount of money. You are in a crowded train, where your pocket may be picked and you might lose money. What is the ideal strategy for carrying money?
Put all money in a single pocket: In this case, it is easy for you to just put the money in the pocket and be done. When you go back home, you can simply take out money from the pocket and count it. But wait. What if your pocket is picked? You lose ALL the money (bankrupt? eh!). Seems like it is not the best idea to store all the money in a SINGLE pocket. Let us think what else we can do
Divide your money: Put some of it in the left pocket, put some in the right pocket and maybe put some in your bag (which has a limited capacity). You need to devise a strategy to divide the money with you. Also, when you go back home, you will have to spend time collecting money from different pockets and collecting it at one place. However, we are in a better situation now. If one of our pocket (or bag) is picked, we do not lose ALL of the money. The chances of your bag, left pocket and the right pocket, all being picked is fairly low. With a little overhead of dividing money, you can now avoid losing all of your money. Isn’t that better?
This is how distributed systems work. They divide the information (money in your case) and keep it on different machines (pockets and bags for us). This way if one of the machine goes down, we are not at a big loss. That is, we do not have a single point of failure
Another important thing that distributed systems implement is data replication. They put replicas of same information in multiple machines. This way, if one of the machines goes down, we do not lose the information. So, we now have something called as fault tolerance.

Basics difference between distributed computing and interprocess communication?

I know the theritical definition for distributed computing and interproces communication.
But in real time I was not able to come to conclusion that when we go for distributed or interprocess.
Tell me some scenario where we can go for distributed computing or interprocess communication by example.
interprocess communication basically would mean comuniction b/t processes.
mostly this concept is used when studying parallel programming and studying the working of operating system.
this topic is to huge to explain, its a full subject, try googling interprocess communication and read the basic definations.
2)
my initial understanding is:-
imagine a office, why does it have several employees in one department? because many brains and men power is needed to bring one task to completion. one man can do the job but it might take days and what if he gets sick! so distributed...
now how to communicate between the porcesses/people doing there independent task of the job on different computers/different CPU's of the same computer/within different cabins of the same office building?
"shout!! hey i have done my work take the result and send more?? who is in charge here!! answer ****"
no right!
so here come the INTER PROCESS COMMUNICATION subject.
note:- please note i am also a learning person :-) so do not take the above as right without doing your own googling, i am not responsible for any .........
Interprocess communication is typically defined as communication between multiple processes on a single machine. Distributed computing is multiple processes being distributed across a network and executed on desired host boxes. To me it makes sense to implement a desired interprocess communication in the same fashion as the distributed processes transmit their results back to the distributor/ host. That way a weaker machine continues to be able to process data while a more powerful box runs a greater load.