How blockchain consensus verify the transaction? - hash

So many blockchain documentation tell that the blockchain uses the consensus to verify transactions. But what I understand the consensus only generate the hash to create the new block. I don't know why the documentation says that the consensus verifies the transaction while the transaction has been created (made) before the consensus run for creating the new hash for new block. The consensus flow does not care about the input (group of transactions), it doesn't know the transactions group are valid or not. Why the blockchain documentation say that?

Before getting into details, I'd like to set some context.
First, what is blockchain - blockchain is a set of blocks where every next block depends on a pervious one. And we can extend the chain as long as every new block is correct from system's point of view.
An example of a block chain (starting from a special, genesis, block): A->B->C->D
Every block, other than A, depends on previous one - usually this is based on hashes and some other rules; but a blockchain creator may pick any set of rules.
Second, we need to agree on what is a consensus. In a blockchain world, the consensus is a process to agree on the chain of block. It could be several chains, but the system will get to a consensus on which one is the right one.
Here is why this is important: let's say we have the chain from above with last block being D. In a distributed environment it is totally possible that more than one VALID block will be generated at the same time; and since the system is distributed, several new chains may emerge:
A->B->C->D->E1
A->B->C->D->E2
The consensus in blockchains allows the network as a whole to agree which of this chains is a valid one. Two important properties of a consensus:
it is eventual, it will take a while before every node gets to the same page; which is the same as: if you will ask at the same time different nodes about which chain is the correct one - you may get different answers; but eventually, the network will agree on the same one
there are many different ways how one can pick the right chain, e.g. in bitcoin, the system eventually agrees on a longest valid chain.
Now we can address the original question: "blockchain uses the consensus to verify transactions". Blockchain (distributed one) uses consensus to agree on a chain and it uses a set of rules to validate that every block is correct. Since every node runs a software with the same set of rules, nodes will make same decision on every block correctness. But in distributed blockchain case, more than one chain may emerge...as described above.
To verify transactions blockchain is using set of rules applied locally to either accept or reject new blocks.
The documentation you mentioned does not sound right to me. Here is an example of a doc: https://www.investopedia.com/terms/c/consensus-mechanism-cryptocurrency.asp - I really like their first take away point: "A consensus mechanism refers to any number of methodologies used to achieve agreement, trust, and security across a decentralized computer network." - and, as I mentioned about - it is always worth asking - what is consensus about - in distributed blockchains, the consensus is about the chain itself.

Related

How does every block in a blockchain contain all others blocks data if it is immutable?

I'm kind of a starter here in the blockchain ecosystem, and I have some questions, but one really seems to bother me since is kind of contradictory.
When you generate a block, each block contains the Header, that contains the previous hash, and hash representation of the current block data, and some other things.
But if every block references the previous block with the hash that was generated with the current data of that block, and every block is immutable...
How can every block in a blockchain can have every other block data if the block is immutable, and also if you change the data, the identity hash of the block immediately changes?
And another one...
Which is the detailed process that happens when there is a transaction available, and how does it incorporates to the blockchain?
Maybe this questions seems kind of dumb to some, but I've been searching and no articles or videos have solved my doubts.
Thanks to all, and have a nice day :)
Every block on the blockchain does not have every other block's data. Actually you might have confused it with Every node (A node is an instance connected to p2p network of blockchain) on blockchain has all the blocks written to the blockchain so far.
So the statement
Every block in a blockchain can have every other block data.
is not true. A block only has data that was added to it at the time of creation after that no data can be added or removed from it.
Now towards your 2nd question
Which is the detailed process that happens when there is a transaction available, and how does it incorporates to the blockchain?
Lets say A transfers B 1 BTC now first of all in order to submit the transaction you have to be connected to blockchain node or a service that is connected to blockchain (bitcoin blockchain in this case) node.
Normally the wallet apps we use to transfer crypto are connected to blockchain node and post our transactions for us.
Once the transaction is given to a node to process initially it is validated and added to what is called mempool which is essentially a place where all the transactions that are yet to be added in blocks live. Mostly nodes have dedicated some space for mempool. Now nodes pick up transaction from mempool (prioritising the one's with higher fee as they get to keep the fee) and add it in a block. This block is called "candidate block".
Nodes start minning candidate block and if they are lucky enough to mine it first the block is broadcasted to the whole network and everyone adds this block to their chain and start minning next block repeating the same process as above.
The Above process is mainly for bitcoin blockchain but other blockchain all of similar processes.
For beginners instead of going through articles and youtube videos i would recommend studying these books to develop a thorough understanding of underlying architecture. There are mainly two type of blockchain (with in crypto eco system) UTXO Based (old ones mainly Bitcoin and its derivaties LTC Zcash DASH etc) and Account Based (New ones Ethereum BNB etc).
For UTXO Based Blockchain.
https://www.oreilly.com/library/view/mastering-bitcoin/9781491902639/ch01.html.
For Account Based Blockchain.
https://www.amazon.com/Mastering-Ethereum-Building-Smart-Contracts/dp/1491971940

Chaos engineering best practice

I studied the principles of chaos, and looks for some opensource project, such as chaosblade which is open sourced by Alibaba, and mangle, by vmware.
These tools are both fault injection tools, and do nothing to analysis on the tested system.
According to the principles of chaos, we should
1.Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
2.Hypothesize that this steady state will continue in both the control group and the experimental group.
3.Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
4.Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.
so how we do step 4? Should we use monitoring system to monitor some major metrics, to check the status of the system after fault injection.
Is there any good suggestions or best practice?
so how we do step 4? Should we use monitoring system to monitor some major metrics, to check the status of the system after fault injection.
As always the answer is it depends.... It depends how do you want to measure your hypothesis, it depends on the hypothesis itself and it depends on the system. But normally it makes totally sense to introduce metrics to improve/increase the observability.
If your hypothesis is like Our service can process 120 requests in a second, even if one node fails. Then you could do it via metrics to measure that yes, but you could also measure it via the requests you send and receive the responses back. It is up to you.
But if your Hypothesis is I get a response for an request which was send before a node goes down. Then it makes more sense to verify this directly with the requests and response.
At our project we use for example chaostoolkit, which lets you specify the hypothesis in json or yaml and related action to prove it.
So you can say I have a steady state X and if I do Y, then the steady state X should be still valid. The toolkit is also able to verify metrics if you want to.
The Principles of Chaos are a bit above the actual testing, they reflect the philosophy of designed vs actual system and system under injection vs baseline, but are a bit too abstract to apply in everyday testing, they are a way of reasoning, not a work process methodology.
I'm think the control group vs experiment wording is one especially doubtful part - you stage a test (injection) in a controlled environment and try to catch if there is a user-facing incident, SLA breach of any kind or a degradation. I do not see where there is a control group out there if you test on a stand or dedicated environment.
We use a very linear variety of chaos methodology which is:
find failure points in the system (based on architecture, critical user scenarios and history of incidents)
design choas test scenarios (may be a single attack or more elaborate sequence)
run tests, register results and reuse green for new releases
start tasks to fix red tests, verify the solutions when they are available
One may say we are actually using the Principles of Choas in 1 and 2, but we tend to think of choas testing as quite linear and simple process.
Mangle 3.0 released with an option for analysis using resiliency score. Detailed documentation available at https://github.com/vmware/mangle/blob/master/docs/sre-developers-and-users/resiliency-score.md

SOA Service Boundary and production support

Our development group is working towards building up with service catalog.
Right now, we have two groups, one to sale a product, another to service that product.
We have one particular service that calculates if the price of the product is profitable. When a sale occurs, the sale can be overridden by a manager. This sale must also be represented in another system to track various sales and the numbers must match. The parameters of profitability also change, and are different from month to month, but a sale may be based on the previous set of parameters.
Right now the sale profitability service only calculates the profit, it also provides a RESTful URI.
One group of developers has suggested that the profitability service also support these "manager overrides" and support a date parameter to calculate based on a previous date. Of course the sales group of developers disagree. If the service won't support this, the servicing developers will have to do an ETL between two systems for each product(s), instead of just the profitability service. Right now since we do not have a set of services to handle this, production support gets the request and then has to update the 1+ systems associated for that given product.
So, if a service works for a narrow slice, but an exception based business process breaks it, does that mean the boundaries of the service are incorrect and need to account for the change in business process?
Two, does adding a date parameter extend the boundary of the service too much, or should it be excepted that if the service already has the parameters, it would also have a history of parameters as well? At this moment, we don't not have a service that only stores the parameters as no one has required a need for it.
If there is any clarification needed before an answer can be given, please let me know.
I think the key here is: How much pain would be incurred by both teams if and ETL was introduced between to the two?
Not that I think you're over-analysing this, but if I may, you probably have an opinion that adding a date parameter into the service contract is bad, but also dislike the idea of the ETL.
Well, strategy aside, I find these days my overriding instinct is to focus less on the technical ins and outs and more on the purely practical.
At the end of the day, ETL is simple, reliable, and relatively pain free to implement, however it comes with side effects. The main one is that you'll be coupling changes to your service's db schema with an outside party, which will limit options to evolve your service in the future.
On the other hand allowing consumer demand to dictate service evolution is easy and low-friction, but also a rocky road as you may become beholden to that consumer at the expense of others.
Another possibility is to allow the extra service parameters to be delivered to the consumer via a message, rather then across the same service. This would allow you to keep your service boundary intact and for the consumer to hold the necessary parameters local to themselves.
Apologies if this does not address your question directly.

How to handle the two signals depending on each other?

I read Deprecating the Observer Pattern with Scala.React and found reactive programming very interesting.
But there is a point I can't figure out: the author described the signals as the nodes in a DAG(Directed acyclic graph). Then what if you have two signals(or event sources, or models, w/e) depending on each other? i.e. the 'two-way binding', like a model and a view in web front-end programming.
Sometimes it's just inevitable because the user can change view, and the back-end(asynchronous request, for example) can change model, and you hope the other side to reflect the change immediately.
The loop dependencies in a reactive programming language can be handled with a variety of semantics. The one that appears to have been chosen in scala.React is that of synchronous reactive languages and specifically that of Esterel. You can have a good explanation of this semantics and its alternatives in the paper "The synchronous languages 12 years later" by Benveniste, A. ; Caspi, P. ; Edwards, S.A. ; Halbwachs, N. ; Le Guernic, P. ; de Simone, R. and available at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1173191&tag=1 or http://virtualhost.cs.columbia.edu/~sedwards/papers/benveniste2003synchronous.pdf.
Replying #Matt Carkci here, because a comment wouldn't suffice
In the paper section 7.1 Change Propagation you have
Our change propagation implementation uses a push-based approach based on a topologically ordered dependency graph. When a propagation turn starts, the propagator puts all nodes that have been invalidated since the last turn into a priority queue which is sorted according to the topological order, briefly level, of the nodes. The propagator dequeues the node on the lowest level and validates it, potentially changing its state and putting its dependent nodes, which are on greater levels, on the queue. The propagator repeats this step until the queue is empty, always keeping track of the current level, which becomes important for level mismatches below. For correctly ordered graphs, this process monotonically proceeds to greater levels, thus ensuring data consistency, i.e., the absence of glitches.
and later at section 7.6 Level Mismatch
We therefore need to prepare for an opaque node n to access another node that is on a higher topological level. Every node that is read from during n’s evaluation, first checks whether the current propagation level which is maintained by the propagator is greater than the node’s level. If it is, it proceed as usual, otherwise it throws a level mismatch exception containing a reference to itself, which is caught only in the main propagation loop. The propagator then hoists n by first changing its level to a level above the node which threw the exception, reinserting n into the propagation queue (since it’s level has changed) for later evaluation in the same turn and then transitively hoisting all of n’s dependents.
While there's no mention about any topological constraint (cyclic vs acyclic), something is not clear. (at least to me)
First arises the question of how is the topological order defined.
And then the implementation suggests that mutually dependent nodes would loop forever in the evaluation through the exception mechanism explained above.
What do you think?
After scanning the paper, I can't find where they mention that it must be acyclic. There's nothing stopping you from creating cyclic graphs in dataflow/reactive programming. Acyclic graphs only allow you to create Pipeline Dataflow (e.g. Unix command line pipes).
Feedback and cycles are a very powerful mechanism in dataflow. Without them you are restricted to the types of programs you can create. Take a look at Flow-Based Programming - Loop-Type Networks.
Edit after second post by pagoda_5b
One statement in the paper made me take notice...
For correctly ordered graphs, this process
monotonically proceeds to greater levels, thus ensuring data
consistency, i.e., the absence of glitches.
To me that says that loops are not allowed within the Scala.React framework. A cycle between two nodes would seem to cause the system to continually try to raise the level of both nodes forever.
But that doesn't mean that you have to encode the loops within their framework. It could be possible to have have one path from the item you want to observe and then another, separate, path back to the GUI.
To me, it always seems that too much emphasis is placed on a programming system completing and giving one answer. Loops make it difficult to determine when to terminate. Libraries that use the term "reactive" tend to subscribe to this thought process. But that is just a result of the Von Neumann architecture of computers... a focus of solving an equation and returning the answer. Libraries that shy away from loops seem to be worried about program termination.
Dataflow doesn't require a program to have one right answer or ever terminate. The answer is the answer at this moment of time due to the inputs at this moment. Feedback and loops are expected if not required. A dataflow system is basically just a big loop that constantly passes data between nodes. To terminate it, you just stop it.
Dataflow doesn't have to be so complicated. It is just a very different way to think about programming. I suggest you look at J. Paul Morison's book "Flow Based Programming" for a field tested version of dataflow or my book (once it's done).
Check your MVC knowledge. The view doesn't update the model, so it won't send signals to it. The controller updates the model. For a C/F converter, you would have two controllers (one for the F control, on for the C control). Both controllers would send signals to a single model (which stores the only real temperature, Kelvin, in a lossless format). The model sends signals to two separate views (one for C view, one for F view). No cycles.
Based on the answer from #pagoda_5b, I'd say that you are likely allowed to have cycles (7.6 should handle it, at the cost of performance) but you must guarantee that there is no infinite regress. For example, you could have the controllers also receive signals from the model, as long as you guaranteed that receipt of said signal never caused a signal to be sent back to the model.
I think the above is a good description, but it uses the word "signal" in a non-FRP style. "Signals" in the above are really messages. If the description in 7.1 is correct and complete, loops in the signal graph would always cause infinite regress as processing the dependents of a node would cause the node to be processed and vice-versa, ad inf.
As #Matt Carkci said, there are FRP frameworks that allow loops, at least to a limited extent. They will either not be push-based, use non-strictness in interesting ways, enforce monotonicity, or introduce "artificial" delays so that when the signal graph is expanded on the temporal dimension (turning it into a value graph) the cycles disappear.

What's a unique, persistent alternative to MAC address?

I need to be able to repeatably, non-randomly, uniquely identify a server host, which may be arbitrarily virtualized and over which I have no control.
A MAC address doesn't work because in some virtualized environments, network interfaces don't have hardware addresses.
Generating a state file and saving it to disk doesn't work because the virtual machine may be cloned, thus duplicating the file.
The server's SSH host keys may be a candidate. They can be cloned like a state file, but in practice they generally aren't because it's such a security problem that it's a mistake not often made.
There's also /var/lib/dbus/machine-id, but that's dependent on dbus. (Thanks Preetam).
There's a cpuid but that's apparently deprecated. (Thanks Bruno Aguirre on Twitter).
Hostname is worth considering. Many systems like Chef already require unique hostnames. (Thanks Alfie John)
I'd like the solution to persist a long time, and certainly across server reboots and software restarts. Ultimately, I also know that users of my software will deprecate a host and want to replace it with another, but keep continuity of the data associated with it, so there are reasons a UUID might be considered mutable over the long term, but I don't particularly want a host to start considering itself to be unknown and re-register itself for no reason.
Are there any alternative persistent, unique identifiers for a host?
It really depends on what is meant by "persistent". For example, two VMs can't each open the same network socket to you, so even if they are bit-level clones of each other it is possible to tell them apart.
So, all that is required is sufficient information to tell the machines apart for whatever the duration of the persistence is.
If the duration of the persistence is the length of a network connection, then you don't need any identifiers at all -- the sockets themselves are unique.
If the persistence needs to be longer -- say, for the length of a boot -- then you can regenerate UUIDs whenever the system boots. (Note that a VM that is cloned would still have to reboot, unless you're hot-copying it.)
If it needs to be longer than that -- say, indefinitely -- then you can generate a UUID identifier on boot and save it to disk, but only use this as part of the identifying information of the machine. If the virtual machine is subsequently cloned, you will know this since you will have two machines reporting the same ID from different sources -- for instance, two different network sockets, different boot times, etc. Since you can tell them apart, you have enough information to differentiate the two cloned machines, which means you can take a subsequent action that forces further differentiation, like instructing each machine to regenerate its state file.
Ultimately, if a machine is perfectly cloned, then by definition you cannot tell which one was the "real one" to begin with, only that there are now two distinguishable machines.
Implying that you can tell the difference between the "real one" and the "cloned one" means that there is some state you can use to record the difference between the two, like the timestamp of when the virtual machine itself was created, in which case you can incorporate that into the state record.
It looks like simple solutions have been ruled out.
So that could lead to complex solutions, like this protocol:
- Client sends tuple [ MAC addr, SSH public host key, sequence number ]
- If server receives this tuple as expected, server and client both increment sequence number.
- Otherwise server must determine what happened (was client cloned? did client move?), perhaps reaching a tentative conclusion and alerting a human to verify it.
I don't think there is a straight forward "use X solution" based on the info available but here are some general suggestions that might get you to a better spot.
If cloning from a "gold image" consider using some "first boot" logic to generate a unique ID. Config management systems like Chef, Puppet or Cf-engine provide some scaffolding to achieve this.
Consider a global state manager like zookeeper. Specifically its atomic counter functionality. Same system could get new ID over time, but it would be unique.
Also this stack overflow might give you some other direction. It references Twitter's approach to a similar problem.
If I understand correctly, you want a durable, globally unique identifier under these conditions:
An OS installation that can be cloned while running, so any state inside the VM won't work, and
Could be running in an arbitrary virtualization environment, so any state outside the VM won't work.
I realize this doesn't directly answer your question, but it really seems like either the design or the constraints need some substantial adjustment to accomodate a solution.