States of a 2-bit Branch Predictor - cpu-architecture

I was reading the dynamic branch prediction section in Chapter 5 of Computer Organization and Design: The Hardware/Software Interface 5th Edition by Patterson and Hennessy when I came across the following diagram for the states of the 2-bit predictor :
The 2-bit predictor should change it's prediction after it predicts wrong twice. But according to this diagram when we start from the bottom left state, if the machine predicts "NOT TAKEN" twice when the branch should have been "TAKEN", then the top right PREDICT TAKEN state is reached. However here the machine will change state to the bottom right PREDICT NOT TAKEN even if it predicts wrongly when the branch should have been "NOT TAKEN" just once.
Isn't that wrong behavior and does this mean the state machine is wrong or am I missing something?
On the bottom NOT TAKEN dark colored state when the branch is TAKEN twice, you can see that the state reached is the light colored "unsure" state, whereas it should have been according to me the dark colored "sure" state, since the branch did the same action twice in a row.

On the bottom NOT TAKEN dark colored state when the branch is TAKEN twice, you can see that the state reached is the light colored "unsure" state, whereas it should have been according to me the dark colored "sure" state, since the branch did the same action twice in a row.
The light-blue state predicts taken like you want it to after two successive taken branches. If the branch is taken from then on there will be no further mispredicts. I don't think your "should" is justified.
It's a 2-bit saturating counter; it takes 3 steps to get all the way from 00 to 11, corresponding to 3 steps along that graph.
Your idea could be implemented using the 2 bits of state to record which way each of the last 2 branches went. But then how would you tell the difference between a loop branch that was not-taken once (falling out of previous loop) and/or after being taken again once (first iter of next loop) vs. a rarely taken branch that was taken once? The actual way, as shown in the graph, mispredicts a loop branch once per loop, only on the last iteration when it falls through. The first iteration of the next time you enter the loop predicts correctly, returning it to strongly taken.
You can find a detailed example of such a predictor in Raffzahn's answer on How does the 68060 branch predictor work? on retrocomputing.SE, including static prediction (backward taken, forward not-taken) when you get a BPB miss (no prediction entry for this branch).
A 2-bit predictor is very far from perfect; more advanced predictors also consider whether global history predicts this branch better than local. https://danluu.com/branch-prediction/

Related

System dynamics SEIR infectious curve for 3 waves of Covid

Using system dynamics on anylogic how can you model a simulation that will give an infectious curve of this nature(Below picture) using SEIR.
enter image description here
I have tried to simulate, however my graph goes up and down. It does not oscillate as per the attached picture.
I need to simulate something similar to the graph for my assingment.
There should be three types of events in your model.
First, lets call it "initial spread", is triggered on the start of your simulation.
Second, lets call it "winter season", is triggered annualy in November\December.
Third, lets call it "mass vaccination" - you can decide when to trigger it and for what selection of your agents.
So first two are kind of global events, and the third event is specific to some sub-population (this can make the third wave kind of "toothy" if you trigger it in slightly different moments for different populations).
That is pretty it.
Curios to see how your model will predict the fourth wave - second winter season of your simulation. So keep us updated :)
There are a number of ways to model this. One of the simplest ways is to simply use a time aspect for one of your infection rate parameters so that the infection rate increases or decreases with time.
See the example below.
I took the SIR model from the Cloud https://cloud.anylogic.com/model/d465d1f5-f1fc-464f-857a-d5517edc2355?mode=SETTINGS
And simply added an event to change the Infectivity rate using an event.
Changing the chart to only show infected people the result now looked something like this.
(See the 3 waves that were created)
You will obviously use a parameters optimization experiment to get the parameter settings as close to reality as possible

Why is the Branch Target Buffer designed as a cache?

The BHT is not a cache and it doesn't need to be because it is okay if a mistake is made when accessing it. The BTB, however, is designed as a cache because it always has to return either a hit or a miss. Why can't the BTB make a mistake?
The BTB can make mistakes, and mis-speculation will be detected when the jump instruction actually executes (or decodes, for direct branches), resulting in a bubble as the front-end re-steers to fetch from the right location.
Or with out-of-order speculative execution for indirect branches, potentially the core has to roll back to the last known-good state just like recovering from a wrong branch direction for a conditional branch.
A BTB can present a false hit and this is currently exploited by some implementations through the use of partial tags. (Similarly, one could have one entry per set be completely untagged; in a direct-mapped BTB, no entries would be tagged. A traditional set-associative design, having (partial) tags for each way, gives "free" miss detection as part of way selection.) As Peter Cordes' answer notes, this mistake can be detected and corrected later in the pipeline.
Recognizing a BTB miss does allow the throttling of speculation. If the BTB is used to prefetch the instruction stream past an instruction cache miss, avoiding cache polluting and bandwidth wasting misspeculation can have a performance impact. When performance is limited by power or thermal considerations, avoiding misspeculation even when it would be detected and corrected quickly can save some power/heat generation and so potentially improve performance.
With a two-level BTB, a hit indication could allow the L2 BTB not to be accessed for that branch. Aside from energy efficiency, the L2 BTB may have been designed to provide lower bandwidth or to be shared with another closely coupled fetch engine (so bandwidth unused by one fetch engine could be used by another).
In addition, an indication of a BTB miss can be used to improve branch direction prediction. A miss indicates that the branch was likely not taken in recent history (whether not recently executed or not taken during recent execution); the branch direction predictor may choose to override a taken prediction (with the target calculated at decode) or may choose to treat the prediction as low confidence (e.g., using dynamic predication or giving priority to fetch from other threads). The former effectively filters out never taken branches from the predictor (which is allowed to have a destructive alias that predicts taken); both uses of a miss indication exploit the likelihood that old branch information is less likely to be accurate.
A BTB can also provide a simple method of branch identification. A BTB miss predicts that the fetch does not contain a potentially taken branch (filtering out non-branches and never taken branches). This avoids the branch direction predictor having to predict not taken for non-branch instructions (or redirecting fetch after instruction decode on a BTB false hit when the branch direction predictor predicts taken). This adds non-branches to the filtering to avoid destructive aliasing. (A separate branch identifier could be used to filter non-branch instructions and to distinguish non-conditional, indirect, and return instructions, which might use different target predictors and might not need direction prediction.)
If the BTB provides a per address direction prediction or other information used for direction prediction, a miss indication could allow the direction predictor to use other methods to provide such information (e.g., static branch prediction). A static prediction may be not particularly accurate but it is likely to be more accurate than a "random" prediction with a taken bias (since never taken branches might never enter the BTB and replacement might be based on least recently taken); a "static" predictor could also exploit the fact that there was a BTB miss. If an agree predictor is used (where a static bias is xored with the prediction to reduce destructive aliasing, biased taken branches that are taken have the same predictor updating as biased not-taken branches that are not taken), a per-address bias is needed.
An L1 BTB might also be integrated with the L1 instruction cache, particularly for branch address relative targets, such that not only is miss detection free (the tags for all ways are present) but the BTB provided target may not even be a prediction (avoiding the need to recalculate the target). This would require additional prediction resources for indirect branches (and an L2 BTB might be used to support prefetching under instruction cache misses) but can avoid significant redundant storage (as such branch instructions already store the offset).
Even though BTB miss determination is not necessary, it can be useful.

GitHub - Green lines next to text?

This is my first GitHub, I want to write a report but this green line keeps popping up next to my text and I don't want it to do so. This is what my text should be but it appears like this: https://gyazo.com/f593db78a387623fcad31c744ac9a120
How can I control this?
System Hardware Final Report
The project designed shows the functioning behind computer's.
The topics covered will include:
Introduction to Binary & Logic Gates
Commonly Used Parts
Logic Functions
The Timing Signal Generator
The Bus, Arithmetic Unit and Program Counter
Data Registers and the Memory Address Register
Program Memory
The Control Signal Generator
The outcome of this project will enable one to build a computer with a CPU (central processing unit) that can do simple
instructions such as moving data from one register to another and performing simple arithmetics.
Introduction to Binary & Logic Gates
Breadboard
The breadboard will be used to setup the components and all circuits. It is divides in 8 parallel coloumns and contains 5 columns that
provides power. In fact, the breadboard is made up of two important supply connections: the VCC for the red
line and the GND for the blue line. The holes are all connected row-by-row in lines of 5.
Switches
This happens when you make changes to a file (this is called a diff), then preview it. Green indicates something was added, while red indicates something was removed. Committing the changes will not make the lines stay in the file.

AlphaGo Zero board evaluation function uses multiple time steps as an input... Why?

According to AlphaGo Cheat Sheet, AlphaGo Zero uses a sequence of consecutive board configurations to encode its game state.
In theory, all the necessary information is contained in the latest state, and yet they include the previous 7 configurations.
Why did they choose to inject so much complexity ?
What are they listening for ??
AlphaGoZero
The sole reason is because in all games - Go, Chess, and Shogi - there is a repetition rule. What this means is that the game is not fully observable from the current board position. In other words, there may be two identical positions with two very different evaluations. For example in one Go position there may be a winning move, but in an identical Go position that move is either illegal or one of the next few moves in the would-be-winning continuation creates an illegal position.
You could try feeding in only the current board position and handling repetitions in the tree only. But I think this would be weaker because the evaluation function would be wrong in some cases, leading to a horizon effect if that branch of the tree had not been explored deeply enough to correct the problem.

Understanding multi-layer LSTM

I'm trying to understand and implement multi-layer LSTM. The problem is i don't know how they connect. I'm having two thoughs in mind:
At each timestep, the hidden state H of the first LSTM will become the input of the second LSTM.
At each timestep, the hidden state H of the first LSTM will become the initial value for the hidden state of the sencond LSTM, and the input of the first LSTM will become the input for the second LSTM.
Please help!
TLDR: Each LSTM cell at time t and level l has inputs x(t) and hidden state h(l,t)
In the first layer, the input is the actual sequence input x(t), and previous hidden state h(l, t-1), and in the next layer the input is the hidden state of the corresponding cell in the previous layer h(l-1,t).
From https://arxiv.org/pdf/1710.02254.pdf:
To increase the capacity of GRU networks (Hermans and
Schrauwen 2013), recurrent layers can be stacked on top of
each other.
Since GRU does not have two output states, the same output hidden state h'2
is passed to the next vertical layer. In other words, the h1 of the next layer will be equal to h'2.
This forces GRU to learn transformations that are useful along depth as well as time.
I am taking help of colah's blog post, just that I will cut short it to make you understand specific part.
As you can look at above image, LSTMs have this chain like structure and each have four neural network layer.
The values that we pass to next timestamp (cell state) and to next layer(hidden state) are basically same and they are desired output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to pass.
We also pass previous cell state information (top arrow to next cell) to next timestamp(cell state) and then decide using sigmoid layer(forget gate layer), how much information we are going to keep taking help of new input and input from previous state.
Hope this helps.
In PyTorch, multilayer LSTM's implementation suggests that the hidden state of the previous layer becomes the input to the next layer. So your first assumption is correct.
There's no definite answer. It depends on your problem and you should try different things.
The simplest thing you can do is to pipe the output from the first LSTM (not the hidden state) as the input to the second layer of LSTM (instead of applying some loss to it). That should work in most cases.
You can try to pipe the hidden state as well but I didn't see it very often.
You can also try other combinations. Say for the second layer you input the output of the first layer and the original input. Or you link to the output of the first layer from the current unit and the previous.
It all depends on your problem and you need to experiment to see what works for you.